Commit Graph

2021 Commits

Author SHA1 Message Date
Louis Dureuil
da0503ef80
Fix document count 2023-10-31 16:36:27 +01:00
ManyTheFish
94206b0055 Update tests 2023-10-31 13:48:47 +01:00
Louis Dureuil
b40253bf18
update snapshots 2023-10-31 10:30:48 +01:00
Louis Dureuil
d8bf3f3fc2
Remove unused snapshots 2023-10-31 10:12:49 +01:00
Louis Dureuil
9d59e8011a
fix some tests 2023-10-31 10:08:36 +01:00
Louis Dureuil
dad78cbf8d
Bulk facet remove deletes keys from DB when value empty 2023-10-31 09:53:55 +01:00
Louis Dureuil
4e91707a06
Rename test 2023-10-31 09:41:17 +01:00
Louis Dureuil
de10f20732
Fix field distribution again 2023-10-30 17:47:22 +01:00
Louis Dureuil
be395c7944
Change order of arguments to tokenizer_builder 2023-10-30 16:26:29 +01:00
Louis Dureuil
9fedd8101a
Fix tests 2023-10-30 15:11:07 +01:00
Louis Dureuil
54d07a8da3
Update field distribution taking into account both deletions and additions 2023-10-30 14:47:51 +01:00
Louis Dureuil
58690dfb19
Fix tests compilation after changes to ExternalDocumentsIds API 2023-10-30 13:34:07 +01:00
Louis Dureuil
abf424ebfc
Remove unused FromIterator 2023-10-30 11:41:56 +01:00
Clément Renault
dfab6293c9
Use an LMDB database to store the external documents ids 2023-10-30 11:41:23 +01:00
Louis Dureuil
fdf3f7f627
Fix facet distribution test 2023-10-30 11:41:23 +01:00
Louis Dureuil
6260cff65f
Actually delete documents from DB when the merge function says so 2023-10-30 11:41:22 +01:00
Louis Dureuil
8e0d9c9a5e
Recover delete_documents tests that were too eagerly deleted 2023-10-30 11:41:22 +01:00
Louis Dureuil
ae4ec8ea55
Add delete_document_using_wtxn to TempIndex 2023-10-30 11:41:22 +01:00
Louis Dureuil
9a2dccc3bc
Add iterator to find external ids of a bitmap of internal ids 2023-10-30 11:41:22 +01:00
Louis Dureuil
a35988550c
Fix some snapshots 2023-10-30 11:41:22 +01:00
Louis Dureuil
e78281785c
Actually execute the transform even if there are only documents to delete 2023-10-30 11:41:22 +01:00
Louis Dureuil
3c15881818
Add simple delete test 2023-10-30 11:41:22 +01:00
Louis Dureuil
73c06d31d9
snapshot always display stuff in consistent order 2023-10-30 11:41:22 +01:00
Louis Dureuil
290e773d23
remove more warnings and fix some tests 2023-10-30 11:41:22 +01:00
Louis Dureuil
fa6c7f65ca
Add TmpIndex::delete_documents 2023-10-30 11:41:22 +01:00
Louis Dureuil
113527f466
Remove soft-deleted related methods from Index 2023-10-30 11:41:22 +01:00
Louis Dureuil
c534a1b687
Stop using delete documents pipeline in batch runner 2023-10-30 11:41:22 +01:00
Louis Dureuil
2263dff02b
Stop using removed delete pipelines almost everywhere 2023-10-30 11:41:22 +01:00
Louis Dureuil
d651b3ef01
Remove delete documents files 2023-10-30 11:41:20 +01:00
ManyTheFish
762b0b47e6
Use deladd merging function in chunks mergers 2023-10-30 11:40:20 +01:00
Louis Dureuil
01d5eedf2f
Remove some warnings 2023-10-30 11:40:20 +01:00
Louis Dureuil
073f89db79
Fix facet tests 2023-10-30 11:40:20 +01:00
Louis Dureuil
8370fbc92b
Fix snaps 2023-10-30 11:40:20 +01:00
Louis Dureuil
85f42fbc03
Handle external to internal id mapping from TypedChunk::Documents 2023-10-30 11:40:20 +01:00
Louis Dureuil
c6b3c18c85
WIP: Comment out document deletion in other pipelines than update
TODO: fix calls to DELETE route
2023-10-30 11:40:20 +01:00
Louis Dureuil
bafeb892a7
Modify Index after changes to ExternalDocumentsIds 2023-10-30 11:40:20 +01:00
Louis Dureuil
8fb221dae3
Refactor ExternalDocumentsIds
- Remove soft deleted
- Add apply method that takes a list of operations to encapsulate modifications to the external -> internal mapping
2023-10-30 11:40:20 +01:00
Louis Dureuil
946c762d28
WIP: reset documents in TypedChunk::Documents 2023-10-30 11:40:20 +01:00
Louis Dureuil
cda6ca1ee6
Remove TypedChunk::NewDocumentIds 2023-10-30 11:40:18 +01:00
Louis Dureuil
696fcf4d18
Fix document insertion into LMDB 2023-10-30 11:39:31 +01:00
ManyTheFish
476e4d3dbe
Use value buffer instead of the initial value when writting the final result in the sorter 2023-10-30 11:39:31 +01:00
Clément Renault
576fa9c6da
Remove useless comment 2023-10-30 11:39:31 +01:00
Kerollmops
77dcbff6b2
Remove and Insert the DelAdd geo points 2023-10-30 11:39:31 +01:00
Kerollmops
544440c363
Ignore geo fields when the Del and Add content is the same 2023-10-30 11:39:31 +01:00
Clément Renault
a3dae4db9b
Extract the geo fields DelAdd and generate a new DelAdd obkv with it 2023-10-30 11:39:31 +01:00
ManyTheFish
ba90a5ec0e
update extract fid word count docids 2023-10-30 11:39:31 +01:00
Louis Dureuil
b26dc9aabe
Explanatory code comment 2023-10-30 11:39:31 +01:00
Louis Dureuil
66abac9364
Use specialized KvReaderDelAdd type
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-10-30 11:39:31 +01:00
Louis Dureuil
59f88c14b3
Simplify facet update after removing Index::faceted_documents_ids 2023-10-30 11:39:29 +01:00
Louis Dureuil
14832cb324
Remove Index::faceted_documents_ids 2023-10-30 11:37:32 +01:00
Louis Dureuil
04ec293024
Facet Incremental update 2023-10-30 11:37:30 +01:00
Louis Dureuil
f67ff3a738
Facets Bulk update 2023-10-30 11:36:40 +01:00
Clément Renault
560e8f5613
Introduce the CboRoaringBitmapCodec merge_deladd_into and use it 2023-10-30 11:34:55 +01:00
Clément Renault
2d3f15f82c
Introduce a function to only serialize the Add side of a DelAdd obkv 2023-10-30 11:34:55 +01:00
Clément Renault
40186bf403
Rename FieldIdWordCountDocids correctly 2023-10-30 11:34:50 +01:00
ManyTheFish
87e3d27878
update extract word pair proximity to support deladd obkvs 2023-10-30 11:34:02 +01:00
ManyTheFish
6bcf8b4f8c
update extract word position docids 2023-10-30 11:34:02 +01:00
ManyTheFish
46aa75abdb
update extract word docids 2023-10-30 11:34:02 +01:00
ManyTheFish
2597bbd107
Make script language docids map taking a tuple of roaring bitmaps expressing the deletions and the additions 2023-10-30 11:34:00 +01:00
Clément Renault
e2bc054604
Update extract_facet_string_docids to support deladd obkvs 2023-10-30 11:32:36 +01:00
Clément Renault
fcd3a1434d
Update extract_facet_number_docids to support deladd obkvs 2023-10-30 11:31:04 +01:00
Clément Renault
a82dee21e0
Rename docid_fid into fid_docid 2023-10-30 11:31:02 +01:00
Clément Renault
bc45c1206d
Implement all the facet extraction paths and simplify them 2023-10-30 11:29:08 +01:00
Clément Renault
6ae4100f07
Generate the DelAdd for is_null, is_empty, and exists 2023-10-30 11:29:08 +01:00
Clément Renault
0c47defeee
Work on fid docid facet values rewrite 2023-10-30 11:29:06 +01:00
ManyTheFish
313b16bec2
Support diff indexing on extract_docid_word_positions 2023-10-30 11:24:19 +01:00
ManyTheFish
1dd97578a8
Make the transform struct return diff-based documents obkvs 2023-10-30 11:22:07 +01:00
ManyTheFish
f5ef69293b
deactivate prefix dbs 2023-10-30 11:22:07 +01:00
ManyTheFish
1c5705c164
clean PR warnings 2023-10-30 11:22:05 +01:00
ManyTheFish
66c2c82a18
Split wpp in several sorters 2023-10-30 11:15:02 +01:00
ManyTheFish
28a8d0ccda
Fix word pair proximity 2023-10-30 11:15:02 +01:00
ManyTheFish
96be85396d
Use a vecDeque in wpp database 2023-10-30 11:15:02 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use 2023-10-30 11:15:02 +01:00
ManyTheFish
b541d48847
Add buffer to the obkv writter 2023-10-30 11:15:02 +01:00
ManyTheFish
8ccf32d1a0
Compute word_fid_docids before word_docids and exact_word_docids 2023-10-30 11:15:02 +01:00
ManyTheFish
db1ca21231
add puffin in sorter into reeder function 2023-10-30 11:15:00 +01:00
ManyTheFish
11ea5acff9
Fix 2023-10-30 11:13:10 +01:00
ManyTheFish
8d77736a67
Fix fid_word_docids 2023-10-30 11:13:10 +01:00
ManyTheFish
748b333161
Add usefull debug assert before key insertion in database 2023-10-30 11:13:10 +01:00
ManyTheFish
17b647dfe5
Wip 2023-10-30 11:13:08 +01:00
meili-bors[bot]
2614e7d9ca
Merge #4174
4174: Fix warnings r=dureuill a=irevoire

Fix all the warnings found in the CI: https://github.com/meilisearch/meilisearch/actions/runs/6622576021/job/17988323623

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-10-30 10:12:54 +00:00
Tamo
e7244aa485 fix warnings 2023-10-30 11:00:46 +01:00
ManyTheFish
4c6fddb1cb update charabia 2023-10-26 17:01:10 +02:00
Louis Dureuil
2bae9550c8
Add explanatory comment 2023-10-23 12:06:28 +02:00
Vivek Kumar
32c78ac8b1
add/update tests when search with distinct attribute & pagination with no ranking 2023-10-23 12:06:27 +02:00
Vivek Kumar
5fe7c4545a
compute all candidates correctly when skipping 2023-10-23 12:02:45 +02:00
meili-bors[bot]
5e0485d8dd
Merge #4131
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish

## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.

## Stats

### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23)

### Impact on search relevancy

<details>

| dataset_name | host_name        | Relevancy rate (Precision) | completion_rate  25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | 1_4_0            | percentile-75 |           0.00% |          12.50% |          35.00% |          45.00% |
| FBIS         | 1_4_0            | percentile-90 |          20.00% |          40.00% |                 |         100.00% |
| FBIS         | 1_4_0            | average       |           5.78% |          11.16% |          21.90% |          26.29% |
| FBIS         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FBIS         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.56% |
| FBIS         | reduce_proximity | percentile-75 |           0.00% |          15.00% |          35.00% |          40.00% |
| FBIS         | reduce_proximity | percentile-90 |          20.00% |          40.00% |          85.00% |         100.00% |
| FBIS         | reduce_proximity | average       |           5.55% |          11.34% |          21.75% |          26.14% |
| FR94         | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | 1_4_0            | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | 1_4_0            | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | 1_4_0            | average       |           5.95% |          12.07% |          18.70% |          25.57% |
| FR94         | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-50 |           0.00% |           0.00% |           0.00% |           0.00% |
| FR94         | reduce_proximity | percentile-75 |           0.00% |           5.00% |          15.00% |          42.11% |
| FR94         | reduce_proximity | percentile-90 |          15.00% |          54.55% |         100.00% |         100.00% |
| FR94         | reduce_proximity | average       |           5.79% |          12.00% |          18.70% |          25.53% |
| FT           | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | 1_4_0            | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | 1_4_0            | percentile-90 |          20.00% |          50.00% |          65.00% |         100.00% |
| FT           | 1_4_0            | average       |           5.08% |          12.58% |          20.00% |          25.49% |
| FT           | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| FT           | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |          10.00% |
| FT           | reduce_proximity | percentile-75 |           0.00% |          15.00% |          30.00% |          40.00% |
| FT           | reduce_proximity | percentile-90 |          10.00% |          45.00% |          60.00% |         100.00% |
| FT           | reduce_proximity | average       |           5.01% |          12.64% |          20.10% |          25.53% |
| LAT          | 1_4_0            | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | 1_4_0            | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | 1_4_0            | percentile-75 |           5.00% |          15.00% |          30.00% |          30.00% |
| LAT          | 1_4_0            | percentile-90 |          15.00% |          45.00% |          60.00% |          80.00% |
| LAT          | 1_4_0            | average       |           4.80% |          11.80% |          17.88% |          21.62% |
| LAT          | reduce_proximity | percentile-10 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-25 |           0.00% |           0.00% |           0.00% |           0.00% |
| LAT          | reduce_proximity | percentile-50 |           0.00% |           0.00% |           5.00% |           5.00% |
| LAT          | reduce_proximity | percentile-75 |           0.00% |          11.11% |          25.00% |          35.00% |
| LAT          | reduce_proximity | percentile-90 |          15.00% |          45.00% |          55.00% |          80.00% |
| LAT          | reduce_proximity | average       |           4.43% |          11.23% |          17.32% |          21.45% |

</details>

### Impact on Search time

| dataset_name | host_name        |      25.00% |      50.00% |      75.00% |     100.00% | Average     |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS         | 1_4_0            |        3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS         | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94         | 1_4_0            | 2.236666667 |        4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94         | reduce_proximity |        2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT           | 1_4_0            | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 |  10.0787919 |
| FT           | reduce_proximity |        4.51 | 5.981666667 | 7.701342282 | 6.766917293 |  6.23998156 |
| LAT          | 1_4_0            | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT          | reduce_proximity |        6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |

## Technical approach

- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4

## Related

TBD

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-18 14:56:08 +00:00
ManyTheFish
27eec21415 Fix tests 2023-10-18 16:03:22 +02:00
Clément Renault
62dfd09dc6
Add more puffin logs to the deletion functions 2023-10-13 13:11:09 +02:00
meili-bors[bot]
f343ef5f2f
Merge #4108
4108: Fix bug where search with distinct attribute and no ranking, returns offset+limit hits r=curquiza a=vivek-26

# Pull Request

## Related issue
Fixes #4078 

## What does this PR do?
This PR - 
- Fixes bug where search with distinct attribute and no ranking, returns offset+limit hits.
- Adds unit and integration tests.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
2023-10-12 07:51:29 +00:00
Vivek Kumar
19ba129165
add unit test for distinct search with no ranking 2023-10-11 19:02:27 +05:30
Vivek Kumar
d4da06ff47
fix bug where distinct search with no ranking returns offset+limit hits 2023-10-11 19:02:16 +05:30
Tamo
c0f2724c2d get rids of the new introduced error code in favor of an io::Error 2023-10-10 15:12:23 +02:00
Tamo
d772073dfa use a bufreader everytime there is a grenad<file> 2023-10-10 15:00:30 +02:00
ManyTheFish
43989fe2e4 Reduce porximity range from 7 to 3 2023-10-03 12:16:48 +02:00
meili-bors[bot]
487d493f49
Merge #4043
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza



Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-09-11 12:27:34 +00:00
Vivek Kumar
abfa7ded25
use a new temp index in the test 2023-09-08 12:32:47 +05:30
Vivek Kumar
f2837aaec2
add another test case 2023-09-08 11:39:54 +05:30
Vivek Kumar
11df155598
fix highlighting bug when searching for a phrase with cropping 2023-09-08 11:39:52 +05:30
meili-bors[bot]
256cf33bca
Merge #4039
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops

This PR fixes #4035, making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-07 09:25:58 +00:00