Commit Graph

529 Commits

Author SHA1 Message Date
ManyTheFish
ebef6bc24d Simplify documents database writing 2023-11-20 10:14:57 +01:00
ManyTheFish
d59b7db8d0 remove unused code 2023-11-20 10:10:45 +01:00
ManyTheFish
263e825619 Fix typos in comments 2023-11-20 10:06:29 +01:00
Many the fish
b0adc73ce6
Merge pull request #4207 from meilisearch/diff-indexing-prefix-databases
Diff indexing prefix databases
2023-11-14 16:04:05 +01:00
Louis Dureuil
772964125d
Factor removal of document from DB 2023-11-13 13:51:22 +01:00
Louis Dureuil
264b10ec20
Fixup documentation 2023-11-09 16:23:20 +01:00
Louis Dureuil
3053e01c05
Batch::remove_documents_from_db_no_batch 2023-11-09 14:23:02 +01:00
Louis Dureuil
9cef800b2a
Enrich uses the new type 2023-11-09 14:22:05 +01:00
ManyTheFish
882ab9cc85 remove warnings 2023-11-09 11:35:33 +01:00
ManyTheFish
70ce40828c Compute word docids prefix cache 2023-11-08 17:01:00 +01:00
ManyTheFish
688266c83e Remove word pair proximity prefix cache and compute it at search time 2023-11-08 14:16:01 +01:00
ManyTheFish
6dab826908 Reactivate prefix databases 2023-11-08 13:58:01 +01:00
Louis Dureuil
cbaa54cafd
Fix clippy issues 2023-11-06 11:19:31 +01:00
Louis Dureuil
1ad1fcc8c8
Remove all warnings 2023-11-06 10:31:14 +01:00
ManyTheFish
87610a5f98 Don't try to delete a document that is not in the database 2023-11-02 16:49:03 +01:00
Clément Renault
ff522c919d
Fix the vector extractions for the diff indexing 2023-11-02 15:58:08 +01:00
ManyTheFish
bf0651f23c Implement iter method on ExternalDocumentsIds 2023-11-02 15:38:00 +01:00
ManyTheFish
5b20e625f3 fix merge 2023-11-02 15:31:37 +01:00
ManyTheFish
bc51d6157a Fix transform reindexing path 2023-11-02 15:26:20 +01:00
ManyTheFish
1b4ff991c0 update typed chunks 2023-11-02 15:26:20 +01:00
ManyTheFish
4b64c33aa2 update vector extractor 2023-11-02 15:26:20 +01:00
ManyTheFish
12323d610e Change the original document sorter key from the internal docid to a concatenation of the internal and the external docid 2023-11-02 15:26:20 +01:00
Clément Renault
4d864f0702
Always sort internal Sorter entries in parallel 2023-11-02 14:47:43 +01:00
Clément Renault
c71b1d33ae
Sort entries using rayon in the transform sorters 2023-11-01 11:07:16 +01:00
Clément Renault
0fc446c62f
Add more timing logs to the Transform 2023-11-01 11:07:16 +01:00
Louis Dureuil
b1d1355b69
remove tests on soft-deleted 2023-10-31 16:36:27 +01:00
Louis Dureuil
f19332466e
Extract field value as values instead of Option<Value> 2023-10-31 16:36:27 +01:00
Louis Dureuil
da0503ef80
Fix document count 2023-10-31 16:36:27 +01:00
Louis Dureuil
b40253bf18
update snapshots 2023-10-31 10:30:48 +01:00
Louis Dureuil
d8bf3f3fc2
Remove unused snapshots 2023-10-31 10:12:49 +01:00
Louis Dureuil
9d59e8011a
fix some tests 2023-10-31 10:08:36 +01:00
Louis Dureuil
4e91707a06
Rename test 2023-10-31 09:41:17 +01:00
Louis Dureuil
de10f20732
Fix field distribution again 2023-10-30 17:47:22 +01:00
Louis Dureuil
be395c7944
Change order of arguments to tokenizer_builder 2023-10-30 16:26:29 +01:00
Louis Dureuil
9fedd8101a
Fix tests 2023-10-30 15:11:07 +01:00
Louis Dureuil
54d07a8da3
Update field distribution taking into account both deletions and additions 2023-10-30 14:47:51 +01:00
Louis Dureuil
58690dfb19
Fix tests compilation after changes to ExternalDocumentsIds API 2023-10-30 13:34:07 +01:00
Louis Dureuil
abf424ebfc
Remove unused FromIterator 2023-10-30 11:41:56 +01:00
Clément Renault
dfab6293c9
Use an LMDB database to store the external documents ids 2023-10-30 11:41:23 +01:00
Louis Dureuil
fdf3f7f627
Fix facet distribution test 2023-10-30 11:41:23 +01:00
Louis Dureuil
6260cff65f
Actually delete documents from DB when the merge function says so 2023-10-30 11:41:22 +01:00
Louis Dureuil
8e0d9c9a5e
Recover delete_documents tests that were too eagerly deleted 2023-10-30 11:41:22 +01:00
Louis Dureuil
a35988550c
Fix some snapshots 2023-10-30 11:41:22 +01:00
Louis Dureuil
e78281785c
Actually execute the transform even if there are only documents to delete 2023-10-30 11:41:22 +01:00
Louis Dureuil
290e773d23
remove more warnings and fix some tests 2023-10-30 11:41:22 +01:00
Louis Dureuil
113527f466
Remove soft-deleted related methods from Index 2023-10-30 11:41:22 +01:00
Louis Dureuil
c534a1b687
Stop using delete documents pipeline in batch runner 2023-10-30 11:41:22 +01:00
Louis Dureuil
2263dff02b
Stop using removed delete pipelines almost everywhere 2023-10-30 11:41:22 +01:00
ManyTheFish
762b0b47e6
Use deladd merging function in chunks mergers 2023-10-30 11:40:20 +01:00
Louis Dureuil
01d5eedf2f
Remove some warnings 2023-10-30 11:40:20 +01:00
Louis Dureuil
85f42fbc03
Handle external to internal id mapping from TypedChunk::Documents 2023-10-30 11:40:20 +01:00
Louis Dureuil
c6b3c18c85
WIP: Comment out document deletion in other pipelines than update
TODO: fix calls to DELETE route
2023-10-30 11:40:20 +01:00
Louis Dureuil
946c762d28
WIP: reset documents in TypedChunk::Documents 2023-10-30 11:40:20 +01:00
Louis Dureuil
cda6ca1ee6
Remove TypedChunk::NewDocumentIds 2023-10-30 11:40:18 +01:00
Louis Dureuil
696fcf4d18
Fix document insertion into LMDB 2023-10-30 11:39:31 +01:00
ManyTheFish
476e4d3dbe
Use value buffer instead of the initial value when writting the final result in the sorter 2023-10-30 11:39:31 +01:00
Clément Renault
576fa9c6da
Remove useless comment 2023-10-30 11:39:31 +01:00
Kerollmops
77dcbff6b2
Remove and Insert the DelAdd geo points 2023-10-30 11:39:31 +01:00
Kerollmops
544440c363
Ignore geo fields when the Del and Add content is the same 2023-10-30 11:39:31 +01:00
Clément Renault
a3dae4db9b
Extract the geo fields DelAdd and generate a new DelAdd obkv with it 2023-10-30 11:39:31 +01:00
ManyTheFish
ba90a5ec0e
update extract fid word count docids 2023-10-30 11:39:31 +01:00
Louis Dureuil
59f88c14b3
Simplify facet update after removing Index::faceted_documents_ids 2023-10-30 11:39:29 +01:00
Louis Dureuil
14832cb324
Remove Index::faceted_documents_ids 2023-10-30 11:37:32 +01:00
Clément Renault
560e8f5613
Introduce the CboRoaringBitmapCodec merge_deladd_into and use it 2023-10-30 11:34:55 +01:00
Clément Renault
2d3f15f82c
Introduce a function to only serialize the Add side of a DelAdd obkv 2023-10-30 11:34:55 +01:00
Clément Renault
40186bf403
Rename FieldIdWordCountDocids correctly 2023-10-30 11:34:50 +01:00
ManyTheFish
87e3d27878
update extract word pair proximity to support deladd obkvs 2023-10-30 11:34:02 +01:00
ManyTheFish
6bcf8b4f8c
update extract word position docids 2023-10-30 11:34:02 +01:00
ManyTheFish
46aa75abdb
update extract word docids 2023-10-30 11:34:02 +01:00
ManyTheFish
2597bbd107
Make script language docids map taking a tuple of roaring bitmaps expressing the deletions and the additions 2023-10-30 11:34:00 +01:00
Clément Renault
e2bc054604
Update extract_facet_string_docids to support deladd obkvs 2023-10-30 11:32:36 +01:00
Clément Renault
fcd3a1434d
Update extract_facet_number_docids to support deladd obkvs 2023-10-30 11:31:04 +01:00
Clément Renault
a82dee21e0
Rename docid_fid into fid_docid 2023-10-30 11:31:02 +01:00
Clément Renault
bc45c1206d
Implement all the facet extraction paths and simplify them 2023-10-30 11:29:08 +01:00
Clément Renault
6ae4100f07
Generate the DelAdd for is_null, is_empty, and exists 2023-10-30 11:29:08 +01:00
Clément Renault
0c47defeee
Work on fid docid facet values rewrite 2023-10-30 11:29:06 +01:00
ManyTheFish
313b16bec2
Support diff indexing on extract_docid_word_positions 2023-10-30 11:24:19 +01:00
ManyTheFish
1dd97578a8
Make the transform struct return diff-based documents obkvs 2023-10-30 11:22:07 +01:00
ManyTheFish
f5ef69293b
deactivate prefix dbs 2023-10-30 11:22:07 +01:00
ManyTheFish
1c5705c164
clean PR warnings 2023-10-30 11:22:05 +01:00
ManyTheFish
66c2c82a18
Split wpp in several sorters 2023-10-30 11:15:02 +01:00
ManyTheFish
28a8d0ccda
Fix word pair proximity 2023-10-30 11:15:02 +01:00
ManyTheFish
96be85396d
Use a vecDeque in wpp database 2023-10-30 11:15:02 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use 2023-10-30 11:15:02 +01:00
ManyTheFish
b541d48847
Add buffer to the obkv writter 2023-10-30 11:15:02 +01:00
ManyTheFish
8ccf32d1a0
Compute word_fid_docids before word_docids and exact_word_docids 2023-10-30 11:15:02 +01:00
ManyTheFish
db1ca21231
add puffin in sorter into reeder function 2023-10-30 11:15:00 +01:00
ManyTheFish
11ea5acff9
Fix 2023-10-30 11:13:10 +01:00
ManyTheFish
8d77736a67
Fix fid_word_docids 2023-10-30 11:13:10 +01:00
ManyTheFish
748b333161
Add usefull debug assert before key insertion in database 2023-10-30 11:13:10 +01:00
ManyTheFish
17b647dfe5
Wip 2023-10-30 11:13:08 +01:00
Tamo
c0f2724c2d get rids of the new introduced error code in favor of an io::Error 2023-10-10 15:12:23 +02:00
Tamo
d772073dfa use a bufreader everytime there is a grenad<file> 2023-10-10 15:00:30 +02:00
meili-bors[bot]
487d493f49
Merge #4043
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza



Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-09-11 12:27:34 +00:00
meili-bors[bot]
256cf33bca
Merge #4039
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops

This PR fixes #4035, making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-07 09:25:58 +00:00
Kerollmops
679c0b0f97
Extract the vectors from the non-flattened version of the documents 2023-09-06 12:26:00 +02:00
Kerollmops
e02d0064bd
Add a test case scenario 2023-09-06 12:26:00 +02:00
ManyTheFish
66aa6d5871 Ignore tokens with empty normalized value during indexing process 2023-09-05 15:44:14 +02:00
meili-bors[bot]
ccf3ba3f32
Merge #4019
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Kerollmops
c53841e166
Accept the null JSON value as the value of _vectors 2023-08-14 16:03:55 +02:00