ManyTheFish
ebef6bc24d
Simplify documents database writing
2023-11-20 10:14:57 +01:00
ManyTheFish
d59b7db8d0
remove unused code
2023-11-20 10:10:45 +01:00
ManyTheFish
263e825619
Fix typos in comments
2023-11-20 10:06:29 +01:00
Many the fish
b0adc73ce6
Merge pull request #4207 from meilisearch/diff-indexing-prefix-databases
...
Diff indexing prefix databases
prototype-diff-indexing-1
2023-11-14 16:04:05 +01:00
meili-bors[bot]
72d3fa4898
Merge #4203
...
4203: Extract external document docids from docs on deletion by filter r=Kerollmops a=dureuill
This fixes some of the performance regression observed on `diff-indexing` when doing delete-by-filter with a filter matching many documents.
To delete 19 768 771 documents (hackernews dataset, all documents matching `type = comment`), here are the observed time:
|branch (commit sha1sum)|time|speed-down factor (lower is better)|
|--|--|--|
|`main` (48865470d7aaf42fa5bbfd01cf73423afb77addf)|1212.885536s (~20min)|x1.0 (baseline)|
|`diff-indexing` (523519fdbfd3a28ca15320641cb096f26230a7ca)|5385.550543s (90min)|x4.44|
|**`diff-indexing-extract-primary-key`**(f8289cd974d957d38645ca66c993ca518ec81955)|2582.323324s (43min) | x2.13|
So we're still suffering a speed-down of x2.13, but that's much better than x4.44.
---
Changes:
- Refactor the logic of PrimaryKey extraction to a struct
- Add a trait to abstract the extraction of field id from a name between `DocumentBatch` and `FieldIdMap`.
- Add `Index::external_id_of` to get the external ids of a bitmap of internal ids.
- Use this new method to add new Transform and Batch methods to remove documents that are known to be from the DB.
- Modify delete-by-filter to use the new method
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-11-13 13:02:10 +00:00
Louis Dureuil
772964125d
Factor removal of document from DB
2023-11-13 13:51:22 +01:00
Louis Dureuil
378deb0bef
Rename trait
2023-11-13 13:38:36 +01:00
ManyTheFish
1f36410541
Update tests
2023-11-13 13:36:39 +01:00
Louis Dureuil
264b10ec20
Fixup documentation
2023-11-09 16:23:20 +01:00
Louis Dureuil
825257da76
Use more efficient method for deletion in benchmarks
2023-11-09 16:13:15 +01:00
Louis Dureuil
f8289cd974
Use it from delete-by-filter
2023-11-09 14:23:15 +01:00
Louis Dureuil
3053e01c05
Batch::remove_documents_from_db_no_batch
2023-11-09 14:23:02 +01:00
Louis Dureuil
b11c2afac0
Index::external_id_of
2023-11-09 14:22:43 +01:00
Louis Dureuil
9cef800b2a
Enrich uses the new type
2023-11-09 14:22:05 +01:00
Louis Dureuil
db2fb86b8b
Extract PrimaryKey logic to a type
2023-11-09 14:19:16 +01:00
ManyTheFish
882ab9cc85
remove warnings
2023-11-09 11:35:33 +01:00
ManyTheFish
5a9c96e1db
Compute word integer prefix cache
2023-11-09 11:34:26 +01:00
ManyTheFish
70ce40828c
Compute word docids prefix cache
2023-11-08 17:01:00 +01:00
ManyTheFish
688266c83e
Remove word pair proximity prefix cache and compute it at search time
2023-11-08 14:16:01 +01:00
ManyTheFish
6dab826908
Reactivate prefix databases
2023-11-08 13:58:01 +01:00
ManyTheFish
1e2fbc6a42
revert "REVERT ME: ignore prefix pair databases tests"
...
This reverts commit 1b2ea6cf19309782a2e3b2ff2fe6d7708dd5de4f.
2023-11-08 11:50:52 +01:00
Many the fish
523519fdbf
Merge pull request #4195 from meilisearch/diff-indexing-remove-from-batch
...
Remove `IndexOperation::DocumentDeletion`
2023-11-08 10:29:49 +01:00
Louis Dureuil
ef6fa10f7a
Remove IndexOperation::DocumentDeletion
2023-11-06 12:16:15 +01:00
Louis Dureuil
620fee35f9
Fix benches
prototype-diff-indexing-0
2023-11-06 11:56:46 +01:00
Louis Dureuil
cbaa54cafd
Fix clippy issues
2023-11-06 11:19:31 +01:00
Louis Dureuil
1bccf2079e
Correctly mark non-tests as non-tests
2023-11-06 11:03:56 +01:00
ManyTheFish
1b2ea6cf19
REVERT ME: ignore prefix pair databases tests
2023-11-06 10:46:22 +01:00
Louis Dureuil
1ad1fcc8c8
Remove all warnings
2023-11-06 10:31:14 +01:00
ManyTheFish
87610a5f98
Don't try to delete a document that is not in the database
2023-11-02 16:49:03 +01:00
Many the fish
2544bc1416
Merge pull request #4160 from meilisearch/diff-indexing-vector-points
...
Diff Indexing for the vector points
2023-11-02 16:01:51 +01:00
Clément Renault
ff522c919d
Fix the vector extractions for the diff indexing
2023-11-02 15:58:08 +01:00
Many the fish
1c39459cf4
Merge pull request #4179 from meilisearch/diff-indexing-fix-nested-primary-key
...
Diff indexing fix nested primary key
2023-11-02 15:39:50 +01:00
ManyTheFish
bf0651f23c
Implement iter method on ExternalDocumentsIds
2023-11-02 15:38:00 +01:00
ManyTheFish
5b20e625f3
fix merge
2023-11-02 15:31:37 +01:00
ManyTheFish
bc51d6157a
Fix transform reindexing path
2023-11-02 15:26:20 +01:00
ManyTheFish
1b4ff991c0
update typed chunks
2023-11-02 15:26:20 +01:00
ManyTheFish
4b64c33aa2
update vector extractor
2023-11-02 15:26:20 +01:00
ManyTheFish
12323d610e
Change the original document sorter key from the internal docid to a concatenation of the internal and the external docid
2023-11-02 15:26:20 +01:00
Clément Renault
44e9033b3a
Merge pull request #4181 from meilisearch/diff-indexing-parallel-transform
...
Use rayon to sort entries in parallel
2023-11-02 15:16:10 +01:00
Clément Renault
4d864f0702
Always sort internal Sorter entries in parallel
2023-11-02 14:47:43 +01:00
Clément Renault
b10c060bf7
Cleanup TOML
2023-11-01 14:03:04 +01:00
Clément Renault
e507ef5932
Slow the logging down
2023-11-01 13:49:32 +01:00
Clément Renault
c71b1d33ae
Sort entries using rayon in the transform sorters
2023-11-01 11:07:16 +01:00
Clément Renault
0fc446c62f
Add more timing logs to the Transform
2023-11-01 11:07:16 +01:00
Louis Dureuil
0fb6acefc3
Add snapshots for facets
2023-10-31 17:11:08 +01:00
Louis Dureuil
b1d1355b69
remove tests on soft-deleted
2023-10-31 16:36:27 +01:00
Louis Dureuil
f19332466e
Extract field value as values instead of Option<Value>
2023-10-31 16:36:27 +01:00
Louis Dureuil
03ddb4f310
use deladd in facet update tests
2023-10-31 16:36:27 +01:00
Louis Dureuil
c855cc2721
Remove unused test
2023-10-31 16:36:27 +01:00
Louis Dureuil
da0503ef80
Fix document count
2023-10-31 16:36:27 +01:00