The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.
Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
588: Fix name of "release_date" facet in movies benchmarks r=ManyTheFish a=loiclec
## What does this PR do?
The `movies.json` file in the benchmark datasets contains a filterable field called "release_date", but the indexing benchmarks wrongly called the field "released_date" instead. This PR fixes that.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
583: Use BufReader to read datasets in benchmarks r=ManyTheFish a=loiclec
## What does this PR do?
Ensure that the datasets used by the benchmarks are read efficiently by using a `BufReader`.
## Why?
Using a `BufReader` is more representative of how `meilisearch` works. It will also make performance comparisons between different branches of `milli` more accurate.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
572: Add reindexing benchmarks r=Kerollmops a=irevoire
With #557 coming, we should add benchmarks that measure our impact on the reindexing process.
Co-authored-by: Tamo <tamo@meilisearch.com>
557: Fasten documents deletion and update r=Kerollmops a=irevoire
When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search and all the other endpoints.
I ran the benchmarks against main;
```
% ./compare.sh indexing_main_83ad1aaf.json indexing_fasten-document-deletion_abab51fb.json
group indexing_fasten-document-deletion_abab51fb indexing_main_83ad1aaf
----- ------------------------------------------ ----------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.05 2.0±0.40ms ? ?/sec 1.00 1904.9±190.00µs ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.00 10.3±2.64ms ? ?/sec 961.61 9.9±0.12s ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 15.1±3.90ms ? ?/sec 554.63 8.4±0.12s ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.00 45.1±7.53ms ? ?/sec 710.15 32.0±0.10s ? ?/sec
indexing/-wiki-delete-searchable- 1.00 277.8±7.97ms ? ?/sec 1946.57 540.8±3.15s ? ?/sec
indexing/Indexing geo_point 1.00 12.0±0.20s ? ?/sec 1.03 12.4±0.19s ? ?/sec
indexing/Indexing movies in three batches 1.00 19.3±0.30s ? ?/sec 1.01 19.4±0.16s ? ?/sec
indexing/Indexing movies with default settings 1.00 18.8±0.09s ? ?/sec 1.00 18.9±0.10s ? ?/sec
indexing/Indexing nested movies with default settings 1.00 25.9±0.19s ? ?/sec 1.00 25.9±0.12s ? ?/sec
indexing/Indexing nested movies without any facets 1.00 24.8±0.17s ? ?/sec 1.00 24.8±0.18s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.00 65.9±0.96s ? ?/sec 1.03 67.8±0.82s ? ?/sec
indexing/Indexing songs with default settings 1.00 58.8±1.11s ? ?/sec 1.02 59.9±2.09s ? ?/sec
indexing/Indexing songs without any facets 1.00 53.4±0.72s ? ?/sec 1.01 54.2±0.88s ? ?/sec
indexing/Indexing songs without faceted numbers 1.00 57.9±1.17s ? ?/sec 1.01 58.3±1.20s ? ?/sec
indexing/Indexing wiki 1.00 1065.2±13.26s ? ?/sec 1.00 1065.8±12.66s ? ?/sec
indexing/Indexing wiki in three batches 1.00 1182.4±6.20s ? ?/sec 1.01 1190.8±8.48s ? ?/sec
```
Most things do not change, we lost 0.1ms on the indexing of geo point (I don’t get why), and then we are between 500 and 1900 times faster when we delete documents.
Co-authored-by: Tamo <tamo@meilisearch.com>
577: Fix deserialisation of NDJson documents in benchmarks r=irevoire a=loiclec
Previously, the first document in the NDJson file was read over and over again. So the `geo_point` benchmark was not working properly: it only indexed one document.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
568: Fix not equal filter when field contains both number and strings r=Kerollmops a=GraDKh
Related to https://github.com/meilisearch/meilisearch/issues/2516
Looks like the issue should be moved to this repo, but I'm not sure what the right procedure for it.
Co-authored-by: Dmytro Gordon <dmytro@bigstream.co>
566: Introduce the copy_to_path method on the Index r=irevoire a=Kerollmops
Meilisearch needs this method to do snapshots.
Co-authored-by: Kerollmops <clement@meilisearch.com>