5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops
In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:
- Optimize the prefix generation for word position docids (`@manythefish)`
- Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
Before on the tag meilisearch-v1.12.0-rc.3.
```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```
After sorting the whole `HashMap`s in a `Vec` on this branch.
```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
5122: Yield the BBQueue writing loop r=ManyTheFish a=Kerollmops
We prefer yielding to let the writing thread do its job instead of spin looping.
Co-authored-by: Kerollmops <clement@meilisearch.com>
5110: Increase margin on deletion of task r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5077
## What does this PR do?
- Increase the margin we keep to enqueue task deletion
The issue was that we had not enough space on the reserved memory to write both the batch and the deletion task we just enqueued.
We could fix it only for this test as it’s not an issue in production where we have 10GiB of margin, but I thought it wasn’t a bad idea either to increase our margin a bit since we’re effectively writing more to lmdb.
Co-authored-by: Tamo <tamo@meilisearch.com>
5094: Implement a bbqueue channel between the extractors and the writer r=dureuill a=Kerollmops
This PR switches from a bounded crossbeam channel only with allocated entries for the communication between the extractors and the writer to a [BBQueue](https://github.com/jamesmunns/bbqueue)-based system with a Single Producer Single Consumer kind of Circular/Ring Buffers channel.
- [x] Implement the BBQueue channel system...
- [x] with a crossbeam channel to wake up the receiver.
- [x] Manage the BBQueue allocated memory dynamically.
- [x] Support content that doesn't fit in the bbqueues.
Co-authored-by: Clément Renault <clement@meilisearch.com>
5109: Fix autobatch r=dureuill a=dureuill
Fixes most SDK tests and flaky failures
Changes:
- Make sure that the settings are not autobatched with document operations, as the new indexer no longer supports this operating mode
Co-authored-by: Louis Dureuil <louis@meilisearch.com>