Louis Dureuil
c77073efcc
Update::has_changed_for_fields
2024-12-05 15:50:12 +01:00
meili-bors[bot]
cac355bfa7
Merge #5124
...
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops
In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:
- Optimize the prefix generation for word position docids (`@manythefish)`
- Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
Before on the tag meilisearch-v1.12.0-rc.3.
```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```
After sorting the whole `HashMap`s in a `Vec` on this branch.
```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
Kerollmops
52843123d4
Clean up and remove the non-sorted merge_caches function
2024-12-05 10:03:05 +01:00
meili-bors[bot]
6298db5bea
Merge #5113
...
5113: Fix the Minimum BBQueue channel threshold r=Kerollmops a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 09:01:02 +00:00
Louis Dureuil
3a11e39c01
Force max_memory to a min of 100MiB
2024-12-04 17:53:30 +01:00
Louis Dureuil
5f896b1050
Fix geo when spilling
2024-12-04 17:51:12 +01:00
Kerollmops
2e32d0474c
Lexicographically sort all the map to merge
2024-12-04 17:05:11 +01:00
Kerollmops
cb99ac6f7e
Consume vec instead of draining
2024-12-04 17:00:22 +01:00
Kerollmops
be411435f5
Use the merge_caches_alt function in the docids merging
2024-12-04 16:37:29 +01:00
Kerollmops
29ef164530
Introduce a new semi ordered merge function
2024-12-04 16:33:35 +01:00
ManyTheFish
739c52a3cd
Replace HashSets by BTreeSets for the prefixes
2024-12-04 16:16:48 +01:00
Kerollmops
261d2ceb06
Yield the BBQueue writer instead of spin looping
2024-12-04 14:16:40 +01:00
Kerollmops
96831ed9bb
Send the WakeUp message if necessary in the reserve function
2024-12-04 11:03:01 +01:00
Kerollmops
0459b1a242
Change the reserve and grant function to accept a closure
2024-12-04 10:32:25 +01:00
Kerollmops
8ecb726683
Fix the minimun BBQueue channel threshold
2024-12-03 15:49:11 +01:00
Clément Renault
0ad2f57a92
Update bbqueue repo to point to the meilisearch org
2024-12-03 12:00:04 +01:00
Louis Dureuil
e905a72d73
remove mimalloc on Windows
2024-12-02 18:13:56 +01:00
Louis Dureuil
d040aff101
Stop allocating 1GiB for documents
2024-12-02 16:30:14 +01:00
Clément Renault
767259be7e
Prefer returning a abort indexation rather than throwing a panic
2024-12-02 11:53:42 +01:00
Clément Renault
e9f34fb4b1
Make the frame consumer pulling fair
2024-12-02 11:49:01 +01:00
Clément Renault
d5c07ef7b3
Manage key length conversion error correctly
2024-12-02 11:03:00 +01:00
Clément Renault
5e218f3f4d
Remove a sync_all (mark my words)
2024-12-02 11:03:00 +01:00
Clément Renault
bcab61ab1d
Do spurious wake ups on the receiver side
2024-12-02 11:03:00 +01:00
Clément Renault
263c5a348e
Move the spin looping for BBQueue frames into a dedicated function
2024-12-02 10:33:49 +01:00
Clément Renault
be7d2fbe63
Move the EntryHeader up in the file and document the safety related to the size
2024-12-02 10:19:11 +01:00
Clément Renault
f7f9a131e4
Improve copying bytes into aligned memory area
2024-12-02 10:15:58 +01:00
Clément Renault
5df5eb2db2
Clarify a method name
2024-12-02 10:10:48 +01:00
Clément Renault
30eb0e5b5b
Rename recv and read methods to recv_action and recv_frame
2024-12-02 10:08:01 +01:00
Clément Renault
5b860cb989
Fix english in the doc
2024-12-02 10:06:35 +01:00
Clément Renault
76d0623b11
Reduce the number of unwraps
2024-12-02 10:05:06 +01:00
Clément Renault
db4eaf4d2d
Rename serialize_into into serialize_into_writer
2024-12-02 10:03:27 +01:00
Clément Renault
13f21206a6
Call the serialize_into_writer method from the serialize_into one
2024-12-02 10:03:01 +01:00
Clément Renault
14ee7aa84c
Make sure the BBQueue is at least 50 MiB
2024-11-28 18:02:48 +01:00
Clément Renault
8a35cd1743
Adjust the BBQueue buffers to use 2% instead of 10%
2024-11-28 16:00:15 +01:00
Clément Renault
3c7ac093d3
Take the BBQueue capacity into account in the max memory
2024-11-28 15:43:14 +01:00
Clément Renault
b57dd5c58e
Remove the Vector variant and use the Vectors
2024-11-28 15:20:43 +01:00
Clément Renault
096a28656e
Fix a bug around deleting all the vectors of a doc
2024-11-28 15:15:06 +01:00
Clément Renault
cc4bd54669
Correctly construct the Embeddings struct
2024-11-28 13:53:25 +01:00
Clément Renault
58eab9a018
Send large payload through crossbeam
2024-11-28 12:01:06 +01:00
Clément Renault
5c488e20cc
Send the geo rtree through crossbeam channel
2024-11-27 18:03:45 +01:00
Clément Renault
da650f834e
Plug the NoPanicThreadPool in the tests and benchmarks
2024-11-27 17:04:49 +01:00
Clément Renault
e83534a430
Fix the indexer::index to correctly use the rayon::ThreadPool
2024-11-27 16:27:43 +01:00
Clément Renault
98d4a2909e
Fix the way we spawn the rayon threadpool
2024-11-27 16:05:44 +01:00
Clément Renault
a514ce472a
Make clippy happy
2024-11-27 14:59:04 +01:00
Clément Renault
cc63802115
Modify and return the IndexEmbeddings to write them later
2024-11-27 14:58:03 +01:00
Clément Renault
acec45ad7c
Send a WakeUp when writing data in the BBQueue buffers
2024-11-27 14:33:23 +01:00
Clément Renault
08d6413365
Fix result types
2024-11-27 14:32:42 +01:00
Clément Renault
70802eb7c7
Fix most issues with the lifetimes
2024-11-27 14:32:42 +01:00
Clément Renault
6ac5b3b136
Finish most of the channels types
2024-11-27 14:32:26 +01:00
Clément Renault
e1e76f39d0
Clean up dependencies
2024-11-27 14:30:34 +01:00