mirror of
https://github.com/meilisearch/MeiliSearch
synced 2024-10-30 01:38:49 +01:00
Describe the multi-threaded cache merging
This commit is contained in:
parent
3a76ccb6e1
commit
437940d053
@ -29,25 +29,47 @@ pub struct CboCachedSorter<'extractor> {
|
|||||||
|
|
||||||
// # How the Merge Algorithm works
|
// # How the Merge Algorithm works
|
||||||
//
|
//
|
||||||
// - Collect all hashmaps to the main thread
|
// Each extractor create #Threads caches and balances the entries
|
||||||
// - Iterator over all the hashmaps in the different threads
|
// based on the hash of the keys. To do that we can use the
|
||||||
// - Each thread must take care of its own keys (regarding a hash number)
|
// hashbrown::hash_map::RawEntryBuilderMut::from_key_hashed_nocheck.
|
||||||
// - Also read the spilled content which are inside
|
// This way we can compute the hash on our own, decide on the cache to
|
||||||
// - Each thread must populate a local hashmap with the entries
|
// target, and insert it into the right HashMap.
|
||||||
// - Every thread send the merged content to the main writing thread
|
|
||||||
//
|
//
|
||||||
// ## Next Step
|
// #Thread -> caches
|
||||||
|
// t1 -> [t1c1, t1c2, t1c3]
|
||||||
|
// t2 -> [t2c1, t2c2, t2c3]
|
||||||
|
// t3 -> [t3c1, t3c2, t3c3]
|
||||||
//
|
//
|
||||||
// - Define the size of the buckets in advance to make sure everything fits in memory.
|
// When the extractors are done filling the caches, we want to merge
|
||||||
// ```
|
// the content of all the caches. We do a transpose and each thread is
|
||||||
// let total_buckets = 32;
|
// assigned the associated cache. By doing that we know that every key
|
||||||
// (0..total_buckets).par_iter().for_each(|n| {
|
// is put in a known cache and will collide with keys in the other
|
||||||
// let hash = todo!();
|
// caches of the other threads.
|
||||||
// if hash % total_bucket == n {
|
//
|
||||||
// // take care of this key
|
// #Thread -> caches
|
||||||
// }
|
// t1 -> [t1c1, t2c1, t3c1]
|
||||||
// });
|
// t2 -> [t1c2, t2c2, t3c2]
|
||||||
// ```
|
// t3 -> [t1c3, t2c3, t3c3]
|
||||||
|
//
|
||||||
|
// When we encountered a miss in the other caches we must still try
|
||||||
|
// to find it in the spilled entries. This is the reason why we use
|
||||||
|
// a grenad sorter/reader so that we can seek "efficiently" for a key.
|
||||||
|
//
|
||||||
|
// ## Memory Control
|
||||||
|
//
|
||||||
|
// We can detect that there are no more memory available when the
|
||||||
|
// bump allocator reaches a threshold. When this is the case we
|
||||||
|
// freeze the cache. There is one bump allocator by thread and the
|
||||||
|
// memory must be well balanced as we manage one type of extraction
|
||||||
|
// at a time with well-balanced documents.
|
||||||
|
//
|
||||||
|
// It means that the unknown new keys added to the
|
||||||
|
// cache are directly spilled to disk: basically a key followed by a
|
||||||
|
// del/add bitmap. For the known keys we can keep modifying them in
|
||||||
|
// the materialized version in the cache: update the del/add bitmaps.
|
||||||
|
//
|
||||||
|
// For now we can use a grenad sorter for spilling even thought I think
|
||||||
|
// it's not the most efficient way (too many files open, sorting entries).
|
||||||
|
|
||||||
impl<'extractor> CboCachedSorter<'extractor> {
|
impl<'extractor> CboCachedSorter<'extractor> {
|
||||||
/// TODO may add the capacity
|
/// TODO may add the capacity
|
||||||
|
Loading…
Reference in New Issue
Block a user