Merge #370

370: Change chunk size to 4MiB to fit more the end user usage r=ManyTheFish a=ManyTheFish We made several indexing tests using different sizes of datasets (5 datasets from 9MiB to 100MiB) on several typologies of VMs (`XS: 1GiB RAM, 1 VCPU`, `S: 2GiB RAM, 2 VCPU`, `M: 4GiB RAM, 3 VCPU`, `L: 8GiB RAM, 4 VCPU`). The result of these tests shows that the `4MiB` chunk size seems to be the best size compared to other chunk sizes (`2Mib`, `4MiB`, `8Mib`, `16Mib`, `32Mib`, `64Mib`, `128Mib`). below is the average time per chunk size: ![Capture d’écran 2021-09-27 à 14 27 50](https://user-images.githubusercontent.com/6482087/134909368-ef0bc45e-68d5-49d1-aaf9-91113b7c410f.png) <details> <summary>Detailled data</summary> <br> ![Capture d’écran 2021-09-27 à 14 39 48](https://user-images.githubusercontent.com/6482087/134909952-a36b1457-bbbd-4a6c-bbe5-519e4b926b5a.png) </br> </details> Co-authored-by: many <maxime@meilisearch.com>
2025-06-29 18:08:31 +02:00 · 2021-09-27 12:57:52 +00:00 · 2021-09-27 12:57:52 +00:00 · 4c09f6838f
commit 4c09f6838f
parent 0f8320bdc2 b188063869
1 changed files with 1 additions and 1 deletions
--- a/milli/src/update/index_documents/mod.rs
+++ b/milli/src/update/index_documents/mod.rs
@ -248,7 +248,7 @@ impl<'t, 'u, 'i, 'a> IndexDocuments<'t, 'u, 'i, 'a> {
            let chunk_iter = grenad_obkv_into_chunks(
                documents_file,
                params.clone(),
-                self.documents_chunk_size.unwrap_or(1024 * 1024 * 128), // 128MiB
+                self.documents_chunk_size.unwrap_or(1024 * 1024 * 4), // 4MiB
            );

            let result = chunk_iter.map(|chunk_iter| {