MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-07-04 20:37:15 +02:00

History

meili-bors[bot] ee62d9ce30 Merge #4845 4845: Fix perf regression facet strings r=ManyTheFish a=dureuill Benchmarks between v1.9 and v1.10 show a performance regression of about x2 (+3dB regression) for most indexing workloads (+44s for hackernews). [Benchmark interpretation in the engine weekly meeting](https://www.notion.so/meilisearch/Engine-weekly-4d49560d374c4a87b4e3d126a261d4a0?pvs=4#98a709683276450295fcfe1f8ea5cef3). - Initial investigation pointed to #4819 as the origin of the regression. - Further investigation points towards the hypernormalization of each facet value in `extract_facet_string_docids` - Most of the slowdown is in `normalize_facet_strings`, and precisely in `detection.language()`. This PR improves the situation (-10s compared with `main` for hackernews, so only +34s regression compared with `v1.9`) by skipping normalization when it can be skipped. I'm not sure how to fix the root cause though. Should we skip facet locale normalization for now? Cc `@ManyTheFish` --- Tentative resolution options: 1. remove locale normalization from facet. I'm not sure why this is required, I believe we weren't doing this before, so maybe we can stop doing that again. 2. don't do language detection when it can be helped: won't help with the regressions in benchmark, but maybe we can skip language detection when the locales contain only one language? 3. use a faster language detection library: `@Kerollmops` told me about https://github.com/quickwit-oss/whichlang which bolsters x10 to x100 throughput compared with whatlang. Should we consider replacing whatlang with whichlang? Now I understand whichlang supports fewer languages than whatlang, so I also suggest: 4. use whichlang when the list of locales is empty (autodetection), or when it only contains locales that whichlang can detect. If the list of locales contains locales that whichlang cannot detect, then use whatlang instead. --- > [!CAUTION] > this PR contains a commit that adds detailed spans, that were used to detect which part of `extract_facet_string_docids` was taking too much time. As this commit adds spans that are called too often and adds 7s overhead, it should be removed before landing. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>		2024-08-19 06:29:48 +00:00
..
documents	Make milli use edition 2021 (#4770 )	2024-07-09 17:25:39 +02:00
facet	Make milli use edition 2021 (#4770 )	2024-07-09 17:25:39 +02:00
heed_codec	Implement localized attributes settings	2024-07-25 10:51:27 +02:00
prompt	Remove prompt strategy and fallback	2023-12-14 16:08:41 +01:00
search	Merge #4845	2024-08-19 06:29:48 +00:00
snapshots/index.rs	always push the user defined vectors in arroy	2024-06-06 11:39:29 +02:00
update	Merge #4845	2024-08-19 06:29:48 +00:00
vector	Specialize authorized error message depending on config source	2024-07-31 15:03:44 +02:00
asc_desc.rs	fmt	2023-03-30 23:37:26 +02:00
criterion.rs	update the syntax of the geoboundingbox filter to uses brackets instead of parens around lat and lng	2023-02-06 16:50:27 +01:00
error.rs	Improve errors when indexing documents with a user provided embedder	2024-07-16 13:39:01 +02:00
external_documents_ids.rs	Make milli use edition 2021 (#4770 )	2024-07-09 17:25:39 +02:00
fieldids_weights_map.rs	makes clippy and fmt happy	2024-06-06 11:39:29 +02:00
fields_ids_map.rs	provide a method to get all the nested fields ids from a name	2024-06-06 11:36:11 +02:00
index.rs	Use wrapper that forces the desired date format	2024-07-31 17:12:19 +02:00
lib.rs	fix clippy	2024-07-25 10:52:56 +02:00
localized_attributes_rules.rs	Fix PR comments	2024-07-25 10:52:56 +02:00
order_by_map.rs	Revert "Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1""	2024-03-20 10:08:28 +01:00
proximity.rs	Change the naming of attributeScale and wordScale into byAttribute and byWord	2023-12-14 16:31:00 +01:00
score_details.rs	Do not fail sort comparisons when the field name or target point are different	2024-07-11 16:28:14 +02:00
snapshot_tests.rs	Make milli use edition 2021 (#4770 )	2024-07-09 17:25:39 +02:00
thread_pool_no_abort.rs	Introduce the ThreadPoolNoAbort wrapper	2024-04-24 16:40:12 +02:00