mirror of
https://github.com/meilisearch/MeiliSearch
synced 2024-12-05 02:55:46 +01:00
5e0485d8dd
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish ## Summary This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database. ## Stats ### Impact on database size and indexing time ![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23) ### Impact on search relevancy <details> | dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% | |--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------| | FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% | | FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% | | FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% | | FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% | | FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% | | FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% | | FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% | | FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% | | FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% | | FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% | | FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% | | FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% | | FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% | | FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% | | FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% | | FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% | | FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% | | FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% | | FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% | | FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% | | FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% | | FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% | | LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% | | LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% | | LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% | | LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% | | LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% | | LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% | | LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% | | LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% | </details> ### Impact on Search time | dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average | |--------------|------------------|------------:|------------:|------------:|------------:|-------------| | FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 | | FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 | | FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 | | FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 | | FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 | | FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 | | LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 | | LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 | ## Technical approach - Ensure the MAX_DISTANCE constant is used everywhere needed - Reduce the MAX_DISTANCE from 8 to 4 ## Related TBD Co-authored-by: ManyTheFish <many@meilisearch.com> |
||
---|---|---|
.. | ||
examples | ||
src | ||
tests | ||
Cargo.toml | ||
README.md |
a concurrent indexer combined with fast and relevant search algorithms
Introduction
This crate contains the internal engine used by Meilisearch.
It contains a library that can manage one and only one index. Meilisearch manages the multi-index itself. Milli is unable to store updates in a store: it is the job of something else above and this is why it is only able to process one update at a time.