mirror of
https://github.com/meilisearch/MeiliSearch
synced 2024-12-29 07:51:39 +01:00
5e0485d8dd
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish ## Summary This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database. ## Stats ### Impact on database size and indexing time ![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23) ### Impact on search relevancy <details> | dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% | |--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------| | FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% | | FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% | | FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% | | FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% | | FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% | | FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% | | FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% | | FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% | | FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% | | FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% | | FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% | | FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% | | FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% | | FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% | | FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% | | FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% | | FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% | | FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% | | FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% | | FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% | | FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% | | FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% | | FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% | | LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% | | LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% | | LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% | | LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% | | LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% | | LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% | | LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% | | LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% | | LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% | </details> ### Impact on Search time | dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average | |--------------|------------------|------------:|------------:|------------:|------------:|-------------| | FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 | | FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 | | FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 | | FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 | | FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 | | FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 | | LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 | | LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 | ## Technical approach - Ensure the MAX_DISTANCE constant is used everywhere needed - Reduce the MAX_DISTANCE from 8 to 4 ## Related TBD Co-authored-by: ManyTheFish <many@meilisearch.com>