meili-bors[bot]
|
5e0485d8dd
|
Merge #4131
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23)
### Impact on search relevancy
<details>
| dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% |
| FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% |
| FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% |
| FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% |
| FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% |
| FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% |
| FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% |
| FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% |
| FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% |
| FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% |
| FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% |
| FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% |
| LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% |
| LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% |
| LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% |
| LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% |
| LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% |
| LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% |
</details>
### Impact on Search time
| dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 |
| FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 |
| LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com>
|
2023-10-18 14:56:08 +00:00 |
|
ManyTheFish
|
27eec21415
|
Fix tests
|
2023-10-18 16:03:22 +02:00 |
|
Tamo
|
c0f2724c2d
|
get rids of the new introduced error code in favor of an io::Error
|
2023-10-10 15:12:23 +02:00 |
|
Tamo
|
d772073dfa
|
use a bufreader everytime there is a grenad<file>
|
2023-10-10 15:00:30 +02:00 |
|
Kerollmops
|
eef95de30e
|
First iteration on exposing puffin profiling
|
2023-07-18 17:38:13 +02:00 |
|
Loïc Lecrenier
|
777b387dc4
|
Avoid a prefix-related worst-case scenario in the proximity criterion
|
2022-12-22 12:08:00 +01:00 |
|
Louis Dureuil
|
ad9937c755
|
Fix tests after adding DeletionStrategy
|
2022-12-19 10:07:17 +01:00 |
|
Loïc Lecrenier
|
cda4ba2bb6
|
Add document import tests
|
2022-12-05 12:02:49 +01:00 |
|
Loïc Lecrenier
|
f2cf981641
|
Add more tests and allow disabling of soft-deletion outside of tests
Also allow disabling soft-deletion in the IndexDocumentsConfig
|
2022-12-05 10:51:01 +01:00 |
|
Loïc Lecrenier
|
777eb3fa00
|
Add insta-snaps for test of bug 3043
|
2022-11-17 12:21:27 +01:00 |
|
Loïc Lecrenier
|
f00108d2ec
|
Fix name of bug in reproduction test
|
2022-11-17 11:29:18 +01:00 |
|
Loïc Lecrenier
|
f7c8730d09
|
Fix bug in prefix DB indexing
Where the batch's information was not properly updated in cases
where only the proximity changed between two consecutive word pair
proximities.
Closes https://github.com/meilisearch/meilisearch/issues/3043
|
2022-11-17 11:29:18 +01:00 |
|
unvalley
|
c7322f704c
|
Fix cargo clippy errors
Dont apply clippy for tests for now
Fix clippy warnings of filter-parser package
parent 8352febd646ec4bcf56a44161e5c4dce0e55111f
author unvalley <38400669+unvalley@users.noreply.github.com> 1666325847 +0900
committer unvalley <kirohi.code@gmail.com> 1666791316 +0900
Update .github/workflows/rust.yml
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
Allow clippy lint too_many_argments
Allow clippy lint needless_collect
Allow clippy lint too_many_arguments and type_complexity
Fix for clippy warnings comparison_chains
Fix for clippy warnings vec_init_then_push
Allow clippy lint should_implement_trait
Allow clippy lint drop_non_drop
Fix lifetime clipy warnings in filter-paprser
Execute cargo fmt
Fix clippy remaining warnings
Fix clippy remaining warnings again and allow lint on each place
|
2022-10-27 01:04:23 +09:00 |
|
unvalley
|
811f156031
|
Execute cargo clippy --fix
|
2022-10-27 01:00:00 +09:00 |
|
Ewan Higgs
|
2ce025a906
|
Fixes after rebase to fix new issues.
|
2022-10-25 20:58:31 +02:00 |
|
Loïc Lecrenier
|
9a569d73d1
|
Minor code style change
|
2022-10-24 15:30:43 +02:00 |
|
Loïc Lecrenier
|
a983129613
|
Apply suggestions from code review
|
2022-10-20 09:49:37 +02:00 |
|
Loïc Lecrenier
|
ab2f6f3aa4
|
Refine some details in word_prefix_pair_proximity indexing code
|
2022-10-18 10:37:34 +02:00 |
|
Loïc Lecrenier
|
178d00f93a
|
Cargo fmt
|
2022-10-18 10:37:34 +02:00 |
|
Loïc Lecrenier
|
072b576514
|
Fix proximity value in keys of prefix_word_pair_proximity_docids
|
2022-10-18 10:37:34 +02:00 |
|
Loïc Lecrenier
|
6c3a5d69e1
|
Update snapshots
|
2022-10-18 10:37:34 +02:00 |
|
Loïc Lecrenier
|
264a04922d
|
Add prefix_word_pair_proximity database
Similar to the word_prefix_pair_proximity one but instead the keys are:
(proximity, prefix, word2)
|
2022-10-18 10:37:34 +02:00 |
|