yudrywet
cf864a1c2e
chore: fix some typos in comments
...
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-14 20:11:34 +08:00
Louis Dureuil
6ebb6b55a6
Lazily embed, don't fail hybrid search on embedding failure
2024-04-04 15:58:17 +02:00
Louis Dureuil
928e6e4c05
Breaking change: remove vector for score details
2024-04-04 15:57:29 +02:00
Clément Renault
877f4b1045
Support negative phrases
2024-03-28 15:51:43 +01:00
Clément Renault
69f8b2730d
Fix the tests
2024-03-28 10:47:04 +01:00
Clément Renault
34262c7a0d
Add analytics for the negative operator
2024-03-26 18:01:27 +01:00
Clément Renault
1da9e0f246
Better support space around the negative operator (-)
2024-03-26 17:47:13 +01:00
Clément Renault
e4a3e603b3
Expose a first working version of the negative keyword
2024-03-26 17:47:13 +01:00
Tamo
6079141ea6
snapshot the scores side by side with the score details
2024-03-19 18:30:14 +01:00
Tamo
2c3af8e513
query the detailed score detail in the test
2024-03-19 18:09:02 +01:00
Tamo
b8cda6c300
fix the search cutoff and add a test
2024-03-19 10:35:47 +01:00
Tamo
4a467739cd
implements a first version of the cutoff without settings
2024-03-19 10:28:21 +01:00
shuangcui
5c95b5c933
chore: remove repetitive words
...
Signed-off-by: shuangcui <fliter@qq.com>
2024-03-14 21:28:55 +08:00
Louis Dureuil
25f64ce7df
Replace logging timer by spans
2024-03-05 11:05:42 +01:00
Louis Dureuil
452a343a2b
Fix imports
2024-02-28 18:09:40 +01:00
Tamo
e773dfa9ba
get rids of log in milli and add logs for the bucket sort
2024-02-08 15:04:05 +01:00
ManyTheFish
5f5a486895
Reduce formatting time
2024-01-11 11:36:41 +01:00
ManyTheFish
5f4fc6c955
Add timer logs
2024-01-11 09:44:16 +01:00
Many the fish
9e1b458010
Merge branch 'main' into change-proximity-precision-settings
2023-12-18 09:08:47 +01:00
ManyTheFish
6425996e36
Change the naming of attributeScale and wordScale into byAttribute and byWord
2023-12-14 16:31:00 +01:00
Louis Dureuil
806e5b6899
Tests pass
2023-12-14 16:08:41 +01:00
Louis Dureuil
e0cc775dc4
Various changes
...
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
Louis Dureuil
922a640188
WIP multi embedders
...
fixed template bugs
2023-12-14 16:08:41 +01:00
Louis Dureuil
d4715e0c4d
Fix same vector sort bug
2023-12-14 16:08:41 +01:00
Louis Dureuil
11e2a2c1aa
Fix geosort bug
2023-12-14 16:08:41 +01:00
Louis Dureuil
65e49b7092
Remove stuff, add distribution shift (WIP)
2023-12-14 16:08:38 +01:00
Louis Dureuil
cb4ebe163e
WIP
2023-12-14 16:07:49 +01:00
Louis Dureuil
dde3a04679
WIP arroy integration
2023-12-14 16:07:49 +01:00
Louis Dureuil
13c2c6c16b
Small commit to add hybrid search and autoembedding
2023-12-14 16:07:48 +01:00
Clément Renault
56571f762a
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1
2023-12-13 11:57:01 +01:00
ManyTheFish
467b49153d
Implement proximityPrecision setting on milli side
2023-12-06 15:49:02 +01:00
ManyTheFish
bddc168d83
List TODOs
2023-12-06 14:59:23 +01:00
ManyTheFish
3b3fa38f27
Put the restrict list in a sub-struct
2023-11-28 18:37:57 +01:00
ManyTheFish
d6c2ee15a9
Filter on attributes before computing the docids when attribute restriction is on
2023-11-28 14:55:29 +01:00
Clément Renault
d32eb11329
Move to the v0.20.0-alpha.9 of heed
2023-11-27 11:52:22 +01:00
Clément Renault
58dac8af42
Remove the panics and unwraps
2023-11-23 15:00:48 +01:00
Clément Renault
0dbf1a16ff
Make clippy happy
2023-11-23 14:11:38 +01:00
Clément Renault
0d4482625a
Make the changes to use heed v0.20-alpha.6
2023-11-23 11:43:58 +01:00
Clément Renault
7cb7e37ba8
Merge branch 'main' into tmp-release-v1.5.0
2023-11-21 16:30:46 +01:00
ManyTheFish
1f36410541
Update tests
2023-11-13 13:36:39 +01:00
Louis Dureuil
8c649d8061
Throw error when the vector search is sent with the wrong size
2023-11-13 09:57:42 +01:00
ManyTheFish
688266c83e
Remove word pair proximity prefix cache and compute it at search time
2023-11-08 14:16:01 +01:00
ManyTheFish
94206b0055
Update tests
2023-10-31 13:48:47 +01:00
ManyTheFish
1c5705c164
clean PR warnings
2023-10-30 11:22:05 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use
2023-10-30 11:15:02 +01:00
ManyTheFish
17b647dfe5
Wip
2023-10-30 11:13:08 +01:00
Tamo
e7244aa485
fix warnings
2023-10-30 11:00:46 +01:00
Louis Dureuil
2bae9550c8
Add explanatory comment
2023-10-23 12:06:28 +02:00
Vivek Kumar
5fe7c4545a
compute all candidates correctly when skipping
2023-10-23 12:02:45 +02:00
meili-bors[bot]
5e0485d8dd
Merge #4131
...
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23 )
### Impact on search relevancy
<details>
| dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% |
| FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% |
| FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% |
| FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% |
| FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% |
| FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% |
| FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% |
| FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% |
| FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% |
| FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% |
| FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% |
| FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% |
| LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% |
| LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% |
| LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% |
| LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% |
| LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% |
| LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% |
</details>
### Impact on Search time
| dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 |
| FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 |
| LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-18 14:56:08 +00:00