Clément Renault
531e3d7d6a
MultiOps trait for OR operations
2024-06-06 09:17:50 -04:00
Tamo
5d50850e12
always push the user defined vectors in arroy
2024-06-06 11:39:29 +02:00
meili-bors[bot]
fc584f1db3
Merge #4666
...
4666: Add a score threshold search parameter r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4609
## What does this PR do?
- See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0 ) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17 )
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-03 08:42:44 +00:00
Louis Dureuil
2b6db6541e
Changes after review
2024-06-03 10:30:00 +02:00
meili-bors[bot]
d6bd88ce4f
Merge #4667
...
4667: Frequency matching strategy r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes #3773
## What does this PR do?
- add test for matching strategy
- implement frequency matching strategy
See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00 ).
[Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f ).
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-05-30 14:53:31 +00:00
ManyTheFish
3f1a510069
Add tests and fix matching strategy
2024-05-30 12:02:42 +02:00
Louis Dureuil
4f03b0cf5b
Add ranking score threshold to similar
2024-05-30 11:20:50 +02:00
Louis Dureuil
aac1d769a7
Add ranking_score_threshold to milli
2024-05-29 14:17:09 +02:00
ManyTheFish
abdc4afcca
Implement Frequency matching strategy
2024-05-29 13:59:08 +02:00
Louis Dureuil
ca6cc4654b
Add similar route
2024-05-28 15:28:19 +02:00
Louis Dureuil
d35278320e
Add support functions for accessing arroy writers and readers
2024-05-28 15:27:43 +02:00
Louis Dureuil
02b3d82c60
filtered_universe accepts index and txn instead of SearchContext
2024-05-28 15:22:12 +02:00
Tamo
c78a2fa4f5
rename method and variable around the attributes to search on feature
2024-05-15 18:04:42 +02:00
Tamo
5542f1d9f1
get back to what we were doingb efore in the DB cache and with the restricted field id
2024-05-15 18:00:39 +02:00
Tamo
7ec4e2a3fb
apply all style review comments
2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d
make clippy happy
2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a
make the attribute ranking rule use the weights and fix the tests
2024-05-14 17:36:32 +02:00
Tamo
685f452fb2
Fix the indexing of the searchable
2024-05-14 17:00:02 +02:00
Tamo
c22460045c
Stops returning an option in the internal searchable fields
2024-05-14 17:00:02 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
...
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
writegr
ab43a8a949
chore: fix some typos in comments
...
Signed-off-by: writegr <wellweek@outlook.com>
2024-04-18 14:12:52 +08:00
Clément Renault
c923adf222
Fix facet distribution for alpha on facet numbers
2024-04-17 16:31:16 +02:00
yudrywet
cf864a1c2e
chore: fix some typos in comments
...
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-14 20:11:34 +08:00
Louis Dureuil
1ff2a2d6fb
Add semanticHitCount
2024-04-04 16:04:06 +02:00
Louis Dureuil
6ebb6b55a6
Lazily embed, don't fail hybrid search on embedding failure
2024-04-04 15:58:17 +02:00
Louis Dureuil
928e6e4c05
Breaking change: remove vector for score details
2024-04-04 15:57:29 +02:00
Clément Renault
877f4b1045
Support negative phrases
2024-03-28 15:51:43 +01:00
Clément Renault
69f8b2730d
Fix the tests
2024-03-28 10:47:04 +01:00
Clément Renault
34262c7a0d
Add analytics for the negative operator
2024-03-26 18:01:27 +01:00
Clément Renault
1da9e0f246
Better support space around the negative operator (-)
2024-03-26 17:47:13 +01:00
Clément Renault
e4a3e603b3
Expose a first working version of the negative keyword
2024-03-26 17:47:13 +01:00
Tamo
6079141ea6
snapshot the scores side by side with the score details
2024-03-19 18:30:14 +01:00
Tamo
2c3af8e513
query the detailed score detail in the test
2024-03-19 18:09:02 +01:00
Louis Dureuil
098ab594eb
A score of 0.0 is now lesser than a sort result
...
handles the niche case 🐩 in the hybrid search where:
1. a sort ranking rule is the first rule.
2. the keyword search is skipped at the first rule.
3. the semantic search is not skipped at the first rule.
Previously, we would have the skipped search winning, whereas we want the non skipped one winning.
2024-03-19 17:32:32 +01:00
Tamo
7b9e0d2944
forward the degraded parameter to the hybrid search
2024-03-19 15:11:21 +01:00
Tamo
bfec9468d4
Update milli/src/search/mod.rs
...
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-03-19 14:49:15 +01:00
Tamo
b8cda6c300
fix the search cutoff and add a test
2024-03-19 10:35:47 +01:00
Tamo
4a467739cd
implements a first version of the cutoff without settings
2024-03-19 10:28:21 +01:00
shuangcui
5c95b5c933
chore: remove repetitive words
...
Signed-off-by: shuangcui <fliter@qq.com>
2024-03-14 21:28:55 +08:00
meili-bors[bot]
abd954755d
Merge #4476
...
4476: Make the `/facet-search` route use the `sortFacetValuesBy` setting r=irevoire a=Kerollmops
This PR fixes #4423 by ensuring that the `/facet-search` route uses the `sortFacetValuesBy` setting.
Note for the documentation team (to be moved in the tracking issue): Using the new `sortFacetValuesBy` setting can slow down the facet-search requests as Meilisearch iterates over the whole list of facet values and computes the count of documents on every entry. That is hardly or even impossible to optimize correctly.
### TODO
- [x] Create a custom HashMap wrapper for the facet `OrderBy` settings.
This wrapper will return the `OrderBy` setting of the facet, if not defined will use the default `*` one, and if not there either (strange) will fall back on the lexicographic one.
- [x] Create a `ValuesCollection` wrapper that implements the logic for the lexicographic and count order by.
- [x] Use it when there is no search query.
- [x] Use it when there is a search query with and without allowed typos.
- [x] Do not change the original logic, only use a wrapper.
- [x] Add tests
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-03-13 14:36:14 +00:00
Clément Renault
f3fc2bd01f
Address some issues with preallocations
2024-03-13 15:22:14 +01:00
Clément Renault
e0dac5a22f
Simplify the algorithm by using the new facet values collection wrapper
2024-03-13 11:31:34 +01:00
Clément Renault
b918b55c6b
Introduce a new facet value collection wrapper to simply the usage
2024-03-13 11:31:34 +01:00
Clément Renault
306b25ad3a
Move the searchForFacetValues struct into a dedicated module
2024-03-13 10:24:21 +01:00
Clément Renault
9f7a4fbfeb
Return the facets of a placeholder facet-search sorted by count
2024-03-13 10:09:01 +01:00
Clément Renault
69c118ef76
Extract the facet order before extracting the facets values
2024-03-12 10:35:39 +01:00
Louis Dureuil
25f64ce7df
Replace logging timer by spans
2024-03-05 11:05:42 +01:00
Louis Dureuil
452a343a2b
Fix imports
2024-02-28 18:09:40 +01:00
Tamo
e773dfa9ba
get rids of log in milli and add logs for the bucket sort
2024-02-08 15:04:05 +01:00
Louis Dureuil
dff2707471
Use MatchingWords from keyword search instead of the one from vector search
2024-02-01 10:33:27 +01:00
Clément Renault
9f9ad4cc05
Fix Clippy warnings
2024-01-16 15:27:24 +01:00
meili-bors[bot]
e93d36d5b9
Merge #4313
...
4313: Fix document formatting performances r=Kerollmops a=ManyTheFish
reduce the formatted option list to the attributes that should be formatted,
instead of all the attributes to display.
The time to compute the `format` list scales with the number of fields to format;
cumulated with `map_leaf_values` that iterates over all the nested fields, it gives a quadratic complexity:
`d*f` where `d` is the total number of fields to display and `f` is the total number of fields to format.
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-01-11 14:19:44 +00:00
ManyTheFish
5f5a486895
Reduce formatting time
2024-01-11 11:36:41 +01:00
ManyTheFish
5f4fc6c955
Add timer logs
2024-01-11 09:44:16 +01:00
Clément Renault
3f3462ab62
Limit the number of values returned by the facet search
2024-01-10 16:54:08 +01:00
Many the fish
9e1b458010
Merge branch 'main' into change-proximity-precision-settings
2023-12-18 09:08:47 +01:00
ManyTheFish
6425996e36
Change the naming of attributeScale and wordScale into byAttribute and byWord
2023-12-14 16:31:00 +01:00
Louis Dureuil
87bba98bd8
Various changes
...
- fixed seed for arroy
- check vector dimensions as soon as it is provided to search
- don't embed whitespace
2023-12-14 16:08:42 +01:00
Louis Dureuil
217105b7da
hybrid search uses semantic ratio, error handling
2023-12-14 16:08:42 +01:00
ManyTheFish
9991152bbe
Add TODOs
2023-12-14 16:08:42 +01:00
Louis Dureuil
806e5b6899
Tests pass
2023-12-14 16:08:41 +01:00
Louis Dureuil
e0cc775dc4
Various changes
...
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
Louis Dureuil
922a640188
WIP multi embedders
...
fixed template bugs
2023-12-14 16:08:41 +01:00
Louis Dureuil
d4715e0c4d
Fix same vector sort bug
2023-12-14 16:08:41 +01:00
Louis Dureuil
11e2a2c1aa
Fix geosort bug
2023-12-14 16:08:41 +01:00
Louis Dureuil
65e49b7092
Remove stuff, add distribution shift (WIP)
2023-12-14 16:08:38 +01:00
Louis Dureuil
cb4ebe163e
WIP
2023-12-14 16:07:49 +01:00
Louis Dureuil
dde3a04679
WIP arroy integration
2023-12-14 16:07:49 +01:00
Louis Dureuil
13c2c6c16b
Small commit to add hybrid search and autoembedding
2023-12-14 16:07:48 +01:00
Clément Renault
56571f762a
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1
2023-12-13 11:57:01 +01:00
ManyTheFish
467b49153d
Implement proximityPrecision setting on milli side
2023-12-06 15:49:02 +01:00
ManyTheFish
bddc168d83
List TODOs
2023-12-06 14:59:23 +01:00
ManyTheFish
3b3fa38f27
Put the restrict list in a sub-struct
2023-11-28 18:37:57 +01:00
ManyTheFish
d6c2ee15a9
Filter on attributes before computing the docids when attribute restriction is on
2023-11-28 14:55:29 +01:00
Clément Renault
d32eb11329
Move to the v0.20.0-alpha.9 of heed
2023-11-27 11:52:22 +01:00
Clément Renault
58dac8af42
Remove the panics and unwraps
2023-11-23 15:00:48 +01:00
Clément Renault
0dbf1a16ff
Make clippy happy
2023-11-23 14:11:38 +01:00
Clément Renault
0d4482625a
Make the changes to use heed v0.20-alpha.6
2023-11-23 11:43:58 +01:00
Clément Renault
7cb7e37ba8
Merge branch 'main' into tmp-release-v1.5.0
2023-11-21 16:30:46 +01:00
ManyTheFish
1f36410541
Update tests
2023-11-13 13:36:39 +01:00
Louis Dureuil
8c649d8061
Throw error when the vector search is sent with the wrong size
2023-11-13 09:57:42 +01:00
ManyTheFish
688266c83e
Remove word pair proximity prefix cache and compute it at search time
2023-11-08 14:16:01 +01:00
Louis Dureuil
1bccf2079e
Correctly mark non-tests as non-tests
2023-11-06 11:03:56 +01:00
ManyTheFish
94206b0055
Update tests
2023-10-31 13:48:47 +01:00
Louis Dureuil
113527f466
Remove soft-deleted related methods from Index
2023-10-30 11:41:22 +01:00
ManyTheFish
1c5705c164
clean PR warnings
2023-10-30 11:22:05 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use
2023-10-30 11:15:02 +01:00
ManyTheFish
17b647dfe5
Wip
2023-10-30 11:13:08 +01:00
Tamo
e7244aa485
fix warnings
2023-10-30 11:00:46 +01:00
Louis Dureuil
2bae9550c8
Add explanatory comment
2023-10-23 12:06:28 +02:00
Vivek Kumar
5fe7c4545a
compute all candidates correctly when skipping
2023-10-23 12:02:45 +02:00
meili-bors[bot]
5e0485d8dd
Merge #4131
...
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23 )
### Impact on search relevancy
<details>
| dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% |
| FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% |
| FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% |
| FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% |
| FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% |
| FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% |
| FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% |
| FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% |
| FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% |
| FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% |
| FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% |
| FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% |
| LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% |
| LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% |
| LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% |
| LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% |
| LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% |
| LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% |
</details>
### Impact on Search time
| dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 |
| FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 |
| LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-18 14:56:08 +00:00
ManyTheFish
27eec21415
Fix tests
2023-10-18 16:03:22 +02:00
Vivek Kumar
d4da06ff47
fix bug where distinct search with no ranking returns offset+limit hits
2023-10-11 19:02:16 +05:30
ManyTheFish
43989fe2e4
Reduce porximity range from 7 to 3
2023-10-03 12:16:48 +02:00
Vivek Kumar
abfa7ded25
use a new temp index in the test
2023-09-08 12:32:47 +05:30
Vivek Kumar
f2837aaec2
add another test case
2023-09-08 11:39:54 +05:30
Vivek Kumar
11df155598
fix highlighting bug when searching for a phrase with cropping
2023-09-08 11:39:52 +05:30
meili-bors[bot]
ccf3ba3f32
Merge #4019
...
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Clément Renault
8c0ebd1331
Update milli/src/search/new/bucket_sort.rs
...
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-08-23 16:40:39 +02:00