Louis Dureuil
a61b852695
Add tests
2024-07-15 08:43:21 +02:00
meili-bors[bot]
e9bf4c43a4
Merge #4649
...
4649: Don't store the vectors in the documents database r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607
## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
Louis Dureuil
3f212a8202
Update tests
2024-06-12 18:13:34 +02:00
Tamo
600e97d9dc
gate the retrieveVectors parameter behind the vectors feature flag
2024-06-10 18:26:12 +02:00
ManyTheFish
57d066595b
fix Tests almost all features
2024-06-06 17:24:50 +02:00
Tamo
31a793d226
fix the regeneration of the embeddings in the search
2024-06-06 11:39:29 +02:00
Tamo
6b29676e7e
update snapshots
2024-06-06 11:39:29 +02:00
Tamo
5d50850e12
always push the user defined vectors in arroy
2024-06-06 11:39:29 +02:00
meili-bors[bot]
fc584f1db3
Merge #4666
...
4666: Add a score threshold search parameter r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4609
## What does this PR do?
- See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0 ) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17 )
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-03 08:42:44 +00:00
ManyTheFish
3f1a510069
Add tests and fix matching strategy
2024-05-30 12:02:42 +02:00
Louis Dureuil
41976b82b1
Tests for ranking_score_threshold
2024-05-30 11:22:26 +02:00
meili-bors[bot]
19acc65ad2
Merge #4646
...
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops
This PR implements what is described in #4485 . It reduces the number of disk writes and disk usage.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
Clément Renault
fe17c0f52e
Construct the minimal OBKVs according to the settings diff
2024-05-23 11:23:57 +02:00
Louis Dureuil
d05d49ffd8
Fix tests
2024-05-20 10:36:18 +02:00
Tamo
f2d0a59f1d
when no searchable attributes are defined, makes all the weight equals to zero
2024-05-16 01:06:33 +02:00
Louis Dureuil
355e5282b2
Remove _semanticScore
2024-04-04 16:04:07 +02:00
meili-bors[bot]
5509bafff8
Merge #4535
...
4535: Support Negative Keywords r=ManyTheFish a=Kerollmops
This PR fixes #4422 by supporting `-` before any word in the query.
The minus symbol `-`, from the ASCII table, is not the only character that can be considered the negative operator. You can see the two other matching characters under the `Based on "-" (U+002D)` section on [this unicode reference website](https://www.compart.com/en/unicode/U+002D ).
It's important to notice the strange behavior when a query includes and excludes the same word; only the derivative ( synonyms and split) will be kept:
- If you input `progamer -progamer`, the engine will still search for `pro gamer`.
- If you have the synonym `like = love` and you input `like -like`, it will still search for `love`.
## TODO
- [x] Add analytics
- [x] Add support to the `-` operator
- [x] Make sure to support spaces around `-` well
- [x] Support phrase negation
- [x] Add tests
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-04-04 13:10:27 +00:00
Clément Renault
90e812fc0b
Add some tests
2024-04-04 15:08:37 +02:00
Tamo
c41e1274dc
push and test the search queue datastructure
2024-03-26 15:56:43 +01:00
Tamo
d8fe4fe49d
return the order in the score details
2024-03-19 15:45:04 +01:00
Tamo
6a0c399c2f
rename the search_cutoff parameter to search_cutoff_ms
2024-03-19 10:35:47 +01:00
Tamo
038c26c118
stop returning the degraded boolean when a search was cutoff
2024-03-19 10:35:47 +01:00
Tamo
ad9192fbbf
reduce the size of an integration test
2024-03-19 10:35:47 +01:00
Tamo
b8cda6c300
fix the search cutoff and add a test
2024-03-19 10:35:47 +01:00
Louis Dureuil
05edd85d75
Stabilize scoreDetails
2024-02-06 11:15:19 +01:00
Louis Dureuil
6ff81de401
Fix tests
2023-12-20 17:16:46 +01:00
ManyTheFish
ac68f33194
Add simple test
2023-12-14 16:08:42 +01:00
Louis Dureuil
806e5b6899
Tests pass
2023-12-14 16:08:41 +01:00
Louis Dureuil
12940d79a9
WIP
...
- manual embedder
- multi embedders OK
- clippy + tests OK
2023-12-14 16:08:41 +01:00
Clément Renault
0dbf1a16ff
Make clippy happy
2023-11-23 14:11:38 +01:00
meili-bors[bot]
5e0485d8dd
Merge #4131
...
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23 )
### Impact on search relevancy
<details>
| dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% |
| FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% |
| FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% |
| FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% |
| FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% |
| FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% |
| FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% |
| FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% |
| FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% |
| FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% |
| FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% |
| FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% |
| LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% |
| LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% |
| LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% |
| LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% |
| LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% |
| LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% |
</details>
### Impact on Search time
| dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 |
| FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 |
| LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-18 14:56:08 +00:00
ManyTheFish
27eec21415
Fix tests
2023-10-18 16:03:22 +02:00
meili-bors[bot]
0913373a5e
Merge #4122
...
4122: Bring back changes from `release-v1.4.1` into `main` r=Kerollmops a=curquiza
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-10-12 15:57:47 +00:00
Vivek Kumar
d1331d8abf
add integration test for distinct search with no ranking
2023-10-11 19:12:56 +05:30
meili-bors[bot]
86b314626d
Merge #4080
...
4080: Bring back changes from v1.4.0 into main r=Kerollmops a=curquiza
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
Co-authored-by: dogukanakkaya <doguakkaya27@hotmail.com>
2023-09-26 08:13:49 +00:00
Tamo
056b2c387d
refactor the tests suite slightly
2023-09-11 16:56:26 +02:00
ManyTheFish
fc2590fc9d
Add a test
2023-08-08 16:43:08 +02:00
ManyTheFish
88559a2d54
Fix score details casing
2023-07-25 15:49:33 +02:00
Kerollmops
f9d94c5845
Test geo sort with string lat/lng
2023-07-17 18:28:03 +02:00
ManyTheFish
c106906f8f
deactivate camelCase segmentation
2023-07-13 12:06:27 +02:00
ManyTheFish
9c0691156f
Add tests
2023-07-13 11:53:13 +02:00
Louis Dureuil
2d3cec11a7
Search integration test to check score details and vector store
2023-07-06 09:02:02 +02:00
Clément Renault
362e9ff845
Add more tests
2023-06-28 15:28:24 +02:00
ManyTheFish
a61ca4066e
Add tests
2023-06-26 14:55:14 +02:00
ManyTheFish
b744f33530
Add test
2023-03-29 12:01:52 +02:00
Clément Renault
6da54d0cb6
Add a test to fix a diacritic issue
2023-03-09 14:57:38 +01:00
bors[bot]
4f1ccbc495
Merge #3525
...
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish
# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.
fixes https://github.com/meilisearch/meilisearch/issues/3521
related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-03-02 10:55:37 +00:00
Louis Dureuil
28d6ab78de
multi-search: Add multi search tests
2023-02-22 17:04:12 +01:00
ManyTheFish
6841f167b4
Add test
2023-02-21 18:02:52 +01:00
ManyTheFish
23f4e82b53
Add test ensuring that Meilisearch works on kanji only requests
2023-02-20 15:43:29 +01:00