ManyTheFish
db1ca21231
add puffin in sorter into reeder function
2023-10-30 11:15:00 +01:00
ManyTheFish
11ea5acff9
Fix
2023-10-30 11:13:10 +01:00
ManyTheFish
8d77736a67
Fix fid_word_docids
2023-10-30 11:13:10 +01:00
ManyTheFish
748b333161
Add usefull debug assert before key insertion in database
2023-10-30 11:13:10 +01:00
ManyTheFish
17b647dfe5
Wip
2023-10-30 11:13:08 +01:00
meili-bors[bot]
5e0485d8dd
Merge #4131
...
4131: Reduce proximity range from 7 to 3 r=Kerollmops a=ManyTheFish
## Summary
This PR aims to reduce the impact of the proximity databases on the indexing time and on the database size by reducing the maximum distance between two words to be indexed in the proximity database.
## Stats
### Impact on database size and indexing time
![Impact on datasets](https://github.com/meilisearch/meilisearch/assets/6482087/28ed3d96-bdde-41c1-bdac-e90c1b1dbb23 )
### Impact on search relevancy
<details>
| dataset_name | host_name | Relevancy rate (Precision) | completion_rate 25.00% | completion_rate 50.00% | completion_rate 75.00% | completion_rate 100.00% |
|--------------|------------------|------------------------------------|-----------------|-----------------|-----------------|-----------------|
| FBIS | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | 1_4_0 | percentile-75 | 0.00% | 12.50% | 35.00% | 45.00% |
| FBIS | 1_4_0 | percentile-90 | 20.00% | 40.00% | | 100.00% |
| FBIS | 1_4_0 | average | 5.78% | 11.16% | 21.90% | 26.29% |
| FBIS | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FBIS | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.56% |
| FBIS | reduce_proximity | percentile-75 | 0.00% | 15.00% | 35.00% | 40.00% |
| FBIS | reduce_proximity | percentile-90 | 20.00% | 40.00% | 85.00% | 100.00% |
| FBIS | reduce_proximity | average | 5.55% | 11.34% | 21.75% | 26.14% |
| FR94 | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | 1_4_0 | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | 1_4_0 | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | 1_4_0 | average | 5.95% | 12.07% | 18.70% | 25.57% |
| FR94 | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-50 | 0.00% | 0.00% | 0.00% | 0.00% |
| FR94 | reduce_proximity | percentile-75 | 0.00% | 5.00% | 15.00% | 42.11% |
| FR94 | reduce_proximity | percentile-90 | 15.00% | 54.55% | 100.00% | 100.00% |
| FR94 | reduce_proximity | average | 5.79% | 12.00% | 18.70% | 25.53% |
| FT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | 1_4_0 | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | 1_4_0 | percentile-90 | 20.00% | 50.00% | 65.00% | 100.00% |
| FT | 1_4_0 | average | 5.08% | 12.58% | 20.00% | 25.49% |
| FT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| FT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 10.00% |
| FT | reduce_proximity | percentile-75 | 0.00% | 15.00% | 30.00% | 40.00% |
| FT | reduce_proximity | percentile-90 | 10.00% | 45.00% | 60.00% | 100.00% |
| FT | reduce_proximity | average | 5.01% | 12.64% | 20.10% | 25.53% |
| LAT | 1_4_0 | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | 1_4_0 | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | 1_4_0 | percentile-75 | 5.00% | 15.00% | 30.00% | 30.00% |
| LAT | 1_4_0 | percentile-90 | 15.00% | 45.00% | 60.00% | 80.00% |
| LAT | 1_4_0 | average | 4.80% | 11.80% | 17.88% | 21.62% |
| LAT | reduce_proximity | percentile-10 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-25 | 0.00% | 0.00% | 0.00% | 0.00% |
| LAT | reduce_proximity | percentile-50 | 0.00% | 0.00% | 5.00% | 5.00% |
| LAT | reduce_proximity | percentile-75 | 0.00% | 11.11% | 25.00% | 35.00% |
| LAT | reduce_proximity | percentile-90 | 15.00% | 45.00% | 55.00% | 80.00% |
| LAT | reduce_proximity | average | 4.43% | 11.23% | 17.32% | 21.45% |
</details>
### Impact on Search time
| dataset_name | host_name | 25.00% | 50.00% | 75.00% | 100.00% | Average |
|--------------|------------------|------------:|------------:|------------:|------------:|-------------|
| FBIS | 1_4_0 | 3.45 | 7.446666667 | 9.773489933 | 9.620300752 | 7.572614338 |
| FBIS | reduce_proximity | 2.983333333 | 5.316666667 | 6.911073826 | 7.637218045 | 5.712072968 |
| FR94 | 1_4_0 | 2.236666667 | 4.45 | 5.523489933 | 4.560150376 | 4.192576744 |
| FR94 | reduce_proximity | 2.09 | 3.991666667 | 4.981543624 | 4.266917293 | 3.832531896 |
| FT | 1_4_0 | 5.956666667 | 9.656666667 | 13.86912752 | 10.83270677 | 10.0787919 |
| FT | reduce_proximity | 4.51 | 5.981666667 | 7.701342282 | 6.766917293 | 6.23998156 |
| LAT | 1_4_0 | 5.856666667 | 9.233333333 | 12.98322148 | 10.78759398 | 9.715203865 |
| LAT | reduce_proximity | 6.91 | 6.706666667 | 8.463087248 | 8.265037594 | 7.586197877 |
## Technical approach
- Ensure the MAX_DISTANCE constant is used everywhere needed
- Reduce the MAX_DISTANCE from 8 to 4
## Related
TBD
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-10-18 14:56:08 +00:00
ManyTheFish
27eec21415
Fix tests
2023-10-18 16:03:22 +02:00
Clément Renault
62dfd09dc6
Add more puffin logs to the deletion functions
2023-10-13 13:11:09 +02:00
meili-bors[bot]
f343ef5f2f
Merge #4108
...
4108: Fix bug where search with distinct attribute and no ranking, returns offset+limit hits r=curquiza a=vivek-26
# Pull Request
## Related issue
Fixes #4078
## What does this PR do?
This PR -
- Fixes bug where search with distinct attribute and no ranking, returns offset+limit hits.
- Adds unit and integration tests.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
2023-10-12 07:51:29 +00:00
Vivek Kumar
d4da06ff47
fix bug where distinct search with no ranking returns offset+limit hits
2023-10-11 19:02:16 +05:30
Tamo
c0f2724c2d
get rids of the new introduced error code in favor of an io::Error
2023-10-10 15:12:23 +02:00
Tamo
d772073dfa
use a bufreader everytime there is a grenad<file>
2023-10-10 15:00:30 +02:00
ManyTheFish
43989fe2e4
Reduce porximity range from 7 to 3
2023-10-03 12:16:48 +02:00
meili-bors[bot]
487d493f49
Merge #4043
...
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-09-11 12:27:34 +00:00
Vivek Kumar
abfa7ded25
use a new temp index in the test
2023-09-08 12:32:47 +05:30
Vivek Kumar
f2837aaec2
add another test case
2023-09-08 11:39:54 +05:30
Vivek Kumar
11df155598
fix highlighting bug when searching for a phrase with cropping
2023-09-08 11:39:52 +05:30
meili-bors[bot]
256cf33bca
Merge #4039
...
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops
This PR fixes #4035 , making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-07 09:25:58 +00:00
Kerollmops
679c0b0f97
Extract the vectors from the non-flattened version of the documents
2023-09-06 12:26:00 +02:00
Kerollmops
e02d0064bd
Add a test case scenario
2023-09-06 12:26:00 +02:00
meili-bors[bot]
dc3d9c90d9
Merge #3994
...
3994: Fix synonyms with separators r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes #3977
## Available prototype
```
$ docker pull getmeili/meilisearch:prototype-fix-synonyms-with-separators-0
```
## What does this PR do?
- add a new test
- filter the empty synonyms after normalization
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-09-05 14:42:46 +00:00
ManyTheFish
66aa6d5871
Ignore tokens with empty normalized value during indexing process
2023-09-05 15:44:14 +02:00
Kerollmops
8ac5b765bc
Fix synonyms normalization
2023-09-04 16:12:48 +02:00
Kerollmops
085aad0a94
Add a test
2023-09-04 14:39:33 +02:00
meili-bors[bot]
ccf3ba3f32
Merge #4019
...
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Clément Renault
8c0ebd1331
Update milli/src/search/new/bucket_sort.rs
...
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-08-23 16:40:39 +02:00
Kerollmops
5130e06b41
Temporarily disable an assert in the ranking rules
2023-08-23 16:11:54 +02:00
meili-bors[bot]
914b125c5f
Merge #3945
...
3945: Do not leak field information on error r=Kerollmops a=vivek-26
# Pull Request
## Related issue
Fixes #3865
## What does this PR do?
This PR ensures that `InvalidSortableAttribute`and `InvalidFacetSearchFacetName` errors do not leak field information i.e. fields which are not part of `displayedAttributes` in the settings are hidden from the error message.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
2023-08-22 18:55:27 +00:00
Kerollmops
c53841e166
Accept the null JSON value as the value of _vectors
2023-08-14 16:03:55 +02:00
meili-bors[bot]
e4e49e63d0
Merge #3993
...
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
ManyTheFish
5a7c1bde84
Fix clippy
2023-08-10 11:27:56 +02:00
ManyTheFish
6b2d671be7
Fix PR comments
2023-08-10 10:44:07 +02:00
Many the fish
43c13faeda
Update milli/src/update/index_documents/extract/extract_docid_word_positions.rs
...
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-10 10:05:03 +02:00
meili-bors[bot]
44c1900f36
Merge #3986
...
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire
# Pull Request
When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.
This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973
## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-09 07:58:15 +00:00
ManyTheFish
8dc5acf998
Try fix
2023-08-08 16:52:36 +02:00
ManyTheFish
35758db9ec
Truncate the the normalized long facets used in search for facet value
2023-08-08 16:38:30 +02:00
Tamo
4988199bb9
ensure the geoboundingbox works with strings and int geofields in milli and meilisearch
2023-08-08 16:29:25 +02:00
Tamo
9d061cec26
automatically parse the filterable attribute to float if it's a geo field
2023-08-08 16:28:07 +02:00
ManyTheFish
4a21fecf67
Merge branch 'main' into settings-customizing-tokenization
2023-08-08 16:08:16 +02:00
Vivek Kumar
dd57873f8e
hide fields not in the displayedAttributes list from errors
2023-08-05 16:03:10 +05:30
ManyTheFish
b45c36cd71
Merge branch 'main' into tmp-release-v1.3.0
2023-08-01 15:05:17 +02:00
ManyTheFish
9d5e3457e5
Fix clippy
2023-07-27 14:21:19 +02:00
meili-bors[bot]
939b2fc6fd
Merge #3949
...
3949: Fix score details casing r=Kerollmops a=ManyTheFish
# Pull Request
Fixes #3941
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-07-26 14:14:59 +00:00
ManyTheFish
b0c1a9504a
ensure the synonyms are updated when the tokenizer settings are changed
2023-07-26 09:33:42 +02:00
meili-bors[bot]
be72be7c0d
Merge #3942
...
3942: Normalize for the search the facets values r=ManyTheFish a=Kerollmops
This PR improves and fixes the search for facet values feature. Searching for _bre_ wasn't returning facet values like _brévent_ or _brô_.
The issue was related to the fact that facets are normalized but not in the same way as the `searchableAttributes` are. We decided to normalize them further and add another intermediate database where the key is the normalized facet value, and the value is a set of the non-normalized facets. We then use these non-normalized ones to get the correct counts by fetching the associated databases.
### What's missing in this PR?
- [x] Apply the change to the whole set of `SearchForFacetValue::execute` conditions.
- [x] Factorize the code that does an intermediate normalized value fetch in a function.
- [x] Add or modify the search for facet value test.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-25 14:37:17 +00:00
ManyTheFish
88559a2d54
Fix score details casing
2023-07-25 15:49:33 +02:00
ManyTheFish
d57026cd96
Support synonyms sinergies
2023-07-25 15:01:42 +02:00
Kerollmops
29ab54b259
Replace the hnsw crate by the instant-distance one
2023-07-25 12:37:35 +02:00
ManyTheFish
d4ff59fcf5
Fix clippy
2023-07-24 18:42:26 +02:00
ManyTheFish
9c485f8563
Make the search and the indexing work
2023-07-24 18:35:20 +02:00
Kerollmops
691a536893
Implement the facet search with the normalized index
2023-07-24 17:56:17 +02:00
ManyTheFish
d8d12d5979
Be able to set and reset settings
2023-07-24 17:00:18 +02:00
Clément Renault
df528b41d8
Normalize for the search the facets values
2023-07-20 17:57:07 +02:00
Kerollmops
eef95de30e
First iteration on exposing puffin profiling
2023-07-18 17:38:13 +02:00
Kerollmops
d383afc82b
Fix the geo sort when lat and lng are strings
2023-07-17 18:28:04 +02:00
Louis Dureuil
4310928803
Fixes #3912
2023-07-12 10:08:56 +02:00
Louis Dureuil
74315b4ea8
Fixes #3911
2023-07-12 10:08:29 +02:00
Louis Dureuil
40fa59d64c
Sort by lexicographic order after normalization
2023-07-10 09:26:59 +02:00
Louis Dureuil
55cd7738b9
Update snapshots
2023-07-04 16:31:01 +02:00
Louis Dureuil
48409c9183
Add missing exactness.matchingWords, exactness.maxMatchingWords
2023-07-04 16:31:01 +02:00
Louis Dureuil
324d448236
Format let-else ❤️ 🎉
2023-07-03 10:20:28 +02:00
meili-bors[bot]
661d1f90dc
Merge #3866
...
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish
# Pull Request
Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
- words containing `_` are now properly segmented into several words
- brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes #3815
- fixes #3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151 )
> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-29 15:24:36 +00:00
ManyTheFish
6ec7541026
Update inta snapshots
2023-06-29 17:18:39 +02:00
ManyTheFish
a82c49ab08
Update test
2023-06-29 15:56:36 +02:00
ManyTheFish
84845de9ef
Update Charabia
2023-06-29 15:56:32 +02:00
Clément Renault
7c157fc442
Document that the LevelEntry fields order is important
2023-06-29 14:33:32 +02:00
Clément Renault
0b97596c93
Replace unwraps with ?
2023-06-29 14:33:32 +02:00
Clément Renault
a0e0fce677
Simplify a Rust lifetime trick
2023-06-29 14:33:32 +02:00
Clément Renault
b951830461
Add more tests
2023-06-29 14:33:32 +02:00
Kerollmops
b132e859f7
Make clippy happy
2023-06-29 14:33:31 +02:00
Kerollmops
9917bf046a
Move the sortFacetValuesBy in the faceting settings
2023-06-29 14:33:31 +02:00
Kerollmops
d9fea0143f
Make Clippy happy
2023-06-29 14:33:31 +02:00
Kerollmops
a385642ec3
Replace the BTreeMap by an IndexMap to return values in order
2023-06-29 14:33:31 +02:00
Kerollmops
34b2e98fe9
Expose a sortFacetValuesBy parameter to the user
2023-06-29 14:33:00 +02:00
Kerollmops
80bbd4b6f3
Clean and make the facet order configurable internally
2023-06-29 14:31:17 +02:00
Kerollmops
f42bef2f66
Make the search to always return the facets ordered by count
2023-06-29 14:31:17 +02:00
Kerollmops
bd3c026406
First to-test version of the algorithm
2023-06-29 14:31:17 +02:00
Kerollmops
84f8938f33
Rename facet distribution to be explicit on the order to find them
2023-06-29 14:31:15 +02:00
Clément Renault
efbe7ce78b
Clean the facet string FSTs when we clear the documents
2023-06-28 15:36:32 +02:00
Kerollmops
26f0fa678d
Change the error message when a facet is not searchable
2023-06-28 15:06:09 +02:00
Kerollmops
60ddd53439
Return one of the original facet values when doing a facet search
2023-06-28 15:06:09 +02:00
Kerollmops
2bcd8d2983
Make sure the facet queries are normalized
2023-06-28 15:06:09 +02:00
Kerollmops
41760a9306
Introduce a new invalid_facet_search_facet_name error code
2023-06-28 15:06:07 +02:00
Kerollmops
e9a3029c30
Use the right field id to write the string facet values FST
2023-06-28 15:01:51 +02:00
Kerollmops
ed0ff47551
Return an empty list of results if attribute is set as filterable
2023-06-28 15:01:51 +02:00
Clément Renault
e1b8fb48ee
Use the minWordSizeForTypos index settings
2023-06-28 15:01:51 +02:00
Clément Renault
87e22e436a
Fix compilation issues
2023-06-28 15:01:51 +02:00
Clément Renault
0252cfe8b6
Simplify the placeholder search of the facet-search route
2023-06-28 15:01:50 +02:00
Clément Renault
f35ad96afa
Use the disableOnAttributes parameter on the facet-search route
2023-06-28 15:01:50 +02:00
Clément Renault
2ceb781c73
Use the disableOnWords parameter on the facet-search route
2023-06-28 15:01:50 +02:00
Clément Renault
7bd67543dd
Support the typoTolerant.enabled parameter
2023-06-28 15:01:50 +02:00
Clément Renault
8e86eb91bb
Log an error when a facet value is missing from the database
2023-06-28 15:01:50 +02:00
Clément Renault
55c17aa38b
Rename the SearchForFacetValues struct
2023-06-28 15:01:50 +02:00
Clément Renault
aadbe88048
Return an internal error when a field id is missing
2023-06-28 15:01:50 +02:00
Clément Renault
f36de2115f
Make clippy happy
2023-06-28 15:01:50 +02:00
Clément Renault
702041b7e1
Improve the returned errors from the facet-search route
2023-06-28 15:01:48 +02:00
Clément Renault
a05074e675
Fix the max number of facets to be returned to 100
2023-06-28 14:58:42 +02:00
Clément Renault
93f30e65a9
Return the correct response JSON object from the facet-search route
2023-06-28 14:58:42 +02:00
Clément Renault
e81809aae7
Make the search for facet work
2023-06-28 14:58:41 +02:00
Kerollmops
ce7e7f12c8
Introduce the facet search route
2023-06-28 14:58:41 +02:00