Commit Graph

1931 Commits

Author SHA1 Message Date
Vivek Kumar
19ba129165
add unit test for distinct search with no ranking 2023-10-11 19:02:27 +05:30
Vivek Kumar
d4da06ff47
fix bug where distinct search with no ranking returns offset+limit hits 2023-10-11 19:02:16 +05:30
Tamo
c0f2724c2d get rids of the new introduced error code in favor of an io::Error 2023-10-10 15:12:23 +02:00
Tamo
d772073dfa use a bufreader everytime there is a grenad<file> 2023-10-10 15:00:30 +02:00
ManyTheFish
43989fe2e4 Reduce porximity range from 7 to 3 2023-10-03 12:16:48 +02:00
meili-bors[bot]
487d493f49
Merge #4043
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza



Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-09-11 12:27:34 +00:00
Vivek Kumar
abfa7ded25
use a new temp index in the test 2023-09-08 12:32:47 +05:30
Vivek Kumar
f2837aaec2
add another test case 2023-09-08 11:39:54 +05:30
Vivek Kumar
11df155598
fix highlighting bug when searching for a phrase with cropping 2023-09-08 11:39:52 +05:30
meili-bors[bot]
256cf33bca
Merge #4039
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops

This PR fixes #4035, making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-07 09:25:58 +00:00
Kerollmops
679c0b0f97
Extract the vectors from the non-flattened version of the documents 2023-09-06 12:26:00 +02:00
Kerollmops
e02d0064bd
Add a test case scenario 2023-09-06 12:26:00 +02:00
meili-bors[bot]
dc3d9c90d9
Merge #3994
3994: Fix synonyms with separators r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #3977

## Available prototype
```
$ docker pull getmeili/meilisearch:prototype-fix-synonyms-with-separators-0
```

## What does this PR do?
- add a new test
- filter the empty synonyms after normalization


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-09-05 14:42:46 +00:00
ManyTheFish
66aa6d5871 Ignore tokens with empty normalized value during indexing process 2023-09-05 15:44:14 +02:00
Kerollmops
8ac5b765bc
Fix synonyms normalization 2023-09-04 16:12:48 +02:00
Kerollmops
085aad0a94
Add a test 2023-09-04 14:39:33 +02:00
meili-bors[bot]
ccf3ba3f32
Merge #4019
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Clément Renault
8c0ebd1331
Update milli/src/search/new/bucket_sort.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-08-23 16:40:39 +02:00
Kerollmops
5130e06b41
Temporarily disable an assert in the ranking rules 2023-08-23 16:11:54 +02:00
meili-bors[bot]
914b125c5f
Merge #3945
3945: Do not leak field information on error r=Kerollmops a=vivek-26

# Pull Request

## Related issue
Fixes #3865

## What does this PR do?
This PR ensures that `InvalidSortableAttribute`and `InvalidFacetSearchFacetName` errors do not leak field information i.e. fields which are not part of `displayedAttributes` in the settings are hidden from the error message.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
2023-08-22 18:55:27 +00:00
Kerollmops
717b069907
Bump charabia to 0.8.3 2023-08-22 16:25:00 +02:00
Kerollmops
c53841e166
Accept the null JSON value as the value of _vectors 2023-08-14 16:03:55 +02:00
ManyTheFish
cab27c2ab4 upgrade indexmap = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
624fa9052f upgrade deserr = "0.6.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
60c11dbdbd upgrade rstar - "0.11.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
dacee40ebc upgrade memmap2 = "0.7.1" 2023-08-10 18:09:02 +02:00
ManyTheFish
cc2c19d4c3 upgrade itertools = "0.10.5" 2023-08-10 18:09:02 +02:00
meili-bors[bot]
e4e49e63d0
Merge #3993
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza



Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
ManyTheFish
5a7c1bde84 Fix clippy 2023-08-10 11:27:56 +02:00
ManyTheFish
6b2d671be7 Fix PR comments 2023-08-10 10:44:07 +02:00
Many the fish
43c13faeda
Update milli/src/update/index_documents/extract/extract_docid_word_positions.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-10 10:05:03 +02:00
meili-bors[bot]
44c1900f36
Merge #3986
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire

# Pull Request

When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.

This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973

## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-09 07:58:15 +00:00
ManyTheFish
8dc5acf998 Try fix 2023-08-08 16:52:36 +02:00
ManyTheFish
35758db9ec Truncate the the normalized long facets used in search for facet value 2023-08-08 16:38:30 +02:00
Tamo
4988199bb9 ensure the geoboundingbox works with strings and int geofields in milli and meilisearch 2023-08-08 16:29:25 +02:00
Tamo
9d061cec26 automatically parse the filterable attribute to float if it's a geo field 2023-08-08 16:28:07 +02:00
ManyTheFish
4a21fecf67 Merge branch 'main' into settings-customizing-tokenization 2023-08-08 16:08:16 +02:00
Vivek Kumar
dd57873f8e
hide fields not in the displayedAttributes list from errors 2023-08-05 16:03:10 +05:30
ManyTheFish
b45c36cd71 Merge branch 'main' into tmp-release-v1.3.0 2023-08-01 15:05:17 +02:00
meili-bors[bot]
151c31c18f
Merge #3963
3963: Fix the milli crate r=ManyTheFish a=irevoire

Milli was using the serde feature of either without enabling it first; thus, it wasn't working.

It was working in meilisearch, though, because `meilisearch-types` was using the feature which enables it globally for all the other crates.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3962

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-07-31 09:32:08 +00:00
Tamo
a8ad0902d3 Fix the milli crate
Milli was using the serde feature of either without enabling it first, thus it wasn't working
2023-07-31 11:08:27 +02:00
ThatOneCalculator
ba919b6123
fix: ⬆️ up mimalloc 2023-07-28 20:35:47 -07:00
ManyTheFish
9d5e3457e5 Fix clippy 2023-07-27 14:21:19 +02:00
meili-bors[bot]
3a3414270d
Merge #3952
3952: Use the new safe `read-txn-no-tls` heed feature r=ManyTheFish a=Kerollmops

[We recently found out](https://github.com/meilisearch/heed/issues/191#issuecomment-1650280513) that the `read-sync-txn` heed feature was invalid and must be removed from this crate. We were declaring it in milli/meilisearch but, fortunately, not sharing the `RoTxn`s across threads 😮‍💨

[I recently introduced the `read-txn-no-tls` heed feature](https://github.com/meilisearch/heed/pull/194), which implements `RoTxn: Send` and allows multiple read transactions on a single thread (which we use).

This PR removes the `sync-read-txn` heed feature from the _Cargo.toml_ file. I will fix this in heed v0.20.0 and will fill a RustSec advisory in the meantime.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-07-26 16:40:58 +00:00
meili-bors[bot]
939b2fc6fd
Merge #3949
3949: Fix score details casing r=Kerollmops a=ManyTheFish

# Pull Request

Fixes #3941


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-07-26 14:14:59 +00:00
Clément Renault
d8b47b689e
Use the new read-txn-no-tls heed feature 2023-07-26 15:45:15 +02:00
ManyTheFish
b0c1a9504a ensure the synonyms are updated when the tokenizer settings are changed 2023-07-26 09:33:42 +02:00
meili-bors[bot]
be72be7c0d
Merge #3942
3942: Normalize for the search the facets values r=ManyTheFish a=Kerollmops

This PR improves and fixes the search for facet values feature. Searching for _bre_ wasn't returning facet values like _brévent_ or _brô_.

The issue was related to the fact that facets are normalized but not in the same way as the `searchableAttributes` are. We decided to normalize them further and add another intermediate database where the key is the normalized facet value, and the value is a set of the non-normalized facets. We then use these non-normalized ones to get the correct counts by fetching the associated databases.

### What's missing in this PR?
 - [x] Apply the change to the whole set of `SearchForFacetValue::execute` conditions.
 - [x] Factorize the code that does an intermediate normalized value fetch in a function.
 - [x] Add or modify the search for facet value test.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-25 14:37:17 +00:00
ManyTheFish
88559a2d54 Fix score details casing 2023-07-25 15:49:33 +02:00
ManyTheFish
d57026cd96 Support synonyms sinergies 2023-07-25 15:01:42 +02:00
Kerollmops
29ab54b259
Replace the hnsw crate by the instant-distance one 2023-07-25 12:37:35 +02:00
ManyTheFish
d4ff59fcf5 Fix clippy 2023-07-24 18:42:26 +02:00
ManyTheFish
9c485f8563 Make the search and the indexing work 2023-07-24 18:35:20 +02:00
Kerollmops
691a536893
Implement the facet search with the normalized index 2023-07-24 17:56:17 +02:00
ManyTheFish
d8d12d5979 Be able to set and reset settings 2023-07-24 17:00:18 +02:00
Clément Renault
df528b41d8
Normalize for the search the facets values 2023-07-20 17:57:07 +02:00
ManyTheFish
0497f93494 Update Charabia to the last version 2023-07-19 15:19:32 +02:00
Kerollmops
eef95de30e
First iteration on exposing puffin profiling 2023-07-18 17:38:13 +02:00
Kerollmops
d383afc82b
Fix the geo sort when lat and lng are strings 2023-07-17 18:28:04 +02:00
meili-bors[bot]
7745cc9d3c
Merge #3921
3921: Deactivate camel case segmentation r=dureuill a=ManyTheFish

# Pull Request
This PR deactivates the camel case segmentation to retrieve the possibility to accept typos over camel-cased words

## Related issue
Fixes #3869
Fixes #3818

## What does this PR do?
- deactivates camelcase segmentation

related to #3919



Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-07-13 11:00:14 +00:00
ManyTheFish
c106906f8f deactivate camelCase segmentation 2023-07-13 12:06:27 +02:00
Louis Dureuil
4310928803
Fixes #3912 2023-07-12 10:08:56 +02:00
Louis Dureuil
74315b4ea8
Fixes #3911 2023-07-12 10:08:29 +02:00
Louis Dureuil
40fa59d64c
Sort by lexicographic order after normalization 2023-07-10 09:26:59 +02:00
Louis Dureuil
55cd7738b9
Update snapshots 2023-07-04 16:31:01 +02:00
Louis Dureuil
48409c9183
Add missing exactness.matchingWords, exactness.maxMatchingWords 2023-07-04 16:31:01 +02:00
Kerollmops
a442af6a7c
Update the features of the either dependency to compile milli successfully 2023-07-03 18:51:43 +02:00
Louis Dureuil
324d448236
Format let-else ❤️ 🎉 2023-07-03 10:20:28 +02:00
meili-bors[bot]
661d1f90dc
Merge #3866
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish

# Pull Request

Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
  - words containing `_` are now properly segmented into several words
  - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes #3815
- fixes #3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)

> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-29 15:24:36 +00:00
ManyTheFish
6ec7541026 Update inta snapshots 2023-06-29 17:18:39 +02:00
ManyTheFish
a82c49ab08 Update test 2023-06-29 15:56:36 +02:00
ManyTheFish
84845de9ef Update Charabia 2023-06-29 15:56:32 +02:00
Clément Renault
7c157fc442
Document that the LevelEntry fields order is important 2023-06-29 14:33:32 +02:00
Clément Renault
0b97596c93
Replace unwraps with ? 2023-06-29 14:33:32 +02:00
Clément Renault
a0e0fce677
Simplify a Rust lifetime trick 2023-06-29 14:33:32 +02:00
Clément Renault
b951830461
Add more tests 2023-06-29 14:33:32 +02:00
Kerollmops
b132e859f7
Make clippy happy 2023-06-29 14:33:31 +02:00
Kerollmops
9917bf046a
Move the sortFacetValuesBy in the faceting settings 2023-06-29 14:33:31 +02:00
Kerollmops
d9fea0143f
Make Clippy happy 2023-06-29 14:33:31 +02:00
Kerollmops
a385642ec3
Replace the BTreeMap by an IndexMap to return values in order 2023-06-29 14:33:31 +02:00
Kerollmops
34b2e98fe9
Expose a sortFacetValuesBy parameter to the user 2023-06-29 14:33:00 +02:00
Kerollmops
80bbd4b6f3
Clean and make the facet order configurable internally 2023-06-29 14:31:17 +02:00
Kerollmops
f42bef2f66
Make the search to always return the facets ordered by count 2023-06-29 14:31:17 +02:00
Kerollmops
bd3c026406
First to-test version of the algorithm 2023-06-29 14:31:17 +02:00
Kerollmops
84f8938f33
Rename facet distribution to be explicit on the order to find them 2023-06-29 14:31:15 +02:00
Clément Renault
efbe7ce78b
Clean the facet string FSTs when we clear the documents 2023-06-28 15:36:32 +02:00
Kerollmops
26f0fa678d
Change the error message when a facet is not searchable 2023-06-28 15:06:09 +02:00
Kerollmops
60ddd53439
Return one of the original facet values when doing a facet search 2023-06-28 15:06:09 +02:00
Kerollmops
2bcd8d2983
Make sure the facet queries are normalized 2023-06-28 15:06:09 +02:00
Kerollmops
41760a9306
Introduce a new invalid_facet_search_facet_name error code 2023-06-28 15:06:07 +02:00
Kerollmops
e9a3029c30
Use the right field id to write the string facet values FST 2023-06-28 15:01:51 +02:00
Kerollmops
ed0ff47551
Return an empty list of results if attribute is set as filterable 2023-06-28 15:01:51 +02:00
Clément Renault
e1b8fb48ee
Use the minWordSizeForTypos index settings 2023-06-28 15:01:51 +02:00
Clément Renault
87e22e436a
Fix compilation issues 2023-06-28 15:01:51 +02:00
Clément Renault
0252cfe8b6
Simplify the placeholder search of the facet-search route 2023-06-28 15:01:50 +02:00
Clément Renault
f35ad96afa
Use the disableOnAttributes parameter on the facet-search route 2023-06-28 15:01:50 +02:00
Clément Renault
2ceb781c73
Use the disableOnWords parameter on the facet-search route 2023-06-28 15:01:50 +02:00
Clément Renault
7bd67543dd
Support the typoTolerant.enabled parameter 2023-06-28 15:01:50 +02:00
Clément Renault
8e86eb91bb
Log an error when a facet value is missing from the database 2023-06-28 15:01:50 +02:00
Clément Renault
55c17aa38b
Rename the SearchForFacetValues struct 2023-06-28 15:01:50 +02:00