3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops
# Pull Request
## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner.
## What does this PR do?
### `IS NULL` and `IS NOT NULL`
This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.
- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`
1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`
`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3
common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2
### `IS EMPTY` and `IS NOT EMPTY`
- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.
1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`
`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6
common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9
## What should the reviewer do?
- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3688: Following release v1.1.1: bring back changes into `main` r=curquiza a=curquiza
`@meilisearch/engine-team` ensure the changes we bring to `main` are the ones you want
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
3641: Bring back changes from `release v1.1.0` into `main` after v1.1.0 release r=curquiza a=curquiza
Replace https://github.com/meilisearch/meilisearch/pull/3637 since we don't want to pull commits from `main` into `release-v1.1.0` when fixing git conflicts
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish
# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.
fixes https://github.com/meilisearch/meilisearch/issues/3521
related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779
Co-authored-by: ManyTheFish <many@meilisearch.com>
3524: Update the metrics route r=irevoire a=irevoire
Fixes#3523
Make the metrics available by default without a feature flag.
+ Rename the cli-flag to `experimental-enable-metrics`.
Co-authored-by: Tamo <tamo@meilisearch.com>
3331: Limit the number of concurrently opened indexes r=dureuill a=dureuill
# Pull Request
## Related issue
Relevant to #1841, fixes#3382
## What does this PR do?
### User standpoint
- Limit the number of concurrently opened indexes (currently, the number of indexes that can be concurrently opened is computed at startup)
- When too many an index is opened, the least recently used one is closed and its virtual memory released.
- This allows a user to have an arbitrary number of indexes of an arbitrary size
### Implementation standpoint
- Added a LRU cache map in `index-scheduler::lru`. A more complete implementation (eg with helper functions not used here) is available but would better fit a dedicated crate.
- Use the LRU cache map in the `IndexScheduler`. To simplify the lifecycle of indexes, they are never removed from the cache when they are in the middle of a resize or delete operation. To achieve this, an intermediate `Vec` stores the UUIDs of the indexes that are in the middle of such an operation.
- Upon creating the index scheduler object, compute the total virtual memory that is adressable by using a dichotomic search on the max size of an index. Use this as a base to compute the number of indexes that can be open with 2TiB per index. If the virtual memory address space is lower than 2TiB, then only allow for 1 index of a fraction of that size.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3530: Fix highlighter bug r=Kerollmops a=ManyTheFish
# Pull Request
There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag.
## Related issue
Fixes#3517Fixes#3526
## What does this PR do?
- add a test showcasing the bug
- fix the bug by activating the char_map creation of the tokenizer during the highlighting process
Co-authored-by: ManyTheFish <many@meilisearch.com>
3534: Update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter r=Kerollmops a=irevoire
Fixes#3533
Co-authored-by: Tamo <tamo@meilisearch.com>
3347: Enhance language detection r=irevoire a=ManyTheFish
## Summary
Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.
This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.
## Detail
- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese
## Related issues
Fixes#2403Fixes#3513
Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
3505: Csv delimiter r=irevoire a=irevoire
Fixes https://github.com/meilisearch/meilisearch/issues/3442
Closes https://github.com/meilisearch/meilisearch/pull/2803
Specified in https://github.com/meilisearch/specifications/pull/221
This PR is a reimplementation of https://github.com/meilisearch/meilisearch/pull/2803, on the new engine. Thanks for your idea and initial PR `@MixusMinimax;` sorry I couldn’t update/merge your PR. Way too many changes happened on the engine in the meantime.
**Attention to reviewer**; I had to update deserr to implement the support of deserializing `char`s
-------
It introduces four new error messages;
- Invalid value in parameter csvDelimiter: expected a string of one character, but found an empty string
- Invalid value in parameter csvDelimiter: expected a string of one character, but found the following string of 5 characters: doggo
- csv delimiter must be an ascii character. Found: 🍰
- The Content-Type application/json does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type text/csv.
And one error code;
- `invalid_index_csv_delimiter`
The `invalid_content_type` error code is now also used when we encounter the `csvDelimiter` query parameter with a non-csv content type.
Co-authored-by: Tamo <tamo@meilisearch.com>