3331: Limit the number of concurrently opened indexes r=dureuill a=dureuill
# Pull Request
## Related issue
Relevant to #1841, fixes#3382
## What does this PR do?
### User standpoint
- Limit the number of concurrently opened indexes (currently, the number of indexes that can be concurrently opened is computed at startup)
- When too many an index is opened, the least recently used one is closed and its virtual memory released.
- This allows a user to have an arbitrary number of indexes of an arbitrary size
### Implementation standpoint
- Added a LRU cache map in `index-scheduler::lru`. A more complete implementation (eg with helper functions not used here) is available but would better fit a dedicated crate.
- Use the LRU cache map in the `IndexScheduler`. To simplify the lifecycle of indexes, they are never removed from the cache when they are in the middle of a resize or delete operation. To achieve this, an intermediate `Vec` stores the UUIDs of the indexes that are in the middle of such an operation.
- Upon creating the index scheduler object, compute the total virtual memory that is adressable by using a dichotomic search on the max size of an index. Use this as a base to compute the number of indexes that can be open with 2TiB per index. If the virtual memory address space is lower than 2TiB, then only allow for 1 index of a fraction of that size.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3530: Fix highlighter bug r=Kerollmops a=ManyTheFish
# Pull Request
There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag.
## Related issue
Fixes#3517Fixes#3526
## What does this PR do?
- add a test showcasing the bug
- fix the bug by activating the char_map creation of the tokenizer during the highlighting process
Co-authored-by: ManyTheFish <many@meilisearch.com>
3534: Update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter r=Kerollmops a=irevoire
Fixes#3533
Co-authored-by: Tamo <tamo@meilisearch.com>
3347: Enhance language detection r=irevoire a=ManyTheFish
## Summary
Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.
This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.
## Detail
- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese
## Related issues
Fixes#2403Fixes#3513
Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
3496: Fix metrics feature r=irevoire a=james-2001
# Pull Request
## Related issue
Resolves: #3469
See also: #2763
## What does this PR do?
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.
The original issue was not *that* clear as to exactly what was broken, and the build logs have disappeared, but it seemed to just be this one line fix. If this is not the case and I've missed the mark let me know, and i'll head back to the drawing board.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: James <james.a.may.2001@gmail.com>
3505: Csv delimiter r=irevoire a=irevoire
Fixes https://github.com/meilisearch/meilisearch/issues/3442
Closes https://github.com/meilisearch/meilisearch/pull/2803
Specified in https://github.com/meilisearch/specifications/pull/221
This PR is a reimplementation of https://github.com/meilisearch/meilisearch/pull/2803, on the new engine. Thanks for your idea and initial PR `@MixusMinimax;` sorry I couldn’t update/merge your PR. Way too many changes happened on the engine in the meantime.
**Attention to reviewer**; I had to update deserr to implement the support of deserializing `char`s
-------
It introduces four new error messages;
- Invalid value in parameter csvDelimiter: expected a string of one character, but found an empty string
- Invalid value in parameter csvDelimiter: expected a string of one character, but found the following string of 5 characters: doggo
- csv delimiter must be an ascii character. Found: 🍰
- The Content-Type application/json does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type text/csv.
And one error code;
- `invalid_index_csv_delimiter`
The `invalid_content_type` error code is now also used when we encounter the `csvDelimiter` query parameter with a non-csv content type.
Co-authored-by: Tamo <tamo@meilisearch.com>
3467: Identify builds git tagged with `prototype-...` in CLI and analytics r=curquiza a=dureuill
# Pull Request
## What does this PR do?
- Parses the last git tag to extract a prototype name if:
- Current build uses the prototype tag (not after the tag) precisely
- The prototype tag name respects the following conditions:
1. starts with `prototype-`
2. ends with a number
3. the hyphen-separated segment right before the number is not a number (required to reject commits after the tag).
- Display the prototype name in the launch summary in the CLI
- Send the prototype name to analytics if any
- Update prototypes instructions in CONTRIBUTING.md
|`VERGEN_GIT_SEMVER_LIGHTWEIGHT` value | Prototype |
|---|---|
| `Some("prototype-geo-bounding-box-0-139-gcde89018")` | `None` (does not end with a number) |
| `Some("prototype-geo-bounding-box-0-139-89018")` | `None` (before the last segment is a number) |
| `Some("prototype-geo-bounding-box-0")` | `Some("prototype-geo-bounding-box-0")` |
| `Some("prototype-geo-bounding-box")` | `None` (does not end with a number") |
| `Some("geo-bounding-box-0")` | `None` (does not start with "prototype") |
| `None` | `None` |
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Metrics feature was relying on old references. Refactored with inspiration from the `get_stats` method in `meilisearch/src/routes/lib.rs`. `enable_metrics_routes` added to options in `segment_analytics`.
Resolves: #3469
See also: #2763
As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`.
Resolves: #3469
See also: #2763
3461: Bring v1 changes into main r=curquiza a=Kerollmops
Also bring back changes in milli (the remote repository) into main done during the pre-release
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Philipp Ahlner <philipp@ahlner.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3441: Fix import of dump v2 r=dureuill a=irevoire
# Pull Request
This bug was introduced because of a mistake we did earlier: We said the last version to export dump v2 was the v0.21.0 while it was the v0.22.0.
To fix the bug I updated our whole v2 reader to use the code from meilisearch v0.22.0.
Also:
- Import the bugged dump in the tests
- Test the import of this dump in the v2 reader and current reader
## Related issue
Fixes#3435
Co-authored-by: Tamo <tamo@meilisearch.com>
3412: When adding documents, trying to update the primary-key now throw an error r=Kerollmops a=irevoire
While updating the test suite, I also noticed an issue with the indexed_documents value of failed tasks and had to update it. I also named a bunch of snapshots that had no name, sorry 😬
Fixes https://github.com/meilisearch/meilisearch/issues/3385
Fixes https://github.com/meilisearch/meilisearch/issues/3411
Co-authored-by: Tamo <tamo@meilisearch.com>
3406: Master Key: Implements errors and warnings from the specification r=irevoire a=dureuill
<sub>Now in technicolor</sub>
# Pull Request
## What does this PR do?
- Uses `atty` and `termcolor` as dependency
- Use these dependencies to print colored background for warning messages
- Update messages to match https://github.com/meilisearch/specifications/pull/209
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
While updating the test suite I also noticed an issue with the indexed_documents value of failed task and had to update it.
I also named a bunch of snapshots that had no name sorry 😬
3389: Return `invalid_search_facets` rather than `bad_request` when using facet on a non filterable attribute r=irevoire a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3384
## What does the PR does
- title
- also adds a test
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3370: make the swap indexes not found errors return an IndexNotFound error-code r=irevoire a=irevoire
Fix https://github.com/meilisearch/meilisearch/issues/3368
3373: fix a wrong error code and add tests on the document resource r=irevoire a=irevoire
Fix https://github.com/meilisearch/meilisearch/issues/3371
3375: Avoid deleting all task invalid canceled by r=irevoire a=Kerollmops
Fixes#3369 by making sure that at least one `canceledBy` task filter parameter matches something.
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3334: Add specific error codes `immutable_...` r=irevoire a=loiclec
Add the following error codes:
When an immutable field of API key is sent to the `PATCH /keys` route:
- `ImmutableApiKeyUid`
- `ImmutableApiKeyKey`
- `ImmutableApiKeyActions`
- `ImmutableApiKeyIndexes`
- `ImmutableApiKeyExpiresAt`
- `ImmutableApiKeyCreatedAt`
- `ImmutableApiKeyUpdatedAt`
When an immutable field of Index is sent to the `PATCH /indexes/{uid}` route:
- `ImmutableIndexUid`
- `ImmutableIndexCreatedAt`
- `ImmutableIndexUpdatedAt`
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Tamo <tamo@meilisearch.com>