(Some of) these specialized tokenizations include huge dictionaries
that currently account for 90% (!) of the meilisearch binary size.
This commit adds chinese, hebrew, japanese, and thai feature flags
that are propagated via milli down to the charabia crate. To keep it
backward compatible, they are enabled by default.
Related to meilisearch/milli#632
2727: Don't panic when the error length is slightly over 100 r=Kerollmops a=onyxcherry
# Pull Request
## What does this PR do?
Fixes PR #2207 as [the last commit](7ece7a9d9e) has changed number of the characters at the end to leave in place from `50` to `85` **but the lower limit of a string length wasn't changed**.
Therefore, any data (e.g. example string from issue #2680) was causing `meilisearch` to **panic**.
So I simply raised the minimum value from `100` to `135` (`50 + 85`) to ensure that `replace_range()` won't panic due to an inverted range.
At the same time I am in favor of the `85` value which was changed in the `@CNLHC's` last commit.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing ~issue~ pull request?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Tomasz Wiśniewski <tomasz@wisniewski.app>
2744: Minor fixes in the just added update-version CI r=ManyTheFish a=curquiza
These fixes does not prevent us to use the current CI
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2740: Update checkout v2 to v3 in CI manifests and use a unique GitHub PAT r=Kerollmops a=curquiza
Upgrade the missing checkout v2 into v3
Probably a bad merge conflicts that make them removed when merging `stable` into `main` after v0.28.0 release.
Also, use `MEILI_BOT_GH_PAT` instead of `PUBLISH_TOKEN` and the default github token, which allow us to remove useless GitHub secrets (once this PR is merged and v0.29.0 is release because `PUBLISH_TOKEN` is still used on `release-v0.29.0`)
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2738: Add missing env vars for dumps and snapshots features r=irevoire a=gmourier
# Pull Request
## What does this PR do?
Fixes#2721
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Guillaume Mourier <guillaume@meilisearch.com>
2741: Add CI to update the Meilisearch version in Cargo.toml files r=ManyTheFish a=curquiza
Add a CI we can trigger manually to create a PR updating the Meilisearch version
The next step is to create a Slack bot that will trigger this CI
In the meantime, we can trigger this CI manually in the [Actions tab](https://github.com/meilisearch/milli/actions)
The `MEILI_BOT_GH_PAT` secrets has been added to the organization level, and is accessible for the following repositories (so far): Meilisearch, Milli and Charabia
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2726: Add dry run for publishing binaries: check the compilation works r=Kerollmops a=curquiza
To avoid realizing the compilation of one type of binary does not work during the release, I create a dry run for binary compilation every day at 2am 😇
See the problem we had recently because missing this CI: https://github.com/meilisearch/meilisearch/issues/2718
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2713: Move prometheus behind a feature flag r=Kerollmops a=irevoire
We decided we wanted to continue working on this feature before making it public.
Co-authored-by: Tamo <tamo@meilisearch.com>
2702: Add link to the main image r=curquiza a=brunoocasali
I have wrapped the image with a `<a>` link, and it seems to be working fine, WDYT?
Co-authored-by: Bruno Casali <brunoocasali@gmail.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2504: New README 🌟 r=curquiza a=curquiza
⚠️ Please do not only look at the Markdown but also how the GitHub renders the README 😇👉👉 [Rendered](https://github.com/meilisearch/meilisearch/blob/new-readme/README.md) 👈👈
2697: Accept an environment variable to enable the metrics route r=ManyTheFish a=Kerollmops
With the PR Meilisearch is able to accept the `MEILI_ENABLE_METRICS_ROUTE` environment variable to enable the newly introduces metrics route.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2696: Add the new `metrics.get` and `metrics.all` actions rights r=Kerollmops a=Kerollmops
Follow the specification and add the new `metrics.get` and `metrics.all` actions, making the `/metrics` route only accessible with those rights.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2636: Upgrade milli to v0.33.0 r=Kerollmops a=ManyTheFish
# Summary
- Update milli to v0.33.0
- Classify the new InvalidLmdbOpenOptions error as an Internal error
- Update filter error check in tests
- Introduce Terms Matching Policies
fixes#2479fixes#2484fixes#2486fixes#2516fixes#2578fixes#2580fixes#2583fixes#2600fixes#2640fixes#2672fixes#2679fixes#2686
# Terms Matching Policies
This PR allows end users to customize matching term policies
## Todo
- [x] Update the API to return the number of pages and allow users to directly choose a page instead of computing an offset
- [x] Change generation of the query tree depending on the chosen settings https://github.com/meilisearch/milli/pull/598
## Small Documentation
### Default search query
**request**:
```sh
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "doctor of tokio" }'
```
**result**:
```json
{
"hits":[...],
"estimatedTotalHits":32,
"query":"doctor of tokio",
"limit":20,
"offset":0,
"processingTimeMs":7
}
```
The default behavior doesn't change with the current Meilisearch behavior:
If we don't have enough documents to fit the requested limit, we remove the query words from the last to the first typed word.
## Search query with `optionalWords` parameter
**request**:
```sh
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "doctor of tokio", "matchingStrategy": "all"}'
```
**result**:
```json
{
"hits":[...],
"estimatedTotalHits":1,
"query":"doctor of tokio",
"limit":20,
"offset":0,
"processingTimeMs":7
}
```
### allowed `matchingStrategy` values
#### `last`
The default behavior, If we don't have enough documents to fit the requested limit, we remove the query words from the last to the first typed word.
#### `all`
No word will be removed, If we don't have enough documents to fit the requested limit, we return the number of documents we found.
### In charge of the feature
Core: `@ManyTheFish` & `@curquiza`
Docs: TBD
Integration: `@bidoubiwa`
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2689: Use mimalloc as the global allocator r=Kerollmops a=loiclec
milli has switched its global allocator to mimalloc already, and we have seen some performance gains as a result. Furthermore, we can use mimalloc as the global allocator on all platforms whereas jemalloc was only activated on Linux.
This PR brings mimalloc to Meilisearch as well.
2690: Add LTO and codegen-units=1 to release compile options r=Kerollmops a=loiclec
This PR brings Meilisearch's release compile options in line with milli (see https://github.com/meilisearch/milli/pull/606 ).
Adding LTO and codegen=units=1 will make compile times longer, but they also speed up the final binary significantly.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>