474: Disable typos on exact word r=MarinPostma a=MarinPostma
This PR introduces the `exact_word` setting to disable typo tolerance on custom words.
If a user query contains a word from `exact_words`, no typo derivation will be made for that particular word.
I have chosen to store the words in a FST, to save on deserialization, and allow for fast lookups.
I had some trouble with the `serde` module, and had to rename it `serde_impl`.
## steps:
- [x] introduce new settings to register words to disable typos on
- [x] in `typos`, return exact match is the current word is part of the word to disable typos for.
- [x] update `Context` to return the exact words dictionary.
- [x] merge #473
Co-authored-by: ad hoc <postma.marin@protonmail.com>
473: set minimum word len for typos r=MarinPostma a=MarinPostma
this PR allows the configuration on the minimum word length for typos.
The default values are the same as previously.
## steps
- [x] introduce settings for the minimum word length for 1 and 2 typos
- [x] update the settings update flow to set this setting
- [x] create a structure `TypoConfig` to configure typo tolerance in the query builder
- [x] in `typo`, use the configuration to create the appropriate query tree node.
- [x] extend `Context` to return the setting for minimum word length for typos
- [x] return correct error message for wrong settings.
- [x] merge #469
Co-authored-by: ad hoc <postma.marin@protonmail.com>
485: fix bug on 2 typos derivation r=Kerollmops a=MarinPostma
I found a bug while working on #473. This pr fixes it and add the missing tests on word derivations.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
469: add authorize typo setting r=Kerollmops a=MarinPostma
This PR adds support for an authorize typo settings. This makes is possible to disable typos for a whole index. Typos are enabled by default.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
480: Increase benchmarks (push) CI timeout r=Kerollmops a=Kerollmops
This PR fixes the fact that the benchmarks CI on push were [canceled by GitHub](https://github.com/meilisearch/milli/actions/runs/2028844132) because they reached the default timeout of 6h. This PR changes the timeout to 72h, the same setting as the manually triggered benchmark one.
Co-authored-by: Kerollmops <clement@meilisearch.com>
479: Update version (v0.24.1) r=Kerollmops a=curquiza
From v0.23.1 to v0.24.1 since we had an issue with the versionning for the previous release
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
475: Bump tokenizer r=Kerollmops a=irevoire
This PR bump the tokenizer in v0.2.9 which fixes an issue we had with lindera where reqwest was used with openssl (which was breaking our benchmarks).
Co-authored-by: Irevoire <tamo@meilisearch.com>
476: Rollback meilisearch-tokenizer version r=Kerollmops a=irevoire
Lindera often fails to download some data from google drive we can’t compile consistently meilisearch / milli.
We can’t bump to the latest version (that moved out of google drive) either because lindera uses reqwest with openssl with no way of configuring it our benchmarks were not able to run. The latter issue should be fixed by https://github.com/lindera-morphology/lindera/pull/164.
Co-authored-by: Irevoire <tamo@meilisearch.com>
472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish
Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.
Co-authored-by: ManyTheFish <many@meilisearch.com>
468: Add a new error message when the filterableAttributes are empty r=Kerollmops a=brunoocasali
Fixes https://github.com/meilisearch/meilisearch/issues/2140
Is there a good way to reduce de duplication here? Maybe adding a shared function? I don't know the best and idiomatic way to do that, I appreciate any tip!
Another doubt is related to the duplication of the calling:
```rs
// filter.rs:373
FilterError::AttributeNotFilterable {
attribute,
filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
},
```
and
```rs
// filter.rs:424
return Err(point[0].as_external_error(FilterError::AttributeNotFilterable {
attribute: "_geo",
filterable: filterable_fields.into_iter().collect::<Vec<_>>().join(" "),
}))?;
```
I think we could make the `filterable_fields.into_iter().collect::<Vec<_>>().join(" ")` directly into the error handling like the sortable error. I made it into the last commit, if this is something to avoid, let me know and I can remove it :)
Co-authored-by: Bruno Casali <brunoocasali@gmail.com>