598: Matching query terms policy r=Kerollmops a=ManyTheFish
## Summary
Implement several optional words strategy.
## Content
Replace `optional_words` boolean with an enum containing several term matching strategies:
```rust
pub enum TermsMatchingStrategy {
// remove last word first
Last,
// remove first word first
First,
// remove more frequent word first
Frequency,
// remove smallest word first
Size,
// only one of the word is mandatory
Any,
// all words are mandatory
All,
}
```
All strategies implemented during the prototype are kept, but only `Last` and `All` will be published by Meilisearch in the `v0.29.0` release.
## Related
spec: https://github.com/meilisearch/specifications/pull/173
prototype discussion: https://github.com/meilisearch/meilisearch/discussions/2639#discussioncomment-3447699
Co-authored-by: ManyTheFish <many@meilisearch.com>
610: Share heed between all sub-crates r=Kerollmops a=irevoire
# Pull Request
## What does this PR do?
Use the reexported version of heed in the benchmarks and the fuzzer
Co-authored-by: Irevoire <tamo@meilisearch.com>
2674: Add analytics on the stats routes r=ManyTheFish a=irevoire
# Pull Request
## What does this PR do?
Implements https://github.com/meilisearch/specifications/pull/169
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Irevoire <tamo@meilisearch.com>
2678: Accept either an array of documents or a single document r=irevoire a=Kerollmops
# Pull Request
## What does this PR do?
Fixes#2671
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Clément Renault <clement@meilisearch.com>
609: Retry downloading the benchmarks datasets r=Kerollmops a=irevoire
Downloading the benchmarks datasets is failing [more and more](https://github.com/meilisearch/milli/pull/607#pullrequestreview-1076023074) often; thus, instead of fixing the issue, I thought we could retry multiple times.
Co-authored-by: Irevoire <tamo@meilisearch.com>
596: Filter operators: NOT + IN[..] r=irevoire a=loiclec
# Pull Request
## What does this PR do?
Implements the changes described in https://github.com/meilisearch/meilisearch/issues/2580
It is based on top of #556
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2677: Hide the batch_uid field from the tasks route r=Kerollmops a=Kerollmops
# Pull Request
## What does this PR do?
Fixes#2676
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Clément Renault <clement@meilisearch.com>
607: Better threshold r=Kerollmops a=irevoire
# Pull Request
## What does this PR do?
Fixes#570
This PR tries to improve the threshold used to trigger the real deletion of documents.
The deletion is now triggered in two cases;
- 10% of the total available space is used by soft deleted documents
- 90% of the total available space is used.
In this context, « total available space » means the `map_size` of lmdb.
And the size used by the soft deleted documents is actually an estimation. We can't determine precisely the size used by one document thus what we do is; take the total space used, divide it by the number of documents + soft deleted documents to estimate the size of one average document. Then multiply the size of one avg document by the number of soft deleted document.
--------
<img width="808" alt="image" src="https://user-images.githubusercontent.com/7032172/185083075-92cf379e-8ae1-4bfc-9ca6-93b54e6ab4e9.png">
Here we can see we have a ~10GB drift in the end between the space used by the soft deleted and the real space used by the documents.
Personally I don’t think that's a big issue because once the red line reach 90GB everything will be freed but now you know.
If you have an idea on how to improve this estimation I would love to hear it.
It look like the difference is linear so maybe we could simply multiply the current estimation by two?
Co-authored-by: Irevoire <tamo@meilisearch.com>