Commit Graph

2124 Commits

Author SHA1 Message Date
dependabot[bot]
b308463022
Bump actions/checkout from 2 to 3
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-06 16:08:51 +00:00
dependabot[bot]
5e85059a71
Bump Swatinem/rust-cache from 1.3.0 to 2.0.0
Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 1.3.0 to 2.0.0.
- [Release notes](https://github.com/Swatinem/rust-cache/releases)
- [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Swatinem/rust-cache/compare/v1.3.0...v2.0.0)

---
updated-dependencies:
- dependency-name: Swatinem/rust-cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-09-06 16:08:48 +00:00
bors[bot]
9e661f2cb9
Merge #623
623: Add dependabot for GHA r=Kerollmops a=curquiza

Same as we added in Meilisearch. Only runs once a month.
https://github.com/meilisearch/meilisearch/blob/main/.github/dependabot.yml

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-09-06 15:56:28 +00:00
Clémentine Urquizar
44192d754f
Add dependabot for GHA 2022-09-06 17:54:05 +02:00
bors[bot]
1fa851a8d0
Merge #622
622: Minor fixes in the just added update-version CI r=ManyTheFish a=curquiza

These fixes are minor, and do not prevent us to use the current CI

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-09-06 13:14:23 +00:00
Clémentine Urquizar
61abc61a69
Minor fixes in the just added update-version CI 2022-09-05 16:01:32 +02:00
bors[bot]
efee0e3f43
Merge #621
621: Add CI to update the Milli version r=ManyTheFish a=curquiza

Add a CI we can trigger manually to create a PR updating the Milli version
The next step is to create a Slack bot that will trigger this CI
In the meantime, we can trigger this CI manually in the [Actions tab](https://github.com/meilisearch/milli/actions)

The `MEILI_BOT_GH_PAT` secrets has been added to the organization level, and is accessible for the following repositories (so far): Meilisearch, Milli and Charabia

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-09-05 08:31:48 +00:00
Clémentine Urquizar
0639b14906
Add CI to update the Milli version 2022-09-04 11:49:50 +02:00
bors[bot]
f7c352a32d
Merge #620
620: Fix word criterion r=Kerollmops a=ManyTheFish

related to https://github.com/meilisearch/meilisearch/issues/2722

- fix the word strategy bug
- update milli version to v0.33.2

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-09-01 10:14:35 +00:00
ManyTheFish
bf750e45a1 Fix word removal issue 2022-09-01 12:10:47 +02:00
ManyTheFish
a38608fe59 Add test mixing phrased and no-phrased words 2022-09-01 12:02:10 +02:00
ManyTheFish
97a04887a3 Update version for next release (v0.33.2) in Cargo.toml 2022-09-01 11:47:23 +02:00
bors[bot]
17d020e996
Merge #618
618: Update version for next release (v0.33.1) in Cargo.toml r=Kerollmops a=curquiza

No breaking for this release

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-08-31 10:43:45 +00:00
Clémentine Urquizar
c3363706c5
Update version for next release (v0.33.1) in Cargo.toml 2022-08-31 11:37:27 +02:00
bors[bot]
2c2f3d38cc
Merge #617
617: Accept integers as document ids again r=irevoire a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2723 and will fix when this PR will be merged, a new release deployed and used in Meilisearch itself.

This PR makes the indexer to try to parse the values of the fields identified as numbers i.e. `id:number` as integer first then as float if it fails.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-08-31 09:25:17 +00:00
Clément Renault
7f92116b51
Accept again integers as document ids 2022-08-31 10:56:39 +02:00
bors[bot]
0b55e7ce6a
Merge #615
615: Remove the artifacts of the past r=Kerollmops a=irevoire



Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-08-23 14:22:43 +00:00
Irevoire
f6024b3269
Remove the artifacts of the past 2022-08-23 16:10:38 +02:00
bors[bot]
a79ff8a1a9
Merge #611
611: Upgrade charabia v0.6.0 r=curquiza a=ManyTheFish

# Pull Request

## What does this PR do?

- Update `log`
- Upgrade `charabia`

related to https://github.com/meilisearch/meilisearch/issues/2686


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-08-23 10:17:29 +00:00
bors[bot]
e314423653
Merge #613
613: Update version for next release (v0.33.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-08-23 10:01:20 +00:00
bors[bot]
d0521e493f
Merge #612
612: Remove Bors required test for Windows r=Kerollmops a=curquiza

Remove the required windows test for merging due to the issue with Lindera
https://github.com/meilisearch/milli/runs/7970141278?check_suite_focus=true

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-08-23 09:47:51 +00:00
Clémentine Urquizar
9ed7324995
Update version for next release (v0.33.0) 2022-08-23 11:47:48 +02:00
Clémentine Urquizar
e140227065
Remove Bors required test for Windows 2022-08-23 11:45:29 +02:00
bors[bot]
18886dc6b7
Merge #598
598: Matching query terms policy r=Kerollmops a=ManyTheFish

## Summary

Implement several optional words strategy.

## Content

Replace `optional_words` boolean with an enum containing several term matching strategies:
```rust
pub enum TermsMatchingStrategy {
    // remove last word first
    Last,
    // remove first word first
    First,
    // remove more frequent word first
    Frequency,
    // remove smallest word first
    Size,
    // only one of the word is mandatory
    Any,
    // all words are mandatory
    All,
}
```

All strategies implemented during the prototype are kept, but only `Last` and `All` will be published by Meilisearch in the `v0.29.0` release.

## Related

spec: https://github.com/meilisearch/specifications/pull/173
prototype discussion: https://github.com/meilisearch/meilisearch/discussions/2639#discussioncomment-3447699


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-08-22 15:51:37 +00:00
ManyTheFish
5391e3842c replace optional_words by term_matching_strategy 2022-08-22 17:47:19 +02:00
ManyTheFish
f9029727e0 Fix benchmarks 2022-08-22 14:55:53 +02:00
ManyTheFish
a5b9a35c50 Activate char_map for highlighting 2022-08-22 14:39:16 +02:00
ManyTheFish
ba5ca8a362 Upgrade charabia v0.6.0 2022-08-22 14:38:00 +02:00
ManyTheFish
5943e1c3b2 Update log dependency 2022-08-22 13:55:01 +02:00
bors[bot]
b46225070f
Merge #610
610: Share heed between all sub-crates r=Kerollmops a=irevoire

# Pull Request

## What does this PR do?
Use the reexported version of heed in the benchmarks and the fuzzer

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-08-22 08:44:31 +00:00
Irevoire
e7624abe63
share heed between all sub-crates 2022-08-19 11:23:41 +02:00
ManyTheFish
993aa1321c Fix query tree building 2022-08-18 17:56:06 +02:00
ManyTheFish
bff9653050 Fix remove count 2022-08-18 17:36:30 +02:00
ManyTheFish
9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00
bors[bot]
60a7221827
Merge #609
609: Retry downloading the benchmarks datasets r=Kerollmops a=irevoire

Downloading the benchmarks datasets is failing [more and more](https://github.com/meilisearch/milli/pull/607#pullrequestreview-1076023074) often; thus, instead of fixing the issue, I thought we could retry multiple times.


Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-08-18 11:47:09 +00:00
bors[bot]
afc10acd19
Merge #596
596: Filter operators: NOT + IN[..] r=irevoire a=loiclec

# Pull Request

## What does this PR do?
Implements the changes described in https://github.com/meilisearch/meilisearch/issues/2580
It is based on top of #556 

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-08-18 11:24:32 +00:00
Loïc Lecrenier
c7a86b56ef Fix filter parser compilation error 2022-08-18 13:16:56 +02:00
Loïc Lecrenier
9b6602cba2 Avoid cloning FilterCondition in filter array parsing 2022-08-18 13:06:57 +02:00
Loïc Lecrenier
8a271223a9 Change a macro_rules to a function in filter parser 2022-08-18 13:03:55 +02:00
Loïc Lecrenier
dd34dbaca5 Add more filter parser tests 2022-08-18 11:55:01 +02:00
Loïc Lecrenier
5d74ebd5e5 Cargo fmt 2022-08-18 11:36:38 +02:00
Loïc Lecrenier
9af69c151b Limit the maximum depth of filters
This should have no impact on the user but is there to safeguard
meilisearch against malicious inputs.
2022-08-18 11:31:38 +02:00
Loïc Lecrenier
c51dcad51b Don't recompute filterable fields in evaluation of IN[] filter 2022-08-18 10:59:21 +02:00
Loïc Lecrenier
98f0da6b38 Simplify representation of nested NOT filters 2022-08-18 10:58:24 +02:00
Loïc Lecrenier
b030efdc83 Fix parsing of IN[] filter followed by whitespace + factorise its impl 2022-08-18 10:58:04 +02:00
Irevoire
84a784834e
retry downloading the benchmarks datasets 2022-08-17 19:25:05 +02:00
bors[bot]
79094bcbcf
Merge #607
607: Better threshold r=Kerollmops a=irevoire

# Pull Request

## What does this PR do?
Fixes #570 

This PR tries to improve the threshold used to trigger the real deletion of documents.
The deletion is now triggered in two cases;
- 10% of the total available space is used by soft deleted documents
- 90% of the total available space is used.

In this context, « total available space » means the `map_size` of lmdb.
And the size used by the soft deleted documents is actually an estimation. We can't determine precisely the size used by one document thus what we do is; take the total space used, divide it by the number of documents + soft deleted documents to estimate the size of one average document. Then multiply the size of one avg document by the number of soft deleted document.

--------

<img width="808" alt="image" src="https://user-images.githubusercontent.com/7032172/185083075-92cf379e-8ae1-4bfc-9ca6-93b54e6ab4e9.png">

Here we can see we have a ~10GB drift in the end between the space used by the soft deleted and the real space used by the documents.
Personally I don’t think that's a big issue because once the red line reach 90GB everything will be freed but now you know.

If you have an idea on how to improve this estimation I would love to hear it.
It look like the difference is linear so maybe we could simply multiply the current estimation by two?

Co-authored-by: Irevoire <tamo@meilisearch.com>
2022-08-17 16:31:04 +00:00
Loïc Lecrenier
497f9817a2 Use snapshot testing for the filter parser 2022-08-17 17:35:01 +02:00
Irevoire
4aae07d5f5
expose the size methods 2022-08-17 17:07:38 +02:00
Irevoire
e96b852107
bump heed 2022-08-17 17:05:50 +02:00