7437 Commits

Author SHA1 Message Date
ManyTheFish
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
bors[bot]
b08a49a16e
Merge #3319 #3470
3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill

# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760

## What does this PR do?

### User standpoint

- Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error.
- Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred.

### Implementation standpoint

- Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one
- `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field
- `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick.
- Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again.
- Add `IndexMapper::resize_index` to increase the size of an index.
- In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task.

## Testing the PR

The following diff can be applied to this branch to make testing the PR easier:

<details>


```diff
diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs
index 553ab45a..022b2f00 100644
--- a/index-scheduler/src/index_mapper.rs
+++ b/index-scheduler/src/index_mapper.rs
`@@` -228,13 +228,15 `@@` impl IndexMapper {
 
         drop(lock);
 
+        std:🧵:sleep_ms(2000);
+
         let current_size = index.map_size()?;
         let closing_event = index.prepare_for_closing();
-        log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2);
+        log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2);
 
         closing_event.wait();
 
-        log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2);
+        log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2);
 
         let index_path = self.base_path.join(uuid.to_string());
         let index = self.create_or_open_index(&index_path, None, 2 * current_size)?;
`@@` -268,8 +270,10 `@@` impl IndexMapper {
             match index {
                 Some(Available(index)) => break index,
                 Some(BeingResized(ref resize_operation)) => {
+                    log::error!("waiting for resize end");
                     // Deadlock: no lock taken while doing this operation.
                     resize_operation.wait();
+                    log::error!("trying our luck again!");
                     continue;
                 }
                 Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())),
diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs
index 11b17d05..242dc095 100644
--- a/index-scheduler/src/lib.rs
+++ b/index-scheduler/src/lib.rs
`@@` -908,6 +908,7 `@@` impl IndexScheduler {
     ///
     /// Returns the number of processed tasks.
     fn tick(&self) -> Result<TickOutcome> {
+        log::error!("ticking!");
         #[cfg(test)]
         {
             *self.run_loop_iteration.write().unwrap() += 1;
diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs
index 050c825a..63f312f6 100644
--- a/meilisearch/src/main.rs
+++ b/meilisearch/src/main.rs
`@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> {
 
 #[actix_web::main]
 async fn main() -> anyhow::Result<()> {
-    let (opt, config_read_from) = Opt::try_build()?;
+    let (mut opt, config_read_from) = Opt::try_build()?;
 
     setup(&opt)?;
 
`@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token):
         _ => (),
     }
 
+    opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap();
+
     let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?;
 
     #[cfg(all(not(debug_assertions), feature = "analytics"))]
```
</details>

Mainly, these debug changes do the following:

- Set the default index size to 1MiB so that index resizes are initially frequent
- Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos)
- Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests)

## Open questions

- Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


3470: Autobatch addition and deletion r=irevoire a=irevoire

This PR adds the capability to meilisearch to batch document addition and deletion together.

Fix https://github.com/meilisearch/meilisearch/issues/3440

--------------

Things to check before merging;

- [x] What happens if we delete multiple time the same documents -> add a test
- [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:00:19 +00:00
ManyTheFish
23f4e82b53 Add test ensuring that Meilisearch works on kanji only requests 2023-02-20 15:43:29 +01:00
Many the fish
119e6d8811
Update milli/src/search/mod.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:33:10 +01:00
bors[bot]
a8f6f108e0
Merge #3515
3515: Consider null as a valid geo field r=irevoire a=irevoire

Fix #3497
Associated spec; https://github.com/meilisearch/specifications/pull/222

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 14:12:55 +00:00
bors[bot]
97b8c32e22
Merge #3514
3514: Bump version of mini-dashboard to v0.2.6 r=irevoire a=bidoubiwa

Update the version of the mini-dashboard to v0.2.6.

See [release notes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.6).

Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
2023-02-20 13:21:00 +00:00
ManyTheFish
cb8d5f2d4b Update Charabia to 0.7.1 2023-02-20 14:00:31 +01:00
Louis Dureuil
35f6c624bc
Make sure we don't leave the in memory hashmap in an inconsistent state 2023-02-20 13:55:32 +01:00
Louis Dureuil
1116788475
Resize indexes when they're full 2023-02-20 13:55:32 +01:00
Louis Dureuil
951a5b5832
Add IndexMapper::resize_index fn 2023-02-20 13:55:32 +01:00
Louis Dureuil
1c670d7fa0
Add IndexStatus::BeingResized 2023-02-20 13:55:32 +01:00
Louis Dureuil
6cc3797aa1
IndexScheduler::tick returns a TickOutcome 2023-02-20 13:55:31 +01:00
Louis Dureuil
faf1e17a27
create_or_open_index takes a map_size argument 2023-02-20 13:55:31 +01:00
Louis Dureuil
4c519c2ab3
Add Batch::index_uid 2023-02-20 13:55:31 +01:00
Charlotte Vermandel
dd120e0e16 Bump version of mini-dashboard to v0.2.6 2023-02-20 13:45:57 +01:00
Tamo
18796d6e6a Consider null as a valid geo object 2023-02-20 13:45:51 +01:00
bors[bot]
c91bfeaf15
Merge #3467
3467: Identify builds git tagged with `prototype-...` in CLI and analytics r=curquiza a=dureuill

# Pull Request

## What does this PR do?

- Parses the last git tag to extract a prototype name if:
  - Current build uses the prototype tag (not after the tag) precisely
  - The prototype tag name respects the following conditions:
    1. starts with `prototype-`
    2. ends with a number
    3. the hyphen-separated segment right before the number is not a number (required to reject commits after the tag).
- Display the prototype name in the launch summary in the CLI
- Send the prototype name to analytics if any
- Update prototypes instructions in CONTRIBUTING.md

|`VERGEN_GIT_SEMVER_LIGHTWEIGHT` value | Prototype |
|---|---|
| `Some("prototype-geo-bounding-box-0-139-gcde89018")` | `None` (does not end with a number) |
| `Some("prototype-geo-bounding-box-0-139-89018")` | `None` (before the last segment is a number) |
| `Some("prototype-geo-bounding-box-0")` | `Some("prototype-geo-bounding-box-0")` |
| `Some("prototype-geo-bounding-box")` | `None` (does not end with a number") |
| `Some("geo-bounding-box-0")` | `None` (does not start with "prototype") |
| `None` | `None` | 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-20 09:27:51 +00:00
bors[bot]
28961b2ad1
Merge #3499
3499: Use the workspace inheritance r=Kerollmops a=irevoire

Use the workspace inheritance [introduced in rust 1.64](https://blog.rust-lang.org/2022/09/22/Rust-1.64.0.html#cargo-improvements-workspace-inheritance-and-multi-target-builds).

It allows us to define the version of meilisearch once in the main `Cargo.toml` and let all the other `Cargo.toml` uses this version.

`@curquiza` I added you as a reviewer because I had to patch some CI scripts

And `@Kerollmops,` I had to bump the `cargo_toml` crates because our version was getting old and didn't support the feature yet.

Also, in another PR, I would like to unify some of our dependencies to ensure we always stay in sync between all our crates.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-17 09:52:29 +00:00
Tamo
895ab2906c apply review suggestions 2023-02-16 18:42:47 +01:00
Tamo
f11c7d4b62 cargo run execute meilisearch by default 2023-02-16 18:03:45 +01:00
Louis Dureuil
49e18da23e
Do not escape tag name
$() syntax is not interpreted by the Dockerfile
2023-02-16 10:53:14 +01:00
Louis Dureuil
54240db495
Add note in code so one does not forget next time 2023-02-16 10:53:14 +01:00
Louis Dureuil
e1ed4bc750
Change Dockerfile to also pass the VERGEN_GIT_SEMVER_LIGHTWEIGHT when building 2023-02-16 10:53:14 +01:00
Louis Dureuil
9bd1cfb3a3
Ignore -dirty flag 2023-02-16 10:53:14 +01:00
Louis Dureuil
a341c94871
Update contributing.md 2023-02-16 10:53:14 +01:00
Louis Dureuil
f46cf46b8c
Add prototype to analytics if any 2023-02-16 10:53:14 +01:00
Louis Dureuil
c3a30a5a91
If using a prototype, display its name at Meilisearch startup 2023-02-16 10:53:14 +01:00
bors[bot]
143e3cf948
Merge #3490
3490: Fix attributes set candidates r=curquiza a=ManyTheFish

# Pull Request

Fix attributes set candidates for v1.1.0

## details

The attribute criterion was not returning the remaining candidates when its internal algorithm was been exhausted.
We had a loss of candidates by the attribute criterion leading to the bug reported in the issue linked below.
After some investigation, it seems that it was the only criterion that had this behavior.

We are now returning the remaining candidates instead of an empty bitmap.

## Related issue

Fixes #3483
PR on milli for v1.0.1: https://github.com/meilisearch/milli/pull/777


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-02-15 17:38:07 +00:00
Tamo
ab2adba183 update our CI scripts accordingly 2023-02-15 13:56:24 +01:00
Tamo
74d1a67a99 Use the workspace inheritance feature of rust 1.64 2023-02-15 13:51:07 +01:00
bors[bot]
91ce8a5e67
Merge #3492
3492: Bump deserr r=Kerollmops a=irevoire

Bump deserr to the latest version;
- We now use the default actix-web extractors that deserr provides (which were copy/pasted from meilisearch)
- We also use the default `JsonError` message provided by deserr instead of defining our own in meilisearch
- Finally, we get the new `did you mean?` error message. Fix #3493

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-15 10:05:05 +00:00
bors[bot]
fd7ae1883b
Merge #3495
3495: Add tests with rust nightly in CI r=curquiza a=ztkmkoo

# Pull Request

## Related issue
Fixes #3402 

## What does this PR do?
- add ci test with rust nightly
- make test with rust stable not run on schedule event

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Kebron <ztkmkoo@gmail.com>
2023-02-15 07:53:17 +00:00
Tamo
42a3cdca66
get rids of the unwrap_any function in favor of take_cf_content 2023-02-14 20:06:31 +01:00
Tamo
a43765d454
use the pre-defined deserr extractors 2023-02-14 20:05:30 +01:00
Tamo
769576fd94
get rids of the whole error_message module since it has been integrated into the last version of deserr 2023-02-14 20:05:27 +01:00
Tamo
8fb7b1d10f
bump deserr 2023-02-14 20:04:30 +01:00
bors[bot]
d494c29768
Merge #3479
3479: Unify "Bad latitude" & "Bad longitude" errors r=irevoire a=cymruu

# Pull Request

## Related issue
Fix part of #3006

## What does this PR do?
- Moved out `BadGeoLat`, `BadGeoLng`, `BadGeoBoundingBoxTopIsBelowBottom` from `FilterError` into newly introduced error type `ParseGeoError`. 
- Renamed `BadGeo` error  to `ReservedGeo`
- Used new `ParseGeoError` type in `FilterError` and `AscDescError`

Screenshot: 
![image](https://user-images.githubusercontent.com/2981598/217927231-fe23b6a3-2ea8-4145-98af-38eb61c4ff16.png)

I ran `cargo test --package milli -- --test-threads 1` and tests passed.
`--test-threads` was set to 1 because my OS complained about too many opened files.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?


Co-authored-by: Filip Bachul <filipbachul@gmail.com>
Co-authored-by: filip <filipbachul@gmail.com>
2023-02-14 18:35:51 +00:00
Tamo
74dcfe9676
Fix a bug when you update a document that was already present in the db, deleted and then inserted again in the same transform 2023-02-14 19:09:40 +01:00
Tamo
1b1703a609
make a small optimization to merge obkvs a little bit faster 2023-02-14 18:32:41 +01:00
Tamo
fb5e4957a6
fix and test the early exit in case a grenad ends with a deletion 2023-02-14 18:23:57 +01:00
Tamo
8de3c9f737
Update milli/src/update/index_documents/transform.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-02-14 17:57:14 +01:00
Tamo
43a19d0709
document the operation enum + the grenads 2023-02-14 17:55:26 +01:00
Tamo
29d14bed90
get rids of the let/else syntax 2023-02-14 17:45:46 +01:00
bors[bot]
f3b54337f9
Merge #3174
3174: Allow wildcards at the end of index names for API Keys and Tenant tokens r=irevoire a=Kerollmops

This PR introduces the wildcards at the end of the index names when identifying indexes in the API Keys and tenant tokens. It fixes #2788 and fixes #2908. This PR is based on `@akhildevelops'` work.

Note that when a tenant token filter is chosen to restrict a search, it is always the most restrictive pattern that is chosen. If we have an index pattern _prod*_ that defines _filter1_ and _p*_ that defines _filter2_, the engine will choose _filter1_ over _filter2_ as it is defined for a most restrictive pattern, _prod*_. This restrictiveness is defined by 1. is it exact, without _*_ 2. the length of the pattern.

It is a continuation of work that has already started and should close #2869.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-02-14 16:12:01 +00:00
Clément Renault
7f3ae40204
Remove a useless comment regarding the index pattern error code 2023-02-14 17:09:20 +01:00
Filip Bachul
a53536836b fmt 2023-02-14 17:04:22 +01:00
Kebron
b095325bf8 Add tests with rust nightly in CI 2023-02-14 15:33:12 +00:00
Filip Bachul
d7ad39ad77 fix: clippy error 2023-02-14 00:15:35 +01:00
Filip Bachul
849de089d2 add thiserror for AscDescError 2023-02-14 00:15:35 +01:00
filip
7f25007d31 Update milli/src/asc_desc.rs
Co-authored-by: Tamo <irevoire@protonmail.ch>
2023-02-14 00:15:35 +01:00