323 Commits

Author SHA1 Message Date
Tamo
cf5145b542 Reduce the time to import a dump
With this commit, for a dump containing 1M tasks we went from 3m36s to import the task queue down to 1m02s
2023-03-29 14:27:40 +02:00
Louis Dureuil
0202ff8ab4
Attempt to use default budget for faster startup 2023-02-28 10:55:43 +01:00
Louis Dureuil
71e7900c67
move index_map to file 2023-02-23 11:29:11 +01:00
Louis Dureuil
431782f3ee
Move index_mapper to mod.rs 2023-02-23 11:29:11 +01:00
Louis Dureuil
3db613ff77
Don't iterate all indexes manually 2023-02-23 11:29:09 +01:00
Louis Dureuil
5822764be9
Skip computing index budget in tests 2023-02-23 11:23:39 +01:00
Louis Dureuil
a529bf160c
Compute budget 2023-02-23 11:23:39 +01:00
Louis Dureuil
f1119f2dc2
Add dichotomic search to utils 2023-02-23 11:23:39 +01:00
Louis Dureuil
1db7d5d851
Add basic tests for index eviction and resize 2023-02-23 11:23:39 +01:00
Louis Dureuil
80b060f920
Use LRU cache 2023-02-23 11:23:39 +01:00
Louis Dureuil
fdf043580c
Add LruMap 2023-02-23 11:23:38 +01:00
Louis Dureuil
42577403d8
Authentication: Directly pass the authfilter to the index scheduler 2023-02-22 16:35:52 +01:00
bors[bot]
b08a49a16e
Merge #3319 #3470
3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill

# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760

## What does this PR do?

### User standpoint

- Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error.
- Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred.

### Implementation standpoint

- Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one
- `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field
- `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick.
- Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again.
- Add `IndexMapper::resize_index` to increase the size of an index.
- In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task.

## Testing the PR

The following diff can be applied to this branch to make testing the PR easier:

<details>


```diff
diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs
index 553ab45a..022b2f00 100644
--- a/index-scheduler/src/index_mapper.rs
+++ b/index-scheduler/src/index_mapper.rs
`@@` -228,13 +228,15 `@@` impl IndexMapper {
 
         drop(lock);
 
+        std:🧵:sleep_ms(2000);
+
         let current_size = index.map_size()?;
         let closing_event = index.prepare_for_closing();
-        log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2);
+        log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2);
 
         closing_event.wait();
 
-        log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2);
+        log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2);
 
         let index_path = self.base_path.join(uuid.to_string());
         let index = self.create_or_open_index(&index_path, None, 2 * current_size)?;
`@@` -268,8 +270,10 `@@` impl IndexMapper {
             match index {
                 Some(Available(index)) => break index,
                 Some(BeingResized(ref resize_operation)) => {
+                    log::error!("waiting for resize end");
                     // Deadlock: no lock taken while doing this operation.
                     resize_operation.wait();
+                    log::error!("trying our luck again!");
                     continue;
                 }
                 Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())),
diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs
index 11b17d05..242dc095 100644
--- a/index-scheduler/src/lib.rs
+++ b/index-scheduler/src/lib.rs
`@@` -908,6 +908,7 `@@` impl IndexScheduler {
     ///
     /// Returns the number of processed tasks.
     fn tick(&self) -> Result<TickOutcome> {
+        log::error!("ticking!");
         #[cfg(test)]
         {
             *self.run_loop_iteration.write().unwrap() += 1;
diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs
index 050c825a..63f312f6 100644
--- a/meilisearch/src/main.rs
+++ b/meilisearch/src/main.rs
`@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> {
 
 #[actix_web::main]
 async fn main() -> anyhow::Result<()> {
-    let (opt, config_read_from) = Opt::try_build()?;
+    let (mut opt, config_read_from) = Opt::try_build()?;
 
     setup(&opt)?;
 
`@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token):
         _ => (),
     }
 
+    opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap();
+
     let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?;
 
     #[cfg(all(not(debug_assertions), feature = "analytics"))]
```
</details>

Mainly, these debug changes do the following:

- Set the default index size to 1MiB so that index resizes are initially frequent
- Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos)
- Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests)

## Open questions

- Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


3470: Autobatch addition and deletion r=irevoire a=irevoire

This PR adds the capability to meilisearch to batch document addition and deletion together.

Fix https://github.com/meilisearch/meilisearch/issues/3440

--------------

Things to check before merging;

- [x] What happens if we delete multiple time the same documents -> add a test
- [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:00:19 +00:00
Louis Dureuil
35f6c624bc
Make sure we don't leave the in memory hashmap in an inconsistent state 2023-02-20 13:55:32 +01:00
Louis Dureuil
1116788475
Resize indexes when they're full 2023-02-20 13:55:32 +01:00
Louis Dureuil
951a5b5832
Add IndexMapper::resize_index fn 2023-02-20 13:55:32 +01:00
Louis Dureuil
1c670d7fa0
Add IndexStatus::BeingResized 2023-02-20 13:55:32 +01:00
Louis Dureuil
6cc3797aa1
IndexScheduler::tick returns a TickOutcome 2023-02-20 13:55:31 +01:00
Louis Dureuil
faf1e17a27
create_or_open_index takes a map_size argument 2023-02-20 13:55:31 +01:00
Louis Dureuil
4c519c2ab3
Add Batch::index_uid 2023-02-20 13:55:31 +01:00
Tamo
74d1a67a99 Use the workspace inheritance feature of rust 1.64 2023-02-15 13:51:07 +01:00
Tamo
29d14bed90
get rids of the let/else syntax 2023-02-14 17:45:46 +01:00
Clément Renault
4570d5bf3a
Merge remote-tracking branch 'origin/main' into temp-wildcard 2023-02-09 13:14:05 +01:00
Tamo
eaad84bd1d fix the test to handle the document deletion correctly 2023-02-09 11:29:13 +01:00
Tamo
ea9ac46f28
stop autobatching the deletion without the index creation right with the addition 2023-02-08 21:24:27 +01:00
Tamo
93f130a400
fix all warnings 2023-02-08 20:57:35 +01:00
Tamo
860c993ef7
Handle the autobatching of deletion and addition in the scheduler 2023-02-08 20:53:19 +01:00
Tamo
67dda0678f
cleanup the autobatcher a little bit 2023-02-08 18:10:59 +01:00
Tamo
2db6347686
update the autobatcher to batch the addition and deletion together 2023-02-08 18:07:59 +01:00
Kerollmops
a36b1dbd70
Fix the tasks with the new patterns 2023-02-01 18:21:45 +01:00
Louis Dureuil
924d5d4c11
clippy: remove needless lifetimes 2023-01-31 10:40:48 +01:00
Tamo
a858531574 apply review comments 2023-01-25 14:51:36 +01:00
Tamo
bf94f89035 Update index-scheduler/src/lib.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-25 11:31:50 +01:00
Tamo
3bcff60d1c makes clippy happy 2023-01-25 11:31:48 +01:00
Tamo
c92948b143 Compute the size of the auth-controller, index-scheduler and all update files in the global stats 2023-01-25 11:25:02 +01:00
Tamo
c7b2e3be87
apply review comments 2023-01-24 17:54:43 +01:00
Tamo
ea3b269b77 reformat 2023-01-23 23:59:34 +01:00
Tamo
a4be4c49e8
Update index-scheduler/src/batch.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-01-23 23:58:03 +01:00
Tamo
7d1ebb7295
add test on the autobatcher layer 2023-01-23 20:56:12 +01:00
Tamo
767cb725a5
reimplement the batching of task with or without primary key in the autobatcher 2023-01-23 20:18:22 +01:00
Tamo
5672118bfa
When adding documents, trying to update the primary-key now throw an error
While updating the test suite I also noticed an issue with the indexed_documents value of failed task and had to update it.
I also named a bunch of snapshots that had no name sorry 😬
2023-01-23 17:32:13 +01:00
Louis Dureuil
72e2b220ed
Fix tests 2023-01-19 15:48:20 +01:00
Tamo
e8e7070cc6
improve the error message when no task filter are specified for the cancelation or deletion of tasks 2023-01-19 12:42:08 +01:00
bors[bot]
3e5b3df487
Merge #3370 #3373 #3375
3370: make the swap indexes not found errors return an IndexNotFound error-code r=irevoire a=irevoire

Fix https://github.com/meilisearch/meilisearch/issues/3368

3373: fix a wrong error code and add tests on the document resource r=irevoire a=irevoire

Fix https://github.com/meilisearch/meilisearch/issues/3371

3375: Avoid deleting all task invalid canceled by r=irevoire a=Kerollmops

Fixes #3369 by making sure that at least one `canceledBy` task filter parameter matches something.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-01-18 15:21:11 +00:00
Kerollmops
e89973f1bf
Do not delete all tasks when no canceled-by matches 2023-01-18 15:50:46 +01:00
Tamo
57da80900d
make the swap indexes not found errors return an IndexNotFound error code 2023-01-18 14:16:00 +01:00
Loïc Lecrenier
2bc2e99ff3
Simplify declaration of the error codes 2023-01-11 19:08:39 +01:00
Tamo
e706628bb1 fix the error code of the swap index route 2023-01-06 14:48:25 +01:00
Tamo
50ce0409bc
Integrate deserr on the most important routes 2023-01-05 20:48:29 +01:00
Loïc Lecrenier
2d74678b51 Replace underscores with hyphens in doc link to error code 2023-01-05 10:09:02 +01:00