Commit Graph

466 Commits

Author SHA1 Message Date
Louis Dureuil
492fc086f0
cargo fmt 2023-11-12 21:53:11 +01:00
Louis Dureuil
a2d0c73b41
Save the currently updating index so that the search can access it at all times 2023-11-10 10:52:03 +01:00
Louis Dureuil
f8289cd974
Use it from delete-by-filter 2023-11-09 14:23:15 +01:00
Louis Dureuil
ef6fa10f7a
Remove IndexOperation::DocumentDeletion 2023-11-06 12:16:15 +01:00
Louis Dureuil
cbaa54cafd
Fix clippy issues 2023-11-06 11:19:31 +01:00
Clément Renault
e507ef5932
Slow the logging down 2023-11-01 13:49:32 +01:00
Clément Renault
13416ccbf7
Introduce a new meilitool to help the cloud team 2023-10-30 14:30:20 +01:00
Clément Renault
dfab6293c9
Use an LMDB database to store the external documents ids 2023-10-30 11:41:23 +01:00
Louis Dureuil
652ac3052d
use new iterator in batch 2023-10-30 11:41:22 +01:00
Louis Dureuil
c534a1b687
Stop using delete documents pipeline in batch runner 2023-10-30 11:41:22 +01:00
Louis Dureuil
cf8dad1ca0
index_scheduler.features() is no longer fallible 2023-10-23 10:38:56 +02:00
bwbonanno
dd619913da Use RwLock to never persist cli state to db 2023-10-19 12:45:57 -07:00
bwbonanno
d8c649b3cd Return recoverable error if we fail to retrieve metrics state 2023-10-18 08:28:24 -07:00
bwbonanno
12fc878640 Merge remote-tracking branch 'origin/main' into enable-metrics-http 2023-10-16 13:48:01 -07:00
bwbonanno
689ec7c7ad Make the experimental route /metrics activable via HTTP 2023-10-13 22:12:54 +00:00
Clément Renault
3655d4bdca
Move the puffin file export logic into the run function 2023-10-13 13:11:30 +02:00
Clément Renault
055ca3935b
Update index-scheduler/src/batch.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-10-13 13:11:30 +02:00
Kerollmops
bf8fac6676
Fix the tests 2023-10-13 13:11:30 +02:00
Kerollmops
f2a9e1ebbb
Improve the debugging experience in the puffin reports 2023-10-13 13:11:30 +02:00
Kerollmops
513e61e9a3
Remove the experimental CLI flag 2023-10-13 13:11:29 +02:00
Kerollmops
90a626bf80
Use the runtime feature to enable puffin report exporting 2023-10-13 13:11:29 +02:00
Kerollmops
0d4acf2daa
Fix the metrics product URL 2023-10-13 13:11:29 +02:00
Kerollmops
58db8d85ec
Add the exportPuffinReports option to the runtime features route 2023-10-13 13:11:29 +02:00
Clément Renault
656dadabea
Expose an experimental flag to write the puffin reports to disk 2023-10-13 13:11:09 +02:00
Tamo
34fac115d5 fix clippy 2023-09-11 17:15:57 +02:00
Tamo
9258e5b5bf Fix the stats of the documents deletion by filter
The issue was that the operation « DocumentDeletionByFilter » was not
declared as an index operation. That means the indexes stats were not
reprocessed after the application of the operation.
2023-09-11 14:04:10 +02:00
meili-bors[bot]
e4e49e63d0
Merge #3993
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza



Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
Tamo
fe819a9d80 fix the get stats method
It was not taking into account the processing tasks at all
2023-08-08 13:21:15 +02:00
ManyTheFish
b45c36cd71 Merge branch 'main' into tmp-release-v1.3.0 2023-08-01 15:05:17 +02:00
Kerollmops
eef95de30e
First iteration on exposing puffin profiling 2023-07-18 17:38:13 +02:00
Clément Renault
22762808ab
Fix the tests 2023-07-06 12:13:29 +02:00
Clément Renault
86b834c9e4
Display the total number of tasks in the tasks route 2023-07-06 10:05:18 +02:00
meili-bors[bot]
aae099e330
Merge #3851
3851: Expose lastUpdate and isIndexing in /stats endpoint r=dureuill a=gentcys

# Pull Request

## Related issue
Fixes #3843

## What does this PR do?
- expose lastUpdate in `/stats` endpoint
- expose isIndex in `stats` endpoint
- add a method `is_task_processing` in index-scheduler/src/lib.rs.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Cong Chen <cong.chen@ocrlabs.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-03 13:41:04 +00:00
ManyTheFish
71500a4e15 Update tests 2023-07-03 11:20:43 +02:00
Louis Dureuil
324d448236
Format let-else ❤️ 🎉 2023-07-03 10:20:28 +02:00
Cong Chen
9859e65d2f fix tests 2023-07-01 09:32:50 +08:00
Cong Chen
3bdf01bc1c Fix failed test 2023-06-30 17:39:23 +08:00
Cong Chen
a5a31667b0 fix converse result of is_task_processing() 2023-06-30 11:28:18 +08:00
Cong Chen
e3fc7112bc use RoaringBitmap::is_empty instead 2023-06-29 11:46:47 +08:00
Kerollmops
816d7ed174
Update the Vector Store product feature link 2023-06-27 12:32:42 +02:00
Louis Dureuil
13e9b4c2e5
Add dump support 2023-06-26 16:29:43 +02:00
Louis Dureuil
072d81843f
Persistently save to DB the status of experimental features 2023-06-26 16:29:43 +02:00
Cong Chen
6d4981ec25 Expose lastUpdate and isIndexing in /stats endpoint 2023-06-23 07:24:25 +08:00
meili-bors[bot]
040b5a5b6f
Merge #3842
3842: fix some typos r=dureuill a=cuishuang

# Pull Request

## Related issue
Fixes #<issue_number>

## What does this PR do?
- fix some typos

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: cui fliter <imcusg@gmail.com>
2023-06-22 18:01:10 +00:00
cui fliter
530a3e2df3 fix some typos
Signed-off-by: cui fliter <imcusg@gmail.com>
2023-06-22 21:59:00 +08:00
meili-bors[bot]
45636d315c
Merge #3670
3670: Fix addition deletion bug r=irevoire a=irevoire

The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed.

It fixes https://github.com/meilisearch/meilisearch/issues/3440.

### What was happening?

The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents.
Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible.
The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced:
1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB.
2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand
3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform.

### Other things introduced in this  PR

Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug.
It's not perfect, but it's easy to improve in the future.
It'll also run for as long as possible on every merge on the main branch.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-19 09:09:30 +00:00
meili-bors[bot]
c1e3cc04b0
Merge #3811
3811: Bring back changes from `release-v1.2.0` to `main` r=Kerollmops a=curquiza



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-06-06 13:10:24 +00:00
Tamo
4a3405afec
comment the stats method 2023-06-06 12:59:58 +02:00
Tamo
3cfd653db1
Apply suggestions from code review
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-06 11:38:41 +02:00
Tamo
2acc3ec5ee
fix the type of the document deletion by filter tasks 2023-05-30 15:18:52 +02:00
Tamo
c9b65677bf
return the on disk size actually used by meilisearch 2023-05-25 18:30:30 +02:00
Tamo
c433bdd1cd add a view for the task queue in the metrics 2023-05-25 12:58:13 +02:00
Tamo
4391cba6ca
fix the addition + deletion bug 2023-05-17 18:28:57 +02:00
Tamo
d7ddf4925e
Revert "Disable autobatching of additions and deletions"
This reverts commit a94e78ffb0.
2023-05-17 14:25:50 +02:00
Tamo
96da5130a4
fix the error code in case of not filterable attributes on the get / delete documents by filter routes 2023-05-16 13:56:18 +02:00
Clément Renault
13f870e993
Fix typos and documentation issues 2023-05-15 15:11:45 +02:00
Kerollmops
f759ec7fad
Expose a flag to enable the MDB_WRITEMAP flag 2023-05-15 11:38:43 +02:00
Kerollmops
c4a40e7110
Use the writemap flag to reduce the memory usage 2023-05-15 10:15:33 +02:00
meili-bors[bot]
a95128df6b
Merge #3550
3550: Delete documents by filter r=irevoire a=dureuill

# Prototype `prototype-delete-by-filter-0`

Usage:
A new route is available under `POST /indexes/{index_uid}/documents/delete` that allows you to delete your documents by filter.
The expected payload looks like that:
```json
{
  "filter": "doggo = bernese",
}
```

It'll then enqueue a task in your task queue that'll delete all the documents matching this filter once it's processed.
Here is an example of the associated details;
```json
  "details": {
    "deletedDocuments": 53,
    "originalFilter": "\"doggo = bernese\""
  }
```

----------


# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3477

## What does this PR do?

### User standpoint

- Modifies the `/indexes/{:indexUid}/documents/delete-batch` route to accept either the existing array of documents ids, or a JSON object with a `filter` field representing a filter to apply. If that latter variant is used, any document matching the filter will be deleted.

### Implementation standpoint

- (processing time version) Adds a new BatchKind that is not autobatchable and that performs the delete by filter
- Reuse the `documentDeletion` task with a new `originalFilter` detail that replaces the `providedIds` detail.

## Example

<details>
<summary>Sample request, response and task result</summary>

Request:

```
curl \
  -X POST 'http://localhost:7700/indexes/index-10/documents/delete-batch' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "filter" : "mass = 600"}'
```

Response:

```
{
  "taskUid": 3902,
  "indexUid": "index-10",
  "status": "enqueued",
  "type": "documentDeletion",
  "enqueuedAt": "2023-02-28T20:50:31.667502Z"
}
```

Task log:

```json
    {
      "uid": 3906,
      "indexUid": "index-12",
      "status": "succeeded",
      "type": "documentDeletion",
      "canceledBy": null,
      "details": {
        "deletedDocuments": 3,
        "originalFilter": "\"mass = 600\""
      },
      "error": null,
      "duration": "PT0.001819S",
      "enqueuedAt": "2023-03-07T08:57:20.11387Z",
      "startedAt": "2023-03-07T08:57:20.115895Z",
      "finishedAt": "2023-03-07T08:57:20.117714Z"
    }
```

</details>

## Draft status

- [ ] Error handling
- [ ] Analytics
- [ ] Do we want to reuse the `delete-batch` route in this way, or create a new route instead?
- [ ] Should the filter be applied at request time or when the deletion task is processed? 
  - The first commit in this PR applies the filter at request time, meaning that even if a document is modified in a way that no longer matches the filter in a later update, it will be deleted as long as the deletion task is processed after that update. 
  - The other commits in this PR apply the filter only when the asynchronous deletion task is processed, meaning that documents that match the filter at processing time are deleted even if they didn't match the filter at request time.
- [ ] If keeping the filter at request time, find a more elegant way to recover the user document ids from the internal document ids. The current way implemented in the first commit of this PR involves getting all the documents matching the filter, looking for the value of their primary key, and turning it into a string by copy-pasting routines found in milli...
- [ ] Security consideration, if any
- [ ] Fix the tests (but waiting until product questions are resolved)
- [ ] Add delete by filter specific tests



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-05-04 10:44:41 +00:00
meili-bors[bot]
da220294f6
Merge #3639
3639: Add a dedicated error variant for planned failures in index scheduler tests r=Kerollmops a=Sufflope

# Pull Request

## Related issue
Fixes #3086

## What does this PR do?
- Add a dedicated test variant in test cfg to avoid reusing a misleading existing error

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Jean-Sébastien Bour <jean-sebastien@bour.name>
2023-05-04 09:33:57 +00:00
Louis Dureuil
d8381eb790
Fix originalFilter 2023-05-04 10:07:59 +02:00
Louis Dureuil
b212aef5db
add one nanosecond to generated filter so as to generate a filter that would have matched the last task to delete 2023-05-04 09:56:48 +02:00
Louis Dureuil
52ab114f6c
Fix test on macOS: 50 tasks would result in the test consistently failing on a local macOS 2023-05-04 00:06:49 +02:00
Tamo
dcbfecf42c
make the generated filter valid 2023-05-04 00:06:49 +02:00
Tamo
9ca6f59546
Update index-scheduler/src/lib.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-04 00:06:49 +02:00
Tamo
aa7537a11e
make the autodeletion work with a fixed number of tasks and update the tests 2023-05-04 00:06:49 +02:00
Tamo
972bb2831c
log when meilisearch need to delete tasks 2023-05-04 00:06:49 +02:00
Tamo
f9ddd32545
implement the auto-deletion of tasks 2023-05-04 00:06:49 +02:00
Tamo
0f0cd2d929
handle the array of array form of filter in the dumps 2023-05-03 17:41:50 +02:00
Tamo
6df2ba93a9
remove one useless txn 2023-05-03 17:41:49 +02:00
Louis Dureuil
3680a6bf1e
extract impl to a function 2023-05-03 17:41:49 +02:00
Louis Dureuil
732c52093d
Processing time without autobatching implementation 2023-05-03 17:41:48 +02:00
Jean-Sébastien Bour
d09b771bce
Add a dedicated error variant for planned failures in index scheduler tests
Fixes #3086
2023-05-02 14:37:20 +02:00
Tamo
0b2200e6e7
remove the unused snapshot files 2023-04-25 17:55:27 +02:00
Kerollmops
a109802d45
Upgrade the incompatible versions of the dependencies 2023-04-24 17:50:57 +02:00
Kerollmops
47b66e49b8
Upgrade the compatible versions of the dependencies 2023-04-24 17:50:52 +02:00
bors[bot]
654a3a9e19
Merge #3688
3688: Following release v1.1.1: bring back changes into `main` r=curquiza a=curquiza

`@meilisearch/engine-team` ensure the changes we bring to `main` are the ones you want

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
2023-04-24 11:38:23 +00:00
Louis Dureuil
fd583501d7
Use non_free_pages_size instead of real_disk_size to check task db space taken 2023-04-13 17:07:44 +02:00
bors[bot]
f9960be115
Merge #3659
3659: stops receiving tasks once the task queue is full r=Kerollmops a=irevoire

Give 20GiB to the task queue + once 50% of the task queue is used, it blocks itself and only receives task deletion requests to ensure we never get in a state where we can’t do anything.

Also, create a new error message when we reach this case:
```
Meilisearch cannot receive write operations because the size limit of the tasks database has been reached. Please delete tasks to continue performing write operations.
```

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-04-13 09:11:12 +00:00
Tamo
b4fabce36d
update the error message + update the task db size to 20GiB with a limit at 50% 2023-04-12 18:54:11 +02:00
Tamo
be69ab320d
stops receiving tasks once the task queue is full 2023-04-12 18:54:11 +02:00
Louis Dureuil
a94e78ffb0
Disable autobatching of additions and deletions 2023-04-12 10:53:00 +02:00
Tamo
4d308d5237 Improve the health route by ensuring lmdb is not down
And refactorize slightly the auth controller.
2023-04-06 15:31:42 +02:00
Tamo
597d57bf1d Merge branch 'main' into bring-back-changes-v1.1.0 2023-04-05 11:32:14 +02:00
Tamo
3fb67f94f7 Reduce the time to import a dump by caching some datas
With this commit, for a dump containing 1M tasks we went form 1m02 to 6s
2023-03-29 14:44:15 +02:00
Tamo
cf5145b542 Reduce the time to import a dump
With this commit, for a dump containing 1M tasks we went from 3m36s to import the task queue down to 1m02s
2023-03-29 14:27:40 +02:00
Tamo
a2b151e877 ensure that the task queue is correctly imported
reduce the size of the snapshots file
2023-03-21 14:41:46 +01:00
bors[bot]
667bb87e35
Merge #3541
3541: Add cache on the indexes stats r=dureuill a=irevoire

Fix https://github.com/meilisearch/meilisearch/issues/3540

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-03-09 13:32:52 +00:00
Louis Dureuil
7faa9a22f6
Pass IndexStat by ref in store_stats_of 2023-03-07 14:00:54 +01:00
Louis Dureuil
76288fad72 Fix snapshots 2023-03-06 16:57:31 +01:00
Louis Dureuil
076a3d371c Eagerly compute stats as fallback to the cache.
- Refactor all around to avoid spawning indexes more times than necessary
2023-03-06 16:57:31 +01:00
Tamo
3bbf760542 update most snapshots 2023-03-06 16:57:31 +01:00
Tamo
fd5c48941a Add cache on the indexes stats 2023-03-06 16:57:31 +01:00
Tamo
e704728ee7 fix the snapshots permissions on unix system 2023-03-06 16:28:40 +01:00
Louis Dureuil
0202ff8ab4
Attempt to use default budget for faster startup 2023-02-28 10:55:43 +01:00
Louis Dureuil
71e7900c67
move index_map to file 2023-02-23 11:29:11 +01:00
Louis Dureuil
431782f3ee
Move index_mapper to mod.rs 2023-02-23 11:29:11 +01:00
Louis Dureuil
3db613ff77
Don't iterate all indexes manually 2023-02-23 11:29:09 +01:00
Louis Dureuil
5822764be9
Skip computing index budget in tests 2023-02-23 11:23:39 +01:00
Louis Dureuil
a529bf160c
Compute budget 2023-02-23 11:23:39 +01:00