3550: Delete documents by filter r=irevoire a=dureuill

# Prototype `prototype-delete-by-filter-0`

Usage:
A new route is available under `POST /indexes/{index_uid}/documents/delete` that allows you to delete your documents by filter.
The expected payload looks like that:
```json
{
  "filter": "doggo = bernese",
}
```

It'll then enqueue a task in your task queue that'll delete all the documents matching this filter once it's processed.
Here is an example of the associated details;
```json
  "details": {
    "deletedDocuments": 53,
    "originalFilter": "\"doggo = bernese\""
  }
```

----------


# Pull Request

## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3477

## What does this PR do?

### User standpoint

- Modifies the `/indexes/{:indexUid}/documents/delete-batch` route to accept either the existing array of documents ids, or a JSON object with a `filter` field representing a filter to apply. If that latter variant is used, any document matching the filter will be deleted.

### Implementation standpoint

- (processing time version) Adds a new BatchKind that is not autobatchable and that performs the delete by filter
- Reuse the `documentDeletion` task with a new `originalFilter` detail that replaces the `providedIds` detail.

## Example

<details>
<summary>Sample request, response and task result</summary>

Request:

```
curl \
  -X POST 'http://localhost:7700/indexes/index-10/documents/delete-batch' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "filter" : "mass = 600"}'
```

Response:

```
{
  "taskUid": 3902,
  "indexUid": "index-10",
  "status": "enqueued",
  "type": "documentDeletion",
  "enqueuedAt": "2023-02-28T20:50:31.667502Z"
}
```

Task log:

```json
    {
      "uid": 3906,
      "indexUid": "index-12",
      "status": "succeeded",
      "type": "documentDeletion",
      "canceledBy": null,
      "details": {
        "deletedDocuments": 3,
        "originalFilter": "\"mass = 600\""
      },
      "error": null,
      "duration": "PT0.001819S",
      "enqueuedAt": "2023-03-07T08:57:20.11387Z",
      "startedAt": "2023-03-07T08:57:20.115895Z",
      "finishedAt": "2023-03-07T08:57:20.117714Z"
    }
```

</details>

## Draft status

- [ ] Error handling
- [ ] Analytics
- [ ] Do we want to reuse the `delete-batch` route in this way, or create a new route instead?
- [ ] Should the filter be applied at request time or when the deletion task is processed? 
  - The first commit in this PR applies the filter at request time, meaning that even if a document is modified in a way that no longer matches the filter in a later update, it will be deleted as long as the deletion task is processed after that update. 
  - The other commits in this PR apply the filter only when the asynchronous deletion task is processed, meaning that documents that match the filter at processing time are deleted even if they didn't match the filter at request time.
- [ ] If keeping the filter at request time, find a more elegant way to recover the user document ids from the internal document ids. The current way implemented in the first commit of this PR involves getting all the documents matching the filter, looking for the value of their primary key, and turning it into a string by copy-pasting routines found in milli...
- [ ] Security consideration, if any
- [ ] Fix the tests (but waiting until product questions are resolved)
- [ ] Add delete by filter specific tests



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
This commit is contained in:
meili-bors[bot] 2023-05-04 10:44:41 +00:00 committed by GitHub
commit a95128df6b
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
20 changed files with 688 additions and 15 deletions

View file

@ -225,6 +225,11 @@ impl Index<'_> {
self.service.delete(url).await
}
pub async fn delete_document_by_filter(&self, body: Value) -> (Value, StatusCode) {
let url = format!("/indexes/{}/documents/delete", urlencode(self.uid.as_ref()));
self.service.post_encoded(url, body, self.encoder).await
}
pub async fn clear_all_documents(&self) -> (Value, StatusCode) {
let url = format!("/indexes/{}/documents", urlencode(self.uid.as_ref()));
self.service.delete(url).await

View file

@ -1,3 +1,4 @@
use meili_snap::{json_string, snapshot};
use serde_json::json;
use crate::common::{GetAllDocumentsOptions, Server};
@ -135,3 +136,254 @@ async fn delete_no_document_batch() {
assert_eq!(code, 200);
assert_eq!(response["results"].as_array().unwrap().len(), 3);
}
#[actix_rt::test]
async fn delete_document_by_filter() {
let server = Server::new().await;
let index = server.index("doggo");
index.update_settings_filterable_attributes(json!(["color"])).await;
index
.add_documents(
json!([
{ "id": 0, "color": "red" },
{ "id": 1, "color": "blue" },
{ "id": 2, "color": "blue" },
{ "id": 3 },
]),
Some("id"),
)
.await;
index.wait_task(1).await;
let (response, code) =
index.delete_document_by_filter(json!({ "filter": "color = blue"})).await;
snapshot!(code, @"202 Accepted");
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]" }), @r###"
{
"taskUid": 2,
"indexUid": "doggo",
"status": "enqueued",
"type": "documentDeletion",
"enqueuedAt": "[date]"
}
"###);
let response = index.wait_task(2).await;
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".duration" => "[duration]" }), @r###"
{
"uid": 2,
"indexUid": "doggo",
"status": "succeeded",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 2,
"originalFilter": "\"color = blue\""
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
let (documents, code) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(documents), @r###"
{
"results": [
{
"id": 0,
"color": "red"
},
{
"id": 3
}
],
"offset": 0,
"limit": 20,
"total": 2
}
"###);
let (response, code) =
index.delete_document_by_filter(json!({ "filter": "color NOT EXISTS"})).await;
snapshot!(code, @"202 Accepted");
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".duration" => "[duration]" }), @r###"
{
"taskUid": 3,
"indexUid": "doggo",
"status": "enqueued",
"type": "documentDeletion",
"enqueuedAt": "[date]"
}
"###);
let response = index.wait_task(3).await;
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".duration" => "[duration]" }), @r###"
{
"uid": 3,
"indexUid": "doggo",
"status": "succeeded",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 1,
"originalFilter": "\"color NOT EXISTS\""
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
let (documents, code) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(documents), @r###"
{
"results": [
{
"id": 0,
"color": "red"
}
],
"offset": 0,
"limit": 20,
"total": 1
}
"###);
}
#[actix_rt::test]
async fn delete_document_by_complex_filter() {
let server = Server::new().await;
let index = server.index("doggo");
index.update_settings_filterable_attributes(json!(["color"])).await;
index
.add_documents(
json!([
{ "id": 0, "color": "red" },
{ "id": 1, "color": "blue" },
{ "id": 2, "color": "blue" },
{ "id": 3, "color": "green" },
{ "id": 4 },
]),
Some("id"),
)
.await;
index.wait_task(1).await;
let (response, code) = index
.delete_document_by_filter(
json!({ "filter": ["color != red", "color != green", "color EXISTS"] }),
)
.await;
snapshot!(code, @"202 Accepted");
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]" }), @r###"
{
"taskUid": 2,
"indexUid": "doggo",
"status": "enqueued",
"type": "documentDeletion",
"enqueuedAt": "[date]"
}
"###);
let response = index.wait_task(2).await;
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".duration" => "[duration]" }), @r###"
{
"uid": 2,
"indexUid": "doggo",
"status": "succeeded",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 2,
"originalFilter": "[\"color != red\",\"color != green\",\"color EXISTS\"]"
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
let (documents, code) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(documents), @r###"
{
"results": [
{
"id": 0,
"color": "red"
},
{
"id": 3,
"color": "green"
},
{
"id": 4
}
],
"offset": 0,
"limit": 20,
"total": 3
}
"###);
let (response, code) = index
.delete_document_by_filter(json!({ "filter": [["color = green", "color NOT EXISTS"]] }))
.await;
snapshot!(code, @"202 Accepted");
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".duration" => "[duration]" }), @r###"
{
"taskUid": 3,
"indexUid": "doggo",
"status": "enqueued",
"type": "documentDeletion",
"enqueuedAt": "[date]"
}
"###);
let response = index.wait_task(3).await;
snapshot!(json_string!(response, { ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]", ".duration" => "[duration]" }), @r###"
{
"uid": 3,
"indexUid": "doggo",
"status": "succeeded",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 4,
"originalFilter": "[[\"color = green\",\"color NOT EXISTS\"]]"
},
"error": null,
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
let (documents, code) = index.get_all_documents(GetAllDocumentsOptions::default()).await;
snapshot!(code, @"200 OK");
snapshot!(json_string!(documents), @r###"
{
"results": [
{
"id": 0,
"color": "red"
}
],
"offset": 0,
"limit": 20,
"total": 1
}
"###);
}

View file

@ -418,3 +418,155 @@ async fn update_documents_csv_delimiter_with_bad_content_type() {
}
"###);
}
#[actix_rt::test]
async fn delete_document_by_filter() {
let server = Server::new().await;
let index = server.index("doggo");
// send a bad payload type
let (response, code) = index.delete_document_by_filter(json!("hello")).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value type: expected an object, but found a string: `\"hello\"`",
"code": "bad_request",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#bad_request"
}
"###);
// send bad payload type
let (response, code) = index.delete_document_by_filter(json!({ "filter": true })).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid syntax for the filter parameter: `expected String, Array, found: true`.",
"code": "invalid_document_delete_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_delete_filter"
}
"###);
// send bad filter
let (response, code) = index.delete_document_by_filter(json!({ "filter": "hello"})).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Was expecting an operation `=`, `!=`, `>=`, `>`, `<=`, `<`, `IN`, `NOT IN`, `TO`, `EXISTS`, `NOT EXISTS`, `IS NULL`, `IS NOT NULL`, `IS EMPTY`, `IS NOT EMPTY`, `_geoRadius`, or `_geoBoundingBox` at `hello`.\n1:6 hello",
"code": "invalid_document_delete_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_delete_filter"
}
"###);
// send empty filter
let (response, code) = index.delete_document_by_filter(json!({ "filter": ""})).await;
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Sending an empty filter is forbidden.",
"code": "invalid_document_delete_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_document_delete_filter"
}
"###);
// index does not exists
let (response, code) =
index.delete_document_by_filter(json!({ "filter": "doggo = bernese"})).await;
snapshot!(code, @"202 Accepted");
let response = server.wait_task(response["taskUid"].as_u64().unwrap()).await;
snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]"}), @r###"
{
"uid": 0,
"indexUid": "doggo",
"status": "failed",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 0,
"originalFilter": "\"doggo = bernese\""
},
"error": {
"message": "Index `doggo` not found.",
"code": "index_not_found",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#index_not_found"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
let (response, code) = index.create(None).await;
snapshot!(code, @"202 Accepted");
server.wait_task(response["taskUid"].as_u64().unwrap()).await;
// no filterable are set
let (response, code) =
index.delete_document_by_filter(json!({ "filter": "doggo = bernese"})).await;
snapshot!(code, @"202 Accepted");
let response = server.wait_task(response["taskUid"].as_u64().unwrap()).await;
snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]"}), @r###"
{
"uid": 2,
"indexUid": "doggo",
"status": "failed",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 0,
"originalFilter": "\"doggo = bernese\""
},
"error": {
"message": "Attribute `doggo` is not filterable. This index does not have configured filterable attributes.\n1:6 doggo = bernese",
"code": "invalid_search_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
let (response, code) = index.update_settings_filterable_attributes(json!(["doggo"])).await;
snapshot!(code, @"202 Accepted");
server.wait_task(response["taskUid"].as_u64().unwrap()).await;
// not filterable while there is a filterable attribute
let (response, code) =
index.delete_document_by_filter(json!({ "filter": "catto = jorts"})).await;
snapshot!(code, @"202 Accepted");
let response = server.wait_task(response["taskUid"].as_u64().unwrap()).await;
snapshot!(json_string!(response, { ".duration" => "[duration]", ".enqueuedAt" => "[date]", ".startedAt" => "[date]", ".finishedAt" => "[date]"}), @r###"
{
"uid": 4,
"indexUid": "doggo",
"status": "failed",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"providedIds": 0,
"deletedDocuments": 0,
"originalFilter": "\"catto = jorts\""
},
"error": {
"message": "Attribute `catto` is not filterable. Available filterable attributes are: `doggo`.\n1:6 catto = jorts",
"code": "invalid_search_filter",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_search_filter"
},
"duration": "[duration]",
"enqueuedAt": "[date]",
"startedAt": "[date]",
"finishedAt": "[date]"
}
"###);
}

View file

@ -97,7 +97,7 @@ async fn task_bad_types() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `documentDeletionByFilter`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"code": "invalid_task_types",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_task_types"
@ -108,7 +108,7 @@ async fn task_bad_types() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `documentDeletionByFilter`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"code": "invalid_task_types",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_task_types"
@ -119,7 +119,7 @@ async fn task_bad_types() {
snapshot!(code, @"400 Bad Request");
snapshot!(json_string!(response), @r###"
{
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"message": "Invalid value in parameter `types`: `doggo` is not a valid task type. Available types are `documentAdditionOrUpdate`, `documentDeletion`, `documentDeletionByFilter`, `settingsUpdate`, `indexCreation`, `indexDeletion`, `indexUpdate`, `indexSwap`, `taskCancelation`, `taskDeletion`, `dumpCreation`, `snapshotCreation`.",
"code": "invalid_task_types",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#invalid_task_types"