7170 Commits

Author SHA1 Message Date
funilrys
3e0e8164a3
fixup! Adjust + Cleanup changes. 2022-12-22 18:01:54 +01:00
funilrys
0bc4572905
Adjust + Cleanup changes.
Indeed, I missed some of the changed that were introduced by #3190.
2022-12-22 17:53:33 +01:00
funilrys
4e6c663a2e
Release unecessary ownership. 2022-12-22 17:47:58 +01:00
funilrys
e2775c6f49
Remove unused object. 2022-12-22 17:47:58 +01:00
funilrys
c07a5932cb
Apply fmt. 2022-12-22 17:47:58 +01:00
funilrys
528a944997
Reimplement v5 date extraction.
Indeed, before this patch the implementation wasn't correct.
2022-12-22 17:47:58 +01:00
funilrys
13fb5ce974
Re-Open tasks list when needed.
Indeed, before this patch we were using the reference instead of
"reopening" the task list each time we needed to access it.
Without this patch, all other usage of the task attribute will
break.
2022-12-22 17:47:57 +01:00
funilrys
a43a0712fa
Add reader.v5.tasks.Task.updated_at.
There was no way to "quickly" get the update date.
2022-12-22 17:47:57 +01:00
funilrys
1be4619b91
Add reader.v5.tasks.Task.created_at.
There was no way to "quickly" get the creation date.
2022-12-22 17:47:57 +01:00
funilrys
cf50f85986
Add reader.v5.tasks.Task.processed_at.
There was no way to "quickly" get the processed date.
2022-12-22 17:47:57 +01:00
funilrys
61b3a29ff3
Extract the dates out of the dumpv5.
This patch possibly fixes #2986.

This patch introduces a way to fill the IndexMetadata.created_at
and IndexMetadata.updated_at keys from the tasks events.
This is done by reading the creation date of the first event
(created_at) and the creation date of the last event (updated_at).
2022-12-22 17:47:57 +01:00
Loïc Lecrenier
32c6062e65 Optimise exactness criterion
1. Cache some results between calls to next()
2. Compute the combinations of exact words more efficiently
2022-12-22 12:28:45 +01:00
Loïc Lecrenier
f097aafa1c Add unit test for prefix handling by the proximity criterion 2022-12-22 12:08:00 +01:00
Loïc Lecrenier
777b387dc4 Avoid a prefix-related worst-case scenario in the proximity criterion 2022-12-22 12:08:00 +01:00
Loïc Lecrenier
b0f3dc2c06 Interpret synonyms as phrases 2022-12-22 12:07:51 +01:00
Louis Dureuil
66e18eae79
auth: add generate_master_key function 2022-12-22 11:55:27 +01:00
amab8901
9a39c4e40d Get date from IndexMetaData 2022-12-22 11:46:17 +01:00
amab8901
df176aaf01 Insert dump_reader.date() into create_raw_index(_) argument 2022-12-21 15:16:31 +01:00
Louis Dureuil
4b166bea2b
Add primary_key_inference test 2022-12-21 15:13:38 +01:00
Louis Dureuil
5943100754
Fix existing tests 2022-12-21 15:13:38 +01:00
Louis Dureuil
b24def3281
Add logging when inference took place.
Displays log message in the form:
```
[2022-12-21T09:19:42Z INFO  milli::update::index_documents::enrich] Primary key was not specified in index. Inferred to 'id'
```
2022-12-21 15:13:38 +01:00
Louis Dureuil
402dcd6b2f
Simplify primary key inference 2022-12-21 15:13:38 +01:00
Louis Dureuil
13c95d25aa
Remove uses of UserError::MissingPrimaryKey not related to inference 2022-12-21 15:13:36 +01:00
amab8901
0893b175dc Merge branch 'main' into 2983-forward-date-to-milli 2022-12-21 14:31:19 +01:00
amab8901
d5978d11e1 Refactor 2022-12-21 14:28:00 +01:00
bors[bot]
a8defb585b
Merge #742
742: Add a "Criterion implementation strategy" parameter to Search r=irevoire a=loiclec

Add a parameter to search requests which determines the implementation strategy of the criteria. This can be either `set-based`, `iterative`, or `dynamic` (ie choosing between set-based or iterative at search time). See https://github.com/meilisearch/milli/issues/755 for more context about this change.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-12-21 12:18:49 +00:00
Loïc Lecrenier
339a4b0789 Make clippy happy 2022-12-21 12:49:34 +01:00
Loïc Lecrenier
904fd2f6d1 Add a search strategy option to the cli 2022-12-21 12:48:53 +01:00
Loïc Lecrenier
229405aeb9 Choose implementation strategy of criterion at runtime 2022-12-21 09:29:39 +01:00
jiangbo212
2780e365e2 test update and ndjson serde use from_slice 2022-12-21 14:31:45 +08:00
jiangbo212
bf2a401a05 serde ndjson fix 2022-12-21 11:27:15 +08:00
bors[bot]
9925309492
Merge #3263
3263: Handle most io error instead of tagging everything as an internal r=dureuill a=irevoire

Fix https://github.com/meilisearch/meilisearch/issues/2255
Fix https://github.com/meilisearch/meilisearch/issues/2785
Close https://github.com/meilisearch/milli/pull/580

- [x] Find a way to catch the `io::Error` contained in `serde_json::Error`: We can't: https://docs.rs/serde_json/latest/serde_json/struct.Error.html
- [x] Check the `grenad::Error` as well => the `grenad::Error::Io` error are correctly converted to a `milli::Error::Io` error 
- [x] Ensure the error code mean the same thing under windows

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-12-20 17:15:53 +00:00
Tamo
9e0cce5ca4
Update dump/src/error.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-12-20 18:08:51 +01:00
Tamo
336ea57384
Update dump/src/error.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2022-12-20 18:08:44 +01:00
Tamo
c637bfba37
convert all the document format error due to io to io::Error 2022-12-20 17:49:38 +01:00
Tamo
3040172562
update the error message as well 2022-12-20 17:31:13 +01:00
bors[bot]
249e051cd4
Merge #750
750: Fix hard-deletion of an external id that was soft-deleted and then reimported - main r=irevoire a=loiclec

# Pull Request

## Related issue
Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3021

## What does this PR do?
There was a bug happening when:

1. Documents were added
2. Some of these documents were replaced using soft-deletion
3. A deletion of another non-replaced document takes place and triggers a hard-deletion
4. Documents with the same identifiers as the replaced documents are added again

Then, search results would return duplicate documents. No crash would happen at any time (this is the reason it wasn't caught by the previous fuzz test. I have updated the new one such that it also checks the result of a placeholder search request, which then finds the bug immediately).

The cause of the bug is: 

1. When a hard-deletion is triggered, we try to retrieve the external document id associated with each soft-deleted document id. 
2. Then, we take this list of external document ids and remove each of them from the `ExternalDocumentsIds` structure. 
3. However, this is not correct in case an existing (non-deleted) document shares the external id of a soft-deleted document. 
   
## Implementation of the fix
1. Before we process a permanent deletion, we update the list of soft-deleted document ids.
2. Then, the permanent deletion's job is to remove the soft-deleted documents from all data structures. Therefore, to update `ExternalDocumentsIds`, we can simply call the `delete_soft_deleted_documents_ids_from_fsts` method, which is faster and simpler.

## Correctness
A unit test was added to reproduce the bug. The new fuzz test, when adjusted to check the correctness of a placeholder search, could also instantly reproduce the bug, but now does not find any other problem.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-12-20 16:13:20 +00:00
Tamo
52aa34d984
remove an unused error handling file 2022-12-20 16:32:51 +01:00
Loïc Lecrenier
fc0e7382fe Fix hard-deletion of an external id that was soft-deleted 2022-12-20 15:33:31 +01:00
bors[bot]
2c86d42a44
Merge #3264
3264: Remove macos-latest and windows-latest usages r=curquiza a=curquiza

Related to https://github.com/meilisearch/meilisearch/issues/3109#issuecomment-1359151297

Remove the `macos-latest` and `windows-latest` to replace them with the specific version: this will avoid "surprises" in the future when GitHub changes the `latest` version.
This way, it will also allow us to let the documentation team know about the changes, since we will control the macOS/Windows version we support

Co-authored-by: curquiza <clementine@meilisearch.com>
2022-12-20 10:53:37 +00:00
curquiza
8ce3a34ffa Remove macos-latest and windows-latest usages 2022-12-20 11:10:09 +01:00
bors[bot]
259c04eb28
Merge #3261
3261: Use ubuntu-18.04 container instead of GitHub hosted actions r=curquiza a=curquiza

Related to (but does not fix totally) https://github.com/meilisearch/meilisearch/issues/3109 and https://github.com/meilisearch/product/discussions/547#discussioncomment-4109143

## For reviewers, what's the PR changes:
- Use ubuntu-latest where compiling with ubuntu-18.04 is not needed (`update-version-cargo-toml`, `fmt`, `clippy` jobs)
- Where ubuntu-18.04 is required
  - Use `ubuntu-latest` as runner
  - Use `ubuntu:18.04` as Docker container
  - Install the required dependencies (curl and cc)
  - Use `actions-rs/toolchain@v1` instead of `hecrj/setup-rust-action@master`. It's more stable and followed alternative. Plus it was easy to make it work with our container contrary to the old one. Change applied in all our CIs to be more consistent
- Remove some useless space to increase readability.

Co-authored-by: curquiza <clementine@meilisearch.com>
2022-12-20 09:28:09 +00:00
Tamo
d8fb506c92
handle most io error instead of tagging everything as an internal 2022-12-19 20:50:40 +01:00
amab8901
aa03e02fdc Apply Rustfmt 2022-12-19 19:24:56 +01:00
curquiza
7ef23addb6 Add comment to bring more context 2022-12-19 18:46:27 +01:00
bors[bot]
97fb64e40e
Merge #747
747: Soft-deletion computation no longer depends on the mapsize r=irevoire a=dureuill

# Pull Request

## Related issue

Related to https://github.com/meilisearch/meilisearch/issues/3231: After removing `--max-index-size`, the `mapsize` will always be unrelated to the actual max size the user wants for their DB, so it doesn't make sense to use these values any longer.

This implements solution 2.3 from https://github.com/meilisearch/meilisearch/issues/3231#issuecomment-1348628824

## What does this PR do?

### User-visible

- Soft-deleted are no longer deleted when there is less than 10% of the mapsize available or when they take more than 10% of the mapsize
- Instead, they are deleted when they are more soft deleted than regular documents, or when they take more than 1GiB disk space (estimated).

### Implementation standpoint

1. Adds a `DeletionStrategy` struct to replace the boolean `disable_soft_deletion` that we had up until now. This enum allows us to specify that we want "always hard", "always soft", or to use the dynamic soft-deletion strategy (default).
2. Uses the current strategy when deleting documents, with the new heuristics being used in the `DeletionStrategy::Dynamic` variant.
3. Updates the tests to use the appropriate DeletionStrategy whenever needed (one of `AlwaysHard` or `AlwaysSoft` depending on the test)

Note to reviewers: this PR is optimized for a commit-by-commit review.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-12-19 17:46:18 +00:00
curquiza
b3fce7c366 Remove useless continue-on-error 2022-12-19 18:39:35 +01:00
curquiza
5099a40484 Use ubuntu-18.04 container in publish CIs 2022-12-19 18:35:33 +01:00
Tamo
69edbf9f6d
Update milli/src/update/delete_documents.rs 2022-12-19 18:23:50 +01:00
bors[bot]
8957251eed
Merge #751
751: Update version for the next release (v0.38.0) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-12-19 17:02:39 +00:00