4313: Fix document formatting performances r=Kerollmops a=ManyTheFish
reduce the formatted option list to the attributes that should be formatted,
instead of all the attributes to display.
The time to compute the `format` list scales with the number of fields to format;
cumulated with `map_leaf_values` that iterates over all the nested fields, it gives a quadratic complexity:
`d*f` where `d` is the total number of fields to display and `f` is the total number of fields to format.
Co-authored-by: ManyTheFish <many@meilisearch.com>
4279: Check experimental feature on setting update query rather than in the task. r=ManyTheFish a=dureuill
Improve the UX by checking for the vector store feature and returning an error synchronously when sending a setting update, rather than in the indexing task.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4238: Task queue webhook r=dureuill a=irevoire
# Prototype `prototype-task-queue-webhook-1`
The prototype is available through Docker by using the following command:
```bash
docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-task-queue-webhook-1
```
# Pull Request
Implements the task queue webhook.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4236
## What does this PR do?
- Provide a new cli and env var for the webhook, respectively called `--task-webhook-url` and `MEILI_TASK_WEBHOOK_URL`
- Also supports sending the requests with a custom `Authorization` header by specifying the optional `--task-webhook-authorization-header` CLI parameter or `MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER` env variable.
- Throw an error if the specified URL is invalid
- Every time a batch is processed, send all the finished tasks into the webhook with our public `TaskView` type as a JSON Line GZIPed body.
- Add one test.
## PR checklist
### Before becoming ready to review
- [x] Add a test
- [x] Compress the data we send
- [x] Chunk and stream the data we send
- [x] Remove the unwrap in the index-scheduler when sending the data fails
- [x] The analytics are missing
### Before merging
- [x] Release a prototype
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
4277: Update mini-dashboard to v0.2.12 r=curquiza a=mdubus
# Pull Request
## Related issue
Fixes#4276
## What does this PR do?
Upgrade mini-dashboard to version 0.2.12 ([see changes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.12))
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
4223: Update to heed 0.20 r=dureuill a=Kerollmops
This PR brings the v0.20-alpha.9 version of heed into Meilisearch 🎉 The main goal is to test it in a real environment to make the necessary changes if needed. We also want to merge it as soon as possible during the pre-release phase to ensure we catch bugs before the release.
Most of the calls to heed are the same as before, except:
- The `PolyDatabase` has been replaced with a `Database<Unspecified, Unspecified>`. We replaced the `get<T, U>()` by a `remap<T, U>().get()` calls.
- The `Database` `append(...)` method has been replaced with a `put_with_flags(PutFlags::APPEND, ...)`.
- The `RwTxn<'e, 'p>` has been simplified into a `RwTxn<'e>`.
- The `BytesEncode/Decode` traits return a `Result<_, BoxedError>` instead of an `Option<_>`.
- We no longer need to wrap and unwrap the `BEU32` integer when storing/getting them from heed.
### TODO
- [x] Create actual, simple error types instead of using strings in the codecs.
### Follow-up work
- Move the codecs into another member crate (we depend on the uuid one in the meilitool crate).
- Display the internal decoding error in the `SerializationError` internal error variant.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4126: Make the experimental route /metrics activable via HTTP r=dureuill a=braddotcoffee
# Pull Request
## Related issue
Closes#4086
## What does this PR do?
- [x] Make `/metrics` available via HTTP as described in #4086
- [x] The users can still launch Meilisearch using the `--experimental-enable-metrics` flag.
- [x] If the flag `--experimental-enable-metrics` is activated, a call to the `GET /experimental-features` route right after the launch will show `"metrics": true` even if the user has not called the `PATCH /experimental-features` route yet.
- [x] Even if the --experimental-enable-metrics flag is present at launch, calling the `PATCH /experimental-features` route with `"metrics": false` disables the experimental feature.
- [x] Update the spec
- I was unable to find docs in this repository to update about the `/experimental-features` endpoint. I'll happily update if you point me in the right direction!
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: bwbonanno <bradfordbonanno@gmail.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
The issue was that the operation « DocumentDeletionByFilter » was not
declared as an index operation. That means the indexes stats were not
reprocessed after the application of the operation.
3994: Fix synonyms with separators r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes#3977
## Available prototype
```
$ docker pull getmeili/meilisearch:prototype-fix-synonyms-with-separators-0
```
## What does this PR do?
- add a new test
- filter the empty synonyms after normalization
Co-authored-by: ManyTheFish <many@meilisearch.com>
3955: Update mini-dashboard to version 0.2.11 r=curquiza a=bidoubiwa
# Pull Request
## What does this PR do?
- Updates the mini-dashboard to version [0.2.11](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.11)
## PR checklist
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
3953: Update UTM campaign r=curquiza a=macraig
# Pull Request
## What does this PR do?
Redirect CTAs to Cloud landing page
Co-authored-by: María <maria@Marias-MacBook-Pro.local>
3942: Normalize for the search the facets values r=ManyTheFish a=Kerollmops
This PR improves and fixes the search for facet values feature. Searching for _bre_ wasn't returning facet values like _brévent_ or _brô_.
The issue was related to the fact that facets are normalized but not in the same way as the `searchableAttributes` are. We decided to normalize them further and add another intermediate database where the key is the normalized facet value, and the value is a set of the non-normalized facets. We then use these non-normalized ones to get the correct counts by fetching the associated databases.
### What's missing in this PR?
- [x] Apply the change to the whole set of `SearchForFacetValue::execute` conditions.
- [x] Factorize the code that does an intermediate normalized value fetch in a function.
- [x] Add or modify the search for facet value test.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3933: Stop computing the update files size r=ManyTheFish a=Kerollmops
This PR, related #3934, removes the part which computes the total size of the `data.ms/update_files` folder, which can take a lot of time when many updates must be processed.
It is not breaking API-side but is breaking on the result we will show to the user. The `databaseSize` field returned by the `/stats` endpoint will be reduced.
Co-authored-by: Kerollmops <clement@meilisearch.com>
3921: Deactivate camel case segmentation r=dureuill a=ManyTheFish
# Pull Request
This PR deactivates the camel case segmentation to retrieve the possibility to accept typos over camel-cased words
## Related issue
Fixes#3869Fixes#3818
## What does this PR do?
- deactivates camelcase segmentation
related to #3919
Co-authored-by: ManyTheFish <many@meilisearch.com>
3907: Add telemetry for define field to search on at query time r=dureuill a=ManyTheFish
Add "attributes_to_search_on" telemetry usage counter:
```json
"attributes_to_search_on": {
"total_number_of_use": 12,
},
```
This measures the number of search queries that the user uses `attributesToSearchOn` field.
related to https://github.com/meilisearch/specifications/pull/251
## reviewers:
- `@macraig` for validating the telemetry's name
- `@dureuill` for validating the code
Co-authored-by: ManyTheFish <many@meilisearch.com>
3897: Add automated tests for `/experimental-features` route r=Kerollmops a=dureuill
# Pull Request
## What does this PR do?
- Make `RuntimeTogglableFeatures` `Eq`
- Add various tests for the `/experimental-features` route
- Integration tests for the route itself
- Integration tests for the effect of enabling `scoreDetails` and `vectorStore` through this route.
- Dump integration tests
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3889: Display the total number of tasks matching a filter/query r=dureuill a=Kerollmops
This PR returns a new field on the `/tasks` routes. The `total` field exposes the total number of tasks that matches the given filter/query. It is useful to display information on a user interface and can help understand when progress is made in processing tasks, i.e., the total number of tasks on `/tasks?statuses=succeeded` will increase over time.
Fixes#3888.
- [ ] Update the specs fo the `/tasks` route.
## How have I implemented it?
I found it much easier to run two times the task filtering system. Once with the original `from` and `limit` parameters and a second time without. The second call will return the total number of tasks that match the query, not only the number of tasks on the current page.
So far, in terms of performance, there doesn't seem to be any issue. I tried different filters with something like 250k tasks. Note that there is a limit of 1M tasks in the queue.
Co-authored-by: Clément Renault <clement@meilisearch.com>
3891: Fix the way we compute the 99th percentile r=dureuill a=Kerollmops
This PR fixes how we compute the 99th percentile by avoiding using float and doing the multiplication and divisions in the correct order avoiding going out of the buffer of timings. You can see the issue on [this rust playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021).
When there are a very small number of successful requests, the number is so tiny that the 99th percentile calculus sometimes gives an index out of the buffer. In this example, the `1`/`1.0` represent the number of timings you collected (one). As you can see, the float computation gives us the index `1.0`, with is out of a vector of only one value. This makes the engine generate a `null` value.
```rust
1 * 99 / 100 = 0 // with integers
0.99_f64 * (1.0 - 1.0) + 1.0 = 1.0 // with floats
```
Co-authored-by: Clément Renault <clement@meilisearch.com>
3877: update the total_received properties of multiple events r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes#3814
## What does this PR do?
-fix name of `total_received` for several events
Co-authored-by: Tamo <tamo@meilisearch.com>
3851: Expose lastUpdate and isIndexing in /stats endpoint r=dureuill a=gentcys
# Pull Request
## Related issue
Fixes#3843
## What does this PR do?
- expose lastUpdate in `/stats` endpoint
- expose isIndex in `stats` endpoint
- add a method `is_task_processing` in index-scheduler/src/lib.rs.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Cong Chen <cong.chen@ocrlabs.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3867: Add a new link to the cloud pricing page r=curquiza a=Kerollmops
This PR promotes the Cloud by adding a link to the Pricing page to the startup message!
<img width="1002" alt="Capture d’écran 2023-06-29 à 17 40 22" src="https://github.com/meilisearch/meilisearch/assets/3610253/b0528c24-fcc2-43ff-a6a1-3ed91716663b">
Co-authored-by: Clément Renault <clement@meilisearch.com>
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish
# Pull Request
Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
- words containing `_` are now properly segmented into several words
- brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes#3815
- fixes#3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)
> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`
Co-authored-by: ManyTheFish <many@meilisearch.com>
3864: Remove `/experimental-features` verbs that weren't in the PRD r=dureuill a=dureuill
Removes:
- POST `/experimental-features`
- DELETE `/experimental-features`
keeping only:
- PATCH `/experimental-features`
- GET `/experimental-features`
The two routes that are described in the PRD.
Following `@guimachiavelli's` [question](https://github.com/meilisearch/documentation/issues/2482#issuecomment-1611845372) about the POST route.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3861: Add "meilisearch" prefix to last metrics that were missing it r=Kerollmops a=dureuill
# Pull Request
## Related issue
Related to #3790
## What does this PR do?
- change implementation to follow the spec on metrics name
- regenerate grafana dashboard from the code
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish
## Summary
This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`:
```json
{
"q": "Captain Marvel",
"attributesToSearchOn": ["title"]
}
```
This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on.
## Trying the prototype
A dedicated docker image has been released for this feature:
#### last prototype version:
```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1
```
#### others prototype versions:
```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0
```
## Technical Detail
The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases.
The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache.
### Relevancy limits
Almost all ranking rules behave as expected when ordering the documents.
Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it:
```rust
#[actix_rt::test]
async fn proximity_ranking_rule_order() {
let server = Server::new().await;
let index = index_with_documents(
&server,
&json!([
{
"title": "Captain super mega cool. A Marvel story",
// Perfect distance between words in an ignored attribute
"desc": "Captain Marvel",
"id": "1",
},
{
"title": "Captain America from Marvel",
"desc": "a Shazam ersatz",
"id": "2",
}]),
)
.await;
// Document 2 should appear before document 1.
index
.search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), |response, code| {
assert_eq!(code, 200, "{}", response);
assert_eq!(
response["hits"],
json!([
{"id": "2"},
{"id": "1"},
])
);
})
.await;
}
```
Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR.
## Related
Fixes#3772
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
3745: tests: add unit test for `PayloadTooLarge` error r=curquiza a=cymruu
# Pull Request
Add a unit test for the `Payload`, which verifies that a request with a payload that is too large is rejected with the appropriate message.
This was requested in this PR https://github.com/meilisearch/meilisearch/pull/3739
## Related issue
https://github.com/meilisearch/meilisearch/pull/3739
## What does this PR do?
- Adds requested test
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
3738: Add analytics on the get documents resource r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3737
Related spec https://github.com/meilisearch/specifications/pull/234
## What does this PR do?
Add the analytics for the following routes:
- `GET` - `/indexes/:uid/documents`
- `GET` - `/indexes/:uid/documents/:doc_id`
- `POST` - `/indexes/:uid/documents/fetch`
These analytics are aggregated between two events:
- `Documents Fetched GET`
- `Documents Fetched POST`
That shares the same payload:
Property name | Description | Example |
|---------------|-------------|---------|
| `requests.total_received` | Total number of request received in this batch | 325 |
| `per_document_id` | `false` | false |
| `per_filter` | `true` if `POST /indexes/:indexUid/documents/fetch` endpoint was used with a filter in this batch, otherwise `false` | false |
| `pagination.max_limit` | Highest value given for the `limit` parameter in this batch | 60 |
| `pagination.max_offset` | Highest value given for the `offset` parameter in this batch | 1000 |
Co-authored-by: Tamo <tamo@meilisearch.com>
3759: Invalid error code when parsing filters r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3753
## What does this PR do?
Fix the error code in case the error comes from the evaluate of the filter for the get, fetch and delete documents routes.
Co-authored-by: Tamo <tamo@meilisearch.com>
3755: Re-add final dot r=curquiza a=ManyTheFish
I removed the final dot of the error message in my last PR, this one re-adds it.
related to https://github.com/meilisearch/meilisearch/pull/3749
> Oups 😬
Co-authored-by: ManyTheFish <many@meilisearch.com>
3749: Fix back: sort error message r=ManyTheFish a=ManyTheFish
This PR reintroduces the error message modified in https://github.com/meilisearch/milli/pull/375.
However, this added double-quotes around `sort` in the message. I don't think another message contains double-quotes, so I have added a separate commit replacing the double-quotes with back-ticks, which seems more consistent with the other error messages, this last change can be reverted easily.
## Detailed changes
#### v1.2-rc0
```
The sort ranking rule must be specified in the ranking rules settings to use the sort parameter at search time.
```
#### [Reintroduce fix (previous and expected behavior)](23d1c86825)
```
You must specify where "sort" is listed in the rankingRules setting to use the sort parameter at search time
```
#### [Replace double-quotes with back-ticks (my suggestion)](4d691d071a)
```
You must specify where `sort` is listed in the rankingRules setting to use the sort parameter at search time
```
## Related
Fixes#3722
## Reviewers
- technical review: `@irevoire`
- to validate the replacement: `@macraig`
Co-authored-by: ManyTheFish <many@meilisearch.com>
3687: Allow to disable specialized tokenizations (again) r=Kerollmops a=jirutka
In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai` feature flags to allow melisearch to be built without huge specialed tokenizations that took up 90% of the melisearch binary size. Unfortunately, due to some recent changes, this doesn't work anymore. The problem lies in excessive use of the `default` feature flag, which infects the dependency graph.
Instead of adding `default-features = false` here and there, it's easier and more future-proof to not declare `default` in `milli` and `meilisearch-types`. I've renamed it to `all-tokenizers`, which also makes it a bit clearer what it's about.
Co-authored-by: Jakub Jirutka <jakub@jirutka.cz>
In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai`
feature flags to allow melisearch to be built without huge specialed
tokenizations that took up 90% of the melisearch binary size.
Unfortunately, due to some recent changes, this doesn't work anymore.
The problem lies in excessive use of the `default` feature flag, which
infects the dependency graph.
Instead of adding `default-features = false` here and there, it's easier
and more future-proof to not declare `default` in `milli` and
`meilisearch-types`. I've renamed it to `all-tokenizers`, which also
makes it a bit clearer what it's about.
3550: Delete documents by filter r=irevoire a=dureuill
# Prototype `prototype-delete-by-filter-0`
Usage:
A new route is available under `POST /indexes/{index_uid}/documents/delete` that allows you to delete your documents by filter.
The expected payload looks like that:
```json
{
"filter": "doggo = bernese",
}
```
It'll then enqueue a task in your task queue that'll delete all the documents matching this filter once it's processed.
Here is an example of the associated details;
```json
"details": {
"deletedDocuments": 53,
"originalFilter": "\"doggo = bernese\""
}
```
----------
# Pull Request
## Related issue
Related to https://github.com/meilisearch/meilisearch/issues/3477
## What does this PR do?
### User standpoint
- Modifies the `/indexes/{:indexUid}/documents/delete-batch` route to accept either the existing array of documents ids, or a JSON object with a `filter` field representing a filter to apply. If that latter variant is used, any document matching the filter will be deleted.
### Implementation standpoint
- (processing time version) Adds a new BatchKind that is not autobatchable and that performs the delete by filter
- Reuse the `documentDeletion` task with a new `originalFilter` detail that replaces the `providedIds` detail.
## Example
<details>
<summary>Sample request, response and task result</summary>
Request:
```
curl \
-X POST 'http://localhost:7700/indexes/index-10/documents/delete-batch' \
-H 'Content-Type: application/json' \
--data-binary '{ "filter" : "mass = 600"}'
```
Response:
```
{
"taskUid": 3902,
"indexUid": "index-10",
"status": "enqueued",
"type": "documentDeletion",
"enqueuedAt": "2023-02-28T20:50:31.667502Z"
}
```
Task log:
```json
{
"uid": 3906,
"indexUid": "index-12",
"status": "succeeded",
"type": "documentDeletion",
"canceledBy": null,
"details": {
"deletedDocuments": 3,
"originalFilter": "\"mass = 600\""
},
"error": null,
"duration": "PT0.001819S",
"enqueuedAt": "2023-03-07T08:57:20.11387Z",
"startedAt": "2023-03-07T08:57:20.115895Z",
"finishedAt": "2023-03-07T08:57:20.117714Z"
}
```
</details>
## Draft status
- [ ] Error handling
- [ ] Analytics
- [ ] Do we want to reuse the `delete-batch` route in this way, or create a new route instead?
- [ ] Should the filter be applied at request time or when the deletion task is processed?
- The first commit in this PR applies the filter at request time, meaning that even if a document is modified in a way that no longer matches the filter in a later update, it will be deleted as long as the deletion task is processed after that update.
- The other commits in this PR apply the filter only when the asynchronous deletion task is processed, meaning that documents that match the filter at processing time are deleted even if they didn't match the filter at request time.
- [ ] If keeping the filter at request time, find a more elegant way to recover the user document ids from the internal document ids. The current way implemented in the first commit of this PR involves getting all the documents matching the filter, looking for the value of their primary key, and turning it into a string by copy-pasting routines found in milli...
- [ ] Security consideration, if any
- [ ] Fix the tests (but waiting until product questions are resolved)
- [ ] Add delete by filter specific tests
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>