4311: Limit the number of values returned by the facet search r=dureuill a=Kerollmops
This PR fixes a bug where the number of values per facet returned by the `indexes/{index}/facet-search` route was not tacking the `faceting.maxValuePerFacet` setting. It also adds a test.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4308: Fix hang on `/indexes` and `/stats` routes r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4218
## Context
- A previous fix added a field to the `IndexScheduler` to memorize the `currently_updating_index`, so that accessing it through the search would return the handle without trying to open it. This resolved a hang on the search, but #4218 reported further hangs on the `/indexes` and `/stats` routes
- These routes were shunting the `IndexScheduler` and using internal `IndexMapper` logic to access the indexes, again trying to reopen the updating index.
## What does this PR do?
- Moves the logic relative to the `currently_updating_index` from the `IndexScheduler` to the `IndexMapper`, so that any index request to the `IndexMapper` can benefit from it.
## Test
1. Follow reproducer from #4218
2. Before this PR, notice a hang on `/stats` and `/indexes`, but not on `/indexes/<updating_index>/search`
3. After this PR, notice no hang on either of `/stats`, `/indexes` or `/indexes/<updating_index>/search`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4303: Display default value when proximityPrecision is not set r=dureuill a=ManyTheFish
# Pull Request
## Related
Issue: #4187
Spec change requests: https://github.com/meilisearch/specifications/pull/261#discussion_r1441725272
## What does this PR do?
- Display default value when proximityPrecision is not set instead of Null
Co-authored-by: ManyTheFish <many@meilisearch.com>
4296: Fix single element search r=irevoire a=dureuill
# Pull Request
Before this PR, indexing a single vector in a single document would result in the vector not being found by the vector search.
This PR adds a test case for this condition, and resolves it by bumping arroy to a version containing the fix.
# Test case
Output of the test before and after this PR:
```diff
diff --git a/meilisearch/tests/search/hybrid.rs b/meilisearch/tests/search/hybrid.rs
index 2cd4b83e7..79819cab2 100644
--- a/meilisearch/tests/search/hybrid.rs on release-v1.6.0
+++ b/meilisearch/tests/search/hybrid.rs on fix-single-element-search
`@@` -171,5 +171,5 `@@` async fn single_document() {
.await;
snapshot!(code, `@"200` OK");
- snapshot!(response["hits"][0], `@r###"{"title":"Shazam!","desc":"a` Captain Marvel ersatz","id":"1","_vectors":{"default":[1.0,3.0]},"_rankingScore":0.0}"###);
+ snapshot!(response["hits"][0], `@r###"{"title":"Shazam!","desc":"a` Captain Marvel ersatz","id":"1","_vectors":{"default":[1.0,3.0]},"_rankingScore":1.0,"_semanticScore":1.0}"###);
}
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4293: Update SDK test dependencies r=curquiza a=curquiza
Replace dependabot updates
The changes are really un-impactful for the engine team velocity because is about a CI
- that does not run during release deployment
- that does not run to merge a PR
It's only a weekly scheduled CI to check the breaking we introduced in the integrations.
I updated the dependencies based on what we do on the integration CIs
For example for dart, I looked at what we have in the [Dart CI](63fd758882/.github/workflows/tests.yml (L16-L54)) and I updated our CI in this repo accordingly. I did the same for each repository. This ensures we test the same things.
Co-authored-by: curquiza <clementine@meilisearch.com>
4295: fix compilation warnings on main r=curquiza a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4292
## What does this PR do?
- Removed unused imports
#4294 fixes the issue for the release v1.6
Co-authored-by: Tamo <tamo@meilisearch.com>
4294: fix compilation warnings for release v1.6 r=curquiza a=irevoire
# Pull Request
## Related issue
Fixes#4292
## What does this PR do?
- Removed unused imports
#4295 fixes the issue no main
Co-authored-by: Tamo <tamo@meilisearch.com>
4279: Check experimental feature on setting update query rather than in the task. r=ManyTheFish a=dureuill
Improve the UX by checking for the vector store feature and returning an error synchronously when sending a setting update, rather than in the indexing task.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4238: Task queue webhook r=dureuill a=irevoire
# Prototype `prototype-task-queue-webhook-1`
The prototype is available through Docker by using the following command:
```bash
docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-task-queue-webhook-1
```
# Pull Request
Implements the task queue webhook.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4236
## What does this PR do?
- Provide a new cli and env var for the webhook, respectively called `--task-webhook-url` and `MEILI_TASK_WEBHOOK_URL`
- Also supports sending the requests with a custom `Authorization` header by specifying the optional `--task-webhook-authorization-header` CLI parameter or `MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER` env variable.
- Throw an error if the specified URL is invalid
- Every time a batch is processed, send all the finished tasks into the webhook with our public `TaskView` type as a JSON Line GZIPed body.
- Add one test.
## PR checklist
### Before becoming ready to review
- [x] Add a test
- [x] Compress the data we send
- [x] Chunk and stream the data we send
- [x] Remove the unwrap in the index-scheduler when sending the data fails
- [x] The analytics are missing
### Before merging
- [x] Release a prototype
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
4277: Update mini-dashboard to v0.2.12 r=curquiza a=mdubus
# Pull Request
## Related issue
Fixes#4276
## What does this PR do?
Upgrade mini-dashboard to version 0.2.12 ([see changes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.12))
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>
4275: Flatten settings r=dureuill a=dureuill
# Pull Request
## Related issue
Initial internal feedback seems to indicate that the current shape of the `embedders` setting is undesirable: it has too much depth.
This PR changes this by flattening the structure of the embedders to the following:
```json5
// NEW structure
"embedders": {
// still starts with the embedder name
"default": {
"source": "huggingFace", // now a string
// properties of the source are all at the same level as the source
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd",
"documentTemplate": "A product titled '{{doc.title}}'" // now a string
}
}
```
By comparison, the old structure was:
```json5
// PREVIOUS version, no longer working with this PR
"embedders": {
// still starts with the embedder name
"default": {
"source": {
"huggingFace": {
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd"
},
"documentTemplate": {
"template": "A product titled '{{doc.title}}'" // now a string
}
}
}
```
The fields that are accepted in the new version of the `embedders` setting are depending on the value of the `source` field:
```json5
// huggingFace
"embedders": {
"default": {
"source": "huggingFace",
"model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
"revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd",
"documentTemplate": "A product titled '{{doc.title}}'"
}
}
// openAi
"embedders": {
"default": {
"source": "openAi",
"model": "text-embedding-ada-002",
"apiKey": "open_ai_api_key",
"documentTemplate": "A product titled '{{doc.title}}'"
}
}
// userProvided
"embedders": {
"default": {
"source": "userProvided",
"dimensions": 42, // mandatory
}
}
```
## What does this PR do?
- Flatten the settings structure
- Validate the prompt earlier to return a synchronous error on setting change rather than in the failing task
- Make it an error to pass a field for the wrong source (see above for allowed fields for each source)
- Not changed: It is still an error not to pass `dimensions` to the `userProvided` embedder
- If `source` was specified in the settings, validate the setting early to return a synchronous error in case of a missing mandatory field for the userProvided source (dimensions) or a forbidden field for the specified source.
- If `source` was not specified in the settings, still validate the setting, but only at indexing time, by using the source stored in the DB.
- Resets all values if the source changes, even if the user did not reset them explicitly.
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Change the public facing guide for using the API
- [ ] Change examples of use in the changelog
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4272: Don't pass default revision when the model is explicitly set in config r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4271
## What does this PR do?
- When the `model` is explicitly set in the `embedders` setting, we reset the `revision` to `None`, such that if the user doesn't specify a revision, the head of the model repository is chosen.
- Not changed: If the user specifies a revision, it applies, like previously.
- Not changed: If the user doesn't specify a model, the default model with the default revision applies, like previously.
## Manual testing on a fresh DB
1. Enable experimental feature:
```sh
curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' -H 'Authorization: Bearer foo' \
--data-binary '{ "vectorStore": true
}'
```
2. Send settings with a specified model but no specified revision:
```sh
curl \
-X PATCH 'http://localhost:7700/indexes/products/settings' \
-H 'Content-Type: application/json' --data-binary \
'{ "embedders": { "default": { "source": { "huggingFace": { "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" } }, "documentTemplate": { "template": "A product titled '{{doc.title}}'"} } } }'
```
3. Check that the task was successful:
```sh
curl 'http://localhost:7700/tasks/0'
{"uid":0,"indexUid":"products","status":"succeeded","type":"settingsUpdate","canceledBy":null,"details":{"embedders":{"default":{"source":{"huggingFace":{"model":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"}},"documentTemplate":{"template":"A product titled {{doc.title}}"}}}},"error":null,"duration":"PT0.001892S","enqueuedAt":"2023-12-20T09:17:01.73789Z","startedAt":"2023-12-20T09:17:01.73854Z","finishedAt":"2023-12-20T09:17:01.740432Z"}
```
4. Send documents to index:
```sh
curl 'https://localhost:7700/indexes/products/documents' -H 'Content-Type: application/json' --data-binary '{"id": 0, "title": "Best product"}'
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4269: Remove dependency that requires libstdc++ r=dureuill a=dureuill
Removes the dependency that caused the additional runtime dependency on libstdc++ by disabling the default features of the hf tokenizer.
## Discussion
- This removes a feature that is using a C++ dependency and is supposed to accelerate the tokenizer. As the tokenizer is likely to be a significant bottleneck for embedding texts using a HF model, this is an issue.
- We should at least rerun the movies vector indexing and check that it still works correctly and that it has a runtime in the ballpark of what it used to be.
Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>