4818: Custom headers and QoL improvements r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes#4734
Depends on #4815
## What does this PR do?
- Adds custom headers for rest embedders ([public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#41354652885242c899def07e36a66d49))
- Quality of life: allow specifying `dimensions` for `ollama` embedders ([public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#37218531431343dab3d2d3a9a1937e9d)). As for `rest` embedders, specifying `dimensions` disables the "test" embedding when the embedder is spawned.
- Improve error message again when indexing documents that don't have a vector for a user-provided vector
1. Remove the contents of the document
2. Display the docid of the first document that triggered the error
3. Indicate how many documents in that chunk suffered from the same issue for that embedder
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4815: Rest embedder api mk2 r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4756
- [x] [REST API parameter names and behavior are unclear](https://github.com/meilisearch/documentation/pull/2824#issuecomment-2124073720)
- unclear names are removed. There remain only two parameters: `request`, a template of what Meilisearch's request to the embedding server should be, and `response`, a template of what the embedding server's response to Meilisearch should look like
- [x] [Bad error message or bad default value when we don't specify the `query` parameter](85d8455c11/meilisearch/tests/vector/rest.rs (L105-L140))
- The replacement for `query`, which is `request`, is now a mandatory parameter. Omitting it will result in the following error message : "`.embedders.rest`: Missing field `request` (note: this field is mandatory for source rest)", which is clear
- [x] [Bad error message when both `pathToEmbeddings` and `embeddingObject` are missing](2141cb3b69/meilisearch/tests/vector/rest.rs (L142-L178))
- These parameters no longer exist. Now, the point of extraction is given directly by the location of an `{{embedding}}` placeholder in the `response` parameter.
- [x] [Unexpected error when we don't specify both `pathToEmbeddings` and `embeddingObject` (only once should be required)](2141cb3b69/meilisearch/tests/vector/rest.rs (L180-L260))
- These parameters no longer exist. Now, the point of extraction is given directly by the location of an `{{embedding}}` placeholder in the `response` parameter.
- [x] [Should not panic when the dimensions specified do not work with the model](2141cb3b69/meilisearch/tests/vector/rest.rs (L262-L299))
- This no longer panics, instead returns "While embedding documents for embedder `rest`: runtime error: was expecting embeddings of dimension `2`, got embeddings of dimensions `3`"
- [x] [Be more flexible on the type of data that is accepted](https://github.com/meilisearch/meilisearch/issues/4757#issuecomment-2201948531)
- [x] Always accept arrays of embeddings even if `inputType` is set to `text`
- This is controlled by the repeat placeholder `"{..}"`, an array of embeddings can be configured even if the input is not in an array.
- [x] Accept arrays of result at the root level and texts/array of text at the root level.
- doable with `request: "{{text}}"` and `response: "{{embedding}}"` or `response: ["{{embedding}}"]` (see test `vector::rest::server_raw`)
## What does this PR do?
- [See public usage](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2#8de842673ffa4a139210094a89c1ec3e)
- Add new `milli::vector::json_template` module to parse JSON templates with an injection placeholder and a repeat placeholder
- Change rest embedder to use two JSON templates
- Change ollama and openai embedders to use the new rest embedder
- Update settings
- Update and add tests
## Breaking change
> [!CAUTION]
> This PR is a breaking change to the REST embedder.
> Importing a dump containing a REST embedder configuration will fail in v1.10 with an error: "Error: unknown field `query`, expected one of `source`, `model`, `revision`, `apiKey`, `dimensions`, `documentTemplate`, `url`, `request`, `response`, `distribution` at line 1 column 752".
Upgrade procedure:
1. Remove any embedder with source "rest"
2. Create a dump
3. Import that dump in a v1.10
4. Re-add any removed embedder, using the new settings.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
Co-authored-by: Tamo <tamo@meilisearch.com>
4822: HuggingFace: Clearer error message when a model is not supported r=Kerollmops a=dureuill
# Pull Request
## Related issue
Context: <https://github.com/meilisearch/meilisearch/discussions/4820>
## What does this PR do?
- Improve error message when a model configuration cannot be loaded and its "architectures" field does not contain "BertModel"
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4817: Update version for the next release (v1.10.0) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4812: Allow `MEILI_NO_VERGEN` env var to skip vergen r=irevoire a=dureuill
- vergen checks the state of the `.git` directory to embed commit information into the `meilisearch` binary and the `cargo xtask bench` invocations.
- This check unfortunately results in too many recompilation of the `meilisearch` binary.
- This PR allows skipping vergen when the `MEILI_NO_VERGEN` variable is present in the environment
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4805: Log the time to index a batch of task r=Kerollmops a=irevoire
This was proposed by `@qdequele` in a private conversation and I think it’s a nice addition.
Co-authored-by: Tamo <tamo@meilisearch.com>