MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-06-22 22:48:28 +02:00

Author	SHA1	Message	Date
Louis Dureuil	ee54d3171e	Check experimental feature at query time	2023-12-21 15:26:12 +01:00
meili-bors[bot]	a0e713c4e7	Merge #4277 4277: Update mini-dashboard to v0.2.12 r=curquiza a=mdubus # Pull Request ## Related issue Fixes #4276 ## What does this PR do? Upgrade mini-dashboard to version 0.2.12 ([see changes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.12)) ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Morgane Dubus <30866152+mdubus@users.noreply.github.com>	2023-12-21 11:03:46 +00:00
meili-bors[bot]	d4cb0a885b	Merge #4275 4275: Flatten settings r=dureuill a=dureuill # Pull Request ## Related issue Initial internal feedback seems to indicate that the current shape of the `embedders` setting is undesirable: it has too much depth. This PR changes this by flattening the structure of the embedders to the following: ```json5 // NEW structure "embedders": { // still starts with the embedder name "default": { "source": "huggingFace", // now a string // properties of the source are all at the same level as the source "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", "revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd", "documentTemplate": "A product titled '{{doc.title}}'" // now a string } } ``` By comparison, the old structure was: ```json5 // PREVIOUS version, no longer working with this PR "embedders": { // still starts with the embedder name "default": { "source": { "huggingFace": { "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", "revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd" }, "documentTemplate": { "template": "A product titled '{{doc.title}}'" // now a string } } } ``` The fields that are accepted in the new version of the `embedders` setting are depending on the value of the `source` field: ```json5 // huggingFace "embedders": { "default": { "source": "huggingFace", "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2", "revision": "a9c555277f9bcf24f28fa5e092e665fc6f7c49cd", "documentTemplate": "A product titled '{{doc.title}}'" } } // openAi "embedders": { "default": { "source": "openAi", "model": "text-embedding-ada-002", "apiKey": "open_ai_api_key", "documentTemplate": "A product titled '{{doc.title}}'" } } // userProvided "embedders": { "default": { "source": "userProvided", "dimensions": 42, // mandatory } } ``` ## What does this PR do? - Flatten the settings structure - Validate the prompt earlier to return a synchronous error on setting change rather than in the failing task - Make it an error to pass a field for the wrong source (see above for allowed fields for each source) - Not changed: It is still an error not to pass `dimensions` to the `userProvided` embedder - If `source` was specified in the settings, validate the setting early to return a synchronous error in case of a missing mandatory field for the userProvided source (dimensions) or a forbidden field for the specified source. - If `source` was not specified in the settings, still validate the setting, but only at indexing time, by using the source stored in the DB. - Resets all values if the source changes, even if the user did not reset them explicitly. ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Change the public facing guide for using the API - [ ] Change examples of use in the changelog Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-12-21 09:58:01 +00:00
Morgane Dubus	f52dee2b3b	Update Cargo.toml Update mini-dashboard with v0.2.12	2023-12-21 09:53:13 +01:00
Louis Dureuil	0bf879fb88	Fix warning on rust stable	2023-12-20 17:48:09 +01:00
Louis Dureuil	6ff81de401	Fix tests	2023-12-20 17:16:46 +01:00
Louis Dureuil	2e4c9651df	Validate settings in route	2023-12-20 17:16:46 +01:00
Louis Dureuil	ec9649c922	Add function to validate settings in Meilisearch, to be used in the routes	2023-12-20 17:16:46 +01:00
Louis Dureuil	9123370e90	Validate fused settings in settings task after fusing with existing setting	2023-12-20 17:16:46 +01:00
Louis Dureuil	14b396d302	Add new errors	2023-12-20 17:16:45 +01:00
Louis Dureuil	393216bf30	Flatten embedders settings	2023-12-20 17:16:43 +01:00
Louis Dureuil	e249e4db7b	Change Setting::apply function signature	2023-12-20 17:15:24 +01:00
meili-bors[bot]	de2ca7006e	Merge #4272 4272: Don't pass default revision when the model is explicitly set in config r=Kerollmops a=dureuill # Pull Request ## Related issue Fixes #4271 ## What does this PR do? - When the `model` is explicitly set in the `embedders` setting, we reset the `revision` to `None`, such that if the user doesn't specify a revision, the head of the model repository is chosen. - Not changed: If the user specifies a revision, it applies, like previously. - Not changed: If the user doesn't specify a model, the default model with the default revision applies, like previously. ## Manual testing on a fresh DB 1. Enable experimental feature: ```sh curl \ -X PATCH 'http://localhost:7700/experimental-features/' \ -H 'Content-Type: application/json' -H 'Authorization: Bearer foo' \ --data-binary '{ "vectorStore": true }' ``` 2. Send settings with a specified model but no specified revision: ```sh curl \ -X PATCH 'http://localhost:7700/indexes/products/settings' \ -H 'Content-Type: application/json' --data-binary \ '{ "embedders": { "default": { "source": { "huggingFace": { "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" } }, "documentTemplate": { "template": "A product titled '{{doc.title}}'"} } } }' ``` 3. Check that the task was successful: ```sh curl 'http://localhost:7700/tasks/0' {"uid":0,"indexUid":"products","status":"succeeded","type":"settingsUpdate","canceledBy":null,"details":{"embedders":{"default":{"source":{"huggingFace":{"model":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"}},"documentTemplate":{"template":"A product titled {{doc.title}}"}}}},"error":null,"duration":"PT0.001892S","enqueuedAt":"2023-12-20T09:17:01.73789Z","startedAt":"2023-12-20T09:17:01.73854Z","finishedAt":"2023-12-20T09:17:01.740432Z"} ``` 4. Send documents to index: ```sh curl 'https://localhost:7700/indexes/products/documents' -H 'Content-Type: application/json' --data-binary '{"id": 0, "title": "Best product"}' ``` Co-authored-by: Louis Dureuil <louis@meilisearch.com> v1.6.0-rc.2	2023-12-20 14:27:51 +00:00
Louis Dureuil	333ce12eb2	Fixed issue where the default revision is always the one we picked for the default model	2023-12-20 10:17:49 +01:00
meili-bors[bot]	fb9db1eba6	Merge #4269 4269: Remove dependency that requires libstdc++ r=dureuill a=dureuill Removes the dependency that caused the additional runtime dependency on libstdc++ by disabling the default features of the hf tokenizer. ## Discussion - This removes a feature that is using a C++ dependency and is supposed to accelerate the tokenizer. As the tokenizer is likely to be a significant bottleneck for embedding texts using a HF model, this is an issue. - We should at least rerun the movies vector indexing and check that it still works correctly and that it has a runtime in the ballpark of what it used to be. Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net> v1.6.0-rc.1	2023-12-19 12:26:48 +00:00
Louis Dureuil	b2193e612f	Revert "Add libstdc++ in Dockerfile" as it is no longer needed This reverts commit 9df8cfc013452ecb5935d5501c96a4c465183a5d.	2023-12-18 22:17:29 +01:00
Louis Dureuil	942d49314c	Remove dependency that requires libstdc++	2023-12-18 22:17:18 +01:00
meili-bors[bot]	9a846e82bc	Merge #4268 4268: Add libstdc++ in Dockerfile r=curquiza a=sanders41 # Pull Request ## Related issue Fixes #4267 ## What does this PR do? - Add libstdc++ in the Dockerfile ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Paul Sanders <psanders1@gmail.com>	2023-12-18 18:35:53 +00:00
Paul Sanders	9df8cfc013	Add libstdc++ in Dockerfile	2023-12-18 13:05:46 -05:00
meili-bors[bot]	248aaa6d45	Merge #4262 4262: Update version for the next release (v1.6.0) in Cargo.toml r=curquiza a=meili-bot ⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging. Co-authored-by: curquiza <curquiza@users.noreply.github.com> v1.6.0-rc.0	2023-12-18 14:00:19 +00:00
curquiza	50d6317ec0	Update version for the next release (v1.6.0) in Cargo.toml	2023-12-18 13:57:46 +00:00
meili-bors[bot]	b734bd9891	Merge #4261 4261: Set rust toolchain to 1.71.1 in dockerfile r=curquiza a=dureuill Fixes docker [CI](https://github.com/meilisearch/meilisearch/actions/workflows/publish-docker-images.yml) Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-12-18 12:32:26 +00:00
Louis Dureuil	9800d5a103	Set rust toolchain to 1.71.1 in dockerfile	2023-12-18 10:59:25 +01:00
meili-bors[bot]	7c4ed07617	Merge #4257 4257: Change proximity precision settings r=dureuill a=ManyTheFish - [x] Add proximity_precision value into the analytics - [x] Change the naming of `attributeScale` and `wordScale` into `byAttribute` and `byWord` - [x] Remove proximityPrecision from the experimental feature Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Many the fish <many@meilisearch.com>	2023-12-18 09:07:28 +00:00
ManyTheFish	3a99a555a2	Fix experimental features snapshots in tests	2023-12-18 10:05:51 +01:00
Many the fish	9e1b458010	Merge branch 'main' into change-proximity-precision-settings	2023-12-18 09:08:47 +01:00
meili-bors[bot]	2aede03bc2	Merge #4226 4226: Hybrid search r=dureuill a=dureuill Allows to perform hybrid search requests that combine the results of semantic and keyword search and automatically generate embeddings. ## How to use See [feature description](https://meilisearch.notion.site/v1-6-Hybrid-Search-Embedders-ea42c82f90cc4bc0be1eeb917c1118c8) ## Changes - work is based on #4213 - milli::new search now takes an input universe directly, rather than computing it from a filter. This adds flexibility to require results on a subset of documents - vector search is now a regular ranking rule (akin to sort and geosort) and reports its score as a ScoreDetail - separate keyword search and vector search functions, vector search now respects (geo)sort ranking rules - add automatic embedding - add hybrid search Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-12-14 16:24:56 +00:00
ManyTheFish	e741bc1c62	Add proximity_precision value into the analytics	2023-12-14 16:48:06 +01:00
ManyTheFish	6425996e36	Change the naming of attributeScale and wordScale into byAttribute and byWord	2023-12-14 16:31:00 +01:00
Louis Dureuil	eb5cb91da2	Switch default from hf to openai	2023-12-14 16:19:46 +01:00
Louis Dureuil	87bba98bd8	Various changes - fixed seed for arroy - check vector dimensions as soon as it is provided to search - don't embed whitespace	2023-12-14 16:08:42 +01:00
Louis Dureuil	217105b7da	hybrid search uses semantic ratio, error handling	2023-12-14 16:08:42 +01:00
ManyTheFish	1b7c164a55	Pass the semantic ratio to milli	2023-12-14 16:08:42 +01:00
ManyTheFish	f3f3944469	Fix error checking	2023-12-14 16:08:42 +01:00
ManyTheFish	93dcbf598d	Deserialize semantic ratio	2023-12-14 16:08:42 +01:00
ManyTheFish	ac68f33194	Add simple test	2023-12-14 16:08:42 +01:00
ManyTheFish	9991152bbe	Add TODOs	2023-12-14 16:08:42 +01:00
Louis Dureuil	a4536b1381	Small adjustments to respect the spec	2023-12-14 16:08:42 +01:00
Louis Dureuil	5b51cb04af	Remove some settings	2023-12-14 16:08:42 +01:00
Louis Dureuil	3c1a14f1cd	Add settings routes	2023-12-14 16:08:42 +01:00
Louis Dureuil	b8e4709dfa	Remove prompt strategy and fallback	2023-12-14 16:08:41 +01:00
Louis Dureuil	806e5b6899	Tests pass	2023-12-14 16:08:41 +01:00
Louis Dureuil	61bd2fb7a9	Update arroy	2023-12-14 16:08:41 +01:00
Louis Dureuil	e0cc775dc4	Various changes - DistributionShift in Search object (to be set from model in embed?) - Fix issue where embedder index wasn't computed at search time - Accept as default embedder either the "default" one, or the only embedder when there is only one	2023-12-14 16:08:41 +01:00
Louis Dureuil	12940d79a9	WIP - manual embedder - multi embedders OK - clippy + tests OK	2023-12-14 16:08:41 +01:00
Louis Dureuil	922a640188	WIP multi embedders fixed template bugs	2023-12-14 16:08:41 +01:00
Louis Dureuil	abbe131084	Cosmetic change	2023-12-14 16:08:41 +01:00
Louis Dureuil	d4715e0c4d	Fix same vector sort bug	2023-12-14 16:08:41 +01:00
Louis Dureuil	11e2a2c1aa	Fix geosort bug	2023-12-14 16:08:41 +01:00
Louis Dureuil	65e49b7092	Remove stuff, add distribution shift (WIP)	2023-12-14 16:08:38 +01:00

1 2 3 4 5 ...

8843 Commits