Commit Graph

8832 Commits

Author SHA1 Message Date
Louis Dureuil
e249e4db7b
Change Setting::apply function signature 2023-12-20 17:15:24 +01:00
meili-bors[bot]
de2ca7006e
Merge #4272
4272: Don't pass default revision when the model is explicitly set in config r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #4271 

## What does this PR do?

- When the `model` is explicitly set in the `embedders` setting, we reset the `revision` to `None`, such that if the user doesn't specify a revision, the head of the model repository is chosen. 
- Not changed: If the user specifies a revision, it applies, like previously. 
- Not changed: If the user doesn't specify a model, the default model with the default revision applies, like previously.

## Manual testing on a fresh DB

1. Enable experimental feature:
```sh
curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' -H 'Authorization: Bearer foo' \
--data-binary '{ "vectorStore": true
  }'
```
2. Send settings with a specified model but no specified revision:
```sh
curl \
-X PATCH 'http://localhost:7700/indexes/products/settings' \
-H 'Content-Type: application/json' --data-binary \
'{ "embedders": { "default": { "source": { "huggingFace": { "model": "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2" } }, "documentTemplate": { "template": "A product titled '{{doc.title}}'"} } } }'
```
3. Check that the task was successful:
```sh
curl 'http://localhost:7700/tasks/0'

{"uid":0,"indexUid":"products","status":"succeeded","type":"settingsUpdate","canceledBy":null,"details":{"embedders":{"default":{"source":{"huggingFace":{"model":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"}},"documentTemplate":{"template":"A product titled {{doc.title}}"}}}},"error":null,"duration":"PT0.001892S","enqueuedAt":"2023-12-20T09:17:01.73789Z","startedAt":"2023-12-20T09:17:01.73854Z","finishedAt":"2023-12-20T09:17:01.740432Z"}
```
4. Send documents to index:
```sh
curl 'https://localhost:7700/indexes/products/documents' -H 'Content-Type: application/json' --data-binary '{"id": 0, "title": "Best product"}'
```

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-12-20 14:27:51 +00:00
Louis Dureuil
333ce12eb2
Fixed issue where the default revision is always the one we picked for the default model 2023-12-20 10:17:49 +01:00
meili-bors[bot]
fb9db1eba6
Merge #4269
4269: Remove dependency that requires libstdc++ r=dureuill a=dureuill

Removes the dependency that caused the additional runtime dependency on libstdc++ by disabling the default features of the hf tokenizer.

## Discussion

- This removes a feature that is using a C++ dependency and is supposed to accelerate the tokenizer. As the tokenizer is likely to be a significant bottleneck for embedding texts using a HF model, this is an issue.
- We should at least rerun the movies vector indexing and check that it still works correctly and that it has a runtime in the ballpark of what it used to be.

Co-authored-by: Louis Dureuil <louis.dureuil@xinra.net>
2023-12-19 12:26:48 +00:00
Louis Dureuil
b2193e612f
Revert "Add libstdc++ in Dockerfile" as it is no longer needed
This reverts commit 9df8cfc013.
2023-12-18 22:17:29 +01:00
Louis Dureuil
942d49314c
Remove dependency that requires libstdc++ 2023-12-18 22:17:18 +01:00
meili-bors[bot]
9a846e82bc
Merge #4268
4268: Add libstdc++ in Dockerfile r=curquiza a=sanders41

# Pull Request

## Related issue
Fixes #4267

## What does this PR do?
- Add libstdc++ in the Dockerfile

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Paul Sanders <psanders1@gmail.com>
2023-12-18 18:35:53 +00:00
Paul Sanders
9df8cfc013 Add libstdc++ in Dockerfile 2023-12-18 13:05:46 -05:00
meili-bors[bot]
248aaa6d45
Merge #4262
4262: Update version for the next release (v1.6.0) in Cargo.toml r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-12-18 14:00:19 +00:00
curquiza
50d6317ec0 Update version for the next release (v1.6.0) in Cargo.toml 2023-12-18 13:57:46 +00:00
meili-bors[bot]
b734bd9891
Merge #4261
4261: Set rust toolchain to 1.71.1 in dockerfile r=curquiza a=dureuill

Fixes docker [CI](https://github.com/meilisearch/meilisearch/actions/workflows/publish-docker-images.yml)

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-12-18 12:32:26 +00:00
Louis Dureuil
9800d5a103
Set rust toolchain to 1.71.1 in dockerfile 2023-12-18 10:59:25 +01:00
meili-bors[bot]
7c4ed07617
Merge #4257
4257: Change proximity precision settings r=dureuill a=ManyTheFish

- [x] Add proximity_precision value into the analytics
- [x] Change the naming of `attributeScale` and `wordScale` into `byAttribute` and `byWord`
- [x] Remove proximityPrecision from the experimental feature

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-12-18 09:07:28 +00:00
ManyTheFish
3a99a555a2 Fix experimental features snapshots in tests 2023-12-18 10:05:51 +01:00
Many the fish
9e1b458010
Merge branch 'main' into change-proximity-precision-settings 2023-12-18 09:08:47 +01:00
meili-bors[bot]
2aede03bc2
Merge #4226
4226: Hybrid search r=dureuill a=dureuill

Allows to perform hybrid search requests that combine the results of semantic and keyword search and automatically generate embeddings.

## How to use

See [feature description](https://meilisearch.notion.site/v1-6-Hybrid-Search-Embedders-ea42c82f90cc4bc0be1eeb917c1118c8)

## Changes

- work is based on #4213 
- milli::new search now takes an input universe directly, rather than computing it from a filter. This adds flexibility to require results on a subset of documents
- vector search is now a regular ranking rule (akin to sort and geosort) and reports its score as a ScoreDetail
- separate keyword search and vector search functions, vector search now respects (geo)sort ranking rules
- add automatic embedding
- add hybrid search

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-12-14 16:24:56 +00:00
ManyTheFish
e741bc1c62 Add proximity_precision value into the analytics 2023-12-14 16:48:06 +01:00
ManyTheFish
6425996e36 Change the naming of attributeScale and wordScale into byAttribute and byWord 2023-12-14 16:31:00 +01:00
Louis Dureuil
eb5cb91da2
Switch default from hf to openai 2023-12-14 16:19:46 +01:00
Louis Dureuil
87bba98bd8
Various changes
- fixed seed for arroy
- check vector dimensions as soon as it is provided to search
- don't embed whitespace
2023-12-14 16:08:42 +01:00
Louis Dureuil
217105b7da
hybrid search uses semantic ratio, error handling 2023-12-14 16:08:42 +01:00
ManyTheFish
1b7c164a55
Pass the semantic ratio to milli 2023-12-14 16:08:42 +01:00
ManyTheFish
f3f3944469
Fix error checking 2023-12-14 16:08:42 +01:00
ManyTheFish
93dcbf598d
Deserialize semantic ratio 2023-12-14 16:08:42 +01:00
ManyTheFish
ac68f33194
Add simple test 2023-12-14 16:08:42 +01:00
ManyTheFish
9991152bbe
Add TODOs 2023-12-14 16:08:42 +01:00
Louis Dureuil
a4536b1381
Small adjustments to respect the spec 2023-12-14 16:08:42 +01:00
Louis Dureuil
5b51cb04af
Remove some settings 2023-12-14 16:08:42 +01:00
Louis Dureuil
3c1a14f1cd
Add settings routes 2023-12-14 16:08:42 +01:00
Louis Dureuil
b8e4709dfa
Remove prompt strategy and fallback 2023-12-14 16:08:41 +01:00
Louis Dureuil
806e5b6899
Tests pass 2023-12-14 16:08:41 +01:00
Louis Dureuil
61bd2fb7a9
Update arroy 2023-12-14 16:08:41 +01:00
Louis Dureuil
e0cc775dc4
Various changes
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
Louis Dureuil
12940d79a9
WIP
- manual embedder
- multi embedders OK
- clippy + tests OK
2023-12-14 16:08:41 +01:00
Louis Dureuil
922a640188
WIP multi embedders
fixed template bugs
2023-12-14 16:08:41 +01:00
Louis Dureuil
abbe131084
Cosmetic change 2023-12-14 16:08:41 +01:00
Louis Dureuil
d4715e0c4d
Fix same vector sort bug 2023-12-14 16:08:41 +01:00
Louis Dureuil
11e2a2c1aa
Fix geosort bug 2023-12-14 16:08:41 +01:00
Louis Dureuil
65e49b7092
Remove stuff, add distribution shift (WIP) 2023-12-14 16:08:38 +01:00
Louis Dureuil
e56f160032
Actually pass embedders on reindex 2023-12-14 16:07:49 +01:00
Louis Dureuil
687d92f217
prompt bifluor+ 2023-12-14 16:07:49 +01:00
Louis Dureuil
fb539f61fe
WIP 2023-12-14 16:07:49 +01:00
Louis Dureuil
cb4ebe163e
WIP 2023-12-14 16:07:49 +01:00
Louis Dureuil
dde3a04679
WIP arroy integration 2023-12-14 16:07:49 +01:00
Louis Dureuil
13c2c6c16b
Small commit to add hybrid search and autoembedding 2023-12-14 16:07:48 +01:00
Louis Dureuil
21bcf32109
Add candle and hg_hub, updating a lot of deps in the process 2023-12-14 16:07:48 +01:00
ManyTheFish
35e1981488 Remove proximityPrecision form the experimental feature 2023-12-14 15:52:42 +01:00
meili-bors[bot]
e0f712b9d3
Merge #4254
4254: Bring back v1.5.1 changes into main r=ManyTheFish a=Kerollmops

This pull request brings back changes from the _release-v1.5.1_ branch into _main_.

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-12-14 09:41:57 +00:00
Clément Renault
56571f762a
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1 2023-12-13 11:57:01 +01:00
Clément Renault
005800634d
Merge pull request #4249 from meilisearch/flag-limit-batch-size
Introduce parameters to limit the number of batched tasks
2023-12-13 10:32:14 +01:00