Commit Graph

865 Commits

Author SHA1 Message Date
Tamo
d7844a6e45 add a bunch of tests on the errors of the distinct at search time 2024-06-17 15:37:32 +02:00
meili-bors[bot]
e9bf4c43a4
Merge #4649
4649: Don't store the vectors in the documents database r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607

## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
Tamo
a8a0854421
Update meilisearch/src/analytics/segment_analytics.rs 2024-06-17 14:30:50 +02:00
Louis Dureuil
09d9b63e1c
- test case where all vectors were generated
- update tests following changes in behavior from previous commit
2024-06-13 17:16:41 +02:00
Louis Dureuil
b9b938c902
Change retrieveVectors behavior:
- when the feature is disabled, documents are never modified
- when the feature is enabled and `retrieveVectors` is disabled, `_vectors` is removed from documents
- when the feature is enabled and `retrieveVectors` is enabled, vectors from the vectors DB are merged with `_vectors` in documents

Additionally `_vectors` is never displayed when the `displayedAttributes` list does not contain either `*` or `_vectors`

- fixed an issue where `_vectors` was not injected when all vectors in the dataset where always generated
2024-06-13 17:13:36 +02:00
Tamo
6bf07d969e add failing test 2024-06-13 15:49:42 +02:00
Louis Dureuil
3f212a8202
Update tests 2024-06-12 18:13:34 +02:00
Louis Dureuil
3bc8f81abc
user_provided => regenerate 2024-06-12 18:12:20 +02:00
Louis Dureuil
fca9fe39b3
Update test snapshots 2024-06-12 14:50:55 +02:00
meili-bors[bot]
e0eff08095
Merge #4685
4685: Fix ci tests r=dureuill a=ManyTheFish

# Pull Request
Make the all following CI succeed:
https://github.com/meilisearch/meilisearch/actions/runs/9477183091

## Related issue
Fixes #4629

## What does this PR do?
- Change the test behavior for `swedish-recomposition` feature flag
- Remove the `-v` parameter from grep

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-06-12 07:58:33 +00:00
Clément Renault
ee39309aae
Improve errors and introduce a new InvalidSearchDistinct error code 2024-06-11 16:03:39 -04:00
Clément Renault
0d31be1494
Make the distinct work at search 2024-06-11 11:39:35 -04:00
Tamo
3493093c4f add a batch of tests 2024-06-11 16:03:54 +02:00
Tamo
600e97d9dc gate the retrieveVectors parameter behind the vectors feature flag 2024-06-10 18:26:12 +02:00
Louis Dureuil
7559dfc814
Merge tag 'v1.8.2' into release-v1.9.0 2024-06-10 15:07:34 +02:00
Louis Dureuil
50f8218a5d
Asynchronously drop permits 2024-06-10 10:19:57 +02:00
ManyTheFish
57d066595b fix Tests almost all features 2024-06-06 17:24:50 +02:00
Tamo
734d1c53ad fix a panic in yaup 2024-06-06 16:31:07 +02:00
Tamo
63dded3961 implements the new analytics for the get documents routes 2024-06-06 11:39:29 +02:00
Tamo
2cdcb703d9 fix the deletion of vectors and add a test 2024-06-06 11:39:29 +02:00
Tamo
6607875f49 add the retrieveVectors parameter to the get and fetch documents route 2024-06-06 11:39:29 +02:00
Tamo
31a793d226 fix the regeneration of the embeddings in the search 2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82 rename all occurences of user_defined to user_provided for consistency 2024-06-06 11:39:29 +02:00
Tamo
49fa41ce65 apply first round of review comments 2024-06-06 11:39:29 +02:00
Tamo
400cf3eb92 add api error test on the new retrieveVectors parameter 2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7 makes clippy and fmt happy 2024-06-06 11:39:29 +02:00
Tamo
d92c173fdc update the new similar tests 2024-06-06 11:39:29 +02:00
Tamo
6b29676e7e update snapshots 2024-06-06 11:39:29 +02:00
Tamo
caad40964a implements the analytics 2024-06-06 11:39:29 +02:00
Tamo
cc5dca8321 fix two bug and add a dump test 2024-06-06 11:39:29 +02:00
Tamo
5d50850e12 always push the user defined vectors in arroy 2024-06-06 11:39:29 +02:00
Tamo
04f6523f3c expose a new parameter to retrieve the embedders at search time 2024-06-06 11:36:11 +02:00
meili-bors[bot]
98e062a714
Merge #4675
4675: Update actix-web 4.5.1 -> 4.6.0 r=dureuill a=dureuill

# Pull Request

- actix-web 4.5.1 -> 4.6.0
- actix-http 3.6.0 -> 3.7.0
- actix-web-static-files (commit 2d3b6160) -> 4.0.1
- tracing-actix-web 0.7.9 -> 0.7.10
- brotli 3.4.0 -> 6.0.0

## Related issue
Fixes #4625 


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-05 07:40:35 +00:00
Louis Dureuil
8412665957
Update actix-web 4.5.1 -> 4.6.0 2024-06-04 09:54:30 +02:00
meili-bors[bot]
fc584f1db3
Merge #4666
4666: Add a score threshold search parameter r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4609

## What does this PR do?
- See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17)


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-03 08:42:44 +00:00
Louis Dureuil
2b6db6541e
Changes after review 2024-06-03 10:30:00 +02:00
meili-bors[bot]
d6bd88ce4f
Merge #4667
4667: Frequency matching strategy r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #3773

## What does this PR do?
- add test for matching strategy
- implement frequency matching strategy

See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00).

[Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f).


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-05-30 14:53:31 +00:00
Louis Dureuil
c2fb7afe59
fmt 2024-05-30 12:06:46 +02:00
ManyTheFish
3f1a510069 Add tests and fix matching strategy 2024-05-30 12:02:42 +02:00
Louis Dureuil
41976b82b1
Tests for ranking_score_threshold 2024-05-30 11:22:26 +02:00
Louis Dureuil
c36410fcbf
Analytics for ranking score threshold 2024-05-30 11:22:12 +02:00
Louis Dureuil
7ce2691374
Add ranking score threshold to similar API 2024-05-30 11:21:31 +02:00
Louis Dureuil
c26db7878c
Expose rankingScoreThreshold in API 2024-05-30 10:32:35 +02:00
ManyTheFish
1ab88e10b9 Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 16:24:00 +02:00
ManyTheFish
6a4b2516aa WIP 2024-05-29 16:21:24 +02:00
Louis Dureuil
2cf3e1c80a
Temporarily ignore perform snapshot test under Windows 2024-05-29 12:42:47 +02:00
Many the fish
e1fbfde6c4
Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 11:31:03 +02:00
Louis Dureuil
ca006e38ec
Basic tests 2024-05-28 15:28:19 +02:00
Louis Dureuil
e26bd87780
Error tests for similar routes 2024-05-28 15:28:19 +02:00
Louis Dureuil
c01e498a63
Test server can call similar 2024-05-28 15:28:19 +02:00
Louis Dureuil
ca6cc4654b
Add similar route 2024-05-28 15:28:19 +02:00
Louis Dureuil
3bd9d2478c
Add error codes 2024-05-28 15:27:43 +02:00
Louis Dureuil
54b15059a0
Analytics changes 2024-05-28 15:27:43 +02:00
Louis Dureuil
e172e938e7
add search rules directly takes the filter rather than the searchquery 2024-05-28 15:22:25 +02:00
Louis Dureuil
02b3d82c60
filtered_universe accepts index and txn instead of SearchContext 2024-05-28 15:22:12 +02:00
Clément Renault
487431a035
Fix tests 2024-05-27 16:12:20 +02:00
Clément Renault
b6d450d484
Remove puffin experimental feature 2024-05-27 15:59:28 +02:00
Clément Renault
7f3e51349e
Remove puffin for the dependencies 2024-05-27 15:53:06 +02:00
meili-bors[bot]
19acc65ad2
Merge #4646
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops

This PR implements what is described in #4485. It reduces the number of disk writes and disk usage.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
Clément Renault
fe17c0f52e
Construct the minimal OBKVs according to the settings diff 2024-05-23 11:23:57 +02:00
meili-bors[bot]
14bc80e3df
Merge #4633
4633: Allow to mark vectors as "userProvided" r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #4606 

## What does this PR do?

[See usage in PRD](https://meilisearch.notion.site/v1-9-AI-search-changes-e90d6803eca8417aa70a1ac5d0225697#deb96fb0595947bda7d4a371100326eb)

- Extends the shape of the special `_vectors` field in documents.
    - previously, the `_vectors` field had to be an object, with each field the name of a configured embedder, and each value either `null`, an embedding (array of numbers), or an array of embeddings.
    - In this PR, the value of an embedder in the `_vectors` field can additionally be an object. The object has two fields:
      1. `embeddings`: `null`, an embedding (array of numbers), or an array of embeddings.
      2. `userProvided`: a boolean indicating if the vector was provided by the user.
    - The previous form `embedder_or_array_of_embedders` is semantically equivalent to:
    ```json
    {
        "embeddings": embedder_or_array_of_embedders,
        "userProvided": true
    }
    ```
- During the indexing step, the subfields and values of the `_vectors` field that have `userProvided` set to **false** are added in the vector DB, but not in the documents DB: that means that future modifications of the documents will trigger a regeneration of that particular vector using the document template.
- This allows **importing** embeddings as a one-shot process, while still retaining the ability to regenerate embeddings on document change.
- The dump process now uses this ability: it enriches the `_vectors` fields of documents with the embeddings that were autogenerated, marking them as not `userProvided`. This allows importing the vectors from a dump without regenerating them.

### Tests

This PR adds the following tests

- Long-needed hybrid search tests of a simple hf embedder
- Dump test that imports vectors. Due to the difficulty of actually importing a dump in tests, we just read the dump and check it contains the expected content.
- Tests in the index-scheduler: this tests that documents containing the same kind of instructions as in the dump indexes as expected


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-05-23 08:17:54 +00:00
ManyTheFish
3e94a90722 Fixes 2024-05-21 13:39:46 +02:00
Tamo
7e251b43d4
Revert "Stream documents" 2024-05-20 15:09:45 +02:00
Louis Dureuil
afcd7b9f0c
Test hybrid search with hf embedder 2024-05-20 14:44:10 +02:00
Tamo
0f78703b85 add a test reproducing the bug 2024-05-20 10:58:08 +02:00
Louis Dureuil
d05d49ffd8
Fix tests 2024-05-20 10:36:18 +02:00
Tamo
273c6e8c5c uses the latest version of heed to get rid of unsafe code 2024-05-16 18:31:32 +02:00
Tamo
c85d1752dd keep the same rtxn to compute the filters on the documents and to stream the documents later on 2024-05-16 18:31:32 +02:00
Tamo
8e6ffbfc6f stream documents 2024-05-16 18:31:32 +02:00
Tamo
673b6e1dc0 fix a flaky test 2024-05-16 11:28:14 +02:00
Tamo
f2d0a59f1d when no searchable attributes are defined, makes all the weight equals to zero 2024-05-16 01:06:33 +02:00
Tamo
7ec4e2a3fb apply all style review comments 2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d make clippy happy 2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a make the attribute ranking rule use the weights and fix the tests 2024-05-14 17:36:32 +02:00
Clément Renault
f33a1282f8
Bump Rustls to v0.21.12 2024-05-07 10:31:39 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
Tamo
3698aef66b fix warning 2024-05-06 11:36:37 +02:00
Simon Detheridge
7f5ab3cef5
Use http path pattern instead of full path in metrics 2024-05-03 12:29:31 +01:00
Louis Dureuil
5a305bfdea
Remove unused struct 2024-05-02 16:14:37 +02:00
ManyTheFish
88174b8ae4 Update charabia v0.8.10 2024-04-30 14:30:23 +02:00
meili-bors[bot]
c793b6ef6d
Merge #4600
4600: Fix embedders api r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #4594
Fixes #4595


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
ManyTheFish
cbbfff3594 Remove debuging prints 2024-04-25 10:37:18 +02:00
ManyTheFish
7468c1cf8d Introduce WildcardSetting that are serialized as wildcards by default 2024-04-24 18:15:03 +02:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
ManyTheFish
e87cb373de Avoid intermediate serializing when displaying settings 2024-04-24 12:33:07 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools 2024-04-22 18:09:33 +02:00
meili-bors[bot]
aa0bbbb246
Merge #4578
4578: Remove useless analytics r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes #4577

## What does this PR do?
Remove the following analytics:
- `Health Seen`
- `Stats Seen`
- `Task Seen`
- `Version Seen`


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-18 13:30:42 +00:00
ManyTheFish
c71b5d09ff Updatre charabia v0.8.9 2024-04-18 11:38:26 +02:00
meili-bors[bot]
2dfee2fad5
Merge #4580
4580: Update the search logs r=Kerollmops a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4579

## What does this PR do?
- Update the debug implementation of the search query and search results so it’s way smaller and doesn’t display useless information


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-17 14:25:43 +00:00
Tamo
4a68e9f6ae reorganize the debug implementation of the search results and only dispaly the meaningful informations 2024-04-17 13:42:10 +02:00
Tamo
206887c7a2 update the SearchQuery Debug implementation so it’s smaller and gives the most important informations first 2024-04-17 12:57:19 +02:00
Tamo
2dd9dd6d0a remove the Health Seen analytic 2024-04-17 11:43:40 +02:00
Tamo
e1f27de51a remove the Stats Seen analytic 2024-04-16 18:49:41 +02:00
Tamo
abae31aee0 remove the Task Seen analytic 2024-04-16 18:48:10 +02:00
Tamo
70ce0095ea remove the Version Seen analytic 2024-04-16 18:48:03 +02:00
ManyTheFish
a1ea224da9 Fix tests 2024-04-16 17:29:34 +02:00
ManyTheFish
a489b406b4 fix test 2024-04-16 14:39:06 +02:00
Louis Dureuil
a9013ed683
Fix comment mistake
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-04 17:21:47 +02:00
Louis Dureuil
ca499a0302
Fix test after rebase 2024-04-04 16:04:07 +02:00
Louis Dureuil
355e5282b2
Remove _semanticScore 2024-04-04 16:04:07 +02:00