Commit Graph

2464 Commits

Author SHA1 Message Date
meili-bors[bot]
3c4c46377b
Merge #4665
4665: Add missing Korean support r=ManyTheFish a=junhochoi

Some configuration is missing `korean` features and add a test case in `milli/src/search/mod.rs`.

# Pull Request

## Related issue

#3443 #3882 

## What does this PR do?
- Improvement on enabling Korean support

Inspired by the work (#3882) I tried to enable Korean features but have found some missing configurations.
This PR is add those missing configs (mostly Cargo.toml) and added one test case.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Junho Choi <jh.choi@catenoid.net>
2024-06-25 11:51:21 +00:00
Louis Dureuil
d75e0098c7
Fixes for Rust v1.79 2024-06-25 11:16:06 +02:00
Junho Choi
2e0ff56f3f Add missing Korean support
Some configuration is missing `korean` features and
add a test case in `milli/src/search/mod.rs`.
2024-06-25 12:45:21 +09:00
Tamo
1693332cab Update arroy and always build the tree that need to be built 2024-06-24 10:14:03 +02:00
meili-bors[bot]
ddd564665b
Merge #4713
4713: Speed up facet distribution r=ManyTheFish a=Kerollmops

This PR is akin to #4682, but this time, the same logic is applied to the facets. Bitmaps are not decoded, and we do an intersection on the bytes with the search candidates instead of materializing the RoaringBitmap to destroy it just after the operation.

A prospect raised some slow requests when performing facet searches, and I found out that the disk optimization intersection wasn't performed on the facets.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-06-24 05:23:46 +00:00
Clément Renault
9736e16a88
Make clippy happy 2024-06-20 13:02:44 +02:00
Clément Renault
6fa4da8ae7
Improve facet distribution speed in count mode 2024-06-20 12:58:51 +02:00
Clément Renault
19d7cdc20d
Improve facet distribution speed in lexico mode 2024-06-20 12:57:08 +02:00
Louis Dureuil
a04041c8f2
Only spawn the pool once 2024-06-19 16:25:33 +02:00
meili-bors[bot]
e580d6b98f
Merge #4693
4693: Introduce distinct attributes at search time r=irevoire a=Kerollmops

This PR fixes #4611.

### To Do
- [x] Remove the `distinguishableAttributes` settings (not even a commit about that).
- [x] Use the `filterableAttributes` to be able to use the `distinct` parameter at search.
- [x] Work on the errors and make tests.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-06-18 07:45:03 +00:00
Tamo
43875e6758 fix bug around nested fields 2024-06-17 15:59:30 +02:00
meili-bors[bot]
e9bf4c43a4
Merge #4649
4649: Don't store the vectors in the documents database r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607

## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
Louis Dureuil
0a8f50695e
Fixes for Rust v1.79 2024-06-13 17:47:44 +02:00
Louis Dureuil
e35ef31738
Small changes following review 2024-06-13 14:20:48 +02:00
Louis Dureuil
3bc8f81abc
user_provided => regenerate 2024-06-12 18:12:20 +02:00
Louis Dureuil
a89eea233b
Fix vectors injection 2024-06-12 17:10:19 +02:00
Louis Dureuil
f5cf01e7d1
Rework extraction to use EmbedderAction 2024-06-12 14:50:55 +02:00
Louis Dureuil
d1dd7e5d09
In transform for removed embedders, write back their user provided vectors in documents, and clear the writers 2024-06-12 14:50:55 +02:00
Louis Dureuil
d18c1f77d7
Update embedder configs with a finer granularity
- no longer clear vector DB between any two embedder changes
2024-06-12 14:50:55 +02:00
Louis Dureuil
d0b05ae691
Add EmbedderAction to settings 2024-06-12 14:50:54 +02:00
Louis Dureuil
e9bf4eb100
Reformulate ParsedVectorsDiff in terms of VectorState 2024-06-12 14:11:44 +02:00
Louis Dureuil
b368105272
Add EmbedderConfigs::into_inner 2024-06-12 14:11:44 +02:00
meili-bors[bot]
e0eff08095
Merge #4685
4685: Fix ci tests r=dureuill a=ManyTheFish

# Pull Request
Make the all following CI succeed:
https://github.com/meilisearch/meilisearch/actions/runs/9477183091

## Related issue
Fixes #4629

## What does this PR do?
- Change the test behavior for `swedish-recomposition` feature flag
- Remove the `-v` parameter from grep

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-06-12 07:58:33 +00:00
Clément Renault
39f60abd7d
Add and modify distinct tests 2024-06-11 17:53:53 -04:00
Clément Renault
1991bd03da
Distinct at search erases the distinct in the settings 2024-06-11 17:02:39 -04:00
Clément Renault
ee39309aae
Improve errors and introduce a new InvalidSearchDistinct error code 2024-06-11 16:03:39 -04:00
Clément Renault
0d31be1494
Make the distinct work at search 2024-06-11 11:39:35 -04:00
Louis Dureuil
7cef2299cf
Fix behavior when removing a document 2024-06-11 09:45:08 +02:00
ManyTheFish
57d066595b fix Tests almost all features 2024-06-06 17:24:50 +02:00
Clément Renault
75b2e02cd2
Log more stuff around filtering 2024-06-06 11:00:07 -04:00
Clément Renault
52d0d35b39
Revert "Reduce the universe while exploring the facet tree" because it's slower this way
This reverts commit 14026115f21409535772ede0ee4273f37848dd61.
2024-06-06 09:17:51 -04:00
Clément Renault
5432776132
Reduce the universe while exploring the facet tree 2024-06-06 09:17:51 -04:00
Clément Renault
66470b27e6
Use the MultiOps trait for IN operations 2024-06-06 09:17:51 -04:00
Clément Renault
0a9bd398c7
Improve the NOT operator to use the universe when possible 2024-06-06 09:17:51 -04:00
Clément Renault
7967e93c16
Skip evaluating when a universe is empty, nothing can be found 2024-06-06 09:17:51 -04:00
Clément Renault
a6f3a01c6a
Expose the universe to do efficient intersections on deserialization 2024-06-06 09:17:51 -04:00
Clément Renault
4ca4a3f954
Make the CboRoaringBitmapCodec support intersection on deserialization 2024-06-06 09:17:51 -04:00
Clément Renault
e4a69c5ac3
Introduce the FacetGroupLazyValue type 2024-06-06 09:17:50 -04:00
Clément Renault
531e3d7d6a
MultiOps trait for OR operations 2024-06-06 09:17:50 -04:00
Tamo
2cdcb703d9 fix the deletion of vectors and add a test 2024-06-06 11:39:29 +02:00
Tamo
31a793d226 fix the regeneration of the embeddings in the search 2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82 rename all occurences of user_defined to user_provided for consistency 2024-06-06 11:39:29 +02:00
Tamo
b7349910d9 implements mor review comments 2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7 makes clippy and fmt happy 2024-06-06 11:39:29 +02:00
Tamo
b867829ef1 remove useless dbg 2024-06-06 11:39:29 +02:00
Tamo
5d50850e12 always push the user defined vectors in arroy 2024-06-06 11:39:29 +02:00
Tamo
a73ccc78a6 forward the embedding config to the extractors 2024-06-06 11:39:28 +02:00
Tamo
9eb6f522ea wraps the index embedding config in a struct 2024-06-06 11:37:30 +02:00
Tamo
04f6523f3c expose a new parameter to retrieve the embedders at search time 2024-06-06 11:36:11 +02:00
Tamo
84e498299b Remove the vectors from the documents database 2024-06-06 11:36:11 +02:00
Tamo
7a84697570 never store the _vectors as searchable or faceted fields 2024-06-06 11:36:11 +02:00
Tamo
4148fbbe85 provide a method to get all the nested fields ids from a name 2024-06-06 11:36:11 +02:00
ManyTheFish
2e50c6ec81 Update Charabia 2024-06-06 10:18:43 +02:00
ManyTheFish
30293883e0 Fix condition mistake 2024-06-05 17:30:07 +02:00
ManyTheFish
b833be46b9 Avoid running proximity when only the exact attributes changes 2024-06-05 17:30:07 +02:00
ManyTheFish
0a4118329e Put only_additional_fields to None if the difference gives an empty result. 2024-06-05 17:30:07 +02:00
ManyTheFish
261e92d7e6 Skip iterating over documents when the faceted field list doesn't change 2024-06-05 17:30:07 +02:00
ManyTheFish
5cd08979b1 iterate over the faceted fields instead of over the whole document 2024-06-05 17:30:07 +02:00
Clément Renault
a998b881f6 Cache a lot of operations to know if a field must be indexed 2024-06-05 17:30:07 +02:00
Clément Renault
b81953a65d Add a span for the prepare_for_documents_reindexing 2024-06-05 17:30:07 +02:00
Clément Renault
091bb157f1 Add a span for the settings diff creation 2024-06-05 17:30:07 +02:00
Clément Renault
1b639ce44b Reduce the number of complex calls to settings diff functions 2024-06-05 17:30:07 +02:00
Clément Renault
87cf8a3c94 Introduce a new way to determine the operations to perform on the fields 2024-06-05 17:30:07 +02:00
Clément Renault
0f578348f1 Introduce a dedicated function to write proximity entries in database 2024-06-05 17:30:07 +02:00
Clément Renault
fad4675abe Give the settings diff to the write_typed_chunk_into_index function 2024-06-05 17:30:07 +02:00
Clément Renault
1ab03c4ede Fix an issue with settings diff and * in the searchable attributes 2024-06-05 17:30:07 +02:00
Clément Renault
0c6e4b2f00 Introducing a new into_del_add_obkv_conditional_operation function 2024-06-05 17:30:07 +02:00
Clément Renault
42b3f52ef9 Introduce the SettingDiff only_additional_fields method 2024-06-05 17:30:07 +02:00
meili-bors[bot]
93f5defedc
Merge #4656
4656: Adding a new `searchableAttribute` no longer re-index all the attributes r=ManyTheFish a=Kerollmops

Fixes #4492.

## To Do
 - [x] Do not call the `InnerSettingsDiff::only_additional_fields` function too many times
 - [ ] Add tests

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-06-05 14:51:14 +00:00
ManyTheFish
33241a6b12 Fix condition mistake 2024-06-05 16:00:24 +02:00
ManyTheFish
ff87b4db26 Avoid running proximity when only the exact attributes changes 2024-06-05 12:48:44 +02:00
ManyTheFish
ba9fadc8f1 Put only_additional_fields to None if the difference gives an empty result. 2024-06-05 10:51:16 +02:00
ManyTheFish
d29d4f88da Skip iterating over documents when the faceted field list doesn't change 2024-06-04 15:31:24 +02:00
ManyTheFish
17c5ceeb9d iterate over the faceted fields instead of over the whole document 2024-06-04 14:04:20 +02:00
meili-bors[bot]
fc584f1db3
Merge #4666
4666: Add a score threshold search parameter r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4609

## What does this PR do?
- See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17)


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-03 08:42:44 +00:00
Louis Dureuil
2b6db6541e
Changes after review 2024-06-03 10:30:00 +02:00
meili-bors[bot]
d6bd88ce4f
Merge #4667
4667: Frequency matching strategy r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #3773

## What does this PR do?
- add test for matching strategy
- implement frequency matching strategy

See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00).

[Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f).


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-05-30 14:53:31 +00:00
Clément Renault
b9a0ff0dd6
Cache a lot of operations to know if a field must be indexed 2024-05-30 16:18:23 +02:00
Clément Renault
75496af985
Add a span for the prepare_for_documents_reindexing 2024-05-30 12:14:22 +02:00
Clément Renault
0e9eb9eedb
Add a span for the settings diff creation 2024-05-30 12:08:27 +02:00
ManyTheFish
3f1a510069 Add tests and fix matching strategy 2024-05-30 12:02:42 +02:00
Clément Renault
3a78e988da
Reduce the number of complex calls to settings diff functions 2024-05-30 11:23:07 +02:00
Clément Renault
d9e5074189
Introduce a new way to determine the operations to perform on the fields 2024-05-30 11:23:07 +02:00
Clément Renault
bc210bdc00
Introduce a dedicated function to write proximity entries in database 2024-05-30 11:23:06 +02:00
Clément Renault
4bf83f701c
Give the settings diff to the write_typed_chunk_into_index function 2024-05-30 11:23:06 +02:00
Clément Renault
db3887929f
Fix an issue with settings diff and * in the searchable attributes 2024-05-30 11:22:50 +02:00
Clément Renault
9af103a88e
Introducing a new into_del_add_obkv_conditional_operation function 2024-05-30 11:22:49 +02:00
Clément Renault
99211eb375
Introduce the SettingDiff only_additional_fields method 2024-05-30 11:22:49 +02:00
Louis Dureuil
4f03b0cf5b
Add ranking score threshold to similar 2024-05-30 11:20:50 +02:00
Louis Dureuil
c26db7878c
Expose rankingScoreThreshold in API 2024-05-30 10:32:35 +02:00
ManyTheFish
1ab88e10b9 Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 16:24:00 +02:00
Louis Dureuil
aac1d769a7
Add ranking_score_threshold to milli 2024-05-29 14:17:09 +02:00
ManyTheFish
abdc4afcca Implement Frequency matching strategy 2024-05-29 13:59:08 +02:00
Many the fish
e1fbfde6c4
Merge branch 'main' into merge-release-v1.8.1-in-main 2024-05-29 11:31:03 +02:00
ManyTheFish
27b75ec648 merge main into v1.8.1 2024-05-29 11:26:07 +02:00
Louis Dureuil
ca6cc4654b
Add similar route 2024-05-28 15:28:19 +02:00
Louis Dureuil
d35278320e
Add support functions for accessing arroy writers and readers 2024-05-28 15:27:43 +02:00
Louis Dureuil
02b3d82c60
filtered_universe accepts index and txn instead of SearchContext 2024-05-28 15:22:12 +02:00
Louis Dureuil
fd2c95999d
Change validate_document_id to public and remove extra layer of result 2024-05-28 15:21:19 +02:00
Clément Renault
dc949ab46a
Remove puffin usage 2024-05-27 15:59:14 +02:00
Clément Renault
7f3e51349e
Remove puffin for the dependencies 2024-05-27 15:53:06 +02:00
meili-bors[bot]
19acc65ad2
Merge #4646
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops

This PR implements what is described in #4485. It reduces the number of disk writes and disk usage.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
Clément Renault
fe17c0f52e
Construct the minimal OBKVs according to the settings diff 2024-05-23 11:23:57 +02:00
Clément Renault
bc5663e673
FieldIdsMap no longer useful thanks to #4631 2024-05-22 16:06:15 +02:00
Louis Dureuil
8a941c0241
Smaller review changes 2024-05-22 14:44:42 +02:00
Louis Dureuil
3412e7fbcf
"[]" is deserialized as 0 embedding rather than 1 embedding of dim 0 2024-05-22 12:25:21 +02:00
Louis Dureuil
16037e2169
Don't remove embedders that are not in the config from the document DB 2024-05-22 12:24:51 +02:00
Louis Dureuil
8f7c8ca7f0
Remove now unused error variant 2024-05-22 12:23:43 +02:00
Clément Renault
500ddc76b5
Make the flattened sorter optional 2024-05-21 16:16:36 +02:00
Clément Renault
943f8dba0c
Make clippy happy 2024-05-21 14:58:41 +02:00
Clément Renault
1aa8ed9ef7
Make the original sorter optional 2024-05-21 14:53:26 +02:00
ManyTheFish
f762307838 Fix clippy 2024-05-21 13:44:20 +02:00
ManyTheFish
3e94a90722 Fixes 2024-05-21 13:39:46 +02:00
Louis Dureuil
b17cb56dee
Test array of vectors 2024-05-20 14:44:10 +02:00
ManyTheFish
fc7e817221 Index geo points based on the settings differences 2024-05-20 12:27:26 +02:00
Louis Dureuil
d05d49ffd8
Fix tests 2024-05-20 10:36:18 +02:00
Louis Dureuil
0462ebbe58
Don't write an empty _vectors field 2024-05-20 10:36:18 +02:00
Louis Dureuil
2f7a8a4efb
Don't write vectors that weren't autogenerated in document DB 2024-05-20 10:36:18 +02:00
Louis Dureuil
52d9cb6e5a
Refactor vector indexing
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Louis Dureuil
261de888b7
Add function to get the embeddings of a document in an index 2024-05-20 10:36:17 +02:00
Louis Dureuil
98c811247e
Add parsed vectors module 2024-05-20 10:25:59 +02:00
Tamo
273c6e8c5c uses the latest version of heed to get rid of unsafe code 2024-05-16 18:31:32 +02:00
Tamo
897d25780e update milli to latest version 2024-05-16 18:31:32 +02:00
Tamo
f2d0a59f1d when no searchable attributes are defined, makes all the weight equals to zero 2024-05-16 01:06:33 +02:00
Tamo
c78a2fa4f5 rename method and variable around the attributes to search on feature 2024-05-15 18:04:42 +02:00
Tamo
5542f1d9f1 get back to what we were doingb efore in the DB cache and with the restricted field id 2024-05-15 18:00:39 +02:00
Tamo
ad4d8502b3 stops storing the whole fieldids weights map when no searchable are defined 2024-05-15 17:16:10 +02:00
Tamo
7ec4e2a3fb apply all style review comments 2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d make clippy happy 2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a make the attribute ranking rule use the weights and fix the tests 2024-05-14 17:36:32 +02:00
Tamo
a0082c4df9 add a failing test on the attribute ranking rule 2024-05-14 17:00:02 +02:00
Tamo
b0afe0972e stop updating the fields ids map when fields are only swapped 2024-05-14 17:00:02 +02:00
Tamo
9ecde41853 add a test on the current behaviour 2024-05-14 17:00:02 +02:00
Tamo
685f452fb2 Fix the indexing of the searchable 2024-05-14 17:00:02 +02:00
Tamo
4e4a1ddff7 gate a test behind the required feature 2024-05-14 17:00:02 +02:00
Tamo
c22460045c Stops returning an option in the internal searchable fields 2024-05-14 17:00:02 +02:00
Clément Renault
ac4bc143c4
Bump ureq to v2.9.7 2024-05-07 10:39:38 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
Louis Dureuil
f4dd73ec8c
Destructure EmbedderOptions so we don't miss some options 2024-05-02 15:39:36 +02:00
ManyTheFish
88174b8ae4 Update charabia v0.8.10 2024-04-30 14:30:23 +02:00
meili-bors[bot]
ebca29f3de
Merge #4597
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish

# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders

## Related issue
Fixes #4585


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 16:37:28 +00:00
meili-bors[bot]
c793b6ef6d
Merge #4600
4600: Fix embedders api r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #4594
Fixes #4595


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
ManyTheFish
9b76501875 Display set API key for Ollama embedder 2024-04-24 12:33:07 +02:00
Clément Renault
b3173d0423
Remove useless dots in the error messages 2024-04-22 18:09:33 +02:00
Clément Renault
96cc5319c8
Introduce a new internal error type to categorize panics 2024-04-22 18:09:33 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools 2024-04-22 18:09:33 +02:00
ManyTheFish
a1aa999026 Add conditions reducing wrok 2024-04-22 14:18:35 +02:00
ManyTheFish
c71b5d09ff Updatre charabia v0.8.9 2024-04-18 11:38:26 +02:00
writegr
ab43a8a949 chore: fix some typos in comments
Signed-off-by: writegr <wellweek@outlook.com>
2024-04-18 14:12:52 +08:00