Tamo
1693332cab
Update arroy and always build the tree that need to be built
2024-06-24 10:14:03 +02:00
meili-bors[bot]
ddd564665b
Merge #4713
...
4713: Speed up facet distribution r=ManyTheFish a=Kerollmops
This PR is akin to #4682 , but this time, the same logic is applied to the facets. Bitmaps are not decoded, and we do an intersection on the bytes with the search candidates instead of materializing the RoaringBitmap to destroy it just after the operation.
A prospect raised some slow requests when performing facet searches, and I found out that the disk optimization intersection wasn't performed on the facets.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-06-24 05:23:46 +00:00
Clément Renault
9736e16a88
Make clippy happy
2024-06-20 13:02:44 +02:00
Clément Renault
6fa4da8ae7
Improve facet distribution speed in count mode
2024-06-20 12:58:51 +02:00
Clément Renault
19d7cdc20d
Improve facet distribution speed in lexico mode
2024-06-20 12:57:08 +02:00
Louis Dureuil
a04041c8f2
Only spawn the pool once
2024-06-19 16:25:33 +02:00
meili-bors[bot]
e580d6b98f
Merge #4693
...
4693: Introduce distinct attributes at search time r=irevoire a=Kerollmops
This PR fixes #4611 .
### To Do
- [x] Remove the `distinguishableAttributes` settings (not even a commit about that).
- [x] Use the `filterableAttributes` to be able to use the `distinct` parameter at search.
- [x] Work on the errors and make tests.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-06-18 07:45:03 +00:00
Tamo
43875e6758
fix bug around nested fields
2024-06-17 15:59:30 +02:00
meili-bors[bot]
e9bf4c43a4
Merge #4649
...
4649: Don't store the vectors in the documents database r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607
## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
Louis Dureuil
0a8f50695e
Fixes for Rust v1.79
2024-06-13 17:47:44 +02:00
Louis Dureuil
e35ef31738
Small changes following review
2024-06-13 14:20:48 +02:00
Louis Dureuil
3bc8f81abc
user_provided => regenerate
2024-06-12 18:12:20 +02:00
Louis Dureuil
a89eea233b
Fix vectors injection
2024-06-12 17:10:19 +02:00
Louis Dureuil
f5cf01e7d1
Rework extraction to use EmbedderAction
2024-06-12 14:50:55 +02:00
Louis Dureuil
d1dd7e5d09
In transform for removed embedders, write back their user provided vectors in documents, and clear the writers
2024-06-12 14:50:55 +02:00
Louis Dureuil
d18c1f77d7
Update embedder configs with a finer granularity
...
- no longer clear vector DB between any two embedder changes
2024-06-12 14:50:55 +02:00
Louis Dureuil
d0b05ae691
Add EmbedderAction to settings
2024-06-12 14:50:54 +02:00
Louis Dureuil
e9bf4eb100
Reformulate ParsedVectorsDiff in terms of VectorState
2024-06-12 14:11:44 +02:00
Louis Dureuil
b368105272
Add EmbedderConfigs::into_inner
2024-06-12 14:11:44 +02:00
meili-bors[bot]
e0eff08095
Merge #4685
...
4685: Fix ci tests r=dureuill a=ManyTheFish
# Pull Request
Make the all following CI succeed:
https://github.com/meilisearch/meilisearch/actions/runs/9477183091
## Related issue
Fixes #4629
## What does this PR do?
- Change the test behavior for `swedish-recomposition` feature flag
- Remove the `-v` parameter from grep
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-06-12 07:58:33 +00:00
Clément Renault
39f60abd7d
Add and modify distinct tests
2024-06-11 17:53:53 -04:00
Clément Renault
1991bd03da
Distinct at search erases the distinct in the settings
2024-06-11 17:02:39 -04:00
Clément Renault
ee39309aae
Improve errors and introduce a new InvalidSearchDistinct error code
2024-06-11 16:03:39 -04:00
Clément Renault
0d31be1494
Make the distinct work at search
2024-06-11 11:39:35 -04:00
Louis Dureuil
7cef2299cf
Fix behavior when removing a document
2024-06-11 09:45:08 +02:00
ManyTheFish
57d066595b
fix Tests almost all features
2024-06-06 17:24:50 +02:00
Clément Renault
75b2e02cd2
Log more stuff around filtering
2024-06-06 11:00:07 -04:00
Clément Renault
52d0d35b39
Revert "Reduce the universe while exploring the facet tree" because it's slower this way
...
This reverts commit 14026115f21409535772ede0ee4273f37848dd61.
2024-06-06 09:17:51 -04:00
Clément Renault
5432776132
Reduce the universe while exploring the facet tree
2024-06-06 09:17:51 -04:00
Clément Renault
66470b27e6
Use the MultiOps trait for IN operations
2024-06-06 09:17:51 -04:00
Clément Renault
0a9bd398c7
Improve the NOT operator to use the universe when possible
2024-06-06 09:17:51 -04:00
Clément Renault
7967e93c16
Skip evaluating when a universe is empty, nothing can be found
2024-06-06 09:17:51 -04:00
Clément Renault
a6f3a01c6a
Expose the universe to do efficient intersections on deserialization
2024-06-06 09:17:51 -04:00
Clément Renault
4ca4a3f954
Make the CboRoaringBitmapCodec support intersection on deserialization
2024-06-06 09:17:51 -04:00
Clément Renault
e4a69c5ac3
Introduce the FacetGroupLazyValue type
2024-06-06 09:17:50 -04:00
Clément Renault
531e3d7d6a
MultiOps trait for OR operations
2024-06-06 09:17:50 -04:00
Tamo
2cdcb703d9
fix the deletion of vectors and add a test
2024-06-06 11:39:29 +02:00
Tamo
31a793d226
fix the regeneration of the embeddings in the search
2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82
rename all occurences of user_defined to user_provided for consistency
2024-06-06 11:39:29 +02:00
Tamo
b7349910d9
implements mor review comments
2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7
makes clippy and fmt happy
2024-06-06 11:39:29 +02:00
Tamo
b867829ef1
remove useless dbg
2024-06-06 11:39:29 +02:00
Tamo
5d50850e12
always push the user defined vectors in arroy
2024-06-06 11:39:29 +02:00
Tamo
a73ccc78a6
forward the embedding config to the extractors
2024-06-06 11:39:28 +02:00
Tamo
9eb6f522ea
wraps the index embedding config in a struct
2024-06-06 11:37:30 +02:00
Tamo
04f6523f3c
expose a new parameter to retrieve the embedders at search time
2024-06-06 11:36:11 +02:00
Tamo
84e498299b
Remove the vectors from the documents database
2024-06-06 11:36:11 +02:00
Tamo
7a84697570
never store the _vectors as searchable or faceted fields
2024-06-06 11:36:11 +02:00
Tamo
4148fbbe85
provide a method to get all the nested fields ids from a name
2024-06-06 11:36:11 +02:00
ManyTheFish
2e50c6ec81
Update Charabia
2024-06-06 10:18:43 +02:00
ManyTheFish
30293883e0
Fix condition mistake
2024-06-05 17:30:07 +02:00
ManyTheFish
b833be46b9
Avoid running proximity when only the exact attributes changes
2024-06-05 17:30:07 +02:00
ManyTheFish
0a4118329e
Put only_additional_fields to None if the difference gives an empty result.
2024-06-05 17:30:07 +02:00
ManyTheFish
261e92d7e6
Skip iterating over documents when the faceted field list doesn't change
2024-06-05 17:30:07 +02:00
ManyTheFish
5cd08979b1
iterate over the faceted fields instead of over the whole document
2024-06-05 17:30:07 +02:00
Clément Renault
a998b881f6
Cache a lot of operations to know if a field must be indexed
2024-06-05 17:30:07 +02:00
Clément Renault
b81953a65d
Add a span for the prepare_for_documents_reindexing
2024-06-05 17:30:07 +02:00
Clément Renault
091bb157f1
Add a span for the settings diff creation
2024-06-05 17:30:07 +02:00
Clément Renault
1b639ce44b
Reduce the number of complex calls to settings diff functions
2024-06-05 17:30:07 +02:00
Clément Renault
87cf8a3c94
Introduce a new way to determine the operations to perform on the fields
2024-06-05 17:30:07 +02:00
Clément Renault
0f578348f1
Introduce a dedicated function to write proximity entries in database
2024-06-05 17:30:07 +02:00
Clément Renault
fad4675abe
Give the settings diff to the write_typed_chunk_into_index function
2024-06-05 17:30:07 +02:00
Clément Renault
1ab03c4ede
Fix an issue with settings diff and * in the searchable attributes
2024-06-05 17:30:07 +02:00
Clément Renault
0c6e4b2f00
Introducing a new into_del_add_obkv_conditional_operation function
2024-06-05 17:30:07 +02:00
Clément Renault
42b3f52ef9
Introduce the SettingDiff only_additional_fields method
2024-06-05 17:30:07 +02:00
meili-bors[bot]
93f5defedc
Merge #4656
...
4656: Adding a new `searchableAttribute` no longer re-index all the attributes r=ManyTheFish a=Kerollmops
Fixes #4492 .
## To Do
- [x] Do not call the `InnerSettingsDiff::only_additional_fields` function too many times
- [ ] Add tests
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-06-05 14:51:14 +00:00
ManyTheFish
33241a6b12
Fix condition mistake
2024-06-05 16:00:24 +02:00
ManyTheFish
ff87b4db26
Avoid running proximity when only the exact attributes changes
2024-06-05 12:48:44 +02:00
ManyTheFish
ba9fadc8f1
Put only_additional_fields to None if the difference gives an empty result.
2024-06-05 10:51:16 +02:00
ManyTheFish
d29d4f88da
Skip iterating over documents when the faceted field list doesn't change
2024-06-04 15:31:24 +02:00
ManyTheFish
17c5ceeb9d
iterate over the faceted fields instead of over the whole document
2024-06-04 14:04:20 +02:00
meili-bors[bot]
fc584f1db3
Merge #4666
...
4666: Add a score threshold search parameter r=ManyTheFish a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4609
## What does this PR do?
- See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0 ) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17 )
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-03 08:42:44 +00:00
Louis Dureuil
2b6db6541e
Changes after review
2024-06-03 10:30:00 +02:00
meili-bors[bot]
d6bd88ce4f
Merge #4667
...
4667: Frequency matching strategy r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes #3773
## What does this PR do?
- add test for matching strategy
- implement frequency matching strategy
See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00 ).
[Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f ).
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-05-30 14:53:31 +00:00
Clément Renault
b9a0ff0dd6
Cache a lot of operations to know if a field must be indexed
2024-05-30 16:18:23 +02:00
Clément Renault
75496af985
Add a span for the prepare_for_documents_reindexing
2024-05-30 12:14:22 +02:00
Clément Renault
0e9eb9eedb
Add a span for the settings diff creation
2024-05-30 12:08:27 +02:00
ManyTheFish
3f1a510069
Add tests and fix matching strategy
2024-05-30 12:02:42 +02:00
Clément Renault
3a78e988da
Reduce the number of complex calls to settings diff functions
2024-05-30 11:23:07 +02:00
Clément Renault
d9e5074189
Introduce a new way to determine the operations to perform on the fields
2024-05-30 11:23:07 +02:00
Clément Renault
bc210bdc00
Introduce a dedicated function to write proximity entries in database
2024-05-30 11:23:06 +02:00
Clément Renault
4bf83f701c
Give the settings diff to the write_typed_chunk_into_index function
2024-05-30 11:23:06 +02:00
Clément Renault
db3887929f
Fix an issue with settings diff and * in the searchable attributes
2024-05-30 11:22:50 +02:00
Clément Renault
9af103a88e
Introducing a new into_del_add_obkv_conditional_operation function
2024-05-30 11:22:49 +02:00
Clément Renault
99211eb375
Introduce the SettingDiff only_additional_fields method
2024-05-30 11:22:49 +02:00
Louis Dureuil
4f03b0cf5b
Add ranking score threshold to similar
2024-05-30 11:20:50 +02:00
Louis Dureuil
c26db7878c
Expose rankingScoreThreshold in API
2024-05-30 10:32:35 +02:00
ManyTheFish
1ab88e10b9
Merge branch 'main' into merge-release-v1.8.1-in-main
2024-05-29 16:24:00 +02:00
Louis Dureuil
aac1d769a7
Add ranking_score_threshold to milli
2024-05-29 14:17:09 +02:00
ManyTheFish
abdc4afcca
Implement Frequency matching strategy
2024-05-29 13:59:08 +02:00
Many the fish
e1fbfde6c4
Merge branch 'main' into merge-release-v1.8.1-in-main
2024-05-29 11:31:03 +02:00
ManyTheFish
27b75ec648
merge main into v1.8.1
2024-05-29 11:26:07 +02:00
Louis Dureuil
ca6cc4654b
Add similar route
2024-05-28 15:28:19 +02:00
Louis Dureuil
d35278320e
Add support functions for accessing arroy writers and readers
2024-05-28 15:27:43 +02:00
Louis Dureuil
02b3d82c60
filtered_universe accepts index and txn instead of SearchContext
2024-05-28 15:22:12 +02:00
Louis Dureuil
fd2c95999d
Change validate_document_id
to public and remove extra layer of result
2024-05-28 15:21:19 +02:00
Clément Renault
dc949ab46a
Remove puffin usage
2024-05-27 15:59:14 +02:00
Clément Renault
7f3e51349e
Remove puffin for the dependencies
2024-05-27 15:53:06 +02:00
meili-bors[bot]
19acc65ad2
Merge #4646
...
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops
This PR implements what is described in #4485 . It reduces the number of disk writes and disk usage.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-23 16:06:50 +00:00
Clément Renault
fe17c0f52e
Construct the minimal OBKVs according to the settings diff
2024-05-23 11:23:57 +02:00
Clément Renault
bc5663e673
FieldIdsMap no longer useful thanks to #4631
2024-05-22 16:06:15 +02:00
Louis Dureuil
8a941c0241
Smaller review changes
2024-05-22 14:44:42 +02:00
Louis Dureuil
3412e7fbcf
"[]" is deserialized as 0 embedding rather than 1 embedding of dim 0
2024-05-22 12:25:21 +02:00
Louis Dureuil
16037e2169
Don't remove embedders that are not in the config from the document DB
2024-05-22 12:24:51 +02:00
Louis Dureuil
8f7c8ca7f0
Remove now unused error variant
2024-05-22 12:23:43 +02:00
Clément Renault
500ddc76b5
Make the flattened sorter optional
2024-05-21 16:16:36 +02:00
Clément Renault
943f8dba0c
Make clippy happy
2024-05-21 14:58:41 +02:00
Clément Renault
1aa8ed9ef7
Make the original sorter optional
2024-05-21 14:53:26 +02:00
ManyTheFish
f762307838
Fix clippy
2024-05-21 13:44:20 +02:00
ManyTheFish
3e94a90722
Fixes
2024-05-21 13:39:46 +02:00
Louis Dureuil
b17cb56dee
Test array of vectors
2024-05-20 14:44:10 +02:00
ManyTheFish
fc7e817221
Index geo points based on the settings differences
2024-05-20 12:27:26 +02:00
Louis Dureuil
d05d49ffd8
Fix tests
2024-05-20 10:36:18 +02:00
Louis Dureuil
0462ebbe58
Don't write an empty _vectors field
2024-05-20 10:36:18 +02:00
Louis Dureuil
2f7a8a4efb
Don't write vectors that weren't autogenerated in document DB
2024-05-20 10:36:18 +02:00
Louis Dureuil
52d9cb6e5a
Refactor vector indexing
...
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Louis Dureuil
261de888b7
Add function to get the embeddings of a document in an index
2024-05-20 10:36:17 +02:00
Louis Dureuil
98c811247e
Add parsed vectors module
2024-05-20 10:25:59 +02:00
Tamo
273c6e8c5c
uses the latest version of heed to get rid of unsafe code
2024-05-16 18:31:32 +02:00
Tamo
897d25780e
update milli to latest version
2024-05-16 18:31:32 +02:00
Tamo
f2d0a59f1d
when no searchable attributes are defined, makes all the weight equals to zero
2024-05-16 01:06:33 +02:00
Tamo
c78a2fa4f5
rename method and variable around the attributes to search on feature
2024-05-15 18:04:42 +02:00
Tamo
5542f1d9f1
get back to what we were doingb efore in the DB cache and with the restricted field id
2024-05-15 18:00:39 +02:00
Tamo
ad4d8502b3
stops storing the whole fieldids weights map when no searchable are defined
2024-05-15 17:16:10 +02:00
Tamo
7ec4e2a3fb
apply all style review comments
2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d
make clippy happy
2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a
make the attribute ranking rule use the weights and fix the tests
2024-05-14 17:36:32 +02:00
Tamo
a0082c4df9
add a failing test on the attribute ranking rule
2024-05-14 17:00:02 +02:00
Tamo
b0afe0972e
stop updating the fields ids map when fields are only swapped
2024-05-14 17:00:02 +02:00
Tamo
9ecde41853
add a test on the current behaviour
2024-05-14 17:00:02 +02:00
Tamo
685f452fb2
Fix the indexing of the searchable
2024-05-14 17:00:02 +02:00
Tamo
4e4a1ddff7
gate a test behind the required feature
2024-05-14 17:00:02 +02:00
Tamo
c22460045c
Stops returning an option in the internal searchable fields
2024-05-14 17:00:02 +02:00
Clément Renault
ac4bc143c4
Bump ureq to v2.9.7
2024-05-07 10:39:38 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
...
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
Louis Dureuil
f4dd73ec8c
Destructure EmbedderOptions so we don't miss some options
2024-05-02 15:39:36 +02:00
ManyTheFish
88174b8ae4
Update charabia v0.8.10
2024-04-30 14:30:23 +02:00
meili-bors[bot]
ebca29f3de
Merge #4597
...
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish
# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders
## Related issue
Fixes #4585
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 16:37:28 +00:00
meili-bors[bot]
c793b6ef6d
Merge #4600
...
4600: Fix embedders api r=ManyTheFish a=ManyTheFish
# Pull Request
## Related issue
Fixes #4594
Fixes #4595
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper
2024-04-24 16:40:12 +02:00
ManyTheFish
9b76501875
Display set API key for Ollama embedder
2024-04-24 12:33:07 +02:00
Clément Renault
b3173d0423
Remove useless dots in the error messages
2024-04-22 18:09:33 +02:00
Clément Renault
96cc5319c8
Introduce a new internal error type to categorize panics
2024-04-22 18:09:33 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools
2024-04-22 18:09:33 +02:00
ManyTheFish
a1aa999026
Add conditions reducing wrok
2024-04-22 14:18:35 +02:00
ManyTheFish
c71b5d09ff
Updatre charabia v0.8.9
2024-04-18 11:38:26 +02:00
writegr
ab43a8a949
chore: fix some typos in comments
...
Signed-off-by: writegr <wellweek@outlook.com>
2024-04-18 14:12:52 +08:00
meili-bors[bot]
4a8459b799
Merge #4576
...
4576: increase the default search time budget from 150ms to 1.5s r=ManyTheFish a=irevoire
# Pull Request
## Related issue
Fixes #4575
## What does this PR do?
- increase the default search time budget from 150ms to 1.5s
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-17 16:04:47 +00:00
Clément Renault
c923adf222
Fix facet distribution for alpha on facet numbers
2024-04-17 16:31:16 +02:00
ManyTheFish
df29ba709a
Make some cleaning in Arcs
2024-04-17 12:33:25 +02:00