ManyTheFish
5b20e625f3
fix merge
2023-11-02 15:31:37 +01:00
ManyTheFish
bc51d6157a
Fix transform reindexing path
2023-11-02 15:26:20 +01:00
ManyTheFish
1b4ff991c0
update typed chunks
2023-11-02 15:26:20 +01:00
ManyTheFish
4b64c33aa2
update vector extractor
2023-11-02 15:26:20 +01:00
ManyTheFish
12323d610e
Change the original document sorter key from the internal docid to a concatenation of the internal and the external docid
2023-11-02 15:26:20 +01:00
Clément Renault
4d864f0702
Always sort internal Sorter entries in parallel
2023-11-02 14:47:43 +01:00
Clément Renault
c71b1d33ae
Sort entries using rayon in the transform sorters
2023-11-01 11:07:16 +01:00
Clément Renault
0fc446c62f
Add more timing logs to the Transform
2023-11-01 11:07:16 +01:00
Louis Dureuil
b1d1355b69
remove tests on soft-deleted
2023-10-31 16:36:27 +01:00
Louis Dureuil
f19332466e
Extract field value as values instead of Option<Value>
2023-10-31 16:36:27 +01:00
Louis Dureuil
da0503ef80
Fix document count
2023-10-31 16:36:27 +01:00
Louis Dureuil
b40253bf18
update snapshots
2023-10-31 10:30:48 +01:00
Louis Dureuil
d8bf3f3fc2
Remove unused snapshots
2023-10-31 10:12:49 +01:00
Louis Dureuil
9d59e8011a
fix some tests
2023-10-31 10:08:36 +01:00
Louis Dureuil
4e91707a06
Rename test
2023-10-31 09:41:17 +01:00
Louis Dureuil
de10f20732
Fix field distribution again
2023-10-30 17:47:22 +01:00
Louis Dureuil
be395c7944
Change order of arguments to tokenizer_builder
2023-10-30 16:26:29 +01:00
Louis Dureuil
9fedd8101a
Fix tests
2023-10-30 15:11:07 +01:00
Louis Dureuil
54d07a8da3
Update field distribution taking into account both deletions and additions
2023-10-30 14:47:51 +01:00
Louis Dureuil
58690dfb19
Fix tests compilation after changes to ExternalDocumentsIds API
2023-10-30 13:34:07 +01:00
Louis Dureuil
abf424ebfc
Remove unused FromIterator
2023-10-30 11:41:56 +01:00
Clément Renault
dfab6293c9
Use an LMDB database to store the external documents ids
2023-10-30 11:41:23 +01:00
Louis Dureuil
fdf3f7f627
Fix facet distribution test
2023-10-30 11:41:23 +01:00
Louis Dureuil
6260cff65f
Actually delete documents from DB when the merge function says so
2023-10-30 11:41:22 +01:00
Louis Dureuil
8e0d9c9a5e
Recover delete_documents tests that were too eagerly deleted
2023-10-30 11:41:22 +01:00
Louis Dureuil
a35988550c
Fix some snapshots
2023-10-30 11:41:22 +01:00
Louis Dureuil
e78281785c
Actually execute the transform even if there are only documents to delete
2023-10-30 11:41:22 +01:00
Louis Dureuil
290e773d23
remove more warnings and fix some tests
2023-10-30 11:41:22 +01:00
Louis Dureuil
113527f466
Remove soft-deleted related methods from Index
2023-10-30 11:41:22 +01:00
Louis Dureuil
c534a1b687
Stop using delete documents pipeline in batch runner
2023-10-30 11:41:22 +01:00
Louis Dureuil
2263dff02b
Stop using removed delete pipelines almost everywhere
2023-10-30 11:41:22 +01:00
ManyTheFish
762b0b47e6
Use deladd merging function in chunks mergers
2023-10-30 11:40:20 +01:00
Louis Dureuil
01d5eedf2f
Remove some warnings
2023-10-30 11:40:20 +01:00
Louis Dureuil
85f42fbc03
Handle external to internal id mapping from TypedChunk::Documents
2023-10-30 11:40:20 +01:00
Louis Dureuil
c6b3c18c85
WIP: Comment out document deletion in other pipelines than update
...
TODO: fix calls to DELETE route
2023-10-30 11:40:20 +01:00
Louis Dureuil
946c762d28
WIP: reset documents in TypedChunk::Documents
2023-10-30 11:40:20 +01:00
Louis Dureuil
cda6ca1ee6
Remove TypedChunk::NewDocumentIds
2023-10-30 11:40:18 +01:00
Louis Dureuil
696fcf4d18
Fix document insertion into LMDB
2023-10-30 11:39:31 +01:00
ManyTheFish
476e4d3dbe
Use value buffer instead of the initial value when writting the final result in the sorter
2023-10-30 11:39:31 +01:00
Clément Renault
576fa9c6da
Remove useless comment
2023-10-30 11:39:31 +01:00
Kerollmops
77dcbff6b2
Remove and Insert the DelAdd geo points
2023-10-30 11:39:31 +01:00
Kerollmops
544440c363
Ignore geo fields when the Del and Add content is the same
2023-10-30 11:39:31 +01:00
Clément Renault
a3dae4db9b
Extract the geo fields DelAdd and generate a new DelAdd obkv with it
2023-10-30 11:39:31 +01:00
ManyTheFish
ba90a5ec0e
update extract fid word count docids
2023-10-30 11:39:31 +01:00
Louis Dureuil
59f88c14b3
Simplify facet update after removing Index::faceted_documents_ids
2023-10-30 11:39:29 +01:00
Louis Dureuil
14832cb324
Remove Index::faceted_documents_ids
2023-10-30 11:37:32 +01:00
Clément Renault
560e8f5613
Introduce the CboRoaringBitmapCodec merge_deladd_into and use it
2023-10-30 11:34:55 +01:00
Clément Renault
2d3f15f82c
Introduce a function to only serialize the Add side of a DelAdd obkv
2023-10-30 11:34:55 +01:00
Clément Renault
40186bf403
Rename FieldIdWordCountDocids correctly
2023-10-30 11:34:50 +01:00
ManyTheFish
87e3d27878
update extract word pair proximity to support deladd obkvs
2023-10-30 11:34:02 +01:00
ManyTheFish
6bcf8b4f8c
update extract word position docids
2023-10-30 11:34:02 +01:00
ManyTheFish
46aa75abdb
update extract word docids
2023-10-30 11:34:02 +01:00
ManyTheFish
2597bbd107
Make script language docids map taking a tuple of roaring bitmaps expressing the deletions and the additions
2023-10-30 11:34:00 +01:00
Clément Renault
e2bc054604
Update extract_facet_string_docids to support deladd obkvs
2023-10-30 11:32:36 +01:00
Clément Renault
fcd3a1434d
Update extract_facet_number_docids to support deladd obkvs
2023-10-30 11:31:04 +01:00
Clément Renault
a82dee21e0
Rename docid_fid into fid_docid
2023-10-30 11:31:02 +01:00
Clément Renault
bc45c1206d
Implement all the facet extraction paths and simplify them
2023-10-30 11:29:08 +01:00
Clément Renault
6ae4100f07
Generate the DelAdd for is_null, is_empty, and exists
2023-10-30 11:29:08 +01:00
Clément Renault
0c47defeee
Work on fid docid facet values rewrite
2023-10-30 11:29:06 +01:00
ManyTheFish
313b16bec2
Support diff indexing on extract_docid_word_positions
2023-10-30 11:24:19 +01:00
ManyTheFish
1dd97578a8
Make the transform struct return diff-based documents obkvs
2023-10-30 11:22:07 +01:00
ManyTheFish
f5ef69293b
deactivate prefix dbs
2023-10-30 11:22:07 +01:00
ManyTheFish
1c5705c164
clean PR warnings
2023-10-30 11:22:05 +01:00
ManyTheFish
66c2c82a18
Split wpp in several sorters
2023-10-30 11:15:02 +01:00
ManyTheFish
28a8d0ccda
Fix word pair proximity
2023-10-30 11:15:02 +01:00
ManyTheFish
96be85396d
Use a vecDeque in wpp database
2023-10-30 11:15:02 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use
2023-10-30 11:15:02 +01:00
ManyTheFish
b541d48847
Add buffer to the obkv writter
2023-10-30 11:15:02 +01:00
ManyTheFish
8ccf32d1a0
Compute word_fid_docids before word_docids and exact_word_docids
2023-10-30 11:15:02 +01:00
ManyTheFish
db1ca21231
add puffin in sorter into reeder function
2023-10-30 11:15:00 +01:00
ManyTheFish
11ea5acff9
Fix
2023-10-30 11:13:10 +01:00
ManyTheFish
8d77736a67
Fix fid_word_docids
2023-10-30 11:13:10 +01:00
ManyTheFish
748b333161
Add usefull debug assert before key insertion in database
2023-10-30 11:13:10 +01:00
ManyTheFish
17b647dfe5
Wip
2023-10-30 11:13:08 +01:00
Tamo
c0f2724c2d
get rids of the new introduced error code in favor of an io::Error
2023-10-10 15:12:23 +02:00
Tamo
d772073dfa
use a bufreader everytime there is a grenad<file>
2023-10-10 15:00:30 +02:00
meili-bors[bot]
487d493f49
Merge #4043
...
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-09-11 12:27:34 +00:00
meili-bors[bot]
256cf33bca
Merge #4039
...
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops
This PR fixes #4035 , making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-07 09:25:58 +00:00
Kerollmops
679c0b0f97
Extract the vectors from the non-flattened version of the documents
2023-09-06 12:26:00 +02:00
Kerollmops
e02d0064bd
Add a test case scenario
2023-09-06 12:26:00 +02:00
ManyTheFish
66aa6d5871
Ignore tokens with empty normalized value during indexing process
2023-09-05 15:44:14 +02:00
meili-bors[bot]
ccf3ba3f32
Merge #4019
...
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Kerollmops
c53841e166
Accept the null JSON value as the value of _vectors
2023-08-14 16:03:55 +02:00
meili-bors[bot]
e4e49e63d0
Merge #3993
...
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
ManyTheFish
5a7c1bde84
Fix clippy
2023-08-10 11:27:56 +02:00
ManyTheFish
6b2d671be7
Fix PR comments
2023-08-10 10:44:07 +02:00
Many the fish
43c13faeda
Update milli/src/update/index_documents/extract/extract_docid_word_positions.rs
...
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-10 10:05:03 +02:00
meili-bors[bot]
44c1900f36
Merge #3986
...
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire
# Pull Request
When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.
This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973
## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-09 07:58:15 +00:00
ManyTheFish
35758db9ec
Truncate the the normalized long facets used in search for facet value
2023-08-08 16:38:30 +02:00
Tamo
9d061cec26
automatically parse the filterable attribute to float if it's a geo field
2023-08-08 16:28:07 +02:00
ManyTheFish
4a21fecf67
Merge branch 'main' into settings-customizing-tokenization
2023-08-08 16:08:16 +02:00
ManyTheFish
b45c36cd71
Merge branch 'main' into tmp-release-v1.3.0
2023-08-01 15:05:17 +02:00
meili-bors[bot]
be72be7c0d
Merge #3942
...
3942: Normalize for the search the facets values r=ManyTheFish a=Kerollmops
This PR improves and fixes the search for facet values feature. Searching for _bre_ wasn't returning facet values like _brévent_ or _brô_.
The issue was related to the fact that facets are normalized but not in the same way as the `searchableAttributes` are. We decided to normalize them further and add another intermediate database where the key is the normalized facet value, and the value is a set of the non-normalized facets. We then use these non-normalized ones to get the correct counts by fetching the associated databases.
### What's missing in this PR?
- [x] Apply the change to the whole set of `SearchForFacetValue::execute` conditions.
- [x] Factorize the code that does an intermediate normalized value fetch in a function.
- [x] Add or modify the search for facet value test.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-07-25 14:37:17 +00:00
Kerollmops
29ab54b259
Replace the hnsw crate by the instant-distance one
2023-07-25 12:37:35 +02:00
ManyTheFish
9c485f8563
Make the search and the indexing work
2023-07-24 18:35:20 +02:00
Clément Renault
df528b41d8
Normalize for the search the facets values
2023-07-20 17:57:07 +02:00
Kerollmops
eef95de30e
First iteration on exposing puffin profiling
2023-07-18 17:38:13 +02:00
Louis Dureuil
324d448236
Format let-else ❤️ 🎉
2023-07-03 10:20:28 +02:00
ManyTheFish
a82c49ab08
Update test
2023-06-29 15:56:36 +02:00
ManyTheFish
84845de9ef
Update Charabia
2023-06-29 15:56:32 +02:00