Commit Graph

58 Commits

Author SHA1 Message Date
Clément Renault
2d3f15f82c
Introduce a function to only serialize the Add side of a DelAdd obkv 2023-10-30 11:34:55 +01:00
Clément Renault
40186bf403
Rename FieldIdWordCountDocids correctly 2023-10-30 11:34:50 +01:00
ManyTheFish
2597bbd107
Make script language docids map taking a tuple of roaring bitmaps expressing the deletions and the additions 2023-10-30 11:34:00 +01:00
ManyTheFish
313b16bec2
Support diff indexing on extract_docid_word_positions 2023-10-30 11:24:19 +01:00
ManyTheFish
1c5705c164
clean PR warnings 2023-10-30 11:22:05 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use 2023-10-30 11:15:02 +01:00
ManyTheFish
748b333161
Add usefull debug assert before key insertion in database 2023-10-30 11:13:10 +01:00
ManyTheFish
17b647dfe5
Wip 2023-10-30 11:13:08 +01:00
Tamo
d772073dfa use a bufreader everytime there is a grenad<file> 2023-10-10 15:00:30 +02:00
ManyTheFish
b45c36cd71 Merge branch 'main' into tmp-release-v1.3.0 2023-08-01 15:05:17 +02:00
Kerollmops
29ab54b259
Replace the hnsw crate by the instant-distance one 2023-07-25 12:37:35 +02:00
Kerollmops
eef95de30e
First iteration on exposing puffin profiling 2023-07-18 17:38:13 +02:00
Clément Renault
30741d17fa
Change the TODO message 2023-06-27 12:32:43 +02:00
Kerollmops
3e3c743392
Make Rustfmt happy 2023-06-27 12:32:41 +02:00
Kerollmops
ab9f2269aa
Normalize the vectors during indexation and search 2023-06-27 12:32:41 +02:00
Kerollmops
321ec5f3fa
Accept multiple vectors by documents using the _vectors field 2023-06-27 12:32:40 +02:00
Kerollmops
a7e0f0de89
Introduce a new error message for invalid vector dimensions 2023-06-27 12:32:40 +02:00
Kerollmops
c79e82c62a
Move back to the hnsw crate
This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.
2023-06-27 12:32:39 +02:00
Kerollmops
aca305bb77
Log more to make sure we insert vectors in the hgg data-structure 2023-06-27 12:32:38 +02:00
Kerollmops
268a9ef416
Move to the hgg crate 2023-06-27 12:32:38 +02:00
Clément Renault
4571e512d2
Store the vectors in an HNSW in LMDB 2023-06-27 12:32:38 +02:00
Clément Renault
7ac2f1489d
Extract the vectors from the documents 2023-06-27 12:32:37 +02:00
Loïc Lecrenier
8628a0c856 Remove docid_word_positions_db + fix deletion bug
That would happen when a word was deleted from all exact attributes
but not all regular attributes.
2023-06-07 10:52:50 +02:00
Louis Dureuil
90bc230820
Merge remote-tracking branch 'origin/main' into search-refactor
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml |  took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
Loïc Lecrenier
130d2061bd Fix indexing of word_position_docid and fid 2023-04-06 17:50:39 +02:00
Clément Renault
ea016d97af
Implementing an IS EMPTY filter 2023-03-15 14:12:34 +01:00
Clément Renault
43ff236df8
Write the NULL facet values in the database 2023-03-08 16:49:53 +01:00
ManyTheFish
bbecab8948 fix clippy 2023-02-21 10:18:44 +01:00
f3r10
fd60a39f1c Format code 2023-01-31 11:28:05 +01:00
f3r10
d97fb6117e Extract and index data 2023-01-31 11:28:05 +01:00
Loïc Lecrenier
b1ab09196c Remove outdated TODOs 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
bee3c23b45 Add comparison benchmark between bulk and incremental facet indexing 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
9b55e582cd Add FacetsUpdate type that wraps incremental and bulk indexing methods 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
61252248fb Fix some facet indexing bugs 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
85824ee203 Try to make facet indexing incremental 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
bd2c0e1ab6 Remove unused code 2022-10-26 13:46:14 +02:00
Loïc Lecrenier
c3f49f766d Prepare refactor of facets database
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
Ewan Higgs
beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
Loïc Lecrenier
1506683705 Avoid using too much memory when indexing facet-exists-docids 2022-07-19 14:42:35 +02:00
Loïc Lecrenier
aed8c69bcb Refactor indexation of the "facet-id-exists-docids" database
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.

Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
453d593ce8 Add a database containing the docids where each field exists 2022-07-19 10:07:33 +02:00
ad hoc
b799f3326b
rename merge_nothing to merge_ignore_values 2022-04-05 18:44:35 +02:00
ad hoc
0a77be4ec0
introduce exact_word_docids db 2022-04-04 20:54:02 +02:00
Clément Renault
ff8d7a810d
Change the behavior of the as_cloneable_grenad by taking a ref 2022-02-16 15:40:08 +01:00
Clément Renault
f367cc2e75
Finally bump grenad to v0.4.1 2022-02-16 15:28:48 +01:00
Tamo
98a365aaae
store the geopoint in three dimensions 2021-12-14 12:21:24 +01:00
many
3296bb243c
Simplify word level position DB into a word position DB 2021-10-05 12:15:02 +02:00
Irevoire
a84f3a8b31
Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
bd4c248292
improve the error handling in general and introduce the concept of reserved keywords 2021-09-08 18:24:09 +02:00
Irevoire
ea2f2ecf96
create a new database containing all the documents that were geo-faceted 2021-09-08 17:51:08 +02:00