Loïc Lecrenier
d76d0cb1bf
Merge branch 'main' into word-pair-proximity-docids-refactor
2022-10-24 15:23:00 +02:00
Loïc Lecrenier
a7de4f5b85
Don't add swapped word pairs to the word_pair_proximity_docids db
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
bdeb47305e
Change encoding of word_pair_proximity DB to (proximity, word1, word2)
...
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
Ewan Higgs
beb987d3d1
Fixing piles of clippy errors.
...
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
msvaljek
762e320c35
Add proximity calculation for the same word
2022-10-07 12:59:12 +02:00
vishalsodani
00c02d00f3
Add missing logging timer to extractors
2022-09-30 22:17:06 +05:30
Loïc Lecrenier
3794962330
Use an unstable algorithm for grenad::Sorter when possible
2022-09-13 14:49:53 +02:00
Kerollmops
fe3973a51c
Make sure that long words are correctly skipped
2022-09-07 15:03:32 +02:00
Loïc Lecrenier
306593144d
Refactor word prefix pair proximity indexation
2022-08-17 11:59:00 +02:00
Loïc Lecrenier
07003704a8
Merge branch 'filter/field-exist'
2022-07-21 14:51:41 +02:00
Loïc Lecrenier
1506683705
Avoid using too much memory when indexing facet-exists-docids
2022-07-19 14:42:35 +02:00
Loïc Lecrenier
aed8c69bcb
Refactor indexation of the "facet-id-exists-docids" database
...
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.
Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
80b962b4f4
Run cargo fmt
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
30bd4db0fc
Simplify indexing task for facet_exists_docids database
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
392472f4bb
Apply suggestions from code review
...
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
453d593ce8
Add a database containing the docids where each field exists
2022-07-19 10:07:33 +02:00
Kerollmops
2eec290424
Check the validity of the latitute and longitude numbers
2022-07-12 15:14:06 +02:00
Kerollmops
d1a4da9812
Generate a real UUIDv4 when ids are auto-generated
2022-07-12 15:14:06 +02:00
Kerollmops
fcfc4caf8c
Move the Object type in the lib.rs file and use it everywhere
2022-07-12 14:55:51 +02:00
Kerollmops
0146175fe6
Introduce the validate_documents_batch function
2022-07-12 14:55:51 +02:00
ManyTheFish
86ac8568e6
Use Charabia in milli
2022-06-02 16:59:11 +02:00
Tamo
0af399a6d7
fix the mixed dataset geosearch indexing bug
2022-05-16 17:37:45 +02:00
Tamo
c55368ddd4
apply code suggestion
...
Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-05-04 14:11:03 +02:00
Tamo
3cb1f6d0a1
improve geosearch error messages
2022-05-02 19:20:47 +02:00
Irevoire
4f3ce6d9cd
nested fields
2022-04-07 16:58:46 +02:00
ad hoc
201fea0fda
limit extract_word_docids memory usage
2022-04-05 14:14:15 +02:00
ad hoc
b85cd4983e
remove field_id_from_position
2022-04-05 09:50:34 +02:00
ad hoc
b7694c34f5
remove println
2022-04-04 21:00:07 +02:00
ad hoc
6cabd47c32
fix typo in comment
2022-04-04 20:59:20 +02:00
ad hoc
6b2c2509b2
fix bug in exact search
2022-04-04 20:54:03 +02:00
ad hoc
8d46a5b0b5
extract exact word docids
2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0
introduce exact_word_docids db
2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d
refactor spawn_extraction_task
2022-04-04 20:54:02 +02:00
Clément Renault
ff8d7a810d
Change the behavior of the as_cloneable_grenad by taking a ref
2022-02-16 15:40:08 +01:00
Clément Renault
f367cc2e75
Finally bump grenad to v0.4.1
2022-02-16 15:28:48 +01:00
many
8970246bc4
Sort positions before iterating over them during word pair proximity extraction
2021-11-22 18:16:54 +01:00
many
c5a6075484
Make max_position_per_attributes changable
2021-10-12 10:10:50 +02:00
many
360c5ff3df
Remove limit of 1000 position per attribute
...
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
many
3296bb243c
Simplify word level position DB into a word position DB
2021-10-05 12:15:02 +02:00
Irevoire
a84f3a8b31
Apply suggestions from code review
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
bad8ea47d5
edit the two lasts TODO comments
2021-09-08 18:24:09 +02:00
Tamo
bd4c248292
improve the error handling in general and introduce the concept of reserved keywords
2021-09-08 18:24:09 +02:00
Tamo
f73273d71c
only call the extractor if needed
2021-09-08 17:51:08 +02:00
Irevoire
a21c854790
handle errors
2021-09-08 17:51:07 +02:00
Irevoire
70ab2c37c5
remove multiple bugs
2021-09-08 17:51:07 +02:00
Irevoire
b4b6ba6d82
rename all the ’long’ into ’lng’ like written in the specification
2021-09-08 17:51:07 +02:00
Irevoire
44d6b6ae9e
Index the geo points
2021-09-08 17:51:07 +02:00
many
e54280fbfc
Skip empty normalized words
2021-09-08 15:25:23 +02:00
many
db0c681bae
Fix Pr comments
2021-09-02 15:17:52 +02:00
many
4860fd4529
Ignore empty facet values
2021-09-01 16:48:40 +02:00
many
b3a22f31f6
Fix memory consuption in word pair proximity extractor
2021-09-01 16:48:40 +02:00
many
8f702828ca
Ignore errors comming from crossbeam channel senders
2021-09-01 16:48:40 +02:00
many
e09eec37bc
Handle distance addition with hard separators
2021-09-01 16:48:40 +02:00
many
fc7cc770d4
Add logging timers
2021-09-01 16:48:40 +02:00
many
a2f59a28f7
Remove unwrap sending errors in channel
2021-09-01 16:48:40 +02:00
many
2d1727697d
Take stop word in account
2021-09-01 16:48:40 +02:00
many
823da19745
Fix test and use progress callback
2021-09-01 16:48:39 +02:00
many
1d314328f0
Plug new indexer
2021-09-01 16:48:36 +02:00