Commit Graph

299 Commits

Author SHA1 Message Date
Kerollmops
8c86348119
Indexing the facet strings levels 2021-07-21 16:59:38 +02:00
Kerollmops
757b2b502a
Remove the FacetValueStringCodec 2021-07-21 16:59:38 +02:00
Kerollmops
9f8095c069
Make sure that we don't keep a reference on the LMDB key when using put_current 2021-07-21 10:35:35 +02:00
Kerollmops
a9553af635
Add a test to check that we can index more that 256 fields 2021-07-06 11:58:03 +02:00
Kerollmops
838ed1cd32
Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
bors[bot]
b4dcdbf00d
Merge #269 #271
269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops

This PR fixes #268.

The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.

The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).

271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops

In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-30 12:34:55 +00:00
Kerollmops
32b7bd366f
Remove the roaring operation functions warnings 2021-06-30 14:12:56 +02:00
Kerollmops
c92ef54466
Add a test for when we insert a previously deleted document 2021-06-30 14:00:01 +02:00
Clément Renault
bdc5599b73
Bump heed to use the git repo with v0.12.0 2021-06-28 18:26:20 +02:00
Clément Renault
0013236e5d
Fix the LMDB and heed invalid interactions.
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.

https://github.com/Kerollmops/heed/pull/108
2021-06-28 16:19:02 +02:00
Kerollmops
9e5f9a8a10
Add a test for the words level positions generation bug 2021-06-28 16:08:31 +02:00
Kerollmops
4fc8f06791
Rename faceted_fields into filterable_fields 2021-06-23 17:26:54 +02:00
Kerollmops
c31cadb54f
Do not consider the searchable field as filterable 2021-06-23 17:26:54 +02:00
bors[bot]
5b6adc6d96
Merge #245
245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops

Closes #191, and resolves #140.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-22 12:10:52 +00:00
Kerollmops
51dbb2e06d
Warn for when a key is too large for LMDB 2021-06-22 11:51:36 +02:00
Kerollmops
0cca2ea24f
Return a MissingDocumentId when a document doesn't have one 2021-06-22 11:22:33 +02:00
Kerollmops
481b0bf277
Warn for when a facet key is too large for LMDB 2021-06-22 10:57:46 +02:00
Clémentine Urquizar
daef43f504
Rename FieldsDistribution into FieldDistribution 2021-06-21 15:57:41 +02:00
Tamo
d08cfda796
convert the field_distribution to a BTreeMap and avoid counting twice the same documents 2021-06-17 18:31:54 +02:00
Tamo
969adaefdf
rename fields_distribution in field_distribution 2021-06-17 15:16:20 +02:00
Tamo
9716fb3b36
format the whole project 2021-06-16 18:33:33 +02:00
many
ce0315a10f
Close write transaction in test 2021-06-16 11:03:37 +02:00
Kerollmops
713acc408b
Introduce the primary key to the Settings builder structure 2021-06-16 11:03:36 +02:00
Kerollmops
a7d6930905
Replace the panicking expect by tracked Errors 2021-06-15 11:51:32 +02:00
Kerollmops
28c004aa2c
Prefer using constant for the database names 2021-06-15 11:13:04 +02:00
Kerollmops
312c2d1d8e
Use the Error enum everywhere in the project 2021-06-14 16:58:38 +02:00
Kerollmops
d2b1ecc885
Remove a lot of serialization unreachable errors 2021-06-14 16:48:51 +02:00
Kerollmops
65b1d09d55
Move the obkv merging functions into the merge_function module 2021-06-14 16:48:51 +02:00
Kerollmops
ab727e428b
Remove the docid_word_positions_merge method that must never be called 2021-06-14 16:48:51 +02:00
Kerollmops
93a8633f18
Remove the documents_merge method that must never be called 2021-06-14 16:48:51 +02:00
Kerollmops
cfc7314bd1
Prefer using an explicit merge function name 2021-06-14 16:48:50 +02:00
Kerollmops
93978ec38a
Serializing a RoaringBitmap into a Vec cannot fail 2021-06-14 16:48:50 +02:00
Kerollmops
ff9414a6ba
Use the out of the compute_primary_key_pair function 2021-06-14 16:48:50 +02:00
Kerollmops
0bf4f3f48a
Modify a test to check that criteria additions change the fields ids map 2021-06-08 18:14:34 +02:00
Kerollmops
82df524e09
Make sure that we register the field when setting criteria 2021-06-08 18:14:33 +02:00
Kerollmops
133ab98260
Use the index primary key when deleting documents 2021-06-08 17:33:29 +02:00
bors[bot]
39ed133f9f
Merge #193
193: Fix primary key behavior r=Kerollmops a=MarinPostma

this pr:
- Adds early returns on empty document additions, avoiding error messages to be returned when adding no documents and no primary key was set.
- Changes the primary key inference logic to match that of legacy meilisearch.

close #194 

Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-03 10:24:21 +00:00
marin postma
57898d8a90
fix silent deserialize error 2021-06-03 10:42:55 +02:00
Kerollmops
3c304c89d4
Make sure that we generate the faceted database when required 2021-06-02 16:24:58 +02:00
Kerollmops
b0c0490e85
Make sure that we can add a Asc/Desc field without it being filterable 2021-06-02 16:24:58 +02:00
Kerollmops
6476827d3a
Fix the indexer to be sure that distinct and Asc/Desc are also faceted 2021-06-02 16:24:58 +02:00
Kerollmops
2a3f9b32ff
Rename the faceted fields into filterable fields 2021-06-02 16:24:57 +02:00
Many
ab2cf69e8d
Update milli/src/update/delete_documents.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:10 +02:00
Many
8e6d1ff0dc
Update milli/src/update/index_documents/store.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:02 +02:00
many
4ddf008be2
add field id word count database 2021-05-31 16:27:28 +02:00
Kerollmops
1c0a5cd136
Resolve code modification suggestions 2021-05-31 15:22:50 +02:00
Clément Renault
3a4a150ef0
Fix the tests and remaining warnings 2021-05-25 11:31:06 +02:00
Clément Renault
bd7b285bae
Split the update side to use the number and the strings facet databases 2021-05-25 11:30:00 +02:00
Clément Renault
837c1041c7
Clear and delete the documents from the facet database 2021-05-25 11:28:36 +02:00
Marin Postma
eeb0c70ea2
meilisearch compatible primary key inference 2021-05-06 22:42:32 +02:00
Marin Postma
313c362461
early return on empty document addition 2021-05-06 18:14:16 +02:00
Alexey Shekhirin
f8d0f5265f
fix(update): fields distribution after documents merge 2021-05-04 22:12:20 +03:00
Alexey Shekhirin
d81c0e8bba
feat(update): disable autogenerate_docids by default 2021-04-30 21:41:34 +03:00
Marin Postma
e8e32e0ba1
make document addition number visible 2021-04-29 20:05:07 +02:00
Kerollmops
e65bad16cc
Compute the words prefixes at the end of an update 2021-04-27 14:39:52 +02:00
Clément Renault
0ad9499b93
Fix an indexing bug in the words level positions 2021-04-27 14:35:43 +02:00
Clément Renault
7aa5753ed2
Make the attribute positions range bounds to be fixed 2021-04-27 14:35:43 +02:00
Kerollmops
89ee2cf576
Introduce the TreeLevel struct 2021-04-27 14:25:35 +02:00
Kerollmops
bd1a371c62
Compute the WordsLevelPositions only once 2021-04-27 14:25:34 +02:00
Kerollmops
8bd4f5d93e
Compute the biggest values of the words_level_positions_docids 2021-04-27 14:25:34 +02:00
Kerollmops
f713828406
Implement the clear and delete documents for the word-level-positions database 2021-04-27 14:25:34 +02:00
Kerollmops
3069bf4f4a
Fix and improve the words-level-positions computation 2021-04-27 14:25:34 +02:00
Kerollmops
3a25137ee4
Expose and use the WordsLevelPositions update 2021-04-27 14:25:34 +02:00
Kerollmops
c765f277a3
Introduce the WordsLevelPositions update 2021-04-27 14:25:34 +02:00
Kerollmops
9242f2f1d4
Store the first word positions levels 2021-04-27 14:25:34 +02:00
Kerollmops
b0a417f342
Introduce the word_level_position_docids Index database 2021-04-27 14:25:34 +02:00
Kerollmops
c9b2d3ae1a
Warn instead of returning an error when a conversion fails 2021-04-20 10:23:31 +02:00
Kerollmops
51767725b2
Simplify integer and float functions trait bounds 2021-04-20 10:23:31 +02:00
Alexey Shekhirin
33860bc3b7
test(update, settings): set & reset synonyms
fixes after review

more fixes after review
2021-04-18 11:24:17 +03:00
Alexey Shekhirin
e39aabbfe6
feat(search, update): synonyms 2021-04-18 11:24:17 +03:00
Marin Postma
45c45e11dd
implement distinct attribute
distinct can return error

facet distinct on numbers

return distinct error

review fixes

make get_facet_value more generic

fixes
2021-04-15 16:25:55 +02:00
tamo
dcb00b2e54
test a new implementation of the stop_words 2021-04-12 18:35:33 +02:00
Alexey Shekhirin
84c1dda39d
test(http): setting enum serialize/deserialize 2021-04-08 17:03:40 +03:00
Alexey Shekhirin
dc636d190d
refactor(http, update): introduce setting enum 2021-04-08 17:03:40 +03:00
Alexey Shekhirin
2658c5c545
feat(index): update fields distribution in clear & delete operations
fixes after review

bump the version of the tokenizer

implement a first version of the stop_words

The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests

Integrate the stop_words in the querytree

remove the stop_words from the querytree except if it was a prefix or a typo

more fixes after review
2021-04-01 19:12:35 +03:00
Alexey Shekhirin
27c7ab6e00
feat(index): store fields distribution in index 2021-04-01 18:35:19 +03:00
tamo
a2f46029c7
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
Alexey Shekhirin
9205b640a4 feat(index): introduce fields_ids_distribution 2021-03-31 18:44:47 +03:00
mpostma
615fe095e1
update index updated at on index writes 2021-03-15 14:05:47 +01:00
Kerollmops
f51eb46c69
Use the RoaringBitmapLenCodec to retrieve the count of documents 2021-03-09 10:25:39 +01:00
Clément Renault
e5bb96bc3b
Fix the searchable settings test 2021-03-06 12:48:41 +01:00
Kerollmops
07784c8990
Tune the words prefixes threshold to compute for 1/1000 instead 2021-03-03 15:51:28 +01:00
Kerollmops
f376c6a728
Make sure we retrieve the docid word positions 2021-03-03 15:45:03 +01:00
many
246286f0eb
take hard separator into account 2021-03-03 15:45:03 +01:00
mpostma
e08b6b3ec7
add primary key to fields_id_map when not present 2021-03-01 16:10:16 +01:00
Clément Renault
c318373b88
Expose the WordsPrefixes update on the UpdateBuilder 2021-02-21 12:15:35 +01:00
Kerollmops
a4a48be923
Run the words prefixes update inside of the indexing documents update 2021-02-17 11:22:26 +01:00
Kerollmops
616ed8f73c
Clean up the word prefix pair proximities when deleting documents 2021-02-17 11:22:26 +01:00
Clément Renault
ea37fd821d
Clean up the words prefixes when deleting documents and words 2021-02-17 11:22:25 +01:00
Clément Renault
62eee9c69e
Introduce the sorter_into_lmdb_database helper function 2021-02-17 11:12:39 +01:00
Clément Renault
b5b89990eb
Compute and write the word prefix pair proximities database 2021-02-17 11:12:38 +01:00
Kerollmops
9b03b0a1b2
Introduce the word prefix pair proximity docids database 2021-02-17 11:12:38 +01:00
Clément Renault
f365de636f
Compute and write the word-prefix-docids database 2021-02-17 11:12:38 +01:00
Clément Renault
ee5a60e1c5
Clear the words prefixes when clearing an index 2021-02-17 10:45:17 +01:00
Clément Renault
b3a21d5a50
Introduce the getters and setters for the words prefixes FST 2021-02-17 10:45:17 +01:00
Clément Renault
89ce4e74fe
Do not change the primary key type when we serialize documents 2021-02-15 21:24:36 +01:00
Clément Renault
69acdd437e
Deserialize documents ids into JSON Values on deletion 2021-02-15 21:24:36 +01:00
Clément Renault
b3776598d8
Add a test to check deletion of documents with number as primary key 2021-02-15 21:24:35 +01:00
Clément Renault
e8639517da
Change the project to become a workspace with milli as a default-member 2021-02-12 16:15:09 +01:00