Commit Graph

1035 Commits

Author SHA1 Message Date
many
e54280fbfc
Skip empty normalized words 2021-09-08 15:25:23 +02:00
many
d18ee58ab9
Check if key are not empty in validator 2021-09-08 15:25:23 +02:00
many
9961b78b06
Drop sorter before creating a new one 2021-09-08 13:30:26 +02:00
many
741a4444a9
Remove log in chunk generator 2021-09-02 16:57:46 +02:00
many
7f7fafb857
Make document_chunk_size settable from update builder 2021-09-02 15:25:39 +02:00
many
db0c681bae
Fix Pr comments 2021-09-02 15:17:52 +02:00
many
4860fd4529
Ignore empty facet values 2021-09-01 16:48:40 +02:00
many
b3a22f31f6
Fix memory consuption in word pair proximity extractor 2021-09-01 16:48:40 +02:00
many
9452fabfb2
Optimize cbo roaring bitmaps merge 2021-09-01 16:48:40 +02:00
many
8f702828ca
Ignore errors comming from crossbeam channel senders 2021-09-01 16:48:40 +02:00
many
e09eec37bc
Handle distance addition with hard separators 2021-09-01 16:48:40 +02:00
many
fc7cc770d4
Add logging timers 2021-09-01 16:48:40 +02:00
many
a2f59a28f7
Remove unwrap sending errors in channel 2021-09-01 16:48:40 +02:00
many
5c962c03dd
Fix and optimize word_prefix_pair_proximity_docids database 2021-09-01 16:48:40 +02:00
many
2d1727697d
Take stop word in account 2021-09-01 16:48:40 +02:00
many
823da19745
Fix test and use progress callback 2021-09-01 16:48:39 +02:00
many
1d314328f0
Plug new indexer 2021-09-01 16:48:36 +02:00
bors[bot]
d6bba0663a
Merge #334
334: Wrap long values into BStr for warn logs r=Kerollmops a=shekhirin

Resolves https://github.com/meilisearch/milli/issues/263

Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-08-31 17:38:54 +00:00
Alexey Shekhirin
0b02eb456c
chore(update): wrap long values into BStr for warn logs 2021-08-31 20:28:16 +03:00
Kerollmops
f230ae6fd5
Introduce the reset_sortable_fields Settings method 2021-08-25 17:44:16 +02:00
Clément Renault
89d0758713
Revert "Revert "Sort at query time"" 2021-08-24 11:55:16 +02:00
Clément Renault
c084f7f731
Fix the facet string docids filterable deletion bug 2021-08-23 10:50:39 +02:00
Clémentine Urquizar
922f9fd4d5
Revert "Sort at query time" 2021-08-20 18:09:17 +02:00
Kerollmops
71602e0f1b
Add the sortable fields into the settings and in the index 2021-08-18 15:04:07 +02:00
Kerollmops
5b88df508e
Use the new Asc/Desc syntax everywhere 2021-08-17 14:15:22 +02:00
bors[bot]
89b9b61840
Merge #300
300: Fix prefix level position docids database r=curquiza a=ManyTheFish

The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.

Fix #299

Co-authored-by: many <maxime@meilisearch.com>
2021-08-04 16:52:09 +00:00
many
cdeb07f0fd
Fix prefix level position docids database
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.

Fix #299
2021-08-04 14:11:49 +02:00
bors[bot]
200e98c211
Merge #293
293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops

Fix https://github.com/meilisearch/meilisearch/issues/1505.

fix https://github.com/meilisearch/MeiliSearch/issues/1529

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-27 16:04:52 +00:00
Kerollmops
dc2b63abdf
Introduce an empty FilterCondition variant to support unknown fields 2021-07-27 16:34:04 +02:00
Kerollmops
7aa6cc9b04
Do not insert fields in the map when changing the settings 2021-07-22 18:40:12 +02:00
bors[bot]
ee3a49cfba
Merge #291
291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops

Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released.

It is not an undefined behavior, I repeat: it is a "normal" bug 🎉 👏

----

This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it.

As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`.

I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes.

<details>
  <summary>The backtrace leading us to a panic in grenad.</summary>

```
meilisearch_1    | thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    | stack backtrace:
meilisearch_1    |    0: rust_begin_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
meilisearch_1    |    1: core::panicking::panic_fmt
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
meilisearch_1    |    2: core::panicking::panic
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
meilisearch_1    |    3: grenad::block_builder::BlockBuilder::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    |    4: grenad::writer::Writer<W>::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12
meilisearch_1    |    5: milli::update::words_level_positions::write_level_entry
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5
meilisearch_1    |    6: milli::update::words_level_positions::compute_positions_levels
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13
meilisearch_1    |    7: milli::update::words_level_positions::WordsLevelPositions::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23
meilisearch_1    |    8: milli::update::index_documents::IndexDocuments::execute_raw
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9
meilisearch_1    |    9: milli::update::index_documents::IndexDocuments::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9
meilisearch_1    |   10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30
meilisearch_1    |   11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22
meilisearch_1    |   12: meilisearch_http::index::update_handler::UpdateHandler::handle_update
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18
meilisearch_1    |   13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}}
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35
meilisearch_1    |   14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21
meilisearch_1    |   15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17
meilisearch_1    |   16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9
meilisearch_1    |   17: tokio::runtime::task::core::CoreStage<T>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13
meilisearch_1    |   18: tokio::runtime::task::harness::poll_future::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23
meilisearch_1    |   19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
meilisearch_1    |   20: std::panicking::try::do_call
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
meilisearch_1    |   21: std::panicking::try
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
meilisearch_1    |   22: std::panic::catch_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
meilisearch_1    |   23: tokio::runtime::task::harness::poll_future
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19
meilisearch_1    |   24: tokio::runtime::task::harness::Harness<T,S>::poll_inner
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9
meilisearch_1    |   25: tokio::runtime::task::harness::Harness<T,S>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15
meilisearch_1    |   26: tokio::runtime::task::raw::RawTask::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18
meilisearch_1    |   27: tokio::runtime::task::Notified<S>::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9
meilisearch_1    |   28: tokio::runtime::blocking::pool::Inner::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17
meilisearch_1    |   29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17
meilisearch_1    | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```

</details>

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-22 16:14:35 +00:00
Kerollmops
92c0a2cdc1
Add a test that triggers a panic when indexing zeroes 2021-07-22 17:14:44 +02:00
Kerollmops
aa02a7fdd8
Add a test to check that we indeed impact the relevancy 2021-07-22 17:04:38 +02:00
Clément Renault
0227254a65
Return the original string values for the inverted facet index database 2021-07-21 16:59:39 +02:00
Kerollmops
03a01166ba
Display the original facet string value from the linear facet database 2021-07-21 16:59:39 +02:00
Clément Renault
5676b204dd
Fix the facet string levels codecs 2021-07-21 16:59:38 +02:00
Kerollmops
8c86348119
Indexing the facet strings levels 2021-07-21 16:59:38 +02:00
Kerollmops
757b2b502a
Remove the FacetValueStringCodec 2021-07-21 16:59:38 +02:00
Kerollmops
9f8095c069
Make sure that we don't keep a reference on the LMDB key when using put_current 2021-07-21 10:35:35 +02:00
Kerollmops
a9553af635
Add a test to check that we can index more that 256 fields 2021-07-06 11:58:03 +02:00
Kerollmops
838ed1cd32
Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
bors[bot]
b4dcdbf00d
Merge #269 #271
269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops

This PR fixes #268.

The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.

The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).

271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops

In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-30 12:34:55 +00:00
Kerollmops
32b7bd366f
Remove the roaring operation functions warnings 2021-06-30 14:12:56 +02:00
Kerollmops
c92ef54466
Add a test for when we insert a previously deleted document 2021-06-30 14:00:01 +02:00
Clément Renault
bdc5599b73
Bump heed to use the git repo with v0.12.0 2021-06-28 18:26:20 +02:00
Clément Renault
0013236e5d
Fix the LMDB and heed invalid interactions.
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.

https://github.com/Kerollmops/heed/pull/108
2021-06-28 16:19:02 +02:00
Kerollmops
9e5f9a8a10
Add a test for the words level positions generation bug 2021-06-28 16:08:31 +02:00
Kerollmops
4fc8f06791
Rename faceted_fields into filterable_fields 2021-06-23 17:26:54 +02:00
Kerollmops
c31cadb54f
Do not consider the searchable field as filterable 2021-06-23 17:26:54 +02:00
bors[bot]
5b6adc6d96
Merge #245
245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops

Closes #191, and resolves #140.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-22 12:10:52 +00:00
Kerollmops
51dbb2e06d
Warn for when a key is too large for LMDB 2021-06-22 11:51:36 +02:00
Kerollmops
0cca2ea24f
Return a MissingDocumentId when a document doesn't have one 2021-06-22 11:22:33 +02:00
Kerollmops
481b0bf277
Warn for when a facet key is too large for LMDB 2021-06-22 10:57:46 +02:00
Clémentine Urquizar
daef43f504
Rename FieldsDistribution into FieldDistribution 2021-06-21 15:57:41 +02:00
Tamo
d08cfda796
convert the field_distribution to a BTreeMap and avoid counting twice the same documents 2021-06-17 18:31:54 +02:00
Tamo
969adaefdf
rename fields_distribution in field_distribution 2021-06-17 15:16:20 +02:00
Tamo
9716fb3b36
format the whole project 2021-06-16 18:33:33 +02:00
many
ce0315a10f
Close write transaction in test 2021-06-16 11:03:37 +02:00
Kerollmops
713acc408b
Introduce the primary key to the Settings builder structure 2021-06-16 11:03:36 +02:00
Kerollmops
a7d6930905
Replace the panicking expect by tracked Errors 2021-06-15 11:51:32 +02:00
Kerollmops
28c004aa2c
Prefer using constant for the database names 2021-06-15 11:13:04 +02:00
Kerollmops
312c2d1d8e
Use the Error enum everywhere in the project 2021-06-14 16:58:38 +02:00
Kerollmops
d2b1ecc885
Remove a lot of serialization unreachable errors 2021-06-14 16:48:51 +02:00
Kerollmops
65b1d09d55
Move the obkv merging functions into the merge_function module 2021-06-14 16:48:51 +02:00
Kerollmops
ab727e428b
Remove the docid_word_positions_merge method that must never be called 2021-06-14 16:48:51 +02:00
Kerollmops
93a8633f18
Remove the documents_merge method that must never be called 2021-06-14 16:48:51 +02:00
Kerollmops
cfc7314bd1
Prefer using an explicit merge function name 2021-06-14 16:48:50 +02:00
Kerollmops
93978ec38a
Serializing a RoaringBitmap into a Vec cannot fail 2021-06-14 16:48:50 +02:00
Kerollmops
ff9414a6ba
Use the out of the compute_primary_key_pair function 2021-06-14 16:48:50 +02:00
Kerollmops
0bf4f3f48a
Modify a test to check that criteria additions change the fields ids map 2021-06-08 18:14:34 +02:00
Kerollmops
82df524e09
Make sure that we register the field when setting criteria 2021-06-08 18:14:33 +02:00
Kerollmops
133ab98260
Use the index primary key when deleting documents 2021-06-08 17:33:29 +02:00
bors[bot]
39ed133f9f
Merge #193
193: Fix primary key behavior r=Kerollmops a=MarinPostma

this pr:
- Adds early returns on empty document additions, avoiding error messages to be returned when adding no documents and no primary key was set.
- Changes the primary key inference logic to match that of legacy meilisearch.

close #194 

Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-03 10:24:21 +00:00
marin postma
57898d8a90
fix silent deserialize error 2021-06-03 10:42:55 +02:00
Kerollmops
3c304c89d4
Make sure that we generate the faceted database when required 2021-06-02 16:24:58 +02:00
Kerollmops
b0c0490e85
Make sure that we can add a Asc/Desc field without it being filterable 2021-06-02 16:24:58 +02:00
Kerollmops
6476827d3a
Fix the indexer to be sure that distinct and Asc/Desc are also faceted 2021-06-02 16:24:58 +02:00
Kerollmops
2a3f9b32ff
Rename the faceted fields into filterable fields 2021-06-02 16:24:57 +02:00
Many
ab2cf69e8d
Update milli/src/update/delete_documents.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:10 +02:00
Many
8e6d1ff0dc
Update milli/src/update/index_documents/store.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:02 +02:00
many
4ddf008be2
add field id word count database 2021-05-31 16:27:28 +02:00
Kerollmops
1c0a5cd136
Resolve code modification suggestions 2021-05-31 15:22:50 +02:00
Clément Renault
3a4a150ef0
Fix the tests and remaining warnings 2021-05-25 11:31:06 +02:00
Clément Renault
bd7b285bae
Split the update side to use the number and the strings facet databases 2021-05-25 11:30:00 +02:00
Clément Renault
837c1041c7
Clear and delete the documents from the facet database 2021-05-25 11:28:36 +02:00
Marin Postma
eeb0c70ea2
meilisearch compatible primary key inference 2021-05-06 22:42:32 +02:00
Marin Postma
313c362461
early return on empty document addition 2021-05-06 18:14:16 +02:00
Alexey Shekhirin
f8d0f5265f
fix(update): fields distribution after documents merge 2021-05-04 22:12:20 +03:00
Alexey Shekhirin
d81c0e8bba
feat(update): disable autogenerate_docids by default 2021-04-30 21:41:34 +03:00
Marin Postma
e8e32e0ba1
make document addition number visible 2021-04-29 20:05:07 +02:00
Kerollmops
e65bad16cc
Compute the words prefixes at the end of an update 2021-04-27 14:39:52 +02:00
Clément Renault
0ad9499b93
Fix an indexing bug in the words level positions 2021-04-27 14:35:43 +02:00
Clément Renault
7aa5753ed2
Make the attribute positions range bounds to be fixed 2021-04-27 14:35:43 +02:00
Kerollmops
89ee2cf576
Introduce the TreeLevel struct 2021-04-27 14:25:35 +02:00
Kerollmops
bd1a371c62
Compute the WordsLevelPositions only once 2021-04-27 14:25:34 +02:00
Kerollmops
8bd4f5d93e
Compute the biggest values of the words_level_positions_docids 2021-04-27 14:25:34 +02:00
Kerollmops
f713828406
Implement the clear and delete documents for the word-level-positions database 2021-04-27 14:25:34 +02:00
Kerollmops
3069bf4f4a
Fix and improve the words-level-positions computation 2021-04-27 14:25:34 +02:00
Kerollmops
3a25137ee4
Expose and use the WordsLevelPositions update 2021-04-27 14:25:34 +02:00
Kerollmops
c765f277a3
Introduce the WordsLevelPositions update 2021-04-27 14:25:34 +02:00
Kerollmops
9242f2f1d4
Store the first word positions levels 2021-04-27 14:25:34 +02:00
Kerollmops
b0a417f342
Introduce the word_level_position_docids Index database 2021-04-27 14:25:34 +02:00
Kerollmops
c9b2d3ae1a
Warn instead of returning an error when a conversion fails 2021-04-20 10:23:31 +02:00
Kerollmops
51767725b2
Simplify integer and float functions trait bounds 2021-04-20 10:23:31 +02:00
Alexey Shekhirin
33860bc3b7
test(update, settings): set & reset synonyms
fixes after review

more fixes after review
2021-04-18 11:24:17 +03:00
Alexey Shekhirin
e39aabbfe6
feat(search, update): synonyms 2021-04-18 11:24:17 +03:00
Marin Postma
45c45e11dd
implement distinct attribute
distinct can return error

facet distinct on numbers

return distinct error

review fixes

make get_facet_value more generic

fixes
2021-04-15 16:25:55 +02:00
tamo
dcb00b2e54
test a new implementation of the stop_words 2021-04-12 18:35:33 +02:00
Alexey Shekhirin
84c1dda39d
test(http): setting enum serialize/deserialize 2021-04-08 17:03:40 +03:00
Alexey Shekhirin
dc636d190d
refactor(http, update): introduce setting enum 2021-04-08 17:03:40 +03:00
Alexey Shekhirin
2658c5c545
feat(index): update fields distribution in clear & delete operations
fixes after review

bump the version of the tokenizer

implement a first version of the stop_words

The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests

Integrate the stop_words in the querytree

remove the stop_words from the querytree except if it was a prefix or a typo

more fixes after review
2021-04-01 19:12:35 +03:00
Alexey Shekhirin
27c7ab6e00
feat(index): store fields distribution in index 2021-04-01 18:35:19 +03:00
tamo
a2f46029c7
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
Alexey Shekhirin
9205b640a4 feat(index): introduce fields_ids_distribution 2021-03-31 18:44:47 +03:00
mpostma
615fe095e1
update index updated at on index writes 2021-03-15 14:05:47 +01:00
Kerollmops
f51eb46c69
Use the RoaringBitmapLenCodec to retrieve the count of documents 2021-03-09 10:25:39 +01:00
Clément Renault
e5bb96bc3b
Fix the searchable settings test 2021-03-06 12:48:41 +01:00
Kerollmops
07784c8990
Tune the words prefixes threshold to compute for 1/1000 instead 2021-03-03 15:51:28 +01:00
Kerollmops
f376c6a728
Make sure we retrieve the docid word positions 2021-03-03 15:45:03 +01:00
many
246286f0eb
take hard separator into account 2021-03-03 15:45:03 +01:00
mpostma
e08b6b3ec7
add primary key to fields_id_map when not present 2021-03-01 16:10:16 +01:00
Clément Renault
c318373b88
Expose the WordsPrefixes update on the UpdateBuilder 2021-02-21 12:15:35 +01:00
Kerollmops
a4a48be923
Run the words prefixes update inside of the indexing documents update 2021-02-17 11:22:26 +01:00
Kerollmops
616ed8f73c
Clean up the word prefix pair proximities when deleting documents 2021-02-17 11:22:26 +01:00
Clément Renault
ea37fd821d
Clean up the words prefixes when deleting documents and words 2021-02-17 11:22:25 +01:00
Clément Renault
62eee9c69e
Introduce the sorter_into_lmdb_database helper function 2021-02-17 11:12:39 +01:00
Clément Renault
b5b89990eb
Compute and write the word prefix pair proximities database 2021-02-17 11:12:38 +01:00
Kerollmops
9b03b0a1b2
Introduce the word prefix pair proximity docids database 2021-02-17 11:12:38 +01:00
Clément Renault
f365de636f
Compute and write the word-prefix-docids database 2021-02-17 11:12:38 +01:00
Clément Renault
ee5a60e1c5
Clear the words prefixes when clearing an index 2021-02-17 10:45:17 +01:00
Clément Renault
b3a21d5a50
Introduce the getters and setters for the words prefixes FST 2021-02-17 10:45:17 +01:00
Clément Renault
89ce4e74fe
Do not change the primary key type when we serialize documents 2021-02-15 21:24:36 +01:00
Clément Renault
69acdd437e
Deserialize documents ids into JSON Values on deletion 2021-02-15 21:24:36 +01:00
Clément Renault
b3776598d8
Add a test to check deletion of documents with number as primary key 2021-02-15 21:24:35 +01:00
Clément Renault
e8639517da
Change the project to become a workspace with milli as a default-member 2021-02-12 16:15:09 +01:00