Commit Graph

577 Commits

Author SHA1 Message Date
Kerollmops
1c604de158
Introduce the highest_iter private method on the FacetStringIter struct 2021-08-17 10:41:11 +02:00
Kerollmops
64df159057
Introduce the new_reducing constructor on the FacetStringIter struct 2021-08-17 10:35:06 +02:00
Kerollmops
01a4052828
Move the FacetStringIter creation logic into a private new method 2021-08-17 10:29:43 +02:00
bors[bot]
51581d14f8
Merge #307
307: Update version for the next release (v0.10.0) r=Kerollmops a=curquiza

Replaces https://github.com/meilisearch/milli/pull/304

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-16 10:33:53 +00:00
Clémentine Urquizar
fcc520e49a
Update version for the next release (v0.10.0) 2021-08-16 12:00:28 +02:00
many
7dbefae1e3
Make facet string iterator non reducing 2021-08-12 17:23:39 +02:00
many
8fdf860c17
Remove max values by facet limit for facet distribution 2021-08-12 11:29:20 +02:00
bors[bot]
2102e0da6b
Merge #302
302: Update milli to v0.9.0 r=curquiza a=curquiza

Updating the minor and not patch since #300 seems to be breaking: it involves a re-indexation to get the fix, so it involves an additional step from the users, not only downloading the latest version.

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-05 08:38:15 +00:00
bors[bot]
89b9b61840
Merge #300
300: Fix prefix level position docids database r=curquiza a=ManyTheFish

The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.

Fix #299

Co-authored-by: many <maxime@meilisearch.com>
2021-08-04 16:52:09 +00:00
Clémentine Urquizar
7f26c75610
Update milli to v0.9.0 2021-08-04 16:04:55 +02:00
many
cdeb07f0fd
Fix prefix level position docids database
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.

Fix #299
2021-08-04 14:11:49 +02:00
bors[bot]
1290edd58a
Merge #297
297: Bump milli to v0.8.1 r=curquiza a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-29 14:19:41 +00:00
Kerollmops
341c244965
Bump milli to v0.8.1 2021-07-29 15:56:36 +02:00
Kerollmops
90514e03d1
Fix invalid faceted documents ids buffer size 2021-07-29 15:49:23 +02:00
bors[bot]
200e98c211
Merge #293
293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops

Fix https://github.com/meilisearch/meilisearch/issues/1505.

fix https://github.com/meilisearch/MeiliSearch/issues/1529

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-27 16:04:52 +00:00
Clémentine Urquizar
6a141694da
Update version for the next release (v0.8.0) 2021-07-27 16:38:42 +02:00
Kerollmops
dc2b63abdf
Introduce an empty FilterCondition variant to support unknown fields 2021-07-27 16:34:04 +02:00
Kerollmops
b12738cfe9
Use the right DB prefixes to store the faceted fields 2021-07-22 19:18:22 +02:00
Kerollmops
7aa6cc9b04
Do not insert fields in the map when changing the settings 2021-07-22 18:40:12 +02:00
bors[bot]
ee3a49cfba
Merge #291
291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops

Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released.

It is not an undefined behavior, I repeat: it is a "normal" bug 🎉 👏

----

This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it.

As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`.

I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes.

<details>
  <summary>The backtrace leading us to a panic in grenad.</summary>

```
meilisearch_1    | thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    | stack backtrace:
meilisearch_1    |    0: rust_begin_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
meilisearch_1    |    1: core::panicking::panic_fmt
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
meilisearch_1    |    2: core::panicking::panic
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
meilisearch_1    |    3: grenad::block_builder::BlockBuilder::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1    |    4: grenad::writer::Writer<W>::insert
meilisearch_1    |              at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12
meilisearch_1    |    5: milli::update::words_level_positions::write_level_entry
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5
meilisearch_1    |    6: milli::update::words_level_positions::compute_positions_levels
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13
meilisearch_1    |    7: milli::update::words_level_positions::WordsLevelPositions::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23
meilisearch_1    |    8: milli::update::index_documents::IndexDocuments::execute_raw
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9
meilisearch_1    |    9: milli::update::index_documents::IndexDocuments::execute
meilisearch_1    |              at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9
meilisearch_1    |   10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30
meilisearch_1    |   11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22
meilisearch_1    |   12: meilisearch_http::index::update_handler::UpdateHandler::handle_update
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18
meilisearch_1    |   13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}}
meilisearch_1    |              at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35
meilisearch_1    |   14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21
meilisearch_1    |   15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17
meilisearch_1    |   16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9
meilisearch_1    |   17: tokio::runtime::task::core::CoreStage<T>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13
meilisearch_1    |   18: tokio::runtime::task::harness::poll_future::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23
meilisearch_1    |   19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
meilisearch_1    |   20: std::panicking::try::do_call
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
meilisearch_1    |   21: std::panicking::try
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
meilisearch_1    |   22: std::panic::catch_unwind
meilisearch_1    |              at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
meilisearch_1    |   23: tokio::runtime::task::harness::poll_future
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19
meilisearch_1    |   24: tokio::runtime::task::harness::Harness<T,S>::poll_inner
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9
meilisearch_1    |   25: tokio::runtime::task::harness::Harness<T,S>::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15
meilisearch_1    |   26: tokio::runtime::task::raw::RawTask::poll
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18
meilisearch_1    |   27: tokio::runtime::task::Notified<S>::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9
meilisearch_1    |   28: tokio::runtime::blocking::pool::Inner::run
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17
meilisearch_1    |   29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
meilisearch_1    |              at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17
meilisearch_1    | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```

</details>

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-22 16:14:35 +00:00
Kerollmops
0353fbb5df
Bump the tokenizer version to v0.2.4 2021-07-22 17:14:45 +02:00
Kerollmops
92c0a2cdc1
Add a test that triggers a panic when indexing zeroes 2021-07-22 17:14:44 +02:00
Kerollmops
aa02a7fdd8
Add a test to check that we indeed impact the relevancy 2021-07-22 17:04:38 +02:00
Clément Renault
0227254a65
Return the original string values for the inverted facet index database 2021-07-21 16:59:39 +02:00
Kerollmops
03a01166ba
Display the original facet string value from the linear facet database 2021-07-21 16:59:39 +02:00
Clément Renault
d23c250ad5
Fix a bound error in the facet string range construction 2021-07-21 16:59:39 +02:00
Clément Renault
081278dfd6
Use the facet string levels when computing the facet distribution 2021-07-21 16:59:39 +02:00
Clément Renault
5676b204dd
Fix the facet string levels codecs 2021-07-21 16:59:38 +02:00
Kerollmops
8c86348119
Indexing the facet strings levels 2021-07-21 16:59:38 +02:00
Kerollmops
a7ae552ba7
Fix the FacetStringLevelZeroRange range when unbounded 2021-07-21 16:59:38 +02:00
Kerollmops
757b2b502a
Remove the FacetValueStringCodec 2021-07-21 16:59:38 +02:00
Kerollmops
adfd4da24c
Introduce the FacetStringIter iterator 2021-07-21 16:59:38 +02:00
Kerollmops
a79661c6dc
Introduce a lot of facet string helper iterators 2021-07-21 16:59:38 +02:00
Kerollmops
851f979039
Describe the way we want to group the facet strings 2021-07-21 16:59:38 +02:00
Kerollmops
f858f64b1f
Move the facet number iterators into their own module 2021-07-21 16:59:37 +02:00
Kerollmops
9f8095c069
Make sure that we don't keep a reference on the LMDB key when using put_current 2021-07-21 10:35:35 +02:00
Kerollmops
a9553af635
Add a test to check that we can index more that 256 fields 2021-07-06 11:58:03 +02:00
Kerollmops
838ed1cd32
Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
Kerollmops
91c5d0c042
Use the AlwaysFreePages flag when opening an index 2021-07-05 16:36:13 +02:00
Kerollmops
a6b4069172
Bump to v0.7.2 2021-07-05 10:54:53 +02:00
many
9f62149b94
Fix matching lenghth in matching_words 2021-07-01 19:03:28 +02:00
Clémentine Urquizar
3c149d8a43
Update tokenizer version to v0.2.3 2021-06-30 18:41:35 +02:00
bors[bot]
b4dcdbf00d
Merge #269 #271
269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops

This PR fixes #268.

The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.

The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).

271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops

In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-30 12:34:55 +00:00
Kerollmops
32b7bd366f
Remove the roaring operation functions warnings 2021-06-30 14:12:56 +02:00
Kerollmops
c92ef54466
Add a test for when we insert a previously deleted document 2021-06-30 14:00:01 +02:00
Kerollmops
28782ff99d
Fix ExternalDocumentsIds struct when inserting previously deleted ids 2021-06-30 14:00:01 +02:00
Clémentine Urquizar
b489515f4d
Update milli version to v0.7.1 2021-06-30 13:52:46 +02:00
Kerollmops
54889813ce
Implement some debug functions on the ExternalDocumentsIds struct 2021-06-30 11:29:41 +02:00
Kerollmops
4bce66d5ff
Make the Index::delete_* method private 2021-06-30 10:07:31 +02:00
Irevoire
6044b80362
Update milli/src/search/matching_words.rs
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-06-30 00:35:26 +02:00
Tamo
be75e738b1
add more tests 2021-06-29 16:24:58 +02:00
Tamo
56fceb1928
re-implement the Damerau-Levenshtein used for the highlighting 2021-06-29 15:36:03 +02:00
Clément Renault
80c6aaf1fd
Bump milli to 0.7.0 2021-06-28 18:31:56 +02:00
Clément Renault
bdc5599b73
Bump heed to use the git repo with v0.12.0 2021-06-28 18:26:20 +02:00
Clément Renault
0013236e5d
Fix the LMDB and heed invalid interactions.
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.

https://github.com/Kerollmops/heed/pull/108
2021-06-28 16:19:02 +02:00
Kerollmops
9e5f9a8a10
Add a test for the words level positions generation bug 2021-06-28 16:08:31 +02:00
Kerollmops
98285b4b18
Bump milli to 0.6.0 2021-06-23 17:30:26 +02:00
Kerollmops
4fc8f06791
Rename faceted_fields into filterable_fields 2021-06-23 17:26:54 +02:00
Kerollmops
c31cadb54f
Do not consider the searchable field as filterable 2021-06-23 17:26:54 +02:00
bors[bot]
2ab24c4f49
Merge #256
256: Update version for the next release (v0.5.1) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-23 12:29:57 +00:00
Clémentine Urquizar
9885fb4159
Update version for the next release (v0.5.1) 2021-06-23 14:05:20 +02:00
Kerollmops
a6218a20ae
Introduce a new InvalidFacetsDistribution user error 2021-06-23 13:56:19 +02:00
Kerollmops
2364777838
Return an error for when a field distribution cannot be done 2021-06-23 11:50:49 +02:00
Kerollmops
aeaac743ff
Replace an if let some by a match 2021-06-23 11:33:30 +02:00
Tamo
8d2a0b43ff
run the formatter on the whole project a second time 2021-06-22 15:36:22 +02:00
Tamo
3d90b03d7b
fix the limit
There was no check on the limit and thus, if a user especified a very large number this line could causes a panic
2021-06-22 14:52:13 +02:00
bors[bot]
5b6adc6d96
Merge #245
245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops

Closes #191, and resolves #140.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-06-22 12:10:52 +00:00
Kerollmops
51dbb2e06d
Warn for when a key is too large for LMDB 2021-06-22 11:51:36 +02:00
Kerollmops
aecbd14761
Improve the error message for InvalidDocumentId 2021-06-22 11:31:58 +02:00
Kerollmops
0cca2ea24f
Return a MissingDocumentId when a document doesn't have one 2021-06-22 11:22:33 +02:00
Kerollmops
481b0bf277
Warn for when a facet key is too large for LMDB 2021-06-22 10:57:46 +02:00
bors[bot]
b073fd49ea
Merge #244
244: Update version for the next release (v0.5.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-21 14:27:10 +00:00
Clémentine Urquizar
320670f8fe
Update version for the next release (v0.5.0) 2021-06-21 15:59:17 +02:00
Clémentine Urquizar
daef43f504
Rename FieldsDistribution into FieldDistribution 2021-06-21 15:57:41 +02:00
Clémentine Urquizar
35fcc351a0
Update version for the next release (v0.4.2) 2021-06-20 17:37:24 +02:00
bors[bot]
5b19dd23d9
Merge #240
240: Field distribution r=Kerollmops a=irevoire

closes #199
closes #198 


Co-authored-by: Tamo <tamo@meilisearch.com>
2021-06-19 10:14:25 +00:00
Tamo
d08cfda796
convert the field_distribution to a BTreeMap and avoid counting twice the same documents 2021-06-17 18:31:54 +02:00
bors[bot]
a9e552ab18
Merge #238
238: Integration tests on filters and distinct r=Kerollmops a=ManyTheFish

Fix #216 
Fix #120 

Co-authored-by: many <maxime@meilisearch.com>
2021-06-17 15:00:51 +00:00
many
6cb1102bdb
Fix PR comments 2021-06-17 15:19:03 +02:00
Tamo
969adaefdf
rename fields_distribution in field_distribution 2021-06-17 15:16:20 +02:00
Kerollmops
ccd6f13793
Update version to the next release (0.4.1) 2021-06-17 15:01:20 +02:00
many
f496cd320d
Add distinct integration tests 2021-06-17 14:33:18 +02:00
many
9f4184208e
Add test on filters 2021-06-17 13:56:09 +02:00
marin
70bee7d405
re-export remaining error types
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-17 11:49:03 +02:00
marin postma
abbebad669
change sub errors visibility 2021-06-17 11:44:01 +02:00
Tamo
9716fb3b36
format the whole project 2021-06-16 18:33:33 +02:00
Clémentine Urquizar
f5ff3e8e19
Update version for the next release (v0.4.0) 2021-06-16 14:01:05 +02:00
many
ce0315a10f
Close write transaction in test 2021-06-16 11:03:37 +02:00
Kerollmops
7ac441e473
Fix small typos 2021-06-16 11:03:37 +02:00
Kerollmops
adf0c389c5
Rename FilterParsing into InvalidFilter 2021-06-16 11:03:36 +02:00
Kerollmops
8cfe3e1ec0
Rename DatabaseSizeReached into MaxDatabaseSizeReached 2021-06-16 11:03:36 +02:00
Kerollmops
4eda438f6f
Add a new Error for when a user use a non-filtered attribute in a filter 2021-06-16 11:03:36 +02:00
Kerollmops
713acc408b
Introduce the primary key to the Settings builder structure 2021-06-16 11:03:36 +02:00
Kerollmops
a7d6930905
Replace the panicking expect by tracked Errors 2021-06-15 11:51:32 +02:00
Kerollmops
f0e804afd5
Rename the FieldIdMapMissingEntry from_db_name field into process 2021-06-15 11:13:04 +02:00
Kerollmops
28c004aa2c
Prefer using constant for the database names 2021-06-15 11:13:04 +02:00
Kerollmops
312c2d1d8e
Use the Error enum everywhere in the project 2021-06-14 16:58:38 +02:00
Kerollmops
ca78cb5aca
Introduce more variants to the error module enums 2021-06-14 16:58:38 +02:00
Kerollmops
456541e921
Implement the Display trait on the Error type 2021-06-14 16:48:51 +02:00
Kerollmops
44c353fafd
Introduce some way to construct an Error 2021-06-14 16:48:51 +02:00
Kerollmops
23fcf7920e
Introduce a basic version of the InternalError struct 2021-06-14 16:48:51 +02:00
Kerollmops
d2b1ecc885
Remove a lot of serialization unreachable errors 2021-06-14 16:48:51 +02:00
Kerollmops
65b1d09d55
Move the obkv merging functions into the merge_function module 2021-06-14 16:48:51 +02:00
Kerollmops
ab727e428b
Remove the docid_word_positions_merge method that must never be called 2021-06-14 16:48:51 +02:00
Kerollmops
93a8633f18
Remove the documents_merge method that must never be called 2021-06-14 16:48:51 +02:00
Kerollmops
cfc7314bd1
Prefer using an explicit merge function name 2021-06-14 16:48:50 +02:00
Kerollmops
93978ec38a
Serializing a RoaringBitmap into a Vec cannot fail 2021-06-14 16:48:50 +02:00
Kerollmops
ff9414a6ba
Use the out of the compute_primary_key_pair function 2021-06-14 16:48:50 +02:00
Many
f4cab080a6
Update milli/src/search/query_tree.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-10 11:30:51 +02:00
Many
36715f571c
Update milli/src/search/criteria/proximity.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-10 11:30:33 +02:00
many
e923a3ed6a
Replace Consecutive by Phrase in query tree
Replace Consecutive by Phrase in query tree in order to remove theorical bugs,
due of the Consecutive enum type.
2021-06-10 11:16:16 +02:00
Clémentine Urquizar
dc64e139b9
Update version for the next release (v0.3.1) 2021-06-09 14:39:21 +02:00
bors[bot]
afb4133bd2
Merge #212 #222 #223
212: Introduce integration test on criteria r=Kerollmops a=ManyTheFish

- add pre-ranked dataset
- test each criterion 1 by 1
- test all criteria in several order

222: Move the `UpdateStore` into the http-ui crate r=Kerollmops a=Kerollmops

We no more need to have the `UpdateStore` inside of the mill crate as this is the job of the caller to stack the updates and sequentially give them to milli.

223: Update dataset links r=Kerollmops a=curquiza



Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-09 08:47:19 +00:00
bors[bot]
6faa87302c
Merge #220
220: Make hard separators split phrase query r=Kerollmops a=ManyTheFish

hard separators will now split a phrase query as two sequential phrases (double-quoted strings):

the query `"Radioactive (Imagine Dragons)"` would be considered equivalent to `"Radioactive" "Imagine Dragons"` which as the little disadvantage of not keeping the order of the two (or more) separate phrases.

Fix #208

Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-06-09 08:22:58 +00:00
Many
f4ff30e99d
Update milli/tests/search/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-09 10:12:24 +02:00
Many
ab696f6a23
Update milli/tests/search/query_criteria.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-09 10:12:17 +02:00
Kerollmops
0bf4f3f48a
Modify a test to check that criteria additions change the fields ids map 2021-06-08 18:14:34 +02:00
Kerollmops
82df524e09
Make sure that we register the field when setting criteria 2021-06-08 18:14:33 +02:00
Kerollmops
103dddba2f
Move the UpdateStore into the http-ui crate 2021-06-08 17:59:51 +02:00
Many
faf148d297
Update milli/src/search/query_tree.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-08 17:52:37 +02:00
Kerollmops
133ab98260
Use the index primary key when deleting documents 2021-06-08 17:33:29 +02:00
many
b489d699ce
Make hard separators split phrase query
hard separators will now split a phrase query as double double-quotes

Fix #208
2021-06-08 17:29:38 +02:00
Many
afb09c914d
Update milli/tests/search/query_criteria.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-08 16:53:56 +02:00
many
b64cd2a3e3
Resolve PR comments 2021-06-08 14:14:34 +02:00
many
1fcc5f73ac
Factorize tests using macro_rules 2021-06-08 12:33:02 +02:00
many
10882bcbce
Introduce integration test on criteria 2021-06-03 14:44:53 +02:00
bors[bot]
a32236c80c
Merge #211
211: Update Cargo.toml for next release v0.3.0 r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-06-03 10:42:52 +00:00
Clémentine Urquizar
3b2b3aeea9
Update Cargo.toml for next release v0.3.0 2021-06-03 12:24:27 +02:00
bors[bot]
39ed133f9f
Merge #193
193: Fix primary key behavior r=Kerollmops a=MarinPostma

this pr:
- Adds early returns on empty document additions, avoiding error messages to be returned when adding no documents and no primary key was set.
- Changes the primary key inference logic to match that of legacy meilisearch.

close #194 

Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-06-03 10:24:21 +00:00
marin postma
57898d8a90
fix silent deserialize error 2021-06-03 10:42:55 +02:00
bors[bot]
834504aec0
Merge #204
204: Decorrelate Distinct, Asc/Desc, Filterable fields from the faceted fields r=Kerollmops a=Kerollmops

This PR decorrelates the fields that need to be stored in facet databases (big inverted indexes for fast access) from the filterable fields, the previously named faceted fields are now named filterable fields and are the union of the distinct attribute, all the Asc/Desc criteria and, the filterable fields.

I added two tests to make sure that the engine was correctly generating the faceted databases when a distinct attribute or an Asc/Desc criteria were added, and one to make sure that it was impossible to filter on a non-filterable field even if it was a faceted one.

Note that the `AttributesForFacetting` has also been renamed into `FilterableAttributes`. But it will be the Transplant's job to do that on the API, this change is only visible to the milli's library users.

- Related to https://github.com/meilisearch/transplant/issues/187.
- Fixes #161 by returning the documents that don't have the Asc/Desc field at the end of the bucket.
- Fixes #168.
- Fixes #152.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: many <maxime@meilisearch.com>
2021-06-02 15:43:39 +00:00
many
26a9974667
Make asc/desc criterion return resting documents
Fix #161.2
2021-06-02 17:41:48 +02:00
Kerollmops
3c304c89d4
Make sure that we generate the faceted database when required 2021-06-02 16:24:58 +02:00
Kerollmops
b0c0490e85
Make sure that we can add a Asc/Desc field without it being filterable 2021-06-02 16:24:58 +02:00
Kerollmops
3b1cd4c4b4
Rename the FacetCondition into FilterCondition 2021-06-02 16:24:58 +02:00
Kerollmops
c2afdbb1fb
Move and comment some internal facet_condition helper functions 2021-06-02 16:24:58 +02:00
Kerollmops
6476827d3a
Fix the indexer to be sure that distinct and Asc/Desc are also faceted 2021-06-02 16:24:58 +02:00
Marin Postma
1e366dae3e
remove useless lifetime on Distinct Trait 2021-06-02 16:24:58 +02:00
Kerollmops
187c713de5
Remove the MapDistinct struct as now distinct attributes are faceted 2021-06-02 16:24:57 +02:00
Kerollmops
ff440c1d9d
Introduce the faceted fields method to retrieve those that needs faceting 2021-06-02 16:24:57 +02:00
Kerollmops
2a3f9b32ff
Rename the faceted fields into filterable fields 2021-06-02 16:24:57 +02:00
tamo
06c414a753
move the benchmarks to another crate so we can download the datasets automatically without adding overhead to the build of milli 2021-06-02 11:11:50 +02:00
tamo
3c84075d2d
uses an env variable to find the datasets 2021-06-02 11:05:07 +02:00
tamo
4969abeaab
update the facets for the benchmarks 2021-06-02 11:05:07 +02:00
tamo
e5dfde88fd
fix the facets conditions 2021-06-02 11:05:07 +02:00
tamo
7c7fba4e57
remove the time limitation to let criterion do what it wants 2021-06-02 11:05:07 +02:00
tamo
5d5d115608
reformat all the files 2021-06-02 11:05:07 +02:00
tamo
7086009f93
improve the base search 2021-06-02 11:05:07 +02:00
tamo
d0b44c380f
add benchmarks on a wiki dataset 2021-06-02 11:05:07 +02:00
tamo
beae843766
add a missing space 2021-06-02 11:05:07 +02:00
tamo
5132a106a1
refactorize everything related to the songs dataset in a songs benchmark file 2021-06-02 11:05:07 +02:00
tamo
136efd6b53
fix the benches 2021-06-02 11:05:07 +02:00
tamo
4b78ef31b6
add the configuration of the searchable fields and displayed fields and a default configuration for the songs 2021-06-02 11:05:07 +02:00
tamo
ea0c6d8c40
add a bunch of queries and start the introduction of the filters and the new dataset 2021-06-02 11:05:07 +02:00
tamo
3def42abd8
merge all the criterion only benchmarks in one file 2021-06-02 11:05:07 +02:00
tamo
a2bff68c1a
remove the optional words for the typo criterion 2021-06-02 11:05:07 +02:00
tamo
aee49bb3cd
add the proximity criterion 2021-06-02 11:05:07 +02:00
tamo
49e4cc3daf
add the words criterion to the bench 2021-06-02 11:05:07 +02:00
tamo
15cce89a45
update the README with instructions to get the download the dataset 2021-06-02 11:05:07 +02:00
tamo
e425f70ef9
let criterion decide how much iteration it wants to do in 10s 2021-06-02 11:05:07 +02:00
tamo
4fdbfd6048
push a first version of the benchmark for the typo 2021-06-02 11:05:07 +02:00
bors[bot]
270da98c46
Merge #202
202: Add field id word count docids database r=Kerollmops a=LegendreM

This PR introduces a new database, `field_id_word_count_docids`, that maps the number of words in an attribute with a list of document ids. This relation is limited to attributes that contain less than 11 words.
This database is used by the exactness criterion to know if a document has an attribute that contains exactly the query without any additional word.

Fix #165 
Fix #196
Related to [specifications:#36](https://github.com/meilisearch/specifications/pull/36)

Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-06-01 16:09:48 +00:00
many
e857ca4d7d
Fix PR comments 2021-06-01 18:06:46 +02:00
Many
ab2cf69e8d
Update milli/src/update/delete_documents.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:10 +02:00
Many
8e6d1ff0dc
Update milli/src/update/index_documents/store.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:02 +02:00
bors[bot]
7d36d664a7
Merge #203
203: Make the MatchingWords return the number of matching bytes r=Kerollmops a=LegendreM

Make the MatchingWords return the number of matching bytes using a custom Levenshtein algorithm.

Fix #138

Co-authored-by: many <maxime@meilisearch.com>
2021-06-01 12:00:33 +00:00
many
225ae6fd25
Resolve PR comments 2021-06-01 11:53:09 +02:00
Marin Postma
984dc7c1ed
rewrite roaring codec without byteorder. 2021-05-31 22:15:39 +02:00
Marin Postma
1373637da1
optimize roaring codec 2021-05-31 22:15:35 +02:00
many
1df68d342a
Make the MatchingWords return the number of matching bytes 2021-05-31 18:22:29 +02:00
many
c701f8bf36
Use field id word count database in exactness criterion 2021-05-31 16:27:28 +02:00
many
4ddf008be2
add field id word count database 2021-05-31 16:27:28 +02:00
bors[bot]
2f5e61bacb
Merge #184
184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops

This pull request is related to https://github.com/meilisearch/milli/issues/152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database.

There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28](d916b57d74/text/0028-indexing-csv.md) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section.

All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good!

Fixes #192.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-05-31 13:32:58 +00:00
Kerollmops
1c0a5cd136
Resolve code modification suggestions 2021-05-31 15:22:50 +02:00
many
a5e98cf46d
Fix plane sweep algorithm 2021-05-25 18:21:55 +02:00
Clément Renault
3a4a150ef0
Fix the tests and remaining warnings 2021-05-25 11:31:06 +02:00
Clément Renault
02c655ff1a
Refine the facet distribution to use both databases 2021-05-25 11:30:00 +02:00
Clément Renault
79efded841
Refine the FacetCondition from_array constructor 2021-05-25 11:30:00 +02:00
Clément Renault
f7efde11d9
Refine the facet condition to use both facet databases 2021-05-25 11:30:00 +02:00
Clément Renault
e62b89a2ed
Make the facet distinct work with the new split facets 2021-05-25 11:30:00 +02:00
Clément Renault
bd7b285bae
Split the update side to use the number and the strings facet databases 2021-05-25 11:30:00 +02:00
Clément Renault
038e03a4e4
Use both facet databases in the FacetIter type 2021-05-25 11:30:00 +02:00
Clément Renault
597144b0b9
Use both number and string facet databases in the distinct system 2021-05-25 11:29:59 +02:00
Clément Renault
837c1041c7
Clear and delete the documents from the facet database 2021-05-25 11:28:36 +02:00
Clément Renault
a56c46b6f1
Explode the string and f64 facet databases into two 2021-05-25 11:28:36 +02:00
Clément Renault
df7a32e3d0
Move the creation date initialization into a function 2021-05-25 11:28:35 +02:00
many
a3944a7083
Introduce a filtered_candidates field 2021-05-11 11:37:40 +02:00
many
efba662ca6
Fix clippy warnings in cirteria 2021-05-10 10:27:18 +02:00
many
e923d51b8f
Make bucket candidates optionals 2021-05-10 10:27:04 +02:00
Marin Postma
eeb0c70ea2
meilisearch compatible primary key inference 2021-05-06 22:42:32 +02:00
Marin Postma
313c362461
early return on empty document addition 2021-05-06 18:14:16 +02:00
Many
44b6843de7
Fix pull request reviews
Update milli/src/fields_ids_map.rs
Update milli/src/search/criteria/exactness.rs
Update milli/src/search/criteria/mod.rs
2021-05-06 14:31:03 +02:00
many
c1ce4e4ca9
Introduce mocked ExactAttribute step in exactness criterion 2021-05-06 14:28:31 +02:00
many
a3f8686fbf
Introduce exactness criterion 2021-05-06 14:28:30 +02:00
bors[bot]
25f75d4d03
Merge #189
189: Update version for the next release (v0.2.1) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-05-05 15:28:56 +00:00
Clémentine Urquizar
1e11578ef0
Update version for the next release (v0.2.1) 2021-05-05 14:57:34 +02:00
Alexey Shekhirin
f8d0f5265f
fix(update): fields distribution after documents merge 2021-05-04 22:12:20 +03:00
tamo
d61566787e
provide an iterator over all the documents in a milli index 2021-05-04 11:23:51 +02:00
Clémentine Urquizar
a8680887d8
Upgrade Milli version (v0.2.0) 2021-05-03 14:50:47 +02:00
Clémentine Urquizar
34e02aba42
Upgrade Tokenizer version (v0.2.2) 2021-05-03 10:55:55 +02:00
Alexey Shekhirin
d81c0e8bba
feat(update): disable autogenerate_docids by default 2021-04-30 21:41:34 +03:00
Marin Postma
e8e32e0ba1
make document addition number visible 2021-04-29 20:05:07 +02:00
many
ee09e50e7f
Remove excluded document in criteria iterations
- pass excluded document to criteria to remove them in higher levels of the bucket-sort
- merge already returned document with excluded documents to avoid duplicas

Related to #125 and #112
Fix #170
2021-04-29 12:09:38 +02:00
many
31607bf9cd
Add a threshold on proximity when choosing between linear/set algorithm 2021-04-28 14:57:22 +02:00
many
3b7e6afb55
Make some refacto and add documentation 2021-04-28 13:53:27 +02:00
Many
0add4d735c
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:40:34 +02:00
Many
3794ffc952
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:23 +02:00
Many
329bd4a1bb
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:03 +02:00
Many
3b1358b62f
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:19 +02:00
Many
c862b1bc6b
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:10 +02:00
Many
e92d137676
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:31:42 +02:00
Many
b3d6c6a9a0
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:31:13 +02:00
Many
498c2b298c
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:30:02 +02:00
Many
0e4e6dfada
Update milli/src/search/criteria/proximity.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:29:52 +02:00
Many
47d780b8ce
Update milli/src/search/criteria/mod.rs
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-04-27 14:39:53 +02:00
Many
0daa0e170a
Fix PR comments
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 14:39:53 +02:00
many
0d7d3ce802
Update roaring package 2021-04-27 14:39:53 +02:00
many
71740805a7
Fix forgotten typo tests 2021-04-27 14:39:53 +02:00
many
e77291a6f3
Optimize Atrribute criterion on big requests 2021-04-27 14:39:53 +02:00
many
716c8e22b0
Add style and comments 2021-04-27 14:39:52 +02:00
many
f853790016
Use the LCM of 10 first numbers to compute attribute rank 2021-04-27 14:39:52 +02:00
many
2b036449be
Fix the return of equal candidates in different pages 2021-04-27 14:39:52 +02:00
many
0efa011e09
Make a small code clean-up 2021-04-27 14:39:52 +02:00
many
17c8c6f945
Make set algorithm return None when nothing can be returned 2021-04-27 14:39:52 +02:00
many
b3e2280bb9
Debug attribute criterion
* debug folding when initializing iterators
2021-04-27 14:39:52 +02:00
many
1eee0029a8
Make attribute criterion typo/prefix tolerant 2021-04-27 14:39:52 +02:00
many
59f58c15f7
Implement attribute criterion
* Implement WordLevelIterator
* Implement QueryLevelIterator
* Implement set algorithm based on iterators

Not tested + Some TODO to fix
2021-04-27 14:39:52 +02:00
Clément Renault
361193099f
Reduce the amount of branches when query tree flattened 2021-04-27 14:39:52 +02:00
Kerollmops
e65bad16cc
Compute the words prefixes at the end of an update 2021-04-27 14:39:52 +02:00
many
ab92c814c3
Fix attributes score 2021-04-27 14:35:43 +02:00
Clément Renault
0ad9499b93
Fix an indexing bug in the words level positions 2021-04-27 14:35:43 +02:00
Clément Renault
7aa5753ed2
Make the attribute positions range bounds to be fixed 2021-04-27 14:35:43 +02:00
Clément Renault
658f316511
Introduce the Initial Criterion 2021-04-27 14:35:43 +02:00
Kerollmops
89ee2cf576
Introduce the TreeLevel struct 2021-04-27 14:25:35 +02:00
Kerollmops
bd1a371c62
Compute the WordsLevelPositions only once 2021-04-27 14:25:34 +02:00
Kerollmops
8bd4f5d93e
Compute the biggest values of the words_level_positions_docids 2021-04-27 14:25:34 +02:00
Kerollmops
f713828406
Implement the clear and delete documents for the word-level-positions database 2021-04-27 14:25:34 +02:00
Kerollmops
3069bf4f4a
Fix and improve the words-level-positions computation 2021-04-27 14:25:34 +02:00
Kerollmops
3a25137ee4
Expose and use the WordsLevelPositions update 2021-04-27 14:25:34 +02:00
Kerollmops
c765f277a3
Introduce the WordsLevelPositions update 2021-04-27 14:25:34 +02:00
Kerollmops
9242f2f1d4
Store the first word positions levels 2021-04-27 14:25:34 +02:00
Kerollmops
b0a417f342
Introduce the word_level_position_docids Index database 2021-04-27 14:25:34 +02:00
many
75e7b1e3da
Implement test Context methods 2021-04-27 14:25:34 +02:00
many
4ff67ec2ee
Implement attribute criterion for small amounts of candidates 2021-04-27 14:25:34 +02:00
Kerollmops
0f4c0beffd
Introduce the Attribute criterion 2021-04-27 14:25:34 +02:00
tamo
f8dee1b402
[makes clippy happy] search/criteria/proximity.rs 2021-04-21 12:36:45 +02:00
Alexey Shekhirin
6fa00c61d2
feat(search): support words_limit 2021-04-20 12:22:04 +03:00
Kerollmops
c9b2d3ae1a
Warn instead of returning an error when a conversion fails 2021-04-20 10:23:31 +02:00
Kerollmops
2aeef09316
Remove debug logs while iterating through the facet levels 2021-04-20 10:23:31 +02:00
Kerollmops
51767725b2
Simplify integer and float functions trait bounds 2021-04-20 10:23:31 +02:00