1. Handle keys with variable length correctly
This fixes https://github.com/meilisearch/meilisearch/issues/3042 and
is easily reproducible with the updated fuzz tests, which now generate
keys with variable lengths.
2. Prevent adding facets to the database if their encoded value does
not satisfy `valid_lmdb_key`.
This fixes an indexing failure when a document had a filterable
attribute containing a value whose length is higher than ~500 bytes.
668: Fix many Clippy errors part 2 r=ManyTheFish a=ehiggs
This brings us a step closer to enforcing clippy on each build.
# Pull Request
## Related issue
This does not fix any issue outright, but it is a second round of fixes for clippy after https://github.com/meilisearch/milli/pull/665. This should contribute to fixing https://github.com/meilisearch/milli/pull/659.
## What does this PR do?
Satisfies many issues for clippy. The complaints are mostly:
* Passing reference where a variable is already a reference.
* Using clone where a struct already implements `Copy`
* Using `ok_or_else` when it is a closure that returns a value instead of using the closure to call function (hence we use `ok_or`)
* Unambiguous lifetimes don't need names, so we can just use `'_`
* Using `return` when it is not needed as we are on the last expression of a function.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
e.g. add one facet value incrementally with a group_size = X and then
add another one with group_size = Y
It is not actually possible to do so with the public API of milli,
but I wanted to make sure the algorithm worked well in those cases
anyway.
The bugs were found by fuzzing the code with fuzzcheck, which I've added
to milli as a conditional dev-dependency. But it can be removed later.
616: Introduce an indexation abortion function when indexing documents r=Kerollmops a=Kerollmops
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to
name fresh variables.
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.
In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
587: Word prefix pair proximity docids indexation refactor r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Refactor the code of `WordPrefixPairProximityDocIds` to make it much faster, fix a bug, and add a unit test.
## Why is it faster?
Because we avoid using a sorter to insert the (`word1`, `prefix`, `proximity`) keys and their associated bitmaps, and thus we don't have to sort a potentially very big set of data. I have also added a couple of other optimisations:
1. reusing allocations
2. using a prefix trie instead of an array of prefixes to get all the prefixes of a word
3. inserting directly into the database instead of putting the data in an intermediary grenad when possible. Also avoid checking for pre-existing values in the database when we know for certain that they do not exist.
## What bug was fixed?
When reindexing, the `new_prefix_fst_words` prefixes may look like:
```
["ant", "axo", "bor"]
```
which we group by first letter:
```
[["ant", "axo"], ["bor"]]
```
Later in the code, if we have the word2 "axolotl", we try to find which subarray of prefixes contains its prefixes. This check is done with `word2.starts_with(subarray_prefixes[0])`, but `"axolotl".starts_with("ant")` is false, and thus we wrongly think that there are no prefixes in `new_prefix_fst_words` that are prefixes of `axolotl`.
## StrStrU8Codec
I had to change the encoding of `StrStrU8Codec` to make the second string null-terminated as well. I don't think this should be a problem, but I may have missed some nuances about the impacts of this change.
## Requests when reviewing this PR
I have explained what the code does in the module documentation of `word_pair_proximity_prefix_docids`. It would be nice if someone could read it and give their opinion on whether it is a clear explanation or not.
I also have a couple questions regarding the code itself:
- Should we clean up and factor out the `PrefixTrieNode` code to try and make broader use of it outside this module? For now, the prefixes undergo a few transformations: from FST, to array, to prefix trie. It seems like it could be simplified.
- I wrote a function called `write_into_lmdb_database_without_merging`. (1) Are we okay with such a function existing? (2) Should it be in `grenad_helpers` instead?
## Benchmark Results
We reduce the time it takes to index about 8% in most cases, but it varies between -3% and -20%.
```
group indexing_main_ce90fc62 indexing_word-prefix-pair-proximity-docids-refactor_cbad2023
----- ---------------------- ------------------------------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.00 1893.0±233.03µs ? ?/sec 1.01 1921.2±260.79µs ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.05 9.4±3.51ms ? ?/sec 1.00 9.0±2.14ms ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.22 18.3±11.42ms ? ?/sec 1.00 15.0±5.79ms ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.00 41.4±4.20ms ? ?/sec 1.28 53.0±13.97ms ? ?/sec
indexing/-wiki-delete-searchable- 1.00 285.6±18.12ms ? ?/sec 1.03 293.1±16.09ms ? ?/sec
indexing/Indexing geo_point 1.03 60.8±0.45s ? ?/sec 1.00 58.8±0.68s ? ?/sec
indexing/Indexing movies in three batches 1.14 16.5±0.30s ? ?/sec 1.00 14.5±0.24s ? ?/sec
indexing/Indexing movies with default settings 1.11 13.7±0.07s ? ?/sec 1.00 12.3±0.28s ? ?/sec
indexing/Indexing nested movies with default settings 1.10 10.6±0.11s ? ?/sec 1.00 9.6±0.15s ? ?/sec
indexing/Indexing nested movies without any facets 1.11 9.4±0.15s ? ?/sec 1.00 8.5±0.10s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.18 66.2±0.39s ? ?/sec 1.00 56.0±0.67s ? ?/sec
indexing/Indexing songs with default settings 1.07 58.7±1.26s ? ?/sec 1.00 54.7±1.71s ? ?/sec
indexing/Indexing songs without any facets 1.08 53.1±0.88s ? ?/sec 1.00 49.3±1.43s ? ?/sec
indexing/Indexing songs without faceted numbers 1.08 57.7±1.33s ? ?/sec 1.00 53.3±0.98s ? ?/sec
indexing/Indexing wiki 1.06 1051.1±21.46s ? ?/sec 1.00 989.6±24.55s ? ?/sec
indexing/Indexing wiki in three batches 1.20 1184.8±8.93s ? ?/sec 1.00 989.7±7.06s ? ?/sec
indexing/Reindexing geo_point 1.04 67.5±0.75s ? ?/sec 1.00 64.9±0.32s ? ?/sec
indexing/Reindexing movies with default settings 1.12 13.9±0.17s ? ?/sec 1.00 12.4±0.13s ? ?/sec
indexing/Reindexing songs with default settings 1.05 60.6±0.84s ? ?/sec 1.00 57.5±0.99s ? ?/sec
indexing/Reindexing wiki 1.07 1725.0±17.92s ? ?/sec 1.00 1611.4±9.90s ? ?/sec
```
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
608: Fix soft deleted documents r=ManyTheFish a=ManyTheFish
When we replaced or updated some documents, the indexing was skipping the replaced documents.
Related to https://github.com/meilisearch/meilisearch/issues/2672
Co-authored-by: ManyTheFish <many@meilisearch.com>
New full snapshot:
---
source: milli/src/update/word_prefix_pair_proximity_docids.rs
---
5 a 1 [101, ]
5 a 2 [101, ]
5 am 1 [101, ]
5 b 4 [101, ]
5 be 4 [101, ]
am a 3 [101, ]
amazing a 1 [100, ]
amazing a 2 [100, ]
amazing a 3 [100, ]
amazing an 1 [100, ]
amazing an 2 [100, ]
amazing b 2 [100, ]
amazing be 2 [100, ]
an a 1 [100, ]
an a 2 [100, 202, ]
an am 1 [100, ]
an an 2 [100, ]
an b 3 [100, ]
an be 3 [100, ]
and a 2 [100, ]
and a 3 [100, ]
and a 4 [100, ]
and am 2 [100, ]
and an 3 [100, ]
and b 1 [100, ]
and be 1 [100, ]
at a 1 [100, 202, ]
at a 2 [100, 101, ]
at a 3 [100, ]
at am 2 [100, 101, ]
at an 1 [100, 202, ]
at an 3 [100, ]
at b 3 [101, ]
at b 4 [100, ]
at be 3 [101, ]
at be 4 [100, ]
beautiful a 2 [100, ]
beautiful a 3 [100, ]
beautiful a 4 [100, ]
beautiful am 3 [100, ]
beautiful an 2 [100, ]
beautiful an 4 [100, ]
bell a 2 [101, ]
bell a 4 [101, ]
bell am 4 [101, ]
extraordinary a 2 [202, ]
extraordinary a 3 [202, ]
extraordinary an 2 [202, ]
house a 3 [100, 202, ]
house a 4 [100, 202, ]
house am 4 [100, ]
house an 3 [100, 202, ]
house b 2 [100, ]
house be 2 [100, ]
rings a 1 [101, ]
rings a 3 [101, ]
rings am 3 [101, ]
rings b 2 [101, ]
rings be 2 [101, ]
the a 3 [101, ]
the b 1 [101, ]
the be 1 [101, ]
New snapshot (yes, it's wrong as well, it will get fixed later):
---
source: milli/src/update/word_prefix_pair_proximity_docids.rs
---
5 a 1 [101, ]
5 a 2 [101, ]
5 am 1 [101, ]
5 b 4 [101, ]
5 be 4 [101, ]
am a 3 [101, ]
amazing a 1 [100, ]
amazing a 2 [100, ]
amazing a 3 [100, ]
amazing an 1 [100, ]
amazing an 2 [100, ]
amazing b 2 [100, ]
amazing be 2 [100, ]
an a 1 [100, ]
an a 2 [100, 202, ]
an am 1 [100, ]
an b 3 [100, ]
an be 3 [100, ]
and a 2 [100, ]
and a 3 [100, ]
and a 4 [100, ]
and b 1 [100, ]
and be 1 [100, ]
d\0 0 [100, 202, ]
an an 2 [100, ]
and am 2 [100, ]
and an 3 [100, ]
at a 2 [100, 101, ]
at a 3 [100, ]
at am 2 [100, 101, ]
at an 1 [100, 202, ]
at an 3 [100, ]
at b 3 [101, ]
at b 4 [100, ]
at be 3 [101, ]
at be 4 [100, ]
beautiful a 2 [100, ]
beautiful a 3 [100, ]
beautiful a 4 [100, ]
beautiful am 3 [100, ]
beautiful an 2 [100, ]
beautiful an 4 [100, ]
bell a 2 [101, ]
bell a 4 [101, ]
bell am 4 [101, ]
extraordinary a 2 [202, ]
extraordinary a 3 [202, ]
extraordinary an 2 [202, ]
house a 4 [100, 202, ]
house a 4 [100, ]
house am 4 [100, ]
house an 3 [100, 202, ]
house b 2 [100, ]
house be 2 [100, ]
rings a 1 [101, ]
rings a 3 [101, ]
rings am 3 [101, ]
rings b 2 [101, ]
rings be 2 [101, ]
the a 3 [101, ]
the b 1 [101, ]
the be 1 [101, ]
556: Add EXISTS filter r=loiclec a=loiclec
## What does this PR do?
Fixes issue [#2484](https://github.com/meilisearch/meilisearch/issues/2484) in the meilisearch repo.
It creates a `field EXISTS` filter which selects all documents containing the `field` key.
For example, with the following documents:
```json
[{
"id": 0,
"colour": []
},
{
"id": 1,
"colour": ["blue", "green"]
},
{
"id": 2,
"colour": 145238
},
{
"id": 3,
"colour": null
},
{
"id": 4,
"colour": {
"green": []
}
},
{
"id": 5,
"colour": {}
},
{
"id": 6
}]
```
Then the filter `colour EXISTS` selects the ids `[0, 1, 2, 3, 4, 5]`. The filter `colour NOT EXISTS` selects `[6]`.
## Details
There is a new database named `facet-id-exists-docids`. Its keys are field ids and its values are bitmaps of all the document ids where the corresponding field exists.
To create this database, the indexing part of milli had to be adapted. The implementation there is basically copy/pasted from the code handling the `facet-id-f64-docids` database, with appropriate modifications in place.
There was an issue involving the flattening of documents during (re)indexing. Previously, the following JSON:
```json
{
"id": 0,
"colour": [],
"size": {}
}
```
would be flattened to:
```json
{
"id": 0
}
```
prior to being given to the extraction pipeline.
This transformation would lose the information that is needed to populate the `facet-id-exists-docids` database. Therefore, I have also changed the implementation of the `flatten-serde-json` crate. Now, as it traverses the Json, it keeps track of which key was encountered. Then, at the end, if a previously encountered key is not present in the flattened object, it adds that key to the object with an empty array as value. For example:
```json
{
"id": 0,
"colour": {
"green": [],
"blue": 1
},
"size": {}
}
```
becomes
```json
{
"id": 0,
"colour": [],
"colour.green": [],
"colour.blue": 1,
"size": []
}
```
Co-authored-by: Kerollmops <clement@meilisearch.com>
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.
Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
523: Improve geosearch error messages r=irevoire a=irevoire
Improve the geosearch error messages (#488).
And try to parse the string as specified in https://github.com/meilisearch/meilisearch/issues/2354
Co-authored-by: Tamo <tamo@meilisearch.com>
520: fix mistake in Settings initialization r=irevoire a=MarinPostma
fix settings not being correctly initialized and add a test to make sure that they are in the future.
fix https://github.com/meilisearch/meilisearch/issues/2358
Co-authored-by: ad hoc <postma.marin@protonmail.com>
505: normalize exact words r=curquiza a=MarinPostma
Normalize the exact words, as specified in the specification.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
We need to store all the external id (primary key) in a hashmap
associated to their internal id during.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
467: optimize prefix database r=Kerollmops a=MarinPostma
This pr introduces two optimizations that greatly improve the speed of computing prefix databases.
- The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST.
- We unconditionally and needlessly checked for documents to remove in `word_prefix_pair`, which caused an iteration over the whole database.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops
This PR depends on the fixes done in #431 and must be merged after it.
In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents.
---
The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity.
The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
390: Add helper methods on the settings r=Kerollmops a=irevoire
This would be a good addition to look at the content of a setting without consuming it.
It’s useful for analytics.
Co-authored-by: Irevoire <tamo@meilisearch.com>
384: Replace memmap with memmap2 r=Kerollmops a=palfrey
[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.
Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
388: fix primary key inference r=MarinPostma a=MarinPostma
The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id".
This fix sorts the the index by field_id when infering the primary key.
Co-authored-by: mpostma <postma.marin@protonmail.com>
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
322: Geosearch r=ManyTheFish a=irevoire
This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.
### What we will have to do on the indexing part:
- [x] Index the `_geo` fields from the documents.
- [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
- [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
- [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field
### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
- [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
- [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
- [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
- [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.
-----------
- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
300: Fix prefix level position docids database r=curquiza a=ManyTheFish
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.
Fix#299
Co-authored-by: many <maxime@meilisearch.com>
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.
Fix#299
291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops
Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released.
It is not an undefined behavior, I repeat: it is a "normal" bug 🎉👏
----
This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it.
As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`.
I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes.
<details>
<summary>The backtrace leading us to a panic in grenad.</summary>
```
meilisearch_1 | thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1 | stack backtrace:
meilisearch_1 | 0: rust_begin_unwind
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
meilisearch_1 | 1: core::panicking::panic_fmt
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
meilisearch_1 | 2: core::panicking::panic
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
meilisearch_1 | 3: grenad::block_builder::BlockBuilder::insert
meilisearch_1 | at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1 | 4: grenad::writer::Writer<W>::insert
meilisearch_1 | at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12
meilisearch_1 | 5: milli::update::words_level_positions::write_level_entry
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5
meilisearch_1 | 6: milli::update::words_level_positions::compute_positions_levels
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13
meilisearch_1 | 7: milli::update::words_level_positions::WordsLevelPositions::execute
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23
meilisearch_1 | 8: milli::update::index_documents::IndexDocuments::execute_raw
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9
meilisearch_1 | 9: milli::update::index_documents::IndexDocuments::execute
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9
meilisearch_1 | 10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30
meilisearch_1 | 11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22
meilisearch_1 | 12: meilisearch_http::index::update_handler::UpdateHandler::handle_update
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18
meilisearch_1 | 13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}}
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35
meilisearch_1 | 14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21
meilisearch_1 | 15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17
meilisearch_1 | 16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9
meilisearch_1 | 17: tokio::runtime::task::core::CoreStage<T>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13
meilisearch_1 | 18: tokio::runtime::task::harness::poll_future::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23
meilisearch_1 | 19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
meilisearch_1 | 20: std::panicking::try::do_call
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
meilisearch_1 | 21: std::panicking::try
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
meilisearch_1 | 22: std::panic::catch_unwind
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
meilisearch_1 | 23: tokio::runtime::task::harness::poll_future
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19
meilisearch_1 | 24: tokio::runtime::task::harness::Harness<T,S>::poll_inner
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9
meilisearch_1 | 25: tokio::runtime::task::harness::Harness<T,S>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15
meilisearch_1 | 26: tokio::runtime::task::raw::RawTask::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18
meilisearch_1 | 27: tokio::runtime::task::Notified<S>::run
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9
meilisearch_1 | 28: tokio::runtime::blocking::pool::Inner::run
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17
meilisearch_1 | 29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17
meilisearch_1 | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```
</details>
Co-authored-by: Kerollmops <clement@meilisearch.com>
269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops
This PR fixes#268.
The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.
The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).
271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops
In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.
Co-authored-by: Kerollmops <clement@meilisearch.com>
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.
https://github.com/Kerollmops/heed/pull/108
245: Warn for when a key is too large for LMDB r=Kerollmops a=Kerollmops
Closes#191, and resolves#140.
Co-authored-by: Kerollmops <clement@meilisearch.com>
193: Fix primary key behavior r=Kerollmops a=MarinPostma
this pr:
- Adds early returns on empty document additions, avoiding error messages to be returned when adding no documents and no primary key was set.
- Changes the primary key inference logic to match that of legacy meilisearch.
close#194
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: marin postma <postma.marin@protonmail.com>
fixes after review
bump the version of the tokenizer
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
more fixes after review
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests