291: Fix a bug about zero bytes in the inputs r=irevoire a=Kerollmops
Ok, good news, after a little session of debugging with `@irevoire` we found out that the bug seems to be related to zeroes in the input update. The engine wasn't designed to accept those. The chosen solution is to update the tokenizer to remove those zeroes. We are waiting on https://github.com/meilisearch/tokenizer/pull/52 to be merged and a new version to be released.
It is not an undefined behavior, I repeat: it is a "normal" bug 🎉👏
----
This PR tries to fix a bug where we use LMDB in the wrong way, leading to panic due to an undefined behavior on the Rust side. I thought [we fixed it in a previous PR](https://github.com/meilisearch/milli/pull/264) but we found out that _a similar_ bug was still present. `@bb` found a way to trigger this bug and helped us find the origin of it.
As I don't have a minimal reproducible example of this bug I bet on the unsafe `put_current` calls when we index new documents as the bug was trigger after a big indexation on a clean database, thus not triggering a deletion update. I only replaced the unsafe `put_current` with two safe calls to `get`/`put`.
I hope it helps and fixes the bug, only `@bb` can help us check that. I am not even sure how I can create a custom Docker image and expose it for testing purposes.
<details>
<summary>The backtrace leading us to a panic in grenad.</summary>
```
meilisearch_1 | thread 'tokio-runtime-worker' panicked at 'assertion failed: key > &last_key', /root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1 | stack backtrace:
meilisearch_1 | 0: rust_begin_unwind
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:493:5
meilisearch_1 | 1: core::panicking::panic_fmt
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:92:14
meilisearch_1 | 2: core::panicking::panic
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/panicking.rs:50:5
meilisearch_1 | 3: grenad::block_builder::BlockBuilder::insert
meilisearch_1 | at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/block_builder.rs:38:17
meilisearch_1 | 4: grenad::writer::Writer<W>::insert
meilisearch_1 | at ./root/.cargo/git/checkouts/grenad-e2cb77f65d31bb02/3adcb26/src/writer.rs:92:12
meilisearch_1 | 5: milli::update::words_level_positions::write_level_entry
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:262:5
meilisearch_1 | 6: milli::update::words_level_positions::compute_positions_levels
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:211:13
meilisearch_1 | 7: milli::update::words_level_positions::WordsLevelPositions::execute
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/words_level_positions.rs:65:23
meilisearch_1 | 8: milli::update::index_documents::IndexDocuments::execute_raw
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:831:9
meilisearch_1 | 9: milli::update::index_documents::IndexDocuments::execute
meilisearch_1 | at ./root/.cargo/git/checkouts/milli-00376cd5db949a15/007fec2/milli/src/update/index_documents/mod.rs:372:9
meilisearch_1 | 10: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents_txn
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/updates.rs:225:30
meilisearch_1 | 11: meilisearch_http::index::updates::<impl meilisearch_http::index::Index>::update_documents
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/updates.rs:183:22
meilisearch_1 | 12: meilisearch_http::index::update_handler::UpdateHandler::handle_update
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index/update_handler.rs:75:18
meilisearch_1 | 13: meilisearch_http::index_controller::index_actor::actor::IndexActor<S>::handle_update::{{closure}}::{{closure}}
meilisearch_1 | at ./meilisearch/meilisearch-http/src/index_controller/index_actor/actor.rs:174:35
meilisearch_1 | 14: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/task.rs:42:21
meilisearch_1 | 15: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:243:17
meilisearch_1 | 16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/loom/std/unsafe_cell.rs:14:9
meilisearch_1 | 17: tokio::runtime::task::core::CoreStage<T>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/core.rs:233:13
meilisearch_1 | 18: tokio::runtime::task::harness::poll_future::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:427:23
meilisearch_1 | 19: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
meilisearch_1 | 20: std::panicking::try::do_call
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
meilisearch_1 | 21: std::panicking::try
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
meilisearch_1 | 22: std::panic::catch_unwind
meilisearch_1 | at ./rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
meilisearch_1 | 23: tokio::runtime::task::harness::poll_future
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:414:19
meilisearch_1 | 24: tokio::runtime::task::harness::Harness<T,S>::poll_inner
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:89:9
meilisearch_1 | 25: tokio::runtime::task::harness::Harness<T,S>::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/harness.rs:59:15
meilisearch_1 | 26: tokio::runtime::task::raw::RawTask::poll
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/raw.rs:66:18
meilisearch_1 | 27: tokio::runtime::task::Notified<S>::run
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/task/mod.rs:171:9
meilisearch_1 | 28: tokio::runtime::blocking::pool::Inner::run
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:265:17
meilisearch_1 | 29: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
meilisearch_1 | at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.7.1/src/runtime/blocking/pool.rs:245:17
meilisearch_1 | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
```
</details>
Co-authored-by: Kerollmops <clement@meilisearch.com>
254: Improve the facet string distribution speed r=Kerollmops a=Kerollmops
This pull request creates a data structure similar to the one we use for the faceted numbers, a tetratomic decision tree but this time for the facet strings. This PR also changes the facet distribution behavior by returning one of the original facet values, fixes#260.
This data structure defines bucket-like structures where documents ids are stored under their facet value and helps the search decide if it wants to move to a lower level under a given bucket or not, depending on if the current bucket contains interesting documents or not. The whole format, algorithm, and previous attempts are explained in the [`facet_string.rs` file](ec1cfdd42b/milli/src/search/facet/facet_string.rs).
Note that this data structure **could** be used to sort by string lexicographically, that hypothetically possible. We need more testing, in terms of performance and quality, as we will sort on lowercased versions of the facet values.
- [x] Implement a faster and more precise way to fetch the facet distribution.
- [x] Store and return the original facet string value. We currently return the lowercased version.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
1521: Sentry was never sending anything r=Kerollmops a=irevoire
@Kerollmops noticed that we had no log of this release in sentry, and it look like I badly tested my code after ignoring the “No space left on device” errors.
Now it should be fixed.
Co-authored-by: Tamo <tamo@meilisearch.com>
290: Add a $HOME to the CI r=Kerollmops a=irevoire
This should fix this issue:
https://github.com/meilisearch/milli/runs/3104228432?check_suite_focus=true
I think a real fix would be to fix the configuration of our github runner but I don't know how to do it.
@curquiza could probably help us on that once she's back from vacation 😄
Co-authored-by: Tamo <tamo@meilisearch.com>
1498: Show the filterable and not the faceted attributes in the settings r=Kerollmops a=Kerollmops
Fixes#1497
Co-authored-by: Clément Renault <clement@meilisearch.com>
1484: Add MeiliSearch version to issue template r=irevoire a=bidoubiwa
It is relevant to know the version of MeiliSearch before any other additional information that might be important to know.
We could also reduce the number of required information asked to the user. I would like to suggest the following:
Instead of the section of `Desktop` and `Smartphone` I would just improve the last section
```
**Additional context**
Additional information that may be relevant to the issue.
[e.g. architecture, device, OS, browser]
```
By applying this, the template final look will be the following:
-----
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**MeiliSearch version:** [e.g. v0.20.0]
**Additional context**
Additional information that may be relevant to the issue.
[e.g. architecture, device, OS, browser]
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
287: Add benchmarks for indexing r=Kerollmops a=irevoire
closes#274
I don't really know how much time this will take on our bench machine. I'm afraid the wiki dataset will take a really long time to bench (it takes 1h30 on my computer).
If you are ok with it, I would like to merge this first PR since it introduces a first set of benchmarks and see how much time it takes in reality on our setup.
Co-authored-by: Tamo <tamo@meilisearch.com>
285: Support documents with at most 65536 fields r=Kerollmops a=Kerollmops
Fixes#248.
In this PR I updated the `obkv` crate, it now supports arbitrary key length and therefore I was able to use an `u16` to represent the fields instead of a single byte. It was impressively easy to update the whole codebase 🍡🍔
Co-authored-by: Kerollmops <clement@meilisearch.com>
1481: fix bug in index deletion r=Kerollmops a=MarinPostma
this bug was caused by a heed iterator entry being deleted while still holding a reference to it.
close#1333
Co-authored-by: mpostma <postma.marin@protonmail.com>
283: Use the AlwaysFreePages flag when opening an index r=irevoire a=Kerollmops
We introduced a new flag in our fork of LMDB, this `AlwaysFreePages` flag forces LMDB to always free the single pages it uses before writing to the disk instead of keeping them in a linked list.
Declaring this flag reduces the memory print (leak) we have on memory after indexing a lot of documents.
Fixes#279.
Co-authored-by: Kerollmops <clement@meilisearch.com>