MeiliSearch/milli
bors[bot] 941af58239
Merge #561
561: Enriched documents batch reader r=curquiza a=Kerollmops

~This PR is based on #555 and must be rebased on main after it has been merged to ease the review.~
This PR contains the work in #555 and can be merged on main as soon as reviewed and approved.

- [x] Create an `EnrichedDocumentsBatchReader` that contains the external documents id.
- [x] Extract the primary key name and make it accessible in the `EnrichedDocumentsBatchReader`.
- [x] Use the external id from the `EnrichedDocumentsBatchReader` in the `Transform::read_documents`.
- [x] Remove the `update_primary_key` from the _transform.rs_ file.
- [x] Really generate the auto-generated documents ids.
- [x] Insert the (auto-generated) document ids in the document while processing it in `Transform::read_documents`.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-21 07:08:50 +00:00
..
fuzz Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
src Merge #561 2022-07-21 07:08:50 +00:00
tests Fix the indexation tests 2022-07-12 14:55:51 +02:00
Cargo.toml Update grenad to 0.4.2 2022-07-12 14:52:55 +02:00
README.md update the readme + dependencies 2022-01-12 18:30:11 +01:00

Milli

Fuzzing milli

Currently you can only fuzz the indexation. To execute the fuzzer run:

cargo +nightly fuzz run indexing

To execute the fuzzer on multiple thread you can also run:

cargo +nightly fuzz run -j4 indexing

Since the fuzzer is going to create a lot of temporary file to let milli index its documents I would also recommand to execute it on a ramdisk. Here is how to setup a ramdisk on linux:

sudo mount -t tmpfs none path/to/your/ramdisk

And then set the TMPDIR environment variable to make the fuzzer create its file in it:

export TMPDIR=path/to/your/ramdisk