bors[bot] 8efac33b53
Merge #467
467: optimize prefix database r=Kerollmops a=MarinPostma

This pr introduces two optimizations that greatly improve the speed of computing prefix databases.

- The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST.
- We unconditionally and needlessly checked for documents to remove in  `word_prefix_pair`, which caused an iteration over the whole database.

Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-15 16:14:35 +00:00
..
2022-03-01 19:45:29 +01:00
2022-03-15 16:14:35 +00:00
2022-01-19 12:40:20 +01:00
2022-03-15 11:17:44 +01:00
2022-01-12 18:30:11 +01:00

Milli

Fuzzing milli

Currently you can only fuzz the indexation. To execute the fuzzer run:

cargo +nightly fuzz run indexing

To execute the fuzzer on multiple thread you can also run:

cargo +nightly fuzz run -j4 indexing

Since the fuzzer is going to create a lot of temporary file to let milli index its documents I would also recommand to execute it on a ramdisk. Here is how to setup a ramdisk on linux:

sudo mount -t tmpfs none path/to/your/ramdisk

And then set the TMPDIR environment variable to make the fuzzer create its file in it:

export TMPDIR=path/to/your/ramdisk