mirror of
https://github.com/meilisearch/MeiliSearch
synced 2025-07-04 20:37:15 +02:00
move the benchmarks to another crate so we can download the datasets automatically without adding overhead to the build of milli
This commit is contained in:
parent
3c84075d2d
commit
06c414a753
10 changed files with 154 additions and 55 deletions
30
benchmarks/README.md
Normal file
30
benchmarks/README.md
Normal file
|
@ -0,0 +1,30 @@
|
|||
Benchmarks
|
||||
==========
|
||||
|
||||
For our benchmark we are using a small subset of the dataset `songs.csv`. It was generated with this command:
|
||||
```
|
||||
xsv sample --seed 42 1000000 songs.csv -o smol-songs.csv
|
||||
```
|
||||
You can download it [here](https://meili-datasets.s3.fr-par.scw.cloud/benchmarks/smol-songs.csv.gz)
|
||||
And the original `songs.csv` dataset is available [here](https://meili-datasets.s3.fr-par.scw.cloud/songs.csv.gz).
|
||||
|
||||
We also use a subset of `wikipedia-articles.csv` that was generated with the following command:
|
||||
```
|
||||
xsv sample --seed 42 500000 wikipedia-articles.csv -o smol-wikipedia-articles.csv
|
||||
```
|
||||
You can download the original [here](https://meili-datasets.s3.fr-par.scw.cloud/wikipedia-articles.csv.gz) and the subset [here](https://meili-datasets.s3.fr-par.scw.cloud/benchmarks/smol-wikipedia-articles.csv.gz).
|
||||
|
||||
-----
|
||||
|
||||
- To run all the benchmarks we recommand using `cargo bench`, this should takes around ~4h
|
||||
- You can also run the benchmarks on the `songs` dataset with `cargo bench --bench songs`, it should takes around 1h
|
||||
- And on the `wiki` dataset with `cargo bench --bench wiki`, it should takes around 3h
|
||||
|
||||
By default the benchmarks will be downloaded and uncompressed automatically in the target directory.
|
||||
If you don't want to download the datasets everytime you updates something on the code you can specify a custom directory with the env variable `MILLI_BENCH_DATASETS_PATH`:
|
||||
```
|
||||
mkdir ~/datasets
|
||||
MILLI_BENCH_DATASETS_PATH=~/datasets cargo bench --bench songs # the two datasets are downloaded
|
||||
touch build.rs
|
||||
MILLI_BENCH_DATASETS_PATH=~/datasets cargo bench --bench songs # the code is compiled again but the datasets are not downloaded
|
||||
```
|
Loading…
Add table
Add a link
Reference in a new issue