Commit Graph

234 Commits

Author SHA1 Message Date
tamo
06c414a753
move the benchmarks to another crate so we can download the datasets automatically without adding overhead to the build of milli 2021-06-02 11:11:50 +02:00
tamo
3c84075d2d
uses an env variable to find the datasets 2021-06-02 11:05:07 +02:00
tamo
4969abeaab
update the facets for the benchmarks 2021-06-02 11:05:07 +02:00
tamo
e5dfde88fd
fix the facets conditions 2021-06-02 11:05:07 +02:00
tamo
7c7fba4e57
remove the time limitation to let criterion do what it wants 2021-06-02 11:05:07 +02:00
tamo
5d5d115608
reformat all the files 2021-06-02 11:05:07 +02:00
tamo
7086009f93
improve the base search 2021-06-02 11:05:07 +02:00
tamo
d0b44c380f
add benchmarks on a wiki dataset 2021-06-02 11:05:07 +02:00
tamo
beae843766
add a missing space 2021-06-02 11:05:07 +02:00
tamo
5132a106a1
refactorize everything related to the songs dataset in a songs benchmark file 2021-06-02 11:05:07 +02:00
tamo
136efd6b53
fix the benches 2021-06-02 11:05:07 +02:00
tamo
4b78ef31b6
add the configuration of the searchable fields and displayed fields and a default configuration for the songs 2021-06-02 11:05:07 +02:00
tamo
ea0c6d8c40
add a bunch of queries and start the introduction of the filters and the new dataset 2021-06-02 11:05:07 +02:00
tamo
3def42abd8
merge all the criterion only benchmarks in one file 2021-06-02 11:05:07 +02:00
tamo
a2bff68c1a
remove the optional words for the typo criterion 2021-06-02 11:05:07 +02:00
tamo
aee49bb3cd
add the proximity criterion 2021-06-02 11:05:07 +02:00
tamo
49e4cc3daf
add the words criterion to the bench 2021-06-02 11:05:07 +02:00
tamo
15cce89a45
update the README with instructions to get the download the dataset 2021-06-02 11:05:07 +02:00
tamo
e425f70ef9
let criterion decide how much iteration it wants to do in 10s 2021-06-02 11:05:07 +02:00
tamo
4fdbfd6048
push a first version of the benchmark for the typo 2021-06-02 11:05:07 +02:00
bors[bot]
270da98c46
Merge #202
202: Add field id word count docids database r=Kerollmops a=LegendreM

This PR introduces a new database, `field_id_word_count_docids`, that maps the number of words in an attribute with a list of document ids. This relation is limited to attributes that contain less than 11 words.
This database is used by the exactness criterion to know if a document has an attribute that contains exactly the query without any additional word.

Fix #165 
Fix #196
Related to [specifications:#36](https://github.com/meilisearch/specifications/pull/36)

Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-06-01 16:09:48 +00:00
many
e857ca4d7d
Fix PR comments 2021-06-01 18:06:46 +02:00
Many
ab2cf69e8d
Update milli/src/update/delete_documents.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:10 +02:00
Many
8e6d1ff0dc
Update milli/src/update/index_documents/store.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:02 +02:00
bors[bot]
7d36d664a7
Merge #203
203: Make the MatchingWords return the number of matching bytes r=Kerollmops a=LegendreM

Make the MatchingWords return the number of matching bytes using a custom Levenshtein algorithm.

Fix #138

Co-authored-by: many <maxime@meilisearch.com>
2021-06-01 12:00:33 +00:00
many
225ae6fd25
Resolve PR comments 2021-06-01 11:53:09 +02:00
Marin Postma
984dc7c1ed
rewrite roaring codec without byteorder. 2021-05-31 22:15:39 +02:00
Marin Postma
1373637da1
optimize roaring codec 2021-05-31 22:15:35 +02:00
many
1df68d342a
Make the MatchingWords return the number of matching bytes 2021-05-31 18:22:29 +02:00
many
c701f8bf36
Use field id word count database in exactness criterion 2021-05-31 16:27:28 +02:00
many
4ddf008be2
add field id word count database 2021-05-31 16:27:28 +02:00
bors[bot]
2f5e61bacb
Merge #184
184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops

This pull request is related to https://github.com/meilisearch/milli/issues/152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database.

There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28](d916b57d74/text/0028-indexing-csv.md) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section.

All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good!

Fixes #192.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-05-31 13:32:58 +00:00
Kerollmops
1c0a5cd136
Resolve code modification suggestions 2021-05-31 15:22:50 +02:00
many
a5e98cf46d
Fix plane sweep algorithm 2021-05-25 18:21:55 +02:00
Clément Renault
3a4a150ef0
Fix the tests and remaining warnings 2021-05-25 11:31:06 +02:00
Clément Renault
02c655ff1a
Refine the facet distribution to use both databases 2021-05-25 11:30:00 +02:00
Clément Renault
79efded841
Refine the FacetCondition from_array constructor 2021-05-25 11:30:00 +02:00
Clément Renault
f7efde11d9
Refine the facet condition to use both facet databases 2021-05-25 11:30:00 +02:00
Clément Renault
e62b89a2ed
Make the facet distinct work with the new split facets 2021-05-25 11:30:00 +02:00
Clément Renault
bd7b285bae
Split the update side to use the number and the strings facet databases 2021-05-25 11:30:00 +02:00
Clément Renault
038e03a4e4
Use both facet databases in the FacetIter type 2021-05-25 11:30:00 +02:00
Clément Renault
597144b0b9
Use both number and string facet databases in the distinct system 2021-05-25 11:29:59 +02:00
Clément Renault
837c1041c7
Clear and delete the documents from the facet database 2021-05-25 11:28:36 +02:00
Clément Renault
a56c46b6f1
Explode the string and f64 facet databases into two 2021-05-25 11:28:36 +02:00
Clément Renault
df7a32e3d0
Move the creation date initialization into a function 2021-05-25 11:28:35 +02:00
many
a3944a7083
Introduce a filtered_candidates field 2021-05-11 11:37:40 +02:00
many
efba662ca6
Fix clippy warnings in cirteria 2021-05-10 10:27:18 +02:00
many
e923d51b8f
Make bucket candidates optionals 2021-05-10 10:27:04 +02:00
Many
44b6843de7
Fix pull request reviews
Update milli/src/fields_ids_map.rs
Update milli/src/search/criteria/exactness.rs
Update milli/src/search/criteria/mod.rs
2021-05-06 14:31:03 +02:00
many
c1ce4e4ca9
Introduce mocked ExactAttribute step in exactness criterion 2021-05-06 14:28:31 +02:00