Commit Graph

997 Commits

Author SHA1 Message Date
Clémentine Urquizar
d138b3c704
Update version 2022-04-25 18:43:46 +02:00
Tamo
fa6f495662
fix the indexing fuzzer 2022-04-25 18:32:06 +02:00
bors[bot]
8010eca9c7
Merge #505
505: normalize exact words r=curquiza a=MarinPostma

Normalize the exact words, as specified in the specification.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-25 09:35:32 +00:00
ad hoc
2e0089d5ff
normalize exact words 2022-04-21 15:38:40 +02:00
ad hoc
3a2451fcba
add test normalize exact words 2022-04-21 13:52:09 +02:00
Clément Renault
eb5830aa40
Add a test to make sure that long words are handled 2022-04-21 13:45:28 +02:00
ad hoc
8b14090927
fix min-word-len-for-typo not reset properly 2022-04-19 15:20:16 +02:00
bors[bot]
ea4bb9402f
Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
ManyTheFish
f1115e274f Use Copy impl of FormatOption instead of clonning 2022-04-19 10:35:50 +02:00
Clémentine Urquizar
8d630a6f62
Update version for the next release (v0.26.1) 2022-04-14 11:44:06 +02:00
Tamo
00f78d6b5a
Apply code suggestions
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-04-14 11:14:08 +02:00
Tamo
399fba16bb
only flatten an object if it's nested 2022-04-14 11:14:08 +02:00
Tamo
ee64f4a936
Use smartstring to store the external id in our hashmap
We need to store all the external id (primary key) in a hashmap
associated to their internal id during.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
2022-04-13 21:22:07 +02:00
ad hoc
dda28d7415
exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
ad hoc
cd83014fff
add test for disctinct nb hits 2022-04-13 12:10:35 +02:00
ad hoc
bbb6728d2f
add distinct attributes to cli 2022-04-13 12:10:35 +02:00
ManyTheFish
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
ManyTheFish
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
ManyTheFish
011f8210ed Make compute_matches more rust idiomatic 2022-04-12 10:19:02 +02:00
ManyTheFish
a16de5de84 Symplify format and remove intermediate function 2022-04-08 11:20:41 +02:00
ManyTheFish
a769e09dfa Make token_crop_bounds more rust idiomatic 2022-04-07 20:15:14 +02:00
bors[bot]
9ac2fd1c37
Merge #487
487: Update version (v0.26.0) r=Kerollmops a=curquiza

breaking because of #458 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-07 17:10:24 +00:00
Tamo
bab898ce86
move the flatten-serde-json crate inside of milli 2022-04-07 18:20:44 +02:00
ManyTheFish
c8ed1675a7 Add some documentation 2022-04-07 17:32:13 +02:00
ManyTheFish
b1905dfa24 Make split_best_frequency returns references instead of owned data 2022-04-07 17:05:44 +02:00
Tamo
ab458d8840
fix tests after rebase 2022-04-07 17:00:00 +02:00
Irevoire
4f3ce6d9cd
nested fields 2022-04-07 16:58:46 +02:00
Clémentine Urquizar
ee1d627803
Update version (v0.26.0) 2022-04-07 15:56:10 +02:00
bors[bot]
4ae7aea3b2
Merge #486
486: Update version (v0.25.0) r=curquiza a=curquiza

v0.25.0 will be released once #478 is merged

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-04-06 11:40:41 +00:00
ad hoc
b799f3326b
rename merge_nothing to merge_ignore_values 2022-04-05 18:44:35 +02:00
ManyTheFish
fa7d3a37c0 Make some cleaning and add comments 2022-04-05 17:48:56 +02:00
ManyTheFish
3bb1e35ada Fix match count 2022-04-05 17:48:45 +02:00
ManyTheFish
56e0edd621 Put crop markers direclty around words 2022-04-05 17:41:32 +02:00
ManyTheFish
a93cd8c61c Fix prefix highlight with special chars 2022-04-05 17:41:32 +02:00
ManyTheFish
b3f0f39106 Make some cleaning 2022-04-05 17:41:32 +02:00
ManyTheFish
6dc345bc53 Test and Fix prefix highlight 2022-04-05 17:41:32 +02:00
ManyTheFish
bd30ee97b8 Keep separators at start of the croped string 2022-04-05 17:41:32 +02:00
ManyTheFish
29c5f76d7f Use new matcher in http-ui 2022-04-05 17:41:32 +02:00
ManyTheFish
734d0899d3 Publish Matcher 2022-04-05 17:41:32 +02:00
ManyTheFish
4428cb5909 Add some tests and fix some corner cases 2022-04-05 17:41:32 +02:00
ManyTheFish
844f546a8b Add matches algorithm V1 2022-04-05 17:41:32 +02:00
ManyTheFish
3be1790803 Add crop algorithm with naive match algorithm 2022-04-05 17:41:32 +02:00
ManyTheFish
d96e72e5dc Create formater with some tests 2022-04-05 17:41:32 +02:00
ad hoc
201fea0fda
limit extract_word_docids memory usage 2022-04-05 14:14:15 +02:00
ad hoc
5cfd3d8407
add exact attributes documentation 2022-04-05 14:10:22 +02:00
Clémentine Urquizar
9eec44dd98
Update version (v0.25.0) 2022-04-05 12:06:42 +02:00
ad hoc
b85cd4983e
remove field_id_from_position 2022-04-05 09:50:34 +02:00
ad hoc
ab185a59b5
fix infos 2022-04-05 09:46:56 +02:00
ad hoc
59e41d98e3
add comments to integration test 2022-04-04 21:17:06 +02:00
ad hoc
1810927dbd
rephrase exact_attributes doc 2022-04-04 21:04:49 +02:00
ad hoc
b7694c34f5
remove println 2022-04-04 21:00:07 +02:00
ad hoc
6cabd47c32
fix typo in comment 2022-04-04 20:59:20 +02:00
ad hoc
c8d3a09af8
add integration test for disabel typo on attributes 2022-04-04 20:54:03 +02:00
ad hoc
6b2c2509b2
fix bug in exact search 2022-04-04 20:54:03 +02:00
ad hoc
56b4f5dce2
add exact prefix to query_docids 2022-04-04 20:54:03 +02:00
ad hoc
21ae4143b1
add exact_word_prefix to Context 2022-04-04 20:54:03 +02:00
ad hoc
e8f06f6c06
extract exact_word_prefix_docids 2022-04-04 20:54:03 +02:00
ad hoc
6dd2e4ffbd
introduce exact_word_prefix database in index 2022-04-04 20:54:03 +02:00
ad hoc
ba0bb29cd8
refactor WordPrefixDocids to take dbs instead of indexes 2022-04-04 20:54:02 +02:00
ad hoc
c4c6e35352
query exact_word_docids in resolve_query_tree 2022-04-04 20:54:02 +02:00
ad hoc
8d46a5b0b5
extract exact word docids 2022-04-04 20:54:02 +02:00
ad hoc
5451c64d5d
increase criteria asc desc test map size 2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0
introduce exact_word_docids db 2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d
refactor spawn_extraction_task 2022-04-04 20:54:02 +02:00
ad hoc
f82d4b36eb
introduce exact attribute setting 2022-04-04 20:54:02 +02:00
ad hoc
c882d8daf0
add test for exact words 2022-04-04 20:54:01 +02:00
ad hoc
7e9d56a9e7
disable typos on exact words 2022-04-04 20:54:01 +02:00
ad hoc
3e67d8818c
fix typo in test comment 2022-04-04 20:34:23 +02:00
ad hoc
284d8a24e0
add intergration test for disabled typon on word 2022-04-04 20:15:51 +02:00
ad hoc
30a2711bac
rename serde module to serde_impl module
needed because of issues with rustfmt
2022-04-04 20:10:55 +02:00
ad hoc
0fd55db21c
fmt 2022-04-04 20:10:55 +02:00
ad hoc
559e46be5e
fix bad rebase bug 2022-04-04 20:10:55 +02:00
ad hoc
8b1e5d9c6d
add test for exact words 2022-04-04 20:10:55 +02:00
ad hoc
774fa8f065
disable typos on exact words 2022-04-04 20:10:55 +02:00
ad hoc
9bbffb8fee
add exact words setting 2022-04-04 20:10:54 +02:00
ad hoc
853b4a520f
fmt 2022-04-04 10:41:46 +02:00
ad hoc
2cb71dff4a
add typo integration tests 2022-04-04 10:41:46 +02:00
ad hoc
1941072bb2
implement Copy on Setting 2022-04-04 10:41:46 +02:00
ad hoc
fdaf45aab2
replace hardcoded value with constant in TestContext 2022-04-04 10:41:46 +02:00
ad hoc
950a740bd4
refactor typos for readability 2022-04-04 10:41:46 +02:00
ad hoc
66020cd923
rename min_word_len* to use plain letter numbers 2022-04-04 10:41:46 +02:00
ad hoc
4c4b336ecb
rename min word len for typo error 2022-04-01 11:17:03 +02:00
ad hoc
286dd7b2e4
rename min_word_len_2_typo 2022-04-01 11:17:03 +02:00
ad hoc
55af85db3c
add tests for min_word_len_for_typo 2022-04-01 11:17:02 +02:00
ad hoc
9102de5500
fix error message 2022-04-01 11:17:02 +02:00
ad hoc
a1a3a49bc9
dynamic minimum word len for typos in query tree builder 2022-04-01 11:17:02 +02:00
ad hoc
5a24e60572
introduce word len for typo setting 2022-04-01 11:17:02 +02:00
ad hoc
9fe40df960
add word derivations tests 2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080
fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00
ad hoc
3e34981d9b
add test for authorize_typos in update 2022-03-31 14:12:00 +02:00
ad hoc
6ef3bb9d83
fmt 2022-03-31 14:06:23 +02:00
ad hoc
f782fe2062
add authorize_typo_test 2022-03-31 10:08:39 +02:00
ad hoc
c4653347fd
add authorize typo setting 2022-03-31 10:05:44 +02:00
Clémentine Urquizar
ddf78a735b
Update version (v0.24.1) 2022-03-24 16:39:45 +01:00
Irevoire
86dd88698d
bump tokenizer 2022-03-23 14:25:58 +01:00
Irevoire
5dc464b9a7
rollback meilisearch-tokenizer version 2022-03-21 17:29:10 +01:00
bors[bot]
90276d9a2d
Merge #472
472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish

Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-03-16 15:33:11 +00:00
ManyTheFish
49d59d88c2 Remove useless variables in proximity 2022-03-16 16:12:52 +01:00
Bruno Casali
adc71742c8 Move string concat to the struct instead of in the calling 2022-03-16 10:26:12 -03:00
Bruno Casali
4822fe1beb Add a better error message when the filterable attrs are empty
Fixes https://github.com/meilisearch/meilisearch/issues/2140
2022-03-15 18:13:59 -03:00
bors[bot]
f04ab67083
Merge #466
466: Bump version to 0.23.1 r=curquiza a=Kerollmops

This PR bumps the crate versions to 0.23.1. Nothing seems to be breaking in the next release.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-03-15 17:19:05 +00:00
bors[bot]
ad4c982c68
Merge #439
439: Optimize typo criterion r=Kerollmops a=MarinPostma

This pr implements a couple of optimization for the typo criterion:

- clamp max typo on concatenated query words to 1: By considering that a concatenated query word is a typo, we clamp the max number of typos allowed o it to 1. This is useful because we noticed that concatenated query words often introduced words with 2 typos in queries that otherwise didn't allow for 2 typo words.

- Make typos on the first letter count for 2. This change is a big performance gain: by considering the typos on the first letter to count as 2 typos, we drastically restrict the search space for 1 typo, and if we reach 2 typos, the search space is reduced as well, as we only consider: (2 typos ∩ correct first letter) ∪ (wrong first letter ∩ 1 typo) instead of 2 typos anywhere in the word.

## benches
```
group                                                                                                    main                                   typo
-----                                                                                                    ----                                   ----
smol-songs.csv: asc + default/Notstandskomitee                                                           2.51      5.8±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: asc + default/charles                                                                    2.48      3.0±0.01ms        ? ?/sec    1.00   1190.9±1.29µs        ? ?/sec
smol-songs.csv: asc + default/charles mingus                                                             5.56     10.8±0.01ms        ? ?/sec    1.00   1935.3±1.00µs        ? ?/sec
smol-songs.csv: asc + default/david                                                                      1.65      3.9±0.00ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
smol-songs.csv: asc + default/david bowie                                                                3.34     12.5±0.02ms        ? ?/sec    1.00      3.7±0.00ms        ? ?/sec
smol-songs.csv: asc + default/john                                                                       1.00   1849.7±3.74µs        ? ?/sec    1.01   1875.1±4.65µs        ? ?/sec
smol-songs.csv: asc + default/marcus miller                                                              4.32     15.7±0.01ms        ? ?/sec    1.00      3.6±0.01ms        ? ?/sec
smol-songs.csv: asc + default/michael jackson                                                            3.31     12.5±0.01ms        ? ?/sec    1.00      3.8±0.00ms        ? ?/sec
smol-songs.csv: asc + default/tamo                                                                       1.05    565.4±0.86µs        ? ?/sec    1.00    539.3±1.22µs        ? ?/sec
smol-songs.csv: asc + default/thelonious monk                                                            3.49     11.5±0.01ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: asc/Notstandskomitee                                                                     2.59      5.6±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
smol-songs.csv: asc/charles                                                                              6.05      2.1±0.00ms        ? ?/sec    1.00    347.8±0.60µs        ? ?/sec
smol-songs.csv: asc/charles mingus                                                                       14.46     9.4±0.01ms        ? ?/sec    1.00    649.2±0.97µs        ? ?/sec
smol-songs.csv: asc/david                                                                                3.87      2.4±0.00ms        ? ?/sec    1.00    618.2±0.69µs        ? ?/sec
smol-songs.csv: asc/david bowie                                                                          10.14     9.8±0.01ms        ? ?/sec    1.00    970.8±1.55µs        ? ?/sec
smol-songs.csv: asc/john                                                                                 1.00    546.5±1.10µs        ? ?/sec    1.00    547.1±2.11µs        ? ?/sec
smol-songs.csv: asc/marcus miller                                                                        11.45    10.4±0.06ms        ? ?/sec    1.00    907.9±1.37µs        ? ?/sec
smol-songs.csv: asc/michael jackson                                                                      10.56     9.7±0.01ms        ? ?/sec    1.00    919.6±1.03µs        ? ?/sec
smol-songs.csv: asc/tamo                                                                                 1.03     43.3±0.18µs        ? ?/sec    1.00     42.2±0.23µs        ? ?/sec
smol-songs.csv: asc/thelonious monk                                                                      4.16     10.7±0.02ms        ? ?/sec    1.00      2.6±0.00ms        ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee                                                        1.00     95.7±0.20µs        ? ?/sec    1.15   109.6±10.40µs        ? ?/sec
smol-songs.csv: basic filter: <=/charles                                                                 1.00     27.8±0.15µs        ? ?/sec    1.01     27.9±0.18µs        ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus                                                          1.72    119.2±0.67µs        ? ?/sec    1.00     69.1±0.13µs        ? ?/sec
smol-songs.csv: basic filter: <=/david                                                                   1.00     22.3±0.33µs        ? ?/sec    1.05     23.4±0.19µs        ? ?/sec
smol-songs.csv: basic filter: <=/david bowie                                                             1.59     86.9±0.79µs        ? ?/sec    1.00     54.5±0.31µs        ? ?/sec
smol-songs.csv: basic filter: <=/john                                                                    1.00     17.9±0.06µs        ? ?/sec    1.06     18.9±0.15µs        ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller                                                           1.65    102.7±1.63µs        ? ?/sec    1.00     62.3±0.18µs        ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson                                                         1.76    128.2±1.85µs        ? ?/sec    1.00     72.9±0.19µs        ? ?/sec
smol-songs.csv: basic filter: <=/tamo                                                                    1.00     17.9±0.13µs        ? ?/sec    1.05     18.7±0.20µs        ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk                                                         1.53    157.5±2.38µs        ? ?/sec    1.00    102.8±0.88µs        ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee                                                        1.00    100.9±4.36µs        ? ?/sec    1.04    105.0±8.25µs        ? ?/sec
smol-songs.csv: basic filter: TO/charles                                                                 1.00     28.4±0.36µs        ? ?/sec    1.03     29.4±0.33µs        ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus                                                          1.71    118.1±1.08µs        ? ?/sec    1.00     68.9±0.26µs        ? ?/sec
smol-songs.csv: basic filter: TO/david                                                                   1.00     24.0±0.26µs        ? ?/sec    1.03     24.6±0.43µs        ? ?/sec
smol-songs.csv: basic filter: TO/david bowie                                                             1.72     95.2±0.30µs        ? ?/sec    1.00     55.2±0.14µs        ? ?/sec
smol-songs.csv: basic filter: TO/john                                                                    1.00     18.8±0.09µs        ? ?/sec    1.06     19.8±0.17µs        ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller                                                           1.61    102.4±1.65µs        ? ?/sec    1.00     63.4±0.24µs        ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson                                                         1.77    132.1±1.41µs        ? ?/sec    1.00     74.5±0.59µs        ? ?/sec
smol-songs.csv: basic filter: TO/tamo                                                                    1.00     18.2±0.14µs        ? ?/sec    1.05     19.2±0.46µs        ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk                                                         1.49    150.8±1.92µs        ? ?/sec    1.00    101.3±0.44µs        ? ?/sec
smol-songs.csv: basic placeholder/                                                                       1.00     27.3±0.07µs        ? ?/sec    1.03     28.0±0.05µs        ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee"                                                      1.00    122.4±0.17µs        ? ?/sec    1.03    125.6±0.16µs        ? ?/sec
smol-songs.csv: basic with quote/"charles"                                                               1.00     88.8±0.30µs        ? ?/sec    1.00     88.4±0.15µs        ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus"                                                      1.00    685.2±0.74µs        ? ?/sec    1.01    689.4±6.07µs        ? ?/sec
smol-songs.csv: basic with quote/"david"                                                                 1.00    161.6±0.42µs        ? ?/sec    1.01    162.6±0.17µs        ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie"                                                         1.00    731.7±0.73µs        ? ?/sec    1.02    743.1±0.77µs        ? ?/sec
smol-songs.csv: basic with quote/"john"                                                                  1.00    267.1±0.33µs        ? ?/sec    1.01    270.9±0.33µs        ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller"                                                       1.00    138.7±0.31µs        ? ?/sec    1.02    140.9±0.13µs        ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson"                                                     1.01    841.4±0.72µs        ? ?/sec    1.00    833.8±0.92µs        ? ?/sec
smol-songs.csv: basic with quote/"tamo"                                                                  1.01    189.2±0.26µs        ? ?/sec    1.00    188.2±0.71µs        ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk"                                                     1.00   1100.5±1.36µs        ? ?/sec    1.01   1111.7±2.17µs        ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee                                                     3.40      7.9±0.02ms        ? ?/sec    1.00      2.3±0.02ms        ? ?/sec
smol-songs.csv: basic without quote/charles                                                              2.57    494.4±0.89µs        ? ?/sec    1.00    192.5±0.18µs        ? ?/sec
smol-songs.csv: basic without quote/charles mingus                                                       1.29      2.8±0.02ms        ? ?/sec    1.00      2.1±0.01ms        ? ?/sec
smol-songs.csv: basic without quote/david                                                                1.95    623.8±0.90µs        ? ?/sec    1.00    319.2±1.22µs        ? ?/sec
smol-songs.csv: basic without quote/david bowie                                                          1.12      5.9±0.00ms        ? ?/sec    1.00      5.2±0.00ms        ? ?/sec
smol-songs.csv: basic without quote/john                                                                 1.24   1340.9±2.25µs        ? ?/sec    1.00   1084.7±7.76µs        ? ?/sec
smol-songs.csv: basic without quote/marcus miller                                                        7.97     14.6±0.01ms        ? ?/sec    1.00   1826.0±6.84µs        ? ?/sec
smol-songs.csv: basic without quote/michael jackson                                                      1.19      3.9±0.00ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: basic without quote/tamo                                                                 1.65    737.7±3.58µs        ? ?/sec    1.00    446.7±0.51µs        ? ?/sec
smol-songs.csv: basic without quote/thelonious monk                                                      1.16      4.5±0.02ms        ? ?/sec    1.00      3.9±0.04ms        ? ?/sec
smol-songs.csv: big filter/Notstandskomitee                                                              3.27      7.6±0.02ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: big filter/charles                                                                       8.26   1957.5±1.37µs        ? ?/sec    1.00    236.8±0.34µs        ? ?/sec
smol-songs.csv: big filter/charles mingus                                                                18.49    11.2±0.06ms        ? ?/sec    1.00    607.7±3.03µs        ? ?/sec
smol-songs.csv: big filter/david                                                                         3.78      2.4±0.00ms        ? ?/sec    1.00    622.8±0.80µs        ? ?/sec
smol-songs.csv: big filter/david bowie                                                                   9.00     12.0±0.01ms        ? ?/sec    1.00   1336.0±3.17µs        ? ?/sec
smol-songs.csv: big filter/john                                                                          1.00    554.2±0.95µs        ? ?/sec    1.01    560.4±0.79µs        ? ?/sec
smol-songs.csv: big filter/marcus miller                                                                 18.09    12.0±0.01ms        ? ?/sec    1.00    664.7±0.60µs        ? ?/sec
smol-songs.csv: big filter/michael jackson                                                               8.43     12.0±0.01ms        ? ?/sec    1.00   1421.6±1.37µs        ? ?/sec
smol-songs.csv: big filter/tamo                                                                          1.00     86.3±0.14µs        ? ?/sec    1.01     87.3±0.21µs        ? ?/sec
smol-songs.csv: big filter/thelonious monk                                                               5.55     14.3±0.02ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee                                                          2.52      5.8±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
smol-songs.csv: desc + default/charles                                                                   3.04      2.7±0.01ms        ? ?/sec    1.00    893.4±1.08µs        ? ?/sec
smol-songs.csv: desc + default/charles mingus                                                            6.77     10.3±0.01ms        ? ?/sec    1.00   1520.8±1.90µs        ? ?/sec
smol-songs.csv: desc + default/david                                                                     1.39      5.7±0.00ms        ? ?/sec    1.00      4.1±0.00ms        ? ?/sec
smol-songs.csv: desc + default/david bowie                                                               2.34     15.8±0.02ms        ? ?/sec    1.00      6.7±0.01ms        ? ?/sec
smol-songs.csv: desc + default/john                                                                      1.00      2.5±0.00ms        ? ?/sec    1.02      2.6±0.01ms        ? ?/sec
smol-songs.csv: desc + default/marcus miller                                                             5.06     14.5±0.02ms        ? ?/sec    1.00      2.9±0.01ms        ? ?/sec
smol-songs.csv: desc + default/michael jackson                                                           2.64     14.1±0.05ms        ? ?/sec    1.00      5.4±0.00ms        ? ?/sec
smol-songs.csv: desc + default/tamo                                                                      1.00    567.0±0.65µs        ? ?/sec    1.00    565.7±0.97µs        ? ?/sec
smol-songs.csv: desc + default/thelonious monk                                                           3.55     11.6±0.02ms        ? ?/sec    1.00      3.3±0.00ms        ? ?/sec
smol-songs.csv: desc/Notstandskomitee                                                                    2.58      5.6±0.02ms        ? ?/sec    1.00      2.2±0.02ms        ? ?/sec
smol-songs.csv: desc/charles                                                                             6.04      2.1±0.00ms        ? ?/sec    1.00    348.1±0.57µs        ? ?/sec
smol-songs.csv: desc/charles mingus                                                                      14.51     9.4±0.01ms        ? ?/sec    1.00    646.7±0.99µs        ? ?/sec
smol-songs.csv: desc/david                                                                               3.86      2.4±0.00ms        ? ?/sec    1.00    620.7±2.46µs        ? ?/sec
smol-songs.csv: desc/david bowie                                                                         10.10     9.8±0.01ms        ? ?/sec    1.00    973.9±3.31µs        ? ?/sec
smol-songs.csv: desc/john                                                                                1.00    545.5±0.78µs        ? ?/sec    1.00    547.2±0.48µs        ? ?/sec
smol-songs.csv: desc/marcus miller                                                                       11.39    10.3±0.01ms        ? ?/sec    1.00    903.7±0.95µs        ? ?/sec
smol-songs.csv: desc/michael jackson                                                                     10.51     9.7±0.01ms        ? ?/sec    1.00    924.7±2.02µs        ? ?/sec
smol-songs.csv: desc/tamo                                                                                1.01     43.2±0.33µs        ? ?/sec    1.00     42.6±0.35µs        ? ?/sec
smol-songs.csv: desc/thelonious monk                                                                     4.19     10.8±0.03ms        ? ?/sec    1.00      2.6±0.00ms        ? ?/sec
smol-songs.csv: prefix search/a                                                                          1.00   1008.7±1.00µs        ? ?/sec    1.00   1005.5±0.91µs        ? ?/sec
smol-songs.csv: prefix search/b                                                                          1.00    885.0±0.70µs        ? ?/sec    1.01    890.6±1.11µs        ? ?/sec
smol-songs.csv: prefix search/i                                                                          1.00   1051.8±1.25µs        ? ?/sec    1.00   1056.6±4.12µs        ? ?/sec
smol-songs.csv: prefix search/s                                                                          1.00    724.7±1.77µs        ? ?/sec    1.00    721.6±0.59µs        ? ?/sec
smol-songs.csv: prefix search/x                                                                          1.01    212.4±0.21µs        ? ?/sec    1.00    210.9±0.38µs        ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie                                             18.55    48.5±0.09ms        ? ?/sec    1.00      2.6±0.03ms        ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus                                               8.41     56.7±0.45ms        ? ?/sec    1.00      6.7±0.05ms        ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights                                                    15.74    38.9±0.14ms        ? ?/sec    1.00      2.5±0.00ms        ? ?/sec
smol-songs.csv: proximity/black saint sinner lady                                                        11.82    40.1±0.13ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960                                                          6.90     26.1±0.13ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
smol-songs.csv: typo/Arethla Franklin                                                                    14.93     5.8±0.01ms        ? ?/sec    1.00    390.1±1.89µs        ? ?/sec
smol-songs.csv: typo/Disnaylande                                                                         3.18      7.3±0.01ms        ? ?/sec    1.00      2.3±0.00ms        ? ?/sec
smol-songs.csv: typo/dire straights                                                                      5.55     15.2±0.02ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
smol-songs.csv: typo/fear of the duck                                                                    28.03    20.0±0.03ms        ? ?/sec    1.00    713.3±1.54µs        ? ?/sec
smol-songs.csv: typo/indochie                                                                            19.25  1851.4±2.38µs        ? ?/sec    1.00     96.2±0.13µs        ? ?/sec
smol-songs.csv: typo/indochien                                                                           14.66  1887.7±3.18µs        ? ?/sec    1.00    128.8±0.18µs        ? ?/sec
smol-songs.csv: typo/klub des loopers                                                                    37.73    18.0±0.02ms        ? ?/sec    1.00    476.7±0.73µs        ? ?/sec
smol-songs.csv: typo/michel depech                                                                       10.17     5.8±0.01ms        ? ?/sec    1.00    565.8±1.16µs        ? ?/sec
smol-songs.csv: typo/mongus                                                                              15.33  1897.4±3.44µs        ? ?/sec    1.00    123.8±0.13µs        ? ?/sec
smol-songs.csv: typo/stromal                                                                             14.63  1859.3±2.40µs        ? ?/sec    1.00    127.1±0.29µs        ? ?/sec
smol-songs.csv: typo/the white striper                                                                   10.83     9.4±0.01ms        ? ?/sec    1.00    866.0±0.98µs        ? ?/sec
smol-songs.csv: typo/thelonius monk                                                                      14.40     3.8±0.00ms        ? ?/sec    1.00    261.5±1.30µs        ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots                                     5.54     70.8±0.09ms        ? ?/sec    1.00     12.8±0.03ms        ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title         3.48    119.8±0.14ms        ? ?/sec    1.00     34.4±0.04ms        ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song                                          8.98     71.9±0.12ms        ? ?/sec    1.00      8.0±0.01ms        ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793                                                     11.88    37.4±0.07ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
smol-songs.csv: words/seven nation mummy                                                                 22.86    23.4±0.04ms        ? ?/sec    1.00   1024.8±1.57µs        ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo                             2.76    124.4±0.15ms        ? ?/sec    1.00     45.1±0.09ms        ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one     2.52    107.0±0.23ms        ? ?/sec    1.00     42.4±0.66ms        ? ?/sec

group                                                                                    main-wiki                              typo-wiki
-----                                                                                    ---------                              ---------
smol-wiki-articles.csv: basic placeholder/                                               1.02     13.7±0.02µs        ? ?/sec    1.00     13.4±0.03µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"film"                                          1.02    409.8±0.67µs        ? ?/sec    1.00    402.6±0.48µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"france"                                        1.00    325.9±0.91µs        ? ?/sec    1.00    326.4±0.49µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"japan"                                         1.00    218.4±0.26µs        ? ?/sec    1.01    220.5±0.20µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"machine"                                       1.00    143.0±0.12µs        ? ?/sec    1.04    148.8±0.21µs        ? ?/sec
smol-wiki-articles.csv: basic with quote/"miles" "davis"                                 1.00     11.7±0.06ms        ? ?/sec    1.00     11.8±0.01ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"mingus"                                        1.00      4.4±0.03ms        ? ?/sec    1.00      4.4±0.00ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"rock" "and" "roll"                             1.00     43.5±0.08ms        ? ?/sec    1.01     43.8±0.06ms        ? ?/sec
smol-wiki-articles.csv: basic with quote/"spain"                                         1.00    137.3±0.35µs        ? ?/sec    1.05    144.4±0.23µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/film                                         1.00    125.3±0.30µs        ? ?/sec    1.06    133.1±0.37µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/france                                       1.21   1782.6±1.65µs        ? ?/sec    1.00   1477.0±1.39µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/japan                                        1.28   1363.9±0.80µs        ? ?/sec    1.00   1064.3±1.79µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/machine                                      1.73    760.3±0.81µs        ? ?/sec    1.00    439.6±0.75µs        ? ?/sec
smol-wiki-articles.csv: basic without quote/miles davis                                  1.03     17.0±0.03ms        ? ?/sec    1.00     16.5±0.02ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/mingus                                       1.07      5.3±0.01ms        ? ?/sec    1.00      5.0±0.00ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/rock and roll                                1.01     63.9±0.18ms        ? ?/sec    1.00     63.0±0.07ms        ? ?/sec
smol-wiki-articles.csv: basic without quote/spain                                        2.07    667.4±0.93µs        ? ?/sec    1.00    322.8±0.29µs        ? ?/sec
smol-wiki-articles.csv: prefix search/c                                                  1.00    343.1±0.47µs        ? ?/sec    1.00    344.0±0.34µs        ? ?/sec
smol-wiki-articles.csv: prefix search/g                                                  1.00    374.4±3.42µs        ? ?/sec    1.00    374.1±0.44µs        ? ?/sec
smol-wiki-articles.csv: prefix search/j                                                  1.00    359.9±0.31µs        ? ?/sec    1.00    361.2±0.79µs        ? ?/sec
smol-wiki-articles.csv: prefix search/q                                                  1.01    102.0±0.12µs        ? ?/sec    1.00    101.4±0.32µs        ? ?/sec
smol-wiki-articles.csv: prefix search/t                                                  1.00    536.7±1.39µs        ? ?/sec    1.00    534.3±0.84µs        ? ?/sec
smol-wiki-articles.csv: prefix search/x                                                  1.00    400.9±1.00µs        ? ?/sec    1.00    399.5±0.45µs        ? ?/sec
smol-wiki-articles.csv: proximity/april paris                                            3.86     14.4±0.01ms        ? ?/sec    1.00      3.7±0.01ms        ? ?/sec
smol-wiki-articles.csv: proximity/diesel engine                                          12.98    10.4±0.01ms        ? ?/sec    1.00    803.5±1.13µs        ? ?/sec
smol-wiki-articles.csv: proximity/herald sings                                           1.00     12.7±0.06ms        ? ?/sec    5.29     67.1±0.09ms        ? ?/sec
smol-wiki-articles.csv: proximity/tea two                                                6.48   1452.1±2.78µs        ? ?/sec    1.00    224.1±0.38µs        ? ?/sec
smol-wiki-articles.csv: typo/Disnaylande                                                 3.89      8.5±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
smol-wiki-articles.csv: typo/aritmetric                                                  3.78     10.3±0.01ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
smol-wiki-articles.csv: typo/linax                                                       8.91   1426.7±0.97µs        ? ?/sec    1.00    160.1±0.18µs        ? ?/sec
smol-wiki-articles.csv: typo/migrosoft                                                   7.48   1417.3±5.84µs        ? ?/sec    1.00    189.5±0.88µs        ? ?/sec
smol-wiki-articles.csv: typo/nympalidea                                                  3.96      7.2±0.01ms        ? ?/sec    1.00   1810.1±2.03µs        ? ?/sec
smol-wiki-articles.csv: typo/phytogropher                                                3.71      7.2±0.01ms        ? ?/sec    1.00   1934.3±6.51µs        ? ?/sec
smol-wiki-articles.csv: typo/sisan                                                       6.44   1497.2±1.38µs        ? ?/sec    1.00    232.7±0.94µs        ? ?/sec
smol-wiki-articles.csv: typo/the fronce                                                  6.92      2.9±0.00ms        ? ?/sec    1.00    418.0±1.76µs        ? ?/sec
smol-wiki-articles.csv: words/Abraham machin                                             16.63    10.8±0.01ms        ? ?/sec    1.00    649.7±1.08µs        ? ?/sec
smol-wiki-articles.csv: words/Idaho Bellevue pizza                                       27.15    25.6±0.03ms        ? ?/sec    1.00    944.2±5.07µs        ? ?/sec
smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk                                26.87    40.7±0.05ms        ? ?/sec    1.00   1515.3±2.73µs        ? ?/sec
smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli                            11.99    48.8±0.10ms        ? ?/sec    1.00      4.1±0.02ms        ? ?/sec
smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo     4.90    110.0±0.15ms        ? ?/sec    1.00     22.4±0.03ms        ? ?/sec

```

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-15 16:43:36 +00:00
ad hoc
3f24555c3d
custom fst automatons 2022-03-15 17:38:35 +01:00
ad hoc
628c835a22
fix tests 2022-03-15 17:38:34 +01:00
bors[bot]
8efac33b53
Merge #467
467: optimize prefix database r=Kerollmops a=MarinPostma

This pr introduces two optimizations that greatly improve the speed of computing prefix databases.

- The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST.
- We unconditionally and needlessly checked for documents to remove in  `word_prefix_pair`, which caused an iteration over the whole database.

Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-15 16:14:35 +00:00
ad hoc
d127c57f2d
review edits 2022-03-15 17:12:48 +01:00
ad hoc
d633ac5b9d
optimize word prefix pair 2022-03-15 16:37:22 +01:00
ad hoc
d68fe2b3c7
optimize word prefix fst 2022-03-15 16:36:48 +01:00
Kerollmops
08a06b49f0
Bump version to 0.23.1 2022-03-15 15:50:28 +01:00
Clément Renault
0c5f4ed7de
Apply suggestions
Co-authored-by: Many <many@meilisearch.com>
2022-03-15 14:18:29 +01:00
Kerollmops
21ec334dcc
Fix the compilation error of the dependency versions 2022-03-15 11:17:45 +01:00
Kerollmops
63682c2c9a
Upgrade the dependencies 2022-03-15 11:17:44 +01:00
Kerollmops
288a879411
Remove three useless dependencies 2022-03-15 11:17:44 +01:00
psvnl sai kumar
5e08fac729 fixes for rustfmt pass 2022-03-14 19:22:41 +05:30
psvnl sai kumar
92e2e09434 exporting heed to avoid having different versions of Heed in Meilisearch 2022-03-14 01:01:58 +05:30
Kerollmops
1ae13c1374
Avoid iterating on big databases when useless 2022-03-09 15:43:54 +01:00
Bruno Casali
66c6d5e1ef Add a new error message when the valid_fields is empty
> "Attribute `{}` is not sortable. This index doesn't have configured sortable attributes."
> "Attribute `{}` is not sortable. Available sortable attributes are: `{}`."

coexist in the error handling
2022-03-05 10:38:18 -03:00
Clémentine Urquizar
d9ed9de2b0
Update heed link in cargo toml 2022-03-01 19:45:29 +01:00
Kerollmops
d5b8b5a2f8
Replace the ugly unwraps by clean if let Somes 2022-02-28 16:31:33 +01:00
Kerollmops
8d26f3040c
Remove a useless grenad file merging 2022-02-28 16:31:33 +01:00
Clément Renault
04b1bbf932
Reintroduce appending sorted entries when possible 2022-02-24 14:50:45 +01:00
bors[bot]
25123af3b8
Merge #436
436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops

This PR depends on the fixes done in #431 and must be merged after it.

In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents.

---

The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity.

The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-02-16 15:41:14 +00:00
Clément Renault
ff8d7a810d
Change the behavior of the as_cloneable_grenad by taking a ref 2022-02-16 15:40:08 +01:00
Clément Renault
f367cc2e75
Finally bump grenad to v0.4.1 2022-02-16 15:28:48 +01:00
Irevoire
0defeb268c
bump milli 2022-02-16 13:27:41 +01:00
Irevoire
48542ac8fd
get rid of chrono in favor of time 2022-02-15 11:41:55 +01:00
Clémentine Urquizar
d03b3ceb58
Update version for the next release (v0.22.1) 2022-02-07 18:39:29 +01:00
bors[bot]
5d58cb7449
Merge #442
442: fix phrase search r=curquiza a=MarinPostma

Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-02-07 16:18:20 +00:00
ad hoc
bd2262ceea
allow null values in csv 2022-02-03 16:03:01 +01:00
ad hoc
13de251047
rewrite word pair distance gathering 2022-02-03 15:57:20 +01:00
Many
d59bcea749 Revert "Revert "Change chunk size to 4MiB to fit more the end user usage"" 2022-02-02 17:01:13 +01:00
mpostma
7541ab99cd
review changes 2022-02-02 12:59:01 +01:00
mpostma
d0aabde502
optimize 2 typos case 2022-02-02 12:56:09 +01:00
mpostma
55e6cb9c7b
typos on first letter counts as 2 2022-02-02 12:56:09 +01:00
mpostma
642c01d0dc
set max typos on ngram to 1 2022-02-02 12:56:08 +01:00
ad hoc
d852dc0d2b
fix phrase search 2022-02-01 20:21:33 +01:00
Kerollmops
fb79c32430
Compute the new, common and, deleted prefix words fst once 2022-01-27 11:00:18 +01:00
Clément Renault
51d1e64b23
Remove, now useless, the WriteMethod enum 2022-01-27 10:08:35 +01:00
Clément Renault
e9c02173cf
Rework the WordsPrefixPositionDocids update to compute a subset of the database 2022-01-27 10:08:35 +01:00
Clément Renault
dbba5fd461
Create a function to simplify the word prefix pair proximity docids compute 2022-01-27 10:08:35 +01:00
Clément Renault
e760e02737
Fix the computation of the newly added and common prefix pair proximity words 2022-01-27 10:08:35 +01:00
Clément Renault
d59e559317
Fix the computation of the newly added and common prefix words 2022-01-27 10:08:34 +01:00
Clément Renault
2ec8542105
Rework the WordPrefixDocids update to compute a subset of the database 2022-01-27 10:08:34 +01:00
Clément Renault
28692f65be
Rework the WordPrefixDocids update to compute a subset of the database 2022-01-27 10:08:34 +01:00
Clément Renault
5404bc02dd
Move the fst_stream_into_hashset method in the helper methods 2022-01-27 10:06:00 +01:00
Clément Renault
c90fa95f93
Only compute the word prefix pairs on the created word pair proximities 2022-01-27 10:06:00 +01:00
Clément Renault
822f67e9ad
Bring the newly created word pair proximity docids 2022-01-27 10:06:00 +01:00
Clément Renault
d28f18658e
Retrieve the previous version of the words prefixes FST 2022-01-27 10:05:59 +01:00
bors[bot]
38d23546a5
Merge #431
431: Fix and improve word prefix pair proximity r=ManyTheFish a=Kerollmops

This PR first fixes the algorithm we used to select and compute the word prefix pair proximity database. The previous version was skipping nearly all of the prefixes. The issue is that this fix made this method to take more time and we were trying to reduce the time spent in it.

With `@ManyTheFish` we found out that we could skip some of the work we were doing by:
 - discarding the prefixes that were shorter than a specific threshold (default: 2).
 - discarding the word prefix pairs with proximity bigger than a specific threshold (default: 4).
 - remove the unused threshold that was specifying a minimum amount of word docids to merge.

We will take more time to do some more optimization, like stop clearing and recomputing from scratch the database, we will compute the subsets of keys to create, keep and merge. This change is a little bit more complex than what this PR does.

I keep this PR as a draft as I want to further test the real gain if it is enough or not if it is valid or not. I advise reviewers to review commit by commit to see the changes bit by bit, reviewing the whole PR can be hard.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-01-27 07:04:56 +00:00
Clément Renault
f9b214f34e
Apply suggestions from code review
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2022-01-26 11:28:11 +01:00
bors[bot]
e1cc025cbd
Merge #440
440: fix(fuzzer): fix the fuzzer after #430 r=Kerollmops a=irevoire



Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-25 16:33:57 +00:00
Clément Renault
f04cd19886
Introduce a max prefix length parameter to the word prefix pair proximity update 2022-01-25 17:04:23 +01:00
Clément Renault
1514dfa1b7
Introduce a max proximity parameter to the word prefix pair proximity update 2022-01-25 17:04:23 +01:00
Clément Renault
23ea3ad738
Remove the useless threshold when computing the word prefix pair proximity 2022-01-25 17:04:23 +01:00
Clément Renault
e3c34684c6
Fix a bug where we were skipping most of the prefix pairs 2022-01-25 17:04:23 +01:00
Tamo
fb51d511be
fix(fuzzer): fix the fuzzer after #430 2022-01-25 12:08:47 +01:00
bors[bot]
9f2ff71581
Merge #434
434: bump milli to v0.22.0 r=curquiza a=irevoire

This is breaking because of this PR:
98a365aaae

Should we do a special branch to only release the [patch](https://github.com/meilisearch/milli/pull/433) for https://github.com/meilisearch/MeiliSearch/issues/2082 (which is non-breaking)?

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-24 17:31:20 +00:00
bors[bot]
fd177b63f8
Merge #423
423: Remove an unused file r=irevoire a=irevoire

This empty file is not included anywhere

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-19 14:18:05 +00:00
Marin Postma
0c84a40298 document batch support
reusable transform

rework update api

add indexer config

fix tests

review changes

Co-authored-by: Clément Renault <clement@meilisearch.com>

fmt
2022-01-19 12:40:20 +01:00
Tamo
01968d7ca7
ensure we get no documents and no error when filtering on an empty db 2022-01-18 11:40:30 +01:00
Tamo
367f403693
bump milli 2022-01-17 16:41:34 +01:00
bors[bot]
8f4499090b
Merge #433
433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire

- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we return an empty RoaringBitmap
  instead of throwing an internal error

Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-17 14:06:53 +00:00
bors[bot]
4c516c00da
Merge #426
426: Fix search highlight for non-unicode chars r=ManyTheFish a=Samyak2

# Pull Request

## What does this PR do?
Fixes https://github.com/meilisearch/MeiliSearch/issues/1480
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

## Changes

The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme clusters to highlight.

In essence, the `matching_bytes` function now returns the number of matching grapheme clusters instead of bytes.

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a part of it matched

## Questions

Since `matching_bytes` does not return number of bytes but grapheme clusters, should it be renamed to something like `matching_chars` or `matching_graphemes`? Will this break the API?

Thank you very much `@ManyTheFish` for helping 😄 

Co-authored-by: Samyak S Sarnayak <samyak201@gmail.com>
2022-01-17 13:39:00 +00:00
Tamo
d1ac40ea14
fix(filter): Fix two bugs.
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we returns an empty RoaringBitmap
  instead of throwing an internal error
2022-01-17 13:51:46 +01:00
Samyak S Sarnayak
2d7607734e
Run cargo fmt on matching_words.rs 2022-01-17 13:04:33 +05:30
Samyak S Sarnayak
5ab505be33
Fix highlight by replacing num_graphemes_from_bytes
num_graphemes_from_bytes has been renamed in the tokenizer to
num_chars_from_bytes.

Highlight now works correctly!
2022-01-17 13:02:55 +05:30
Samyak S Sarnayak
c10f58b7bd
Update tokenizer to v0.2.7 2022-01-17 13:02:00 +05:30
Samyak S Sarnayak
e752bd06f7
Fix matching_words tests to compile successfully
The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59
2022-01-17 11:37:45 +05:30
Samyak S Sarnayak
30247d70cd
Fix search highlight for non-unicode chars
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
  clusters to highlight.

In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
  clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a
      part of it matched
2022-01-17 11:37:44 +05:30
Tamo
0605c0ac68
apply review comments 2022-01-13 18:51:08 +01:00
Tamo
b22c80106f
add some settings to the fuzzed milli and use the published version of arbitrary json 2022-01-13 15:35:24 +01:00
Tamo
c94952e25d
update the readme + dependencies 2022-01-12 18:30:11 +01:00
Tamo
e1053989c0
add a fuzzer on milli 2022-01-12 17:57:54 +01:00
Tamo
98a365aaae
store the geopoint in three dimensions 2021-12-14 12:21:24 +01:00
Tamo
d671d6f0f1
remove an unused file 2021-12-13 19:27:34 +01:00
Clément Renault
25faef67d0
Remove the database setup in the filter_depth test 2021-12-09 11:57:53 +01:00
Clément Renault
65519bc04b
Test that empty filters return a None 2021-12-09 11:57:53 +01:00
Clément Renault
ef59762d8e
Prefer returning None instead of the Empty Filter state 2021-12-09 11:57:52 +01:00
Clément Renault
ee856a7a46
Limit the max filter depth to 2000 2021-12-07 17:36:45 +01:00
Clément Renault
32bd9f091f
Detect the filters that are too deep and return an error 2021-12-07 17:20:11 +01:00
Clément Renault
90f49eab6d
Check the filter max depth limit and reject the invalid ones 2021-12-07 16:32:48 +01:00
many
1b3923b5ce
Update all packages to 0.21.0 2021-11-29 12:17:59 +01:00
many
8970246bc4
Sort positions before iterating over them during word pair proximity extraction 2021-11-22 18:16:54 +01:00
Marin Postma
6e977dd8e8 change visibility of DocumentDeletionResult 2021-11-22 15:44:44 +01:00
many
35f9499638
Export tokenizer from milli 2021-11-18 16:57:12 +01:00
many
64ef5869d7
Update tokenizer v0.2.6 2021-11-18 16:56:05 +01:00
Marin Postma
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
Marin Postma
09b4281cff improve document addition returned metaimprove document addition
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
2021-11-10 14:08:36 +01:00
Marin Postma
721fc294be improve document deletion returned meta
returns both the remaining number of documents and the number of deleted
documents.
2021-11-10 14:08:18 +01:00
Tamo
f28600031d
Rename the filter_parser crate into filter-parser
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-11-09 16:41:10 +01:00
Irevoire
0ea0146e04
implement deref &str on the tokens 2021-11-09 11:34:10 +01:00
Tamo
7483c7513a
fix the filterable fields 2021-11-07 01:52:19 +01:00
Tamo
e5af3ac65c
rename the filter_condition.rs to filter.rs 2021-11-06 16:37:55 +01:00
Tamo
6831c23449
merge with main 2021-11-06 16:34:30 +01:00
Tamo
b249989bef
fix most of the tests 2021-11-06 01:32:12 +01:00
Tamo
27a6a26b4b
makes the parse function part of the filter_parser 2021-11-05 10:46:54 +01:00
Tamo
76d961cc77
implements the last errors 2021-11-04 17:42:06 +01:00
Tamo
8234f9fdf3
recreate most filter error except for the geosearch 2021-11-04 17:24:55 +01:00
Tamo
07a5ffb04c
update http-ui 2021-11-04 15:52:22 +01:00
Tamo
a58bc5bebb
update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
many
743ed9f57f
Bump milli version 2021-11-04 14:04:21 +01:00
many
7b3bac46a0
Change Attribute and Ranking rules errors 2021-11-04 13:19:32 +01:00
many
702589104d
Update version for the next release (v0.20.1) 2021-11-03 14:20:01 +01:00
many
0c0038488c
Change last error messages 2021-11-03 11:24:06 +01:00
Tamo
76a2adb7c3
re-enable the tests in the parser and start the creation of an error type 2021-11-02 17:35:17 +01:00
bors[bot]
5a6d22d4ec
Merge #407
407: Update version for the next release (v0.20.0) r=curquiza a=curquiza

Breaking because of #405 and #406 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-28 13:43:48 +00:00
bors[bot]
08ae47e475
Merge #405
405: Change some error messages r=ManyTheFish a=ManyTheFish



Co-authored-by: many <maxime@meilisearch.com>
2021-10-28 13:35:55 +00:00
Clémentine Urquizar
056ff13c4d
Update version for the next release (v0.20.0) 2021-10-28 14:52:57 +02:00
many
9f1e0d2a49
Refine asc/desc error messages 2021-10-28 14:47:17 +02:00
many
ed6db19681
Fix PR comments 2021-10-28 11:18:32 +02:00
marin postma
183d3dada7
return document count from builder 2021-10-28 10:33:04 +02:00
many
2be755ce75
Lower error check, already check in meilisearch 2021-10-27 19:50:41 +02:00
many
3599df77f0
Change some error messages 2021-10-27 19:33:01 +02:00
bors[bot]
d7943fe225
Merge #402
402: Optimize document transform r=MarinPostma a=MarinPostma

This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
- For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation.
- For csv, we directly write the lines in the obkv, applying other optimization as well.

Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-10-26 09:55:28 +00:00
marin postma
baddd80069
implement review suggestions 2021-10-25 18:29:12 +02:00
marin postma
f9445c1d90
return float parsing error context in csv 2021-10-25 17:27:10 +02:00
bors[bot]
15c29cdd9b
Merge #401
401: Update version for the next release (v0.19.0) r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-25 12:49:53 +00:00
Clémentine Urquizar
208903ddde
Revert "Replacing pest with nom " 2021-10-25 11:58:00 +02:00
Clémentine Urquizar
679fe18b17
Update version for the next release (v0.19.0) 2021-10-25 11:52:17 +02:00
marin postma
3fcccc31b5
add document builder example 2021-10-25 10:26:43 +02:00
marin postma
430e9b13d3
add csv builder tests 2021-10-25 10:26:43 +02:00
marin postma
53c79e85f2
document errors 2021-10-25 10:26:43 +02:00
marin postma
2e62925a6e
fix tests 2021-10-25 10:26:42 +02:00
marin postma
0f86d6b28f
implement csv serialization 2021-10-25 10:26:42 +02:00
marin postma
8d70b01714
optimize document deserialization 2021-10-25 10:26:42 +02:00
Tamo
1327807caa
add some error messages 2021-10-22 19:00:33 +02:00
Tamo
c8d03046bf
add a check on the fid in the geosearch 2021-10-22 18:08:18 +02:00
Tamo
3942b3732f
re-implement the geosearch 2021-10-22 18:03:39 +02:00
Tamo
7cd9109e2f
lowercase value extracted from Token 2021-10-22 17:50:15 +02:00
Tamo
e25ca9776f
start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
Tamo
6c9165b6a8
provide a helper to parse the token but to not handle the errors 2021-10-22 16:52:13 +02:00
Tamo
efb2f8b325
convert the errors 2021-10-22 16:38:35 +02:00
Tamo
c27870e765
integrate a first version without any error handling 2021-10-22 14:33:18 +02:00
Tamo
01dedde1c9
update some names and move some parser out of the lib.rs 2021-10-22 01:59:38 +02:00
Tamo
c634d43ac5
add a simple test on the filters with an integer 2021-10-21 17:10:27 +02:00
Tamo
6c15f50899
rewrite the parser logic 2021-10-21 16:45:42 +02:00
Tamo
e1d81342cf
add test on the or and and operator 2021-10-21 13:01:25 +02:00
Tamo
423baac08b
fix the tests 2021-10-21 12:45:40 +02:00
Tamo
36281a653f
write all the simple tests 2021-10-21 12:40:11 +02:00
Clémentine Urquizar
f8fe9316c0
Update version for the next release (v0.18.1) 2021-10-21 11:56:14 +02:00
Tamo
661bc21af5
Fix the filter parser
And add a bunch of tests on the filter::from_array
2021-10-21 11:45:03 +02:00
Clémentine Urquizar
2209acbfe2
Update version for the next release (v0.18.2) 2021-10-18 13:45:48 +02:00
bors[bot]
59cc59e93e
Merge #358
358: Replacing pest with nom  r=Kerollmops a=CNLHC



Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>
2021-10-16 20:44:38 +00:00
刘瀚骋
7666e4f34a follow the suggestions 2021-10-14 21:37:59 +08:00
刘瀚骋
2ea2f7570c use nightly cargo to format the code 2021-10-14 16:46:13 +08:00
刘瀚骋
e750465e15 check logic for geolocation. 2021-10-14 16:12:00 +08:00
bors[bot]
aa5e099718
Merge #390
390: Add helper methods on the settings r=Kerollmops a=irevoire

This would be a good addition to look at the content of a setting without consuming it.
It’s useful for analytics.

Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-10-13 20:36:30 +00:00
bors[bot]
c7db4176f3
Merge #384
384: Replace memmap with memmap2 r=Kerollmops a=palfrey

[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.

Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
2021-10-13 13:47:23 +00:00
Irevoire
a3e7c468cd
add helper methods on the settings 2021-10-13 13:05:07 +02:00
刘瀚骋
cd359cd96e WIP: extract the error trait bound to new trait. 2021-10-13 18:04:15 +08:00