Loïc Lecrenier
6cc91824c1
Remove unused heed codec files
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
5a904cf29d
Reintroduce facet distribution functionality
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
b8a1caad5e
Add range search and incremental indexing algorithm
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
63ef0aba18
Start porting facet distribution and sort to new database structure
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
7913d6365c
Update Facets indexing to be compatible with new database structure
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
c3f49f766d
Prepare refactor of facets database
...
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
bors[bot]
004c09a8e2
Merge #669
...
669: Add method to create a new Index with specific creation dates r=irevoire a=loiclec
This functionality is needed to implement the import of dumps correctly.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-10-25 12:44:43 +00:00
Loïc Lecrenier
36bd66281d
Add method to create a new Index with specific creation dates
2022-10-25 14:37:56 +02:00
bors[bot]
d11a6e187f
Merge #639
...
639: Reduce the size of the word_pair_proximity database r=loiclec a=loiclec
# Pull Request
## What does this PR do?
Fixes #634
Now, the value corresponding to the key `prox word1 word2` in the `word_pair_proximity_docids` database contains the ids of the documents in which:
- `word1` is followed by `word2`
- the minimum number of words between `word1` and `word2` is `prox-1`
Before this PR, the `word_pair_proximity_docids` had keys with the format `word1 word2 prox` and the value contained the ids of the documents in which either:
- `word1` is followed by `word2` after a minimum of `prox-1` words in between them
- `word2` is followed by `word1` after a minimum of `prox-2` words
As a consequence of this change, calls such as:
```
let docids = word_pair_proximity_docids.get(rtxn, (word1, word2, prox));
```
have to be replaced with:
```
let docids1 = word_pair_proximity_docids.get(rtxn, (prox, word1, word2)) ;
let docids2 = word_pair_proximity_docids.get(rtxn, (prox-1, word2, word1)) ;
let docids = docids1 | docids2;
```
## Phrase search
The PR also fixes two bugs in the `resolve_phrase` function. The first bug is that a phrase containing twice the same word would always return zero documents (e.g. `"dog eats dog"`).
The second bug occurs with a phrase such as "fox is smarter than a dog"` and the document with the text:
```
fox or dog? a fox is smarter than a dog
```
In that case, the phrase search would not return the documents because:
* we only have the key `fox dog 2` in `word_pair_proximity_docids`
* but the implementation of `resolve_phrase` looks for `fox dog 5`, which returns 0 documents
### New implementation of `resolve_phrase`
Given the phrase:
```
fox is smarter than a dog
```
We select the document ids corresponding to all of the following keys in `word_pair_proximity_docids`:
- `1 fox is`
- `1 is smarter`
- `1 smarter than`
- (etc.)
- `1 fox smarter` OR `2 fox smarter`
- `1 is than` OR `2 is than`
- ...
- `1 than dog` OR `2 than dog`
## Benchmark Results
Indexing:
```
group indexing_main_d94339a8 indexing_word-pair-proximity-docids-refactor_2983dd8e
----- ---------------------- -----------------------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.19 40.7±11.28ms ? ?/sec 1.00 34.3±4.16ms ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.62 11.3±3.77ms ? ?/sec 1.00 7.0±1.56ms ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 12.5±2.62ms ? ?/sec 1.07 13.4±4.24ms ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.26 50.2±12.63ms ? ?/sec 1.00 39.8±20.25ms ? ?/sec
indexing/-wiki-delete-searchable- 1.83 269.1±16.11ms ? ?/sec 1.00 146.8±6.12ms ? ?/sec
indexing/Indexing geo_point 1.00 47.2±0.46s ? ?/sec 1.00 47.3±0.56s ? ?/sec
indexing/Indexing movies in three batches 1.42 12.7±0.13s ? ?/sec 1.00 9.0±0.07s ? ?/sec
indexing/Indexing movies with default settings 1.40 10.2±0.07s ? ?/sec 1.00 7.3±0.06s ? ?/sec
indexing/Indexing nested movies with default settings 1.22 7.8±0.11s ? ?/sec 1.00 6.4±0.13s ? ?/sec
indexing/Indexing nested movies without any facets 1.24 7.3±0.07s ? ?/sec 1.00 5.9±0.06s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.14 47.6±0.67s ? ?/sec 1.00 41.8±0.63s ? ?/sec
indexing/Indexing songs with default settings 1.13 44.1±0.74s ? ?/sec 1.00 38.9±0.76s ? ?/sec
indexing/Indexing songs without any facets 1.19 42.0±0.66s ? ?/sec 1.00 35.2±0.48s ? ?/sec
indexing/Indexing songs without faceted numbers 1.20 44.3±1.40s ? ?/sec 1.00 37.0±0.48s ? ?/sec
indexing/Indexing wiki 1.39 862.9±9.95s ? ?/sec 1.00 622.6±27.11s ? ?/sec
indexing/Indexing wiki in three batches 1.40 934.4±5.97s ? ?/sec 1.00 665.7±4.72s ? ?/sec
indexing/Reindexing geo_point 1.01 15.9±0.39s ? ?/sec 1.00 15.7±0.28s ? ?/sec
indexing/Reindexing movies with default settings 1.15 288.8±25.03ms ? ?/sec 1.00 250.4±2.23ms ? ?/sec
indexing/Reindexing songs with default settings 1.01 4.1±0.06s ? ?/sec 1.00 4.1±0.03s ? ?/sec
indexing/Reindexing wiki 1.41 1484.7±20.59s ? ?/sec 1.00 1052.0±19.89s ? ?/sec
```
Search Wiki:
<details>
<pre>
group search_wiki_main_d94339a8 search_wiki_word-pair-proximity-docids-refactor_2983dd8e
----- ------------------------- --------------------------------------------------------
smol-wiki-articles.csv: basic placeholder/ 1.02 25.8±0.21µs ? ?/sec 1.00 25.4±0.19µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"film" 1.00 441.7±2.57µs ? ?/sec 1.00 442.3±2.41µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"france" 1.00 357.0±2.63µs ? ?/sec 1.00 358.3±2.65µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"japan" 1.00 239.4±2.24µs ? ?/sec 1.00 240.2±1.82µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"machine" 1.00 180.3±2.40µs ? ?/sec 1.00 180.0±1.08µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"miles" "davis" 1.00 9.1±0.03ms ? ?/sec 1.03 9.3±0.04ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"mingus" 1.00 3.6±0.01ms ? ?/sec 1.03 3.7±0.02ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"rock" "and" "roll" 1.00 34.0±0.11ms ? ?/sec 1.03 35.1±0.13ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"spain" 1.00 162.0±0.88µs ? ?/sec 1.00 161.9±0.98µs ? ?/sec
smol-wiki-articles.csv: basic without quote/film 1.01 164.4±1.46µs ? ?/sec 1.00 163.1±1.58µs ? ?/sec
smol-wiki-articles.csv: basic without quote/france 1.00 1698.3±7.37µs ? ?/sec 1.00 1697.7±11.53µs ? ?/sec
smol-wiki-articles.csv: basic without quote/japan 1.00 1154.0±23.61µs ? ?/sec 1.00 1150.7±9.27µs ? ?/sec
smol-wiki-articles.csv: basic without quote/machine 1.00 524.6±3.45µs ? ?/sec 1.01 528.1±4.56µs ? ?/sec
smol-wiki-articles.csv: basic without quote/miles davis 1.00 13.5±0.05ms ? ?/sec 1.02 13.8±0.05ms ? ?/sec
smol-wiki-articles.csv: basic without quote/mingus 1.00 4.1±0.02ms ? ?/sec 1.03 4.2±0.01ms ? ?/sec
smol-wiki-articles.csv: basic without quote/rock and roll 1.00 49.0±0.19ms ? ?/sec 1.03 50.4±0.22ms ? ?/sec
smol-wiki-articles.csv: basic without quote/spain 1.00 412.2±3.35µs ? ?/sec 1.00 412.9±2.81µs ? ?/sec
smol-wiki-articles.csv: prefix search/c 1.00 383.9±2.53µs ? ?/sec 1.00 383.4±2.44µs ? ?/sec
smol-wiki-articles.csv: prefix search/g 1.00 433.4±2.53µs ? ?/sec 1.00 432.8±2.52µs ? ?/sec
smol-wiki-articles.csv: prefix search/j 1.00 424.3±2.05µs ? ?/sec 1.00 424.0±2.15µs ? ?/sec
smol-wiki-articles.csv: prefix search/q 1.00 154.0±1.93µs ? ?/sec 1.00 153.5±1.04µs ? ?/sec
smol-wiki-articles.csv: prefix search/t 1.04 658.5±91.93µs ? ?/sec 1.00 631.4±3.89µs ? ?/sec
smol-wiki-articles.csv: prefix search/x 1.00 446.2±2.09µs ? ?/sec 1.00 445.6±3.13µs ? ?/sec
smol-wiki-articles.csv: proximity/april paris 1.02 3.4±0.39ms ? ?/sec 1.00 3.3±0.01ms ? ?/sec
smol-wiki-articles.csv: proximity/diesel engine 1.00 1022.1±17.52µs ? ?/sec 1.00 1017.7±8.16µs ? ?/sec
smol-wiki-articles.csv: proximity/herald sings 1.01 1872.5±97.70µs ? ?/sec 1.00 1862.2±8.57µs ? ?/sec
smol-wiki-articles.csv: proximity/tea two 1.00 295.2±34.91µs ? ?/sec 1.00 296.6±4.08µs ? ?/sec
smol-wiki-articles.csv: typo/Disnaylande 1.00 3.4±0.51ms ? ?/sec 1.04 3.5±0.01ms ? ?/sec
smol-wiki-articles.csv: typo/aritmetric 1.00 3.6±0.01ms ? ?/sec 1.00 3.7±0.01ms ? ?/sec
smol-wiki-articles.csv: typo/linax 1.00 167.5±1.28µs ? ?/sec 1.00 167.1±2.65µs ? ?/sec
smol-wiki-articles.csv: typo/migrosoft 1.01 217.9±1.84µs ? ?/sec 1.00 216.2±1.61µs ? ?/sec
smol-wiki-articles.csv: typo/nympalidea 1.00 2.9±0.01ms ? ?/sec 1.10 3.1±0.01ms ? ?/sec
smol-wiki-articles.csv: typo/phytogropher 1.00 3.0±0.23ms ? ?/sec 1.08 3.3±0.01ms ? ?/sec
smol-wiki-articles.csv: typo/sisan 1.00 234.6±1.38µs ? ?/sec 1.01 235.8±1.67µs ? ?/sec
smol-wiki-articles.csv: typo/the fronce 1.00 104.4±0.84µs ? ?/sec 1.00 103.9±0.81µs ? ?/sec
smol-wiki-articles.csv: words/Abraham machin 1.02 675.5±4.74µs ? ?/sec 1.00 662.1±5.13µs ? ?/sec
smol-wiki-articles.csv: words/Idaho Bellevue pizza 1.02 1004.5±11.07µs ? ?/sec 1.00 989.5±13.08µs ? ?/sec
smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk 1.00 1650.8±10.92µs ? ?/sec 1.00 1643.2±10.77µs ? ?/sec
smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli 1.00 5.4±0.03ms ? ?/sec 1.00 5.4±0.02ms ? ?/sec
smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo 1.00 32.9±0.10ms ? ?/sec 1.00 32.8±0.10ms ? ?/sec
</pre>
</details>
Search songs:
<details>
<pre>
group search_songs_main_d94339a8 search_songs_word-pair-proximity-docids-refactor_2983dd8e
----- -------------------------- ---------------------------------------------------------
smol-songs.csv: asc + default/Notstandskomitee 1.00 3.0±0.01ms ? ?/sec 1.01 3.0±0.04ms ? ?/sec
smol-songs.csv: asc + default/charles 1.00 2.2±0.01ms ? ?/sec 1.01 2.2±0.01ms ? ?/sec
smol-songs.csv: asc + default/charles mingus 1.00 3.1±0.01ms ? ?/sec 1.01 3.1±0.01ms ? ?/sec
smol-songs.csv: asc + default/david 1.00 2.9±0.01ms ? ?/sec 1.00 2.9±0.01ms ? ?/sec
smol-songs.csv: asc + default/david bowie 1.00 4.5±0.02ms ? ?/sec 1.00 4.5±0.02ms ? ?/sec
smol-songs.csv: asc + default/john 1.00 3.1±0.01ms ? ?/sec 1.01 3.2±0.01ms ? ?/sec
smol-songs.csv: asc + default/marcus miller 1.00 5.0±0.02ms ? ?/sec 1.00 5.0±0.02ms ? ?/sec
smol-songs.csv: asc + default/michael jackson 1.00 4.7±0.02ms ? ?/sec 1.00 4.7±0.02ms ? ?/sec
smol-songs.csv: asc + default/tamo 1.00 1463.4±12.17µs ? ?/sec 1.01 1481.5±8.83µs ? ?/sec
smol-songs.csv: asc + default/thelonious monk 1.00 4.4±0.01ms ? ?/sec 1.00 4.4±0.02ms ? ?/sec
smol-songs.csv: asc/Notstandskomitee 1.01 2.6±0.01ms ? ?/sec 1.00 2.6±0.01ms ? ?/sec
smol-songs.csv: asc/charles 1.00 473.6±3.70µs ? ?/sec 1.01 476.8±22.17µs ? ?/sec
smol-songs.csv: asc/charles mingus 1.01 780.1±3.90µs ? ?/sec 1.00 773.6±4.60µs ? ?/sec
smol-songs.csv: asc/david 1.00 757.6±4.50µs ? ?/sec 1.00 760.7±5.20µs ? ?/sec
smol-songs.csv: asc/david bowie 1.00 1131.2±8.68µs ? ?/sec 1.00 1130.7±8.36µs ? ?/sec
smol-songs.csv: asc/john 1.00 668.9±6.48µs ? ?/sec 1.00 669.9±2.78µs ? ?/sec
smol-songs.csv: asc/marcus miller 1.00 959.8±7.10µs ? ?/sec 1.00 958.9±4.72µs ? ?/sec
smol-songs.csv: asc/michael jackson 1.01 1076.7±16.73µs ? ?/sec 1.00 1070.8±7.34µs ? ?/sec
smol-songs.csv: asc/tamo 1.00 70.4±0.55µs ? ?/sec 1.00 70.5±0.51µs ? ?/sec
smol-songs.csv: asc/thelonious monk 1.01 2.9±0.01ms ? ?/sec 1.00 2.9±0.01ms ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee 1.00 162.0±0.91µs ? ?/sec 1.01 163.6±1.72µs ? ?/sec
smol-songs.csv: basic filter: <=/charles 1.00 38.3±0.24µs ? ?/sec 1.01 38.7±0.31µs ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus 1.01 85.3±0.44µs ? ?/sec 1.00 84.6±0.47µs ? ?/sec
smol-songs.csv: basic filter: <=/david 1.01 32.4±0.25µs ? ?/sec 1.00 32.1±0.24µs ? ?/sec
smol-songs.csv: basic filter: <=/david bowie 1.00 68.6±0.99µs ? ?/sec 1.01 68.9±0.88µs ? ?/sec
smol-songs.csv: basic filter: <=/john 1.04 26.1±0.37µs ? ?/sec 1.00 25.1±0.22µs ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller 1.00 76.7±0.39µs ? ?/sec 1.01 77.3±0.61µs ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson 1.00 95.5±0.66µs ? ?/sec 1.01 96.3±0.79µs ? ?/sec
smol-songs.csv: basic filter: <=/tamo 1.03 26.2±0.36µs ? ?/sec 1.00 25.3±0.23µs ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk 1.00 140.7±1.36µs ? ?/sec 1.01 142.7±0.88µs ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee 1.00 165.4±1.25µs ? ?/sec 1.00 165.7±1.72µs ? ?/sec
smol-songs.csv: basic filter: TO/charles 1.01 40.6±0.57µs ? ?/sec 1.00 40.1±0.54µs ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus 1.01 87.1±0.80µs ? ?/sec 1.00 86.3±0.61µs ? ?/sec
smol-songs.csv: basic filter: TO/david 1.02 34.5±0.26µs ? ?/sec 1.00 33.7±0.24µs ? ?/sec
smol-songs.csv: basic filter: TO/david bowie 1.00 70.6±0.38µs ? ?/sec 1.00 70.6±0.68µs ? ?/sec
smol-songs.csv: basic filter: TO/john 1.02 27.5±0.77µs ? ?/sec 1.00 26.9±0.21µs ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller 1.01 79.8±0.76µs ? ?/sec 1.00 79.3±1.27µs ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson 1.00 98.3±0.54µs ? ?/sec 1.00 98.0±0.88µs ? ?/sec
smol-songs.csv: basic filter: TO/tamo 1.03 27.9±0.23µs ? ?/sec 1.00 27.1±0.32µs ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk 1.00 142.5±1.36µs ? ?/sec 1.02 145.2±0.98µs ? ?/sec
smol-songs.csv: basic placeholder/ 1.00 49.4±0.34µs ? ?/sec 1.00 49.3±0.45µs ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee" 1.00 190.5±1.60µs ? ?/sec 1.01 191.8±2.10µs ? ?/sec
smol-songs.csv: basic with quote/"charles" 1.00 165.0±1.13µs ? ?/sec 1.01 166.0±1.39µs ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus" 1.00 1149.4±15.78µs ? ?/sec 1.02 1171.1±9.95µs ? ?/sec
smol-songs.csv: basic with quote/"david" 1.00 236.5±1.61µs ? ?/sec 1.00 236.9±1.73µs ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie" 1.00 1384.8±9.02µs ? ?/sec 1.01 1393.8±11.39µs ? ?/sec
smol-songs.csv: basic with quote/"john" 1.00 358.3±4.85µs ? ?/sec 1.00 358.9±1.75µs ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller" 1.00 281.4±1.79µs ? ?/sec 1.01 285.6±3.24µs ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson" 1.00 1328.4±8.01µs ? ?/sec 1.00 1334.6±8.00µs ? ?/sec
smol-songs.csv: basic with quote/"tamo" 1.00 528.7±3.72µs ? ?/sec 1.01 533.4±5.31µs ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk" 1.00 1223.0±7.24µs ? ?/sec 1.02 1245.7±12.04µs ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee 1.00 2.8±0.01ms ? ?/sec 1.00 2.8±0.01ms ? ?/sec
smol-songs.csv: basic without quote/charles 1.00 273.3±2.06µs ? ?/sec 1.01 275.9±1.76µs ? ?/sec
smol-songs.csv: basic without quote/charles mingus 1.00 2.3±0.01ms ? ?/sec 1.02 2.4±0.01ms ? ?/sec
smol-songs.csv: basic without quote/david 1.00 434.3±3.86µs ? ?/sec 1.01 436.7±2.47µs ? ?/sec
smol-songs.csv: basic without quote/david bowie 1.00 5.6±0.02ms ? ?/sec 1.01 5.7±0.02ms ? ?/sec
smol-songs.csv: basic without quote/john 1.00 1322.5±9.98µs ? ?/sec 1.00 1321.2±17.40µs ? ?/sec
smol-songs.csv: basic without quote/marcus miller 1.02 2.4±0.02ms ? ?/sec 1.00 2.4±0.01ms ? ?/sec
smol-songs.csv: basic without quote/michael jackson 1.00 3.8±0.02ms ? ?/sec 1.01 3.9±0.01ms ? ?/sec
smol-songs.csv: basic without quote/tamo 1.00 809.0±4.01µs ? ?/sec 1.01 819.0±6.22µs ? ?/sec
smol-songs.csv: basic without quote/thelonious monk 1.00 3.8±0.02ms ? ?/sec 1.02 3.9±0.02ms ? ?/sec
smol-songs.csv: big filter/Notstandskomitee 1.00 2.7±0.01ms ? ?/sec 1.01 2.8±0.01ms ? ?/sec
smol-songs.csv: big filter/charles 1.00 266.5±1.34µs ? ?/sec 1.01 270.1±8.17µs ? ?/sec
smol-songs.csv: big filter/charles mingus 1.00 651.0±5.40µs ? ?/sec 1.00 651.0±2.73µs ? ?/sec
smol-songs.csv: big filter/david 1.00 1018.1±11.16µs ? ?/sec 1.00 1022.3±8.94µs ? ?/sec
smol-songs.csv: big filter/david bowie 1.00 1912.2±11.13µs ? ?/sec 1.00 1919.8±8.30µs ? ?/sec
smol-songs.csv: big filter/john 1.00 867.2±6.66µs ? ?/sec 1.01 873.3±3.44µs ? ?/sec
smol-songs.csv: big filter/marcus miller 1.00 717.7±2.86µs ? ?/sec 1.01 721.5±3.89µs ? ?/sec
smol-songs.csv: big filter/michael jackson 1.00 1668.4±16.76µs ? ?/sec 1.00 1667.9±10.11µs ? ?/sec
smol-songs.csv: big filter/tamo 1.01 136.7±0.88µs ? ?/sec 1.00 135.5±1.22µs ? ?/sec
smol-songs.csv: big filter/thelonious monk 1.03 3.1±0.02ms ? ?/sec 1.00 3.0±0.01ms ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee 1.00 3.0±0.01ms ? ?/sec 1.00 3.0±0.01ms ? ?/sec
smol-songs.csv: desc + default/charles 1.00 1599.5±13.07µs ? ?/sec 1.01 1622.9±22.43µs ? ?/sec
smol-songs.csv: desc + default/charles mingus 1.00 2.3±0.01ms ? ?/sec 1.01 2.4±0.03ms ? ?/sec
smol-songs.csv: desc + default/david 1.00 5.7±0.02ms ? ?/sec 1.00 5.7±0.02ms ? ?/sec
smol-songs.csv: desc + default/david bowie 1.00 9.0±0.04ms ? ?/sec 1.00 9.0±0.03ms ? ?/sec
smol-songs.csv: desc + default/john 1.00 4.5±0.01ms ? ?/sec 1.00 4.5±0.02ms ? ?/sec
smol-songs.csv: desc + default/marcus miller 1.00 3.9±0.01ms ? ?/sec 1.00 3.9±0.02ms ? ?/sec
smol-songs.csv: desc + default/michael jackson 1.00 6.6±0.03ms ? ?/sec 1.00 6.6±0.03ms ? ?/sec
smol-songs.csv: desc + default/tamo 1.00 1472.4±10.38µs ? ?/sec 1.01 1484.2±8.07µs ? ?/sec
smol-songs.csv: desc + default/thelonious monk 1.00 4.4±0.02ms ? ?/sec 1.00 4.4±0.05ms ? ?/sec
smol-songs.csv: desc/Notstandskomitee 1.01 2.6±0.01ms ? ?/sec 1.00 2.6±0.01ms ? ?/sec
smol-songs.csv: desc/charles 1.00 475.9±3.38µs ? ?/sec 1.00 475.9±2.64µs ? ?/sec
smol-songs.csv: desc/charles mingus 1.00 775.3±4.30µs ? ?/sec 1.00 778.9±3.52µs ? ?/sec
smol-songs.csv: desc/david 1.00 757.9±4.10µs ? ?/sec 1.01 763.4±3.27µs ? ?/sec
smol-songs.csv: desc/david bowie 1.00 1129.0±11.87µs ? ?/sec 1.01 1135.1±8.86µs ? ?/sec
smol-songs.csv: desc/john 1.00 670.2±4.38µs ? ?/sec 1.00 670.2±3.46µs ? ?/sec
smol-songs.csv: desc/marcus miller 1.00 961.2±4.47µs ? ?/sec 1.00 961.9±4.03µs ? ?/sec
smol-songs.csv: desc/michael jackson 1.00 1076.5±6.61µs ? ?/sec 1.00 1077.9±7.11µs ? ?/sec
smol-songs.csv: desc/tamo 1.00 70.6±0.57µs ? ?/sec 1.01 71.3±0.48µs ? ?/sec
smol-songs.csv: desc/thelonious monk 1.01 2.9±0.01ms ? ?/sec 1.00 2.9±0.01ms ? ?/sec
smol-songs.csv: prefix search/a 1.00 1236.2±9.43µs ? ?/sec 1.00 1232.0±12.07µs ? ?/sec
smol-songs.csv: prefix search/b 1.00 1090.8±9.89µs ? ?/sec 1.00 1090.8±9.43µs ? ?/sec
smol-songs.csv: prefix search/i 1.00 1333.9±8.28µs ? ?/sec 1.00 1334.2±11.21µs ? ?/sec
smol-songs.csv: prefix search/s 1.00 810.5±3.69µs ? ?/sec 1.00 806.6±3.50µs ? ?/sec
smol-songs.csv: prefix search/x 1.00 290.5±1.88µs ? ?/sec 1.00 291.0±1.85µs ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie 1.00 4.7±0.02ms ? ?/sec 1.00 4.7±0.02ms ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus 1.01 5.6±0.02ms ? ?/sec 1.00 5.6±0.03ms ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights 1.00 2.5±0.01ms ? ?/sec 1.00 2.5±0.01ms ? ?/sec
smol-songs.csv: proximity/black saint sinner lady 1.00 4.8±0.02ms ? ?/sec 1.00 4.8±0.02ms ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960 1.00 3.2±0.01ms ? ?/sec 1.01 3.2±0.01ms ? ?/sec
smol-songs.csv: typo/Arethla Franklin 1.00 388.7±5.16µs ? ?/sec 1.00 390.0±2.11µs ? ?/sec
smol-songs.csv: typo/Disnaylande 1.01 2.6±0.01ms ? ?/sec 1.00 2.6±0.01ms ? ?/sec
smol-songs.csv: typo/dire straights 1.00 125.9±1.22µs ? ?/sec 1.00 126.0±0.71µs ? ?/sec
smol-songs.csv: typo/fear of the duck 1.00 373.7±4.25µs ? ?/sec 1.01 375.7±14.17µs ? ?/sec
smol-songs.csv: typo/indochie 1.00 103.6±0.94µs ? ?/sec 1.00 103.4±0.74µs ? ?/sec
smol-songs.csv: typo/indochien 1.00 155.6±1.14µs ? ?/sec 1.01 157.5±1.75µs ? ?/sec
smol-songs.csv: typo/klub des loopers 1.00 160.6±2.98µs ? ?/sec 1.01 161.7±1.96µs ? ?/sec
smol-songs.csv: typo/michel depech 1.00 79.4±0.54µs ? ?/sec 1.01 79.9±0.60µs ? ?/sec
smol-songs.csv: typo/mongus 1.00 126.7±1.85µs ? ?/sec 1.00 126.1±0.74µs ? ?/sec
smol-songs.csv: typo/stromal 1.01 132.9±0.99µs ? ?/sec 1.00 131.9±1.09µs ? ?/sec
smol-songs.csv: typo/the white striper 1.00 287.8±2.88µs ? ?/sec 1.00 286.5±1.91µs ? ?/sec
smol-songs.csv: typo/thelonius monk 1.00 304.2±1.49µs ? ?/sec 1.01 306.5±1.50µs ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots 1.01 20.9±0.08ms ? ?/sec 1.00 20.7±0.07ms ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title 1.00 48.9±0.13ms ? ?/sec 1.00 48.9±0.11ms ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song 1.01 13.9±0.06ms ? ?/sec 1.00 13.8±0.07ms ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793 1.01 3.7±0.01ms ? ?/sec 1.00 3.6±0.02ms ? ?/sec
smol-songs.csv: words/seven nation mummy 1.00 1054.2±14.49µs ? ?/sec 1.00 1056.6±10.53µs ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo 1.00 58.2±0.29ms ? ?/sec 1.00 57.9±0.21ms ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one 1.00 66.1±0.21ms ? ?/sec 1.00 66.0±0.24ms ? ?/sec
</code>
</details>
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2022-10-25 10:42:04 +00:00
Loïc Lecrenier
9a569d73d1
Minor code style change
2022-10-24 15:30:43 +02:00
Loïc Lecrenier
be302fd250
Remove outdated workaround for duplicate words in phrase search
2022-10-24 15:27:06 +02:00
Loïc Lecrenier
d76d0cb1bf
Merge branch 'main' into word-pair-proximity-docids-refactor
2022-10-24 15:23:00 +02:00
bors[bot]
2bf867982a
Merge #667
...
667: Update version for the next release (v0.34.0) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2022-10-24 10:19:04 +00:00
curquiza
f3874d58b9
Update version for the next release (v0.34.0) in Cargo.toml files
2022-10-24 10:13:25 +00:00
Loïc Lecrenier
a983129613
Apply suggestions from code review
2022-10-20 09:49:37 +02:00
bors[bot]
f11a4087da
Merge #665
...
665: Fixing piles of clippy errors. r=ManyTheFish a=ehiggs
## Related issue
No issue fixed. Simply cleaning up some code for clippy on the march towards a clean build when #659 is merged.
## What does this PR do?
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to name fresh variables.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
2022-10-20 07:19:46 +00:00
Loïc Lecrenier
176ffd23f5
Fix compile error after rebasing wppd-refactor
2022-10-18 10:40:26 +02:00
Loïc Lecrenier
ab2f6f3aa4
Refine some details in word_prefix_pair_proximity indexing code
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
e6e76fbefe
Improve performance of resolve_phrase at the cost of some relevancy
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
178d00f93a
Cargo fmt
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
830a7c0c7a
Use resolve_phrase
function for exactness criteria as well
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
18d578dfc4
Adjust some algorithms using DBs of word pair proximities
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
072b576514
Fix proximity value in keys of prefix_word_pair_proximity_docids
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
6c3a5d69e1
Update snapshots
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
a7de4f5b85
Don't add swapped word pairs to the word_pair_proximity_docids db
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
264a04922d
Add prefix_word_pair_proximity database
...
Similar to the word_prefix_pair_proximity one but instead the keys are:
(proximity, prefix, word2)
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
1dbbd8694f
Rename StrStrU8Codec to U8StrStrCodec and reorder its fields
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
bdeb47305e
Change encoding of word_pair_proximity DB to (proximity, word1, word2)
...
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
bors[bot]
19b2326f3d
Merge #586
...
586: Add settings to force milli to exhaustively compute the total number of hits r=Kerollmops a=ManyTheFish
Add a new setting `exhaustive_number_hits` to `Search` forcing the `Initial` criterion to exhaustively compute the bucket_candidates allowing the end users to implement finite pagination.
related to https://github.com/meilisearch/meilisearch/pull/2601
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2022-10-17 16:24:35 +00:00
Many the fish
81919a35a2
Update milli/src/search/criteria/initial.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:20 +02:00
Many the fish
516e838eb4
Update milli/src/search/criteria/initial.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:15 +02:00
ManyTheFish
6f55e7844c
Add some code comments
2022-10-17 14:41:57 +02:00
ManyTheFish
cf203b7fde
Take filter in account when computing the pages candidates
2022-10-17 14:13:44 +02:00
ManyTheFish
d71bc1e69f
Compute an exact count when using distinct
2022-10-17 14:13:44 +02:00
ManyTheFish
a396806343
Add settings to force milli to exhaustively compute the total number of hits
2022-10-17 14:13:44 +02:00
bors[bot]
fad0de4581
Merge #655
...
655: Upgrade all dependencies r=Kerollmops a=loiclec
Upgrade all dependencies to their latest versions.
Partly fixes https://github.com/meilisearch/meilisearch/issues/2822
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-10-17 11:19:46 +00:00
Loïc Lecrenier
c2ca259f48
Update cli to latest indicatif
crate version
2022-10-17 13:05:56 +02:00
Loïc Lecrenier
4c481a8947
Upgrade all dependencies
2022-10-17 13:05:56 +02:00
Ewan Higgs
beb987d3d1
Fixing piles of clippy errors.
...
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
bors[bot]
95e45e1c2c
Merge #663
...
663: Fix CONTRIBUTING.md step to make the project work r=Kerollmops a=curquiza
Following this discussion: https://github.com/meilisearch/milli/issues/76#issuecomment-1277459125
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2022-10-13 11:47:34 +00:00
Clémentine Urquizar - curqui
59fe1e8efa
Update CONTRIBUTING.md
2022-10-13 13:46:18 +02:00
bors[bot]
f30979d021
Merge #662
...
662: Enhance word splitting strategy r=ManyTheFish a=akki1306
# Pull Request
## Related issue
Fixes #648
## What does this PR do?
- [split_best_frequency](55d889522b/milli/src/search/query_tree.rs (L282-L301)
) to use frequency of word pairs near together with proximity value of 1 instead of considering the frequency of individual words. Word pairs having max frequency are considered.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Akshay Kulkarni <akshayk.gj@gmail.com>
2022-10-13 08:14:22 +00:00
Akshay Kulkarni
85f3028317
remove underscore and introduce back word_documents_count
2022-10-13 13:21:59 +05:30
Akshay Kulkarni
8195fc6141
revert removal of word_documents_count method
2022-10-13 13:14:27 +05:30
Akshay Kulkarni
32f825d442
move default implementation of word_pair_frequency to TestContext
2022-10-13 12:57:50 +05:30
Akshay Kulkarni
ff8b2d4422
formatting
2022-10-13 12:44:08 +05:30
Akshay Kulkarni
6cb8b46900
use word_pair_frequency and remove word_documents_count
2022-10-13 12:43:11 +05:30
Akshay Kulkarni
8c9245149e
format file
2022-10-12 15:27:56 +05:30
bors[bot]
2000f7958d
Merge #604
...
604: Speed up debug builds r=Kerollmops a=loiclec
Note: this draft PR is based on https://github.com/meilisearch/milli/pull/601 , for no particular reason.
## What does this PR do?
Make a series of changes with the goal of speeding up debug builds:
1. Add an `all_languages` feature which compiles charabia with its `default` features activated.
The `all_languages` feature is activated by default. But running:
```
cargo build --no-default-features
```
on `milli` is now much faster.
2. Reduce the debug optimisation level from 3 to 0, except for a few critical dependencies.
3. Compile the build dependencies quicker as well. Previously, all build dependencies were compiled with `opt-level = 3`. Now, only the critical build dependencies are compiled with optimisations.
4. Reduce the amount of code generated by the `documents!` macro
5. Make the "progress update" closure provided to indexing functions a trait object instead of a generic parameter. This avoids monomorphising the indexing code multiple times needlessly.
## Results
Initial build times on my computer before and after these changes:
| | cargo check | cargo check --no-default-features | cargo test | cargo test --lib | cargo test --no-default-features | cargo test --lib --no-default-features |
|--------|-------------|-----------------------------------|------------|------------------|----------------------------------|----------------------------------------|
| before | 1m05s | 1m05s | 2m06s | 1m47s | 2m06 | 1m47s |
| after | 28.9s | 13.1s | 40s | 38s | 23s | 21s |
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-10-12 08:54:48 +00:00
Akshay Kulkarni
63e79a9039
update comment
2022-10-12 13:36:48 +05:30