Alexey Shekhirin
f8d0f5265f
fix(update): fields distribution after documents merge
2021-05-04 22:12:20 +03:00
tamo
d61566787e
provide an iterator over all the documents in a milli index
2021-05-04 11:23:51 +02:00
Clémentine Urquizar
a8680887d8
Upgrade Milli version (v0.2.0)
2021-05-03 14:50:47 +02:00
Clémentine Urquizar
34e02aba42
Upgrade Tokenizer version (v0.2.2)
2021-05-03 10:55:55 +02:00
Alexey Shekhirin
d81c0e8bba
feat(update): disable autogenerate_docids by default
2021-04-30 21:41:34 +03:00
Marin Postma
e8e32e0ba1
make document addition number visible
2021-04-29 20:05:07 +02:00
many
ee09e50e7f
Remove excluded document in criteria iterations
...
- pass excluded document to criteria to remove them in higher levels of the bucket-sort
- merge already returned document with excluded documents to avoid duplicas
Related to #125 and #112
Fix #170
2021-04-29 12:09:38 +02:00
many
31607bf9cd
Add a threshold on proximity when choosing between linear/set algorithm
2021-04-28 14:57:22 +02:00
many
3b7e6afb55
Make some refacto and add documentation
2021-04-28 13:53:27 +02:00
Many
0add4d735c
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:40:34 +02:00
Many
3794ffc952
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:23 +02:00
Many
329bd4a1bb
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:39:03 +02:00
Many
3b1358b62f
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:19 +02:00
Many
c862b1bc6b
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:32:10 +02:00
Many
e92d137676
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:31:42 +02:00
Many
b3d6c6a9a0
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:31:13 +02:00
Many
498c2b298c
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:30:02 +02:00
Many
0e4e6dfada
Update milli/src/search/criteria/proximity.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 17:29:52 +02:00
Many
47d780b8ce
Update milli/src/search/criteria/mod.rs
...
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-04-27 14:39:53 +02:00
Many
0daa0e170a
Fix PR comments
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-04-27 14:39:53 +02:00
many
0d7d3ce802
Update roaring package
2021-04-27 14:39:53 +02:00
many
71740805a7
Fix forgotten typo tests
2021-04-27 14:39:53 +02:00
many
e77291a6f3
Optimize Atrribute criterion on big requests
2021-04-27 14:39:53 +02:00
many
716c8e22b0
Add style and comments
2021-04-27 14:39:52 +02:00
many
f853790016
Use the LCM of 10 first numbers to compute attribute rank
2021-04-27 14:39:52 +02:00
many
2b036449be
Fix the return of equal candidates in different pages
2021-04-27 14:39:52 +02:00
many
0efa011e09
Make a small code clean-up
2021-04-27 14:39:52 +02:00
many
17c8c6f945
Make set algorithm return None when nothing can be returned
2021-04-27 14:39:52 +02:00
many
b3e2280bb9
Debug attribute criterion
...
* debug folding when initializing iterators
2021-04-27 14:39:52 +02:00
many
1eee0029a8
Make attribute criterion typo/prefix tolerant
2021-04-27 14:39:52 +02:00
many
59f58c15f7
Implement attribute criterion
...
* Implement WordLevelIterator
* Implement QueryLevelIterator
* Implement set algorithm based on iterators
Not tested + Some TODO to fix
2021-04-27 14:39:52 +02:00
Clément Renault
361193099f
Reduce the amount of branches when query tree flattened
2021-04-27 14:39:52 +02:00
Kerollmops
e65bad16cc
Compute the words prefixes at the end of an update
2021-04-27 14:39:52 +02:00
many
ab92c814c3
Fix attributes score
2021-04-27 14:35:43 +02:00
Clément Renault
0ad9499b93
Fix an indexing bug in the words level positions
2021-04-27 14:35:43 +02:00
Clément Renault
7aa5753ed2
Make the attribute positions range bounds to be fixed
2021-04-27 14:35:43 +02:00
Clément Renault
658f316511
Introduce the Initial Criterion
2021-04-27 14:35:43 +02:00
Kerollmops
89ee2cf576
Introduce the TreeLevel struct
2021-04-27 14:25:35 +02:00
Kerollmops
bd1a371c62
Compute the WordsLevelPositions only once
2021-04-27 14:25:34 +02:00
Kerollmops
8bd4f5d93e
Compute the biggest values of the words_level_positions_docids
2021-04-27 14:25:34 +02:00
Kerollmops
f713828406
Implement the clear and delete documents for the word-level-positions database
2021-04-27 14:25:34 +02:00
Kerollmops
3069bf4f4a
Fix and improve the words-level-positions computation
2021-04-27 14:25:34 +02:00
Kerollmops
3a25137ee4
Expose and use the WordsLevelPositions update
2021-04-27 14:25:34 +02:00
Kerollmops
c765f277a3
Introduce the WordsLevelPositions update
2021-04-27 14:25:34 +02:00
Kerollmops
9242f2f1d4
Store the first word positions levels
2021-04-27 14:25:34 +02:00
Kerollmops
b0a417f342
Introduce the word_level_position_docids Index database
2021-04-27 14:25:34 +02:00
many
75e7b1e3da
Implement test Context methods
2021-04-27 14:25:34 +02:00
many
4ff67ec2ee
Implement attribute criterion for small amounts of candidates
2021-04-27 14:25:34 +02:00
Kerollmops
0f4c0beffd
Introduce the Attribute criterion
2021-04-27 14:25:34 +02:00
tamo
f8dee1b402
[makes clippy happy] search/criteria/proximity.rs
2021-04-21 12:36:45 +02:00
Alexey Shekhirin
6fa00c61d2
feat(search): support words_limit
2021-04-20 12:22:04 +03:00
Kerollmops
c9b2d3ae1a
Warn instead of returning an error when a conversion fails
2021-04-20 10:23:31 +02:00
Kerollmops
2aeef09316
Remove debug logs while iterating through the facet levels
2021-04-20 10:23:31 +02:00
Kerollmops
51767725b2
Simplify integer and float functions trait bounds
2021-04-20 10:23:31 +02:00
Kerollmops
efbfa81fa7
Merge the Float and Integer enum variant into the Number one
2021-04-20 10:23:30 +02:00
Clémentine Urquizar
127d3d028e
Update version for the next release (v0.1.1)
2021-04-19 14:48:13 +02:00
Alexey Shekhirin
33860bc3b7
test(update, settings): set & reset synonyms
...
fixes after review
more fixes after review
2021-04-18 11:24:17 +03:00
Alexey Shekhirin
e39aabbfe6
feat(search, update): synonyms
2021-04-18 11:24:17 +03:00
Marin Postma
9c4660d3d6
add tests
2021-04-15 16:25:56 +02:00
Marin Postma
75464a1baa
review fixes
2021-04-15 16:25:56 +02:00
Marin Postma
2f73fa55ae
add documentation
2021-04-15 16:25:55 +02:00
Marin Postma
45c45e11dd
implement distinct attribute
...
distinct can return error
facet distinct on numbers
return distinct error
review fixes
make get_facet_value more generic
fixes
2021-04-15 16:25:55 +02:00
Clémentine Urquizar
2c5c79d68e
Update Tokenizer version to v0.2.1
2021-04-14 18:54:04 +02:00
tamo
dcb00b2e54
test a new implementation of the stop_words
2021-04-12 18:35:33 +02:00
tamo
da036dcc3e
Revert "Integrate the stop_words in the querytree"
...
This reverts commit 12fb509d8470e6d0c3a424756c9838a1efe306d2.
We revert this commit because it's causing the bug #150 .
The initial algorithm we implemented for the stop_words was:
1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
**always** looking for something starting with “t he” or “th e”
instead of ignoring the word completely.
For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.
For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.
One fix would be to update our query tree to something like that:
---------------------
OR
OR
TOLERANT hobbit # the first option is to ignore the stop_word
AND
CONSECUTIVE # the second option is to do as we are doing
EXACT t # currently
EXACT he
TOLERANT hobbit
---------------------
This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.
For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
2021-04-12 18:35:33 +02:00
Alexey Shekhirin
84c1dda39d
test(http): setting enum serialize/deserialize
2021-04-08 17:03:40 +03:00
Alexey Shekhirin
dc636d190d
refactor(http, update): introduce setting enum
2021-04-08 17:03:40 +03:00
tamo
0a4bde1f2f
update the default ordering of the criterion
2021-04-01 19:45:31 +02:00
Alexey Shekhirin
2658c5c545
feat(index): update fields distribution in clear & delete operations
...
fixes after review
bump the version of the tokenizer
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
more fixes after review
2021-04-01 19:12:35 +03:00
Alexey Shekhirin
27c7ab6e00
feat(index): store fields distribution in index
2021-04-01 18:35:19 +03:00
tamo
12fb509d84
Integrate the stop_words in the querytree
...
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
...
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
tamo
62a8f1d707
bump the version of the tokenizer
2021-04-01 13:49:22 +02:00
Alexey Shekhirin
9205b640a4
feat(index): introduce fields_ids_distribution
2021-03-31 18:44:47 +03:00
Alexey Shekhirin
2cb32edaa9
fix(criterion): compile asc/desc regex only once
...
use once_cell instead of lazy_static
reorder imports
2021-03-30 16:07:14 +03:00
Alexey Shekhirin
1e3f05db8f
use fixed number of candidates as a threshold
2021-03-30 11:57:10 +03:00
Alexey Shekhirin
a776ec9718
fix division
2021-03-29 19:16:58 +03:00
Alexey Shekhirin
522e79f2e0
feat(search, criteria): introduce a percentage threshold to the asc/desc
2021-03-29 19:08:31 +03:00
tamo
73dcdb27f6
select a specific release of the tokenizer instead of using the latests git commit
2021-03-25 15:00:18 +01:00
mpostma
9c27183876
fix broken offset
2021-03-15 20:23:50 +01:00
mpostma
f0210453a6
add updated at on put primary key
2021-03-15 14:05:48 +01:00
mpostma
615fe095e1
update index updated at on index writes
2021-03-15 14:05:47 +01:00
mpostma
80d0f9c49d
methods to update index time metadata
2021-03-15 14:05:47 +01:00
Kerollmops
d48008339e
Introduce two new optional_words and authorize_typos Search options
2021-03-10 11:16:30 +01:00
Kerollmops
54b97ed8e1
Update the fetcher comments
2021-03-10 10:56:26 +01:00
Kerollmops
d301859bbd
Introduce a special word_derivations function for Proximity
2021-03-10 10:42:53 +01:00
Kerollmops
facfb4b615
Fix the bucket candidates
2021-03-10 10:42:53 +01:00
Kerollmops
42fd7dea78
Remove the useless typo cache
2021-03-10 10:42:53 +01:00
many
62a70c300d
Optimize words criterion
2021-03-10 10:42:53 +01:00
Kerollmops
f51eb46c69
Use the RoaringBitmapLenCodec to retrieve the count of documents
2021-03-09 10:25:39 +01:00
Kerollmops
d781a6164a
Rewrite some code with idiomatic Rust
2021-03-08 16:27:52 +01:00
Clément Renault
b18ec00a7a
Add a logging_timer macro to te criterion next methods
2021-03-08 16:12:06 +01:00
Kerollmops
82a0f678fb
Introduce a cache on the docid_word_positions database method
2021-03-08 16:12:03 +01:00
Clément Renault
5fcaedb880
Introduce a WordDerivationsCache struct
2021-03-08 16:00:53 +01:00
many
2606c92ef9
use plain sweep in proximity criterion
2021-03-08 15:58:39 +01:00
many
ae47bb3594
Introduce plane_sweep function in proximity criterion
2021-03-08 15:58:38 +01:00
Kerollmops
636a9df177
Temporarily fix the tinytemplate doc hidden issue
2021-03-08 15:57:45 +01:00
Clément Renault
3c76b3548d
Rework the Asc/Desc criteria to be facet iterator based
2021-03-08 13:32:25 +01:00
Clément Renault
a58d2b6137
Print the Asc/Desc criterion field name in the debug prints
2021-03-08 13:32:25 +01:00
mpostma
e3095be85c
Remove Debug use in Display impl
2021-03-08 12:09:09 +01:00