458 Commits

Author SHA1 Message Date
Alexey Shekhirin
2658c5c545
feat(index): update fields distribution in clear & delete operations
fixes after review

bump the version of the tokenizer

implement a first version of the stop_words

The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests

Integrate the stop_words in the querytree

remove the stop_words from the querytree except if it was a prefix or a typo

more fixes after review
2021-04-01 19:12:35 +03:00
Alexey Shekhirin
27c7ab6e00
feat(index): store fields distribution in index 2021-04-01 18:35:19 +03:00
tamo
12fb509d84
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
Alexey Shekhirin
9205b640a4 feat(index): introduce fields_ids_distribution 2021-03-31 18:44:47 +03:00
Alexey Shekhirin
2cb32edaa9 fix(criterion): compile asc/desc regex only once
use once_cell instead of lazy_static

reorder imports
2021-03-30 16:07:14 +03:00
Alexey Shekhirin
1e3f05db8f use fixed number of candidates as a threshold 2021-03-30 11:57:10 +03:00
Alexey Shekhirin
a776ec9718 fix division 2021-03-29 19:16:58 +03:00
Alexey Shekhirin
522e79f2e0 feat(search, criteria): introduce a percentage threshold to the asc/desc 2021-03-29 19:08:31 +03:00
mpostma
9c27183876
fix broken offset 2021-03-15 20:23:50 +01:00
mpostma
f0210453a6
add updated at on put primary key 2021-03-15 14:05:48 +01:00
mpostma
615fe095e1
update index updated at on index writes 2021-03-15 14:05:47 +01:00
mpostma
80d0f9c49d
methods to update index time metadata 2021-03-15 14:05:47 +01:00
Kerollmops
d48008339e
Introduce two new optional_words and authorize_typos Search options 2021-03-10 11:16:30 +01:00
Kerollmops
54b97ed8e1
Update the fetcher comments 2021-03-10 10:56:26 +01:00
Kerollmops
d301859bbd
Introduce a special word_derivations function for Proximity 2021-03-10 10:42:53 +01:00
Kerollmops
facfb4b615
Fix the bucket candidates 2021-03-10 10:42:53 +01:00
Kerollmops
42fd7dea78
Remove the useless typo cache 2021-03-10 10:42:53 +01:00
many
62a70c300d
Optimize words criterion 2021-03-10 10:42:53 +01:00
Kerollmops
f51eb46c69
Use the RoaringBitmapLenCodec to retrieve the count of documents 2021-03-09 10:25:39 +01:00
Kerollmops
d781a6164a
Rewrite some code with idiomatic Rust 2021-03-08 16:27:52 +01:00
Clément Renault
b18ec00a7a
Add a logging_timer macro to te criterion next methods 2021-03-08 16:12:06 +01:00
Kerollmops
82a0f678fb
Introduce a cache on the docid_word_positions database method 2021-03-08 16:12:03 +01:00
Clément Renault
5fcaedb880
Introduce a WordDerivationsCache struct 2021-03-08 16:00:53 +01:00
many
2606c92ef9
use plain sweep in proximity criterion 2021-03-08 15:58:39 +01:00
many
ae47bb3594
Introduce plane_sweep function in proximity criterion 2021-03-08 15:58:38 +01:00
Clément Renault
3c76b3548d
Rework the Asc/Desc criteria to be facet iterator based 2021-03-08 13:32:25 +01:00
Clément Renault
a58d2b6137
Print the Asc/Desc criterion field name in the debug prints 2021-03-08 13:32:25 +01:00
mpostma
e3095be85c
Remove Debug use in Display impl 2021-03-08 12:09:09 +01:00
mpostma
9e1eb25232
implement display for criterion
Update milli/src/criterion.rs

Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-03-08 11:00:30 +01:00
Clément Renault
e5bb96bc3b
Fix the searchable settings test 2021-03-06 12:48:41 +01:00
Kerollmops
9b6b35d9b7
Clean up some comments 2021-03-03 18:19:10 +01:00
Kerollmops
2cc4a467a6
Change the criterion output that cannot fail 2021-03-03 18:18:33 +01:00
Kerollmops
1fc25148da
Remove useless where clauses for the criteria 2021-03-03 18:09:19 +01:00
Kerollmops
07784c8990
Tune the words prefixes threshold to compute for 1/1000 instead 2021-03-03 15:51:28 +01:00
Kerollmops
f376c6a728
Make sure we retrieve the docid word positions 2021-03-03 15:45:03 +01:00
Kerollmops
5c5e51095c
Fix the Asc/Desc criteria to alsways return the QueryTree when available 2021-03-03 15:45:03 +01:00
many
cdaa96df63
optimize proximity criterion 2021-03-03 15:45:03 +01:00
many
246286f0eb
take hard separator into account 2021-03-03 15:45:03 +01:00
Kerollmops
6bf6b40495
Remove unused files 2021-03-03 15:45:03 +01:00
Kerollmops
f118d7e067
build criteria from settings 2021-03-03 15:45:03 +01:00
Kerollmops
025835c5b2
Fix the criteria to avoid always returning a placeholder 2021-03-03 15:45:03 +01:00
Kerollmops
36c1f93ceb
Do an union of the bucket candidates 2021-03-03 15:45:03 +01:00
many
b0e0c5eba0
remove option of bucket_candidates 2021-03-03 15:45:03 +01:00
Kerollmops
daf126a638
Introduce the final Fetcher criterion 2021-03-03 15:45:03 +01:00
many
7ac09d7b7c
remove option of bucket_candidates 2021-03-03 15:45:03 +01:00
Kerollmops
5af63c74e0
Speed-up the MatchingWords highlighting struct 2021-03-03 15:45:03 +01:00
Kerollmops
4510bbccca
Add a lot of debug 2021-03-03 15:43:44 +01:00
Kerollmops
ae4a237e58
Fix the maximum_proximity function 2021-03-03 15:43:44 +01:00
Kerollmops
9bc9b36645
Introduce the Proximity criterion 2021-03-03 15:43:44 +01:00
Kerollmops
22b84fe543
Use the words criterion in the search module 2021-03-03 15:43:44 +01:00
many
3d731cc861
remove option on bucket_candidates 2021-03-03 15:43:44 +01:00
Clément Renault
14f9f85c4b
Introduce the AscDesc criterion 2021-03-03 15:43:44 +01:00
many
b5b7ec0162
implement initial state for words criterion 2021-03-03 15:43:44 +01:00
Kerollmops
3415812b06
Imrpove the intersection speed in the words criterion 2021-03-03 15:43:43 +01:00
Clément Renault
ef381e17bb
Compute the candidates for each sub query tree 2021-03-03 15:43:43 +01:00
Kerollmops
e174ccbd8e
Use the words criterion in the search module 2021-03-03 15:43:43 +01:00
Clément Renault
1e47f9b3ff
Introduce the Words criterion 2021-03-03 15:43:43 +01:00
many
2d068bd45b
implement Context trait for criteria 2021-03-03 15:43:43 +01:00
many
d92ad5640a
remove option on bucket_candidates 2021-03-03 15:43:43 +01:00
many
64688b3786
fix query tree builder 2021-03-03 15:43:43 +01:00
many
fb7e6df790
add tests on typo criterion 2021-03-03 15:43:43 +01:00
Kerollmops
c5a32fd4fa
Fix the typo criterion 2021-03-03 15:43:42 +01:00
many
a273c46559
clean warnings 2021-03-03 15:43:42 +01:00
many
9e093d5ff3
add cache on alterate_query_tree function 2021-03-03 15:43:42 +01:00
many
41fc51ebcf
optimize alterate_query_tree when number_typos is zero 2021-03-03 15:43:42 +01:00
many
4da6e1ea9c
add cache in typo criterion 2021-03-03 15:43:42 +01:00
Kerollmops
67c71130df
Reduce the number of calls to alterate_query_tree 2021-03-03 15:43:42 +01:00
many
9ccaea2afc
simplify criterion context 2021-03-03 15:43:42 +01:00
Clément Renault
fea9ffc46a
Use the bucket candidates in the search module 2021-03-03 15:43:42 +01:00
Clément Renault
229130ed25
Correctly compute the bucket candidates for the Typo criterion 2021-03-03 15:43:42 +01:00
Clément Renault
5344abc008
Introduce the CriterionResult return type 2021-03-03 15:43:41 +01:00
many
86bcecf840
change variable's name from distance to proximity 2021-03-03 15:43:41 +01:00
many
4128bdc859
reduce match possibilities in docids fetchers 2021-03-03 15:43:41 +01:00
many
907482c8ac
clean docids fetchers 2021-03-03 15:43:41 +01:00
many
774a255f2e
use prefix cache in criteria 2021-03-03 15:43:41 +01:00
many
98e69e63d2
implement Context trait for criteria 2021-03-03 15:43:41 +01:00
Clément Renault
f091f370d0
Use the Typo criteria in the search module 2021-03-03 15:43:41 +01:00
Clément Renault
ad20d72a39
Introduce the Typo criterion 2021-03-03 15:43:41 +01:00
Clément Renault
f0ddea821c
Introduce the Typo criterion 2021-03-03 15:43:41 +01:00
many
73286dc8bf
Introduce the query tree data structure 2021-03-03 15:43:40 +01:00
Kerollmops
240b02e175
Remove unused Operation constructors 2021-03-03 13:40:19 +01:00
many
a463ae821e
Add methods optional_words and authorize_typos on the query tree 2021-03-03 13:40:19 +01:00
Kerollmops
6d135beb21
Introduce the maximum_proximity helper function 2021-03-03 13:40:18 +01:00
Kerollmops
6008f528d0
Introduce the maximum_typo helper function 2021-03-03 13:40:18 +01:00
Kerollmops
1dc857a4b2
Fix the query tree optional word generation with phrases 2021-03-03 13:40:18 +01:00
Kerollmops
4f19749252
Introduce the word_documents_count method on the Context trait 2021-03-03 13:40:18 +01:00
Kerollmops
79a143b32f
Introduce the query tree data structure 2021-03-03 13:40:18 +01:00
mpostma
e08b6b3ec7
add primary key to fields_id_map when not present 2021-03-01 16:10:16 +01:00
Clément Renault
c318373b88
Expose the WordsPrefixes update on the UpdateBuilder 2021-02-21 12:15:35 +01:00
Kerollmops
c2ffcc4bd1
Return an heed error from the word_documents_count method 2021-02-18 14:59:37 +01:00
Kerollmops
2f561c77f5
Introduce the word documents count method on the index 2021-02-18 14:35:14 +01:00
Kerollmops
8d710c5130
Introduce heed codecs to retrieve the length of roaring bitmaps 2021-02-18 14:30:47 +01:00
Kerollmops
fcfb39c5de
Move the RoaringBitmap related codecs into a module 2021-02-18 13:56:28 +01:00
Kerollmops
a4a48be923
Run the words prefixes update inside of the indexing documents update 2021-02-17 11:22:26 +01:00
Kerollmops
616ed8f73c
Clean up the word prefix pair proximities when deleting documents 2021-02-17 11:22:26 +01:00
Clément Renault
ea37fd821d
Clean up the words prefixes when deleting documents and words 2021-02-17 11:22:25 +01:00
Clément Renault
62eee9c69e
Introduce the sorter_into_lmdb_database helper function 2021-02-17 11:12:39 +01:00
Clément Renault
b5b89990eb
Compute and write the word prefix pair proximities database 2021-02-17 11:12:38 +01:00
Kerollmops
9b03b0a1b2
Introduce the word prefix pair proximity docids database 2021-02-17 11:12:38 +01:00
Clément Renault
f365de636f
Compute and write the word-prefix-docids database 2021-02-17 11:12:38 +01:00
Clément Renault
ee5a60e1c5
Clear the words prefixes when clearing an index 2021-02-17 10:45:17 +01:00
Clément Renault
b3a21d5a50
Introduce the getters and setters for the words prefixes FST 2021-02-17 10:45:17 +01:00
Clément Renault
89ce4e74fe
Do not change the primary key type when we serialize documents 2021-02-15 21:24:36 +01:00
Clément Renault
69acdd437e
Deserialize documents ids into JSON Values on deletion 2021-02-15 21:24:36 +01:00
Clément Renault
b3776598d8
Add a test to check deletion of documents with number as primary key 2021-02-15 21:24:35 +01:00
Clément Renault
fecf3d6fc1
Move the command lines helpers into different crates 2021-02-14 18:55:15 +01:00
Clément Renault
e8639517da
Change the project to become a workspace with milli as a default-member 2021-02-12 16:15:09 +01:00