Alexey Shekhirin
2658c5c545
feat(index): update fields distribution in clear & delete operations
...
fixes after review
bump the version of the tokenizer
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
more fixes after review
2021-04-01 19:12:35 +03:00
Alexey Shekhirin
27c7ab6e00
feat(index): store fields distribution in index
2021-04-01 18:35:19 +03:00
tamo
12fb509d84
Integrate the stop_words in the querytree
...
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
...
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
tamo
62a8f1d707
bump the version of the tokenizer
2021-04-01 13:49:22 +02:00
Alexey Shekhirin
9205b640a4
feat(index): introduce fields_ids_distribution
2021-03-31 18:44:47 +03:00
Alexey Shekhirin
2cb32edaa9
fix(criterion): compile asc/desc regex only once
...
use once_cell instead of lazy_static
reorder imports
2021-03-30 16:07:14 +03:00
Alexey Shekhirin
1e3f05db8f
use fixed number of candidates as a threshold
2021-03-30 11:57:10 +03:00
Alexey Shekhirin
a776ec9718
fix division
2021-03-29 19:16:58 +03:00
Alexey Shekhirin
522e79f2e0
feat(search, criteria): introduce a percentage threshold to the asc/desc
2021-03-29 19:08:31 +03:00
tamo
73dcdb27f6
select a specific release of the tokenizer instead of using the latests git commit
2021-03-25 15:00:18 +01:00
mpostma
9c27183876
fix broken offset
2021-03-15 20:23:50 +01:00
mpostma
f0210453a6
add updated at on put primary key
2021-03-15 14:05:48 +01:00
mpostma
615fe095e1
update index updated at on index writes
2021-03-15 14:05:47 +01:00
mpostma
80d0f9c49d
methods to update index time metadata
2021-03-15 14:05:47 +01:00
Kerollmops
d48008339e
Introduce two new optional_words and authorize_typos Search options
2021-03-10 11:16:30 +01:00
Kerollmops
54b97ed8e1
Update the fetcher comments
2021-03-10 10:56:26 +01:00
Kerollmops
d301859bbd
Introduce a special word_derivations function for Proximity
2021-03-10 10:42:53 +01:00
Kerollmops
facfb4b615
Fix the bucket candidates
2021-03-10 10:42:53 +01:00
Kerollmops
42fd7dea78
Remove the useless typo cache
2021-03-10 10:42:53 +01:00
many
62a70c300d
Optimize words criterion
2021-03-10 10:42:53 +01:00
Kerollmops
f51eb46c69
Use the RoaringBitmapLenCodec to retrieve the count of documents
2021-03-09 10:25:39 +01:00
Kerollmops
d781a6164a
Rewrite some code with idiomatic Rust
2021-03-08 16:27:52 +01:00
Clément Renault
b18ec00a7a
Add a logging_timer macro to te criterion next methods
2021-03-08 16:12:06 +01:00
Kerollmops
82a0f678fb
Introduce a cache on the docid_word_positions database method
2021-03-08 16:12:03 +01:00
Clément Renault
5fcaedb880
Introduce a WordDerivationsCache struct
2021-03-08 16:00:53 +01:00
many
2606c92ef9
use plain sweep in proximity criterion
2021-03-08 15:58:39 +01:00
many
ae47bb3594
Introduce plane_sweep function in proximity criterion
2021-03-08 15:58:38 +01:00
Kerollmops
636a9df177
Temporarily fix the tinytemplate doc hidden issue
2021-03-08 15:57:45 +01:00
Clément Renault
3c76b3548d
Rework the Asc/Desc criteria to be facet iterator based
2021-03-08 13:32:25 +01:00
Clément Renault
a58d2b6137
Print the Asc/Desc criterion field name in the debug prints
2021-03-08 13:32:25 +01:00
mpostma
e3095be85c
Remove Debug use in Display impl
2021-03-08 12:09:09 +01:00
mpostma
9e1eb25232
implement display for criterion
...
Update milli/src/criterion.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-03-08 11:00:30 +01:00
Clément Renault
e5bb96bc3b
Fix the searchable settings test
2021-03-06 12:48:41 +01:00
Kerollmops
9b6b35d9b7
Clean up some comments
2021-03-03 18:19:10 +01:00
Kerollmops
2cc4a467a6
Change the criterion output that cannot fail
2021-03-03 18:18:33 +01:00
Kerollmops
1fc25148da
Remove useless where clauses for the criteria
2021-03-03 18:09:19 +01:00
Kerollmops
07784c8990
Tune the words prefixes threshold to compute for 1/1000 instead
2021-03-03 15:51:28 +01:00
Kerollmops
f376c6a728
Make sure we retrieve the docid word positions
2021-03-03 15:45:03 +01:00
Kerollmops
5c5e51095c
Fix the Asc/Desc criteria to alsways return the QueryTree when available
2021-03-03 15:45:03 +01:00
many
cdaa96df63
optimize proximity criterion
2021-03-03 15:45:03 +01:00
many
246286f0eb
take hard separator into account
2021-03-03 15:45:03 +01:00
Kerollmops
6bf6b40495
Remove unused files
2021-03-03 15:45:03 +01:00
Kerollmops
f118d7e067
build criteria from settings
2021-03-03 15:45:03 +01:00
Kerollmops
025835c5b2
Fix the criteria to avoid always returning a placeholder
2021-03-03 15:45:03 +01:00
Kerollmops
36c1f93ceb
Do an union of the bucket candidates
2021-03-03 15:45:03 +01:00
many
b0e0c5eba0
remove option of bucket_candidates
2021-03-03 15:45:03 +01:00
Kerollmops
daf126a638
Introduce the final Fetcher criterion
2021-03-03 15:45:03 +01:00
many
7ac09d7b7c
remove option of bucket_candidates
2021-03-03 15:45:03 +01:00
Kerollmops
5af63c74e0
Speed-up the MatchingWords highlighting struct
2021-03-03 15:45:03 +01:00
Kerollmops
4510bbccca
Add a lot of debug
2021-03-03 15:43:44 +01:00
Kerollmops
ae4a237e58
Fix the maximum_proximity function
2021-03-03 15:43:44 +01:00
Kerollmops
9bc9b36645
Introduce the Proximity criterion
2021-03-03 15:43:44 +01:00
Kerollmops
22b84fe543
Use the words criterion in the search module
2021-03-03 15:43:44 +01:00
many
3d731cc861
remove option on bucket_candidates
2021-03-03 15:43:44 +01:00
Clément Renault
14f9f85c4b
Introduce the AscDesc criterion
2021-03-03 15:43:44 +01:00
many
b5b7ec0162
implement initial state for words criterion
2021-03-03 15:43:44 +01:00
Kerollmops
3415812b06
Imrpove the intersection speed in the words criterion
2021-03-03 15:43:43 +01:00
Clément Renault
ef381e17bb
Compute the candidates for each sub query tree
2021-03-03 15:43:43 +01:00
Kerollmops
e174ccbd8e
Use the words criterion in the search module
2021-03-03 15:43:43 +01:00
Clément Renault
1e47f9b3ff
Introduce the Words criterion
2021-03-03 15:43:43 +01:00
many
2d068bd45b
implement Context trait for criteria
2021-03-03 15:43:43 +01:00
many
d92ad5640a
remove option on bucket_candidates
2021-03-03 15:43:43 +01:00
many
64688b3786
fix query tree builder
2021-03-03 15:43:43 +01:00
many
fb7e6df790
add tests on typo criterion
2021-03-03 15:43:43 +01:00
Kerollmops
c5a32fd4fa
Fix the typo criterion
2021-03-03 15:43:42 +01:00
many
a273c46559
clean warnings
2021-03-03 15:43:42 +01:00
many
9e093d5ff3
add cache on alterate_query_tree function
2021-03-03 15:43:42 +01:00
many
41fc51ebcf
optimize alterate_query_tree when number_typos is zero
2021-03-03 15:43:42 +01:00
many
4da6e1ea9c
add cache in typo criterion
2021-03-03 15:43:42 +01:00
Kerollmops
67c71130df
Reduce the number of calls to alterate_query_tree
2021-03-03 15:43:42 +01:00
many
9ccaea2afc
simplify criterion context
2021-03-03 15:43:42 +01:00
Clément Renault
fea9ffc46a
Use the bucket candidates in the search module
2021-03-03 15:43:42 +01:00
Clément Renault
229130ed25
Correctly compute the bucket candidates for the Typo criterion
2021-03-03 15:43:42 +01:00
Clément Renault
5344abc008
Introduce the CriterionResult return type
2021-03-03 15:43:41 +01:00
many
86bcecf840
change variable's name from distance to proximity
2021-03-03 15:43:41 +01:00
many
4128bdc859
reduce match possibilities in docids fetchers
2021-03-03 15:43:41 +01:00
many
907482c8ac
clean docids fetchers
2021-03-03 15:43:41 +01:00
many
774a255f2e
use prefix cache in criteria
2021-03-03 15:43:41 +01:00
many
98e69e63d2
implement Context trait for criteria
2021-03-03 15:43:41 +01:00
Clément Renault
f091f370d0
Use the Typo criteria in the search module
2021-03-03 15:43:41 +01:00
Clément Renault
ad20d72a39
Introduce the Typo criterion
2021-03-03 15:43:41 +01:00
Clément Renault
f0ddea821c
Introduce the Typo criterion
2021-03-03 15:43:41 +01:00
many
73286dc8bf
Introduce the query tree data structure
2021-03-03 15:43:40 +01:00
Kerollmops
240b02e175
Remove unused Operation constructors
2021-03-03 13:40:19 +01:00
many
a463ae821e
Add methods optional_words and authorize_typos on the query tree
2021-03-03 13:40:19 +01:00
Kerollmops
6d135beb21
Introduce the maximum_proximity helper function
2021-03-03 13:40:18 +01:00
Kerollmops
6008f528d0
Introduce the maximum_typo helper function
2021-03-03 13:40:18 +01:00
Kerollmops
1dc857a4b2
Fix the query tree optional word generation with phrases
2021-03-03 13:40:18 +01:00
Kerollmops
4f19749252
Introduce the word_documents_count method on the Context trait
2021-03-03 13:40:18 +01:00
Kerollmops
79a143b32f
Introduce the query tree data structure
2021-03-03 13:40:18 +01:00
mpostma
e08b6b3ec7
add primary key to fields_id_map when not present
2021-03-01 16:10:16 +01:00
Clément Renault
c318373b88
Expose the WordsPrefixes update on the UpdateBuilder
2021-02-21 12:15:35 +01:00
Kerollmops
519b1cb5c9
Update dependencies
2021-02-21 10:26:04 +01:00
Kerollmops
c2ffcc4bd1
Return an heed error from the word_documents_count method
2021-02-18 14:59:37 +01:00
Kerollmops
2f561c77f5
Introduce the word documents count method on the index
2021-02-18 14:35:14 +01:00
Kerollmops
8d710c5130
Introduce heed codecs to retrieve the length of roaring bitmaps
2021-02-18 14:30:47 +01:00
Kerollmops
fcfb39c5de
Move the RoaringBitmap related codecs into a module
2021-02-18 13:56:28 +01:00
Kerollmops
a4a48be923
Run the words prefixes update inside of the indexing documents update
2021-02-17 11:22:26 +01:00
Kerollmops
616ed8f73c
Clean up the word prefix pair proximities when deleting documents
2021-02-17 11:22:26 +01:00
Clément Renault
ea37fd821d
Clean up the words prefixes when deleting documents and words
2021-02-17 11:22:25 +01:00
Clément Renault
62eee9c69e
Introduce the sorter_into_lmdb_database helper function
2021-02-17 11:12:39 +01:00
Clément Renault
b5b89990eb
Compute and write the word prefix pair proximities database
2021-02-17 11:12:38 +01:00
Kerollmops
9b03b0a1b2
Introduce the word prefix pair proximity docids database
2021-02-17 11:12:38 +01:00
Clément Renault
f365de636f
Compute and write the word-prefix-docids database
2021-02-17 11:12:38 +01:00
Clément Renault
ee5a60e1c5
Clear the words prefixes when clearing an index
2021-02-17 10:45:17 +01:00
Clément Renault
b3a21d5a50
Introduce the getters and setters for the words prefixes FST
2021-02-17 10:45:17 +01:00
Clément Renault
89ce4e74fe
Do not change the primary key type when we serialize documents
2021-02-15 21:24:36 +01:00
Clément Renault
69acdd437e
Deserialize documents ids into JSON Values on deletion
2021-02-15 21:24:36 +01:00
Clément Renault
b3776598d8
Add a test to check deletion of documents with number as primary key
2021-02-15 21:24:35 +01:00
Clément Renault
fecf3d6fc1
Move the command lines helpers into different crates
2021-02-14 18:55:15 +01:00
Clément Renault
d8f3421608
Update the dependencies and remove the unused ones
2021-02-14 18:32:46 +01:00
Clément Renault
e8639517da
Change the project to become a workspace with milli as a default-member
2021-02-12 16:15:09 +01:00