Marin Postma
75464a1baa
review fixes
2021-04-15 16:25:56 +02:00
Marin Postma
2f73fa55ae
add documentation
2021-04-15 16:25:55 +02:00
Marin Postma
45c45e11dd
implement distinct attribute
...
distinct can return error
facet distinct on numbers
return distinct error
review fixes
make get_facet_value more generic
fixes
2021-04-15 16:25:55 +02:00
tamo
dcb00b2e54
test a new implementation of the stop_words
2021-04-12 18:35:33 +02:00
tamo
da036dcc3e
Revert "Integrate the stop_words in the querytree"
...
This reverts commit 12fb509d84
.
We revert this commit because it's causing the bug #150 .
The initial algorithm we implemented for the stop_words was:
1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
**always** looking for something starting with “t he” or “th e”
instead of ignoring the word completely.
For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.
For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.
One fix would be to update our query tree to something like that:
---------------------
OR
OR
TOLERANT hobbit # the first option is to ignore the stop_word
AND
CONSECUTIVE # the second option is to do as we are doing
EXACT t # currently
EXACT he
TOLERANT hobbit
---------------------
This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.
For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
2021-04-12 18:35:33 +02:00
tamo
12fb509d84
Integrate the stop_words in the querytree
...
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
...
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface
Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
Alexey Shekhirin
1e3f05db8f
use fixed number of candidates as a threshold
2021-03-30 11:57:10 +03:00
Alexey Shekhirin
a776ec9718
fix division
2021-03-29 19:16:58 +03:00
Alexey Shekhirin
522e79f2e0
feat(search, criteria): introduce a percentage threshold to the asc/desc
2021-03-29 19:08:31 +03:00
mpostma
9c27183876
fix broken offset
2021-03-15 20:23:50 +01:00
Kerollmops
d48008339e
Introduce two new optional_words and authorize_typos Search options
2021-03-10 11:16:30 +01:00
Kerollmops
54b97ed8e1
Update the fetcher comments
2021-03-10 10:56:26 +01:00
Kerollmops
d301859bbd
Introduce a special word_derivations function for Proximity
2021-03-10 10:42:53 +01:00
Kerollmops
facfb4b615
Fix the bucket candidates
2021-03-10 10:42:53 +01:00
Kerollmops
42fd7dea78
Remove the useless typo cache
2021-03-10 10:42:53 +01:00
many
62a70c300d
Optimize words criterion
2021-03-10 10:42:53 +01:00
Kerollmops
d781a6164a
Rewrite some code with idiomatic Rust
2021-03-08 16:27:52 +01:00
Clément Renault
b18ec00a7a
Add a logging_timer macro to te criterion next methods
2021-03-08 16:12:06 +01:00
Kerollmops
82a0f678fb
Introduce a cache on the docid_word_positions database method
2021-03-08 16:12:03 +01:00
Clément Renault
5fcaedb880
Introduce a WordDerivationsCache struct
2021-03-08 16:00:53 +01:00
many
2606c92ef9
use plain sweep in proximity criterion
2021-03-08 15:58:39 +01:00
many
ae47bb3594
Introduce plane_sweep function in proximity criterion
2021-03-08 15:58:38 +01:00
Clément Renault
3c76b3548d
Rework the Asc/Desc criteria to be facet iterator based
2021-03-08 13:32:25 +01:00
Clément Renault
a58d2b6137
Print the Asc/Desc criterion field name in the debug prints
2021-03-08 13:32:25 +01:00
Kerollmops
9b6b35d9b7
Clean up some comments
2021-03-03 18:19:10 +01:00
Kerollmops
2cc4a467a6
Change the criterion output that cannot fail
2021-03-03 18:18:33 +01:00
Kerollmops
1fc25148da
Remove useless where clauses for the criteria
2021-03-03 18:09:19 +01:00
Kerollmops
5c5e51095c
Fix the Asc/Desc criteria to alsways return the QueryTree when available
2021-03-03 15:45:03 +01:00
many
cdaa96df63
optimize proximity criterion
2021-03-03 15:45:03 +01:00
Kerollmops
f118d7e067
build criteria from settings
2021-03-03 15:45:03 +01:00
Kerollmops
025835c5b2
Fix the criteria to avoid always returning a placeholder
2021-03-03 15:45:03 +01:00
Kerollmops
36c1f93ceb
Do an union of the bucket candidates
2021-03-03 15:45:03 +01:00
many
b0e0c5eba0
remove option of bucket_candidates
2021-03-03 15:45:03 +01:00
Kerollmops
daf126a638
Introduce the final Fetcher criterion
2021-03-03 15:45:03 +01:00
many
7ac09d7b7c
remove option of bucket_candidates
2021-03-03 15:45:03 +01:00
Kerollmops
5af63c74e0
Speed-up the MatchingWords highlighting struct
2021-03-03 15:45:03 +01:00
Kerollmops
4510bbccca
Add a lot of debug
2021-03-03 15:43:44 +01:00
Kerollmops
ae4a237e58
Fix the maximum_proximity function
2021-03-03 15:43:44 +01:00
Kerollmops
9bc9b36645
Introduce the Proximity criterion
2021-03-03 15:43:44 +01:00
Kerollmops
22b84fe543
Use the words criterion in the search module
2021-03-03 15:43:44 +01:00
many
3d731cc861
remove option on bucket_candidates
2021-03-03 15:43:44 +01:00
Clément Renault
14f9f85c4b
Introduce the AscDesc criterion
2021-03-03 15:43:44 +01:00
many
b5b7ec0162
implement initial state for words criterion
2021-03-03 15:43:44 +01:00
Kerollmops
3415812b06
Imrpove the intersection speed in the words criterion
2021-03-03 15:43:43 +01:00
Clément Renault
ef381e17bb
Compute the candidates for each sub query tree
2021-03-03 15:43:43 +01:00
Kerollmops
e174ccbd8e
Use the words criterion in the search module
2021-03-03 15:43:43 +01:00
Clément Renault
1e47f9b3ff
Introduce the Words criterion
2021-03-03 15:43:43 +01:00
many
2d068bd45b
implement Context trait for criteria
2021-03-03 15:43:43 +01:00
many
d92ad5640a
remove option on bucket_candidates
2021-03-03 15:43:43 +01:00
many
64688b3786
fix query tree builder
2021-03-03 15:43:43 +01:00
many
fb7e6df790
add tests on typo criterion
2021-03-03 15:43:43 +01:00
Kerollmops
c5a32fd4fa
Fix the typo criterion
2021-03-03 15:43:42 +01:00
many
a273c46559
clean warnings
2021-03-03 15:43:42 +01:00
many
9e093d5ff3
add cache on alterate_query_tree function
2021-03-03 15:43:42 +01:00
many
41fc51ebcf
optimize alterate_query_tree when number_typos is zero
2021-03-03 15:43:42 +01:00
many
4da6e1ea9c
add cache in typo criterion
2021-03-03 15:43:42 +01:00
Kerollmops
67c71130df
Reduce the number of calls to alterate_query_tree
2021-03-03 15:43:42 +01:00
many
9ccaea2afc
simplify criterion context
2021-03-03 15:43:42 +01:00
Clément Renault
fea9ffc46a
Use the bucket candidates in the search module
2021-03-03 15:43:42 +01:00
Clément Renault
229130ed25
Correctly compute the bucket candidates for the Typo criterion
2021-03-03 15:43:42 +01:00
Clément Renault
5344abc008
Introduce the CriterionResult return type
2021-03-03 15:43:41 +01:00
many
86bcecf840
change variable's name from distance to proximity
2021-03-03 15:43:41 +01:00
many
4128bdc859
reduce match possibilities in docids fetchers
2021-03-03 15:43:41 +01:00
many
907482c8ac
clean docids fetchers
2021-03-03 15:43:41 +01:00
many
774a255f2e
use prefix cache in criteria
2021-03-03 15:43:41 +01:00
many
98e69e63d2
implement Context trait for criteria
2021-03-03 15:43:41 +01:00
Clément Renault
f091f370d0
Use the Typo criteria in the search module
2021-03-03 15:43:41 +01:00
Clément Renault
ad20d72a39
Introduce the Typo criterion
2021-03-03 15:43:41 +01:00
Clément Renault
f0ddea821c
Introduce the Typo criterion
2021-03-03 15:43:41 +01:00
many
73286dc8bf
Introduce the query tree data structure
2021-03-03 15:43:40 +01:00
Kerollmops
240b02e175
Remove unused Operation constructors
2021-03-03 13:40:19 +01:00
many
a463ae821e
Add methods optional_words and authorize_typos on the query tree
2021-03-03 13:40:19 +01:00
Kerollmops
6d135beb21
Introduce the maximum_proximity helper function
2021-03-03 13:40:18 +01:00
Kerollmops
6008f528d0
Introduce the maximum_typo helper function
2021-03-03 13:40:18 +01:00
Kerollmops
1dc857a4b2
Fix the query tree optional word generation with phrases
2021-03-03 13:40:18 +01:00
Kerollmops
4f19749252
Introduce the word_documents_count method on the Context trait
2021-03-03 13:40:18 +01:00
Kerollmops
79a143b32f
Introduce the query tree data structure
2021-03-03 13:40:18 +01:00
Clément Renault
e8639517da
Change the project to become a workspace with milli as a default-member
2021-02-12 16:15:09 +01:00