Commit Graph

82 Commits

Author SHA1 Message Date
Alexey Shekhirin
33860bc3b7
test(update, settings): set & reset synonyms
fixes after review

more fixes after review
2021-04-18 11:24:17 +03:00
Alexey Shekhirin
e39aabbfe6
feat(search, update): synonyms 2021-04-18 11:24:17 +03:00
Marin Postma
9c4660d3d6
add tests 2021-04-15 16:25:56 +02:00
Marin Postma
75464a1baa
review fixes 2021-04-15 16:25:56 +02:00
Marin Postma
2f73fa55ae
add documentation 2021-04-15 16:25:55 +02:00
Marin Postma
45c45e11dd
implement distinct attribute
distinct can return error

facet distinct on numbers

return distinct error

review fixes

make get_facet_value more generic

fixes
2021-04-15 16:25:55 +02:00
tamo
dcb00b2e54
test a new implementation of the stop_words 2021-04-12 18:35:33 +02:00
tamo
da036dcc3e
Revert "Integrate the stop_words in the querytree"
This reverts commit 12fb509d84.
We revert this commit because it's causing the bug #150.
The initial algorithm we implemented for the stop_words was:

1. remove the stop_words from the dataset
2. keep the stop_words in the query to see if we can generate new words by
   integrating typos or if the word was a prefix
=> This was causing the bug since, in the case of “The hobbit”, we were
   **always** looking for something starting with “t he” or “th e”
   instead of ignoring the word completely.

For now we are going to fix the bug by completely ignoring the
stop_words in the query.
This could cause another problem were someone mistyped a normal word and
ended up typing a stop_word.

For example imagine someone searching for the music “Won't he do it”.
If that person misplace one space and write “Won' the do it” then we
will loose a part of the request.

One fix would be to update our query tree to something like that:

---------------------
OR
  OR
    TOLERANT hobbit # the first option is to ignore the stop_word
    AND
      CONSECUTIVE   # the second option is to do as we are doing
        EXACT t	    # currently
        EXACT he
      TOLERANT hobbit
---------------------

This would increase drastically the size of our query tree on request
with a lot of stop_words. For example think of “The Lord Of The Rings”.

For now whatsoever we decided we were going to ignore this problem and consider
that it doesn't reduce too much the relevancy of the search to do that
while it improves the performances.
2021-04-12 18:35:33 +02:00
tamo
12fb509d84
Integrate the stop_words in the querytree
remove the stop_words from the querytree except if it was a prefix or a typo
2021-04-01 13:57:55 +02:00
tamo
a2f46029c7
implement a first version of the stop_words
The front must provide a BTreeSet containing the stop words
The stop_words are set at None if an empty Set is provided
add the stop-words in the http-ui interface

Use maplit in the test
and remove all the useless drop(rtxn) at the end of all tests
2021-04-01 13:57:55 +02:00
Alexey Shekhirin
1e3f05db8f use fixed number of candidates as a threshold 2021-03-30 11:57:10 +03:00
Alexey Shekhirin
a776ec9718 fix division 2021-03-29 19:16:58 +03:00
Alexey Shekhirin
522e79f2e0 feat(search, criteria): introduce a percentage threshold to the asc/desc 2021-03-29 19:08:31 +03:00
mpostma
9c27183876
fix broken offset 2021-03-15 20:23:50 +01:00
Kerollmops
d48008339e
Introduce two new optional_words and authorize_typos Search options 2021-03-10 11:16:30 +01:00
Kerollmops
54b97ed8e1
Update the fetcher comments 2021-03-10 10:56:26 +01:00
Kerollmops
d301859bbd
Introduce a special word_derivations function for Proximity 2021-03-10 10:42:53 +01:00
Kerollmops
facfb4b615
Fix the bucket candidates 2021-03-10 10:42:53 +01:00
Kerollmops
42fd7dea78
Remove the useless typo cache 2021-03-10 10:42:53 +01:00
many
62a70c300d
Optimize words criterion 2021-03-10 10:42:53 +01:00
Kerollmops
d781a6164a
Rewrite some code with idiomatic Rust 2021-03-08 16:27:52 +01:00
Clément Renault
b18ec00a7a
Add a logging_timer macro to te criterion next methods 2021-03-08 16:12:06 +01:00
Kerollmops
82a0f678fb
Introduce a cache on the docid_word_positions database method 2021-03-08 16:12:03 +01:00
Clément Renault
5fcaedb880
Introduce a WordDerivationsCache struct 2021-03-08 16:00:53 +01:00
many
2606c92ef9
use plain sweep in proximity criterion 2021-03-08 15:58:39 +01:00
many
ae47bb3594
Introduce plane_sweep function in proximity criterion 2021-03-08 15:58:38 +01:00
Clément Renault
3c76b3548d
Rework the Asc/Desc criteria to be facet iterator based 2021-03-08 13:32:25 +01:00
Clément Renault
a58d2b6137
Print the Asc/Desc criterion field name in the debug prints 2021-03-08 13:32:25 +01:00
Kerollmops
9b6b35d9b7
Clean up some comments 2021-03-03 18:19:10 +01:00
Kerollmops
2cc4a467a6
Change the criterion output that cannot fail 2021-03-03 18:18:33 +01:00
Kerollmops
1fc25148da
Remove useless where clauses for the criteria 2021-03-03 18:09:19 +01:00
Kerollmops
5c5e51095c
Fix the Asc/Desc criteria to alsways return the QueryTree when available 2021-03-03 15:45:03 +01:00
many
cdaa96df63
optimize proximity criterion 2021-03-03 15:45:03 +01:00
Kerollmops
f118d7e067
build criteria from settings 2021-03-03 15:45:03 +01:00
Kerollmops
025835c5b2
Fix the criteria to avoid always returning a placeholder 2021-03-03 15:45:03 +01:00
Kerollmops
36c1f93ceb
Do an union of the bucket candidates 2021-03-03 15:45:03 +01:00
many
b0e0c5eba0
remove option of bucket_candidates 2021-03-03 15:45:03 +01:00
Kerollmops
daf126a638
Introduce the final Fetcher criterion 2021-03-03 15:45:03 +01:00
many
7ac09d7b7c
remove option of bucket_candidates 2021-03-03 15:45:03 +01:00
Kerollmops
5af63c74e0
Speed-up the MatchingWords highlighting struct 2021-03-03 15:45:03 +01:00
Kerollmops
4510bbccca
Add a lot of debug 2021-03-03 15:43:44 +01:00
Kerollmops
ae4a237e58
Fix the maximum_proximity function 2021-03-03 15:43:44 +01:00
Kerollmops
9bc9b36645
Introduce the Proximity criterion 2021-03-03 15:43:44 +01:00
Kerollmops
22b84fe543
Use the words criterion in the search module 2021-03-03 15:43:44 +01:00
many
3d731cc861
remove option on bucket_candidates 2021-03-03 15:43:44 +01:00
Clément Renault
14f9f85c4b
Introduce the AscDesc criterion 2021-03-03 15:43:44 +01:00
many
b5b7ec0162
implement initial state for words criterion 2021-03-03 15:43:44 +01:00
Kerollmops
3415812b06
Imrpove the intersection speed in the words criterion 2021-03-03 15:43:43 +01:00
Clément Renault
ef381e17bb
Compute the candidates for each sub query tree 2021-03-03 15:43:43 +01:00
Kerollmops
e174ccbd8e
Use the words criterion in the search module 2021-03-03 15:43:43 +01:00