Louis Dureuil
701d44bd91
Store the scores for each bucket
...
Remove optimization where ranking rules are not executed on buckets of a single document
when the score needs to be computed
2023-06-22 12:39:14 +02:00
Louis Dureuil
c621a250a7
Score for graph based ranking rules
...
Count phrases in matchingWords and maxMatchingWords
2023-06-22 12:39:14 +02:00
Louis Dureuil
8939e85f60
Add rank_to_score for graph based ranking rules
2023-06-22 12:39:14 +02:00
Louis Dureuil
fa41d2489e
Score for sort
2023-06-22 12:39:14 +02:00
Louis Dureuil
59c5b992c2
Score for geosort
2023-06-22 12:39:14 +02:00
Louis Dureuil
2ea8194c18
Score for exact_attributes
2023-06-22 12:39:14 +02:00
Louis Dureuil
421df64602
RankingRuleOutput now contains a Score
2023-06-22 12:39:14 +02:00
Louis Dureuil
f050634b1e
add virtual conditions to fid and position to always have the max cost
2023-06-20 10:07:18 +02:00
Louis Dureuil
becf1f066a
Change how the cost of removing words is computed
2023-06-20 09:45:43 +02:00
Louis Dureuil
701d299369
Remove out-of-date comment
2023-06-20 09:45:42 +02:00
Louis Dureuil
a20e4d447c
Position now takes into account the distance to the position of the word in the query
...
it used to be based on the distance to the position 0
2023-06-20 09:45:42 +02:00
Louis Dureuil
af57c3c577
Proximity costs 0 for documents that are perfectly matching
2023-06-20 09:45:42 +02:00
Louis Dureuil
0c40ef6911
Fix sort id
2023-06-20 09:45:42 +02:00
Loïc Lecrenier
2da86b31a6
Remove comments and add documentation
2023-06-14 12:39:42 +02:00
Louis Dureuil
a2a3b8c973
Fix offset difference between query and indexing for hard separators
2023-06-08 12:07:12 +02:00
Louis Dureuil
1dfc4038ab
Add test that fails before PR and passes now
2023-05-29 11:58:26 +02:00
Louis Dureuil
73198179f1
Consistently use wrapping add to avoid overflow in debug when query starts with a separator
2023-05-29 11:54:12 +02:00
meili-bors[bot]
2e49d6aec1
Merge #3768
...
3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec
This PR contains three changes:
## 1. Don't call the `words` ranking rule if the term matching strategy is `All`
This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph.
## 2. The `words` ranking rule is replaced by a graph-based ranking rule.
This is for three reasons:
1. **performance**: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily.
2. **consistency**: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get.
3. **surfacing bugs**: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix.
## 3. Fix the `update_all_costs_before_nodes` function
It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like:
<img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7 ">
and we gave the node `is` as argument.
Then, we'd walk backwards from the node breadth-first. We'd update the costs of:
1. `sun`
2. `thesun`
3. `start`
4. `the`
which is an incorrect order. The correct order is:
1. `sun`
2. `thesun`
3. `the`
4. `start`
That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-23 13:28:08 +00:00
Louis Dureuil
51043f78f0
Remove trailing whitespace
2023-05-23 15:27:25 +02:00
Louis Dureuil
a490a11325
Add explanatory comment on the way we're recomputing costs
2023-05-23 15:24:24 +02:00
Loïc Lecrenier
ec8f685d84
Fix bug in cheapest path algorithm
2023-05-16 17:01:30 +02:00
Loïc Lecrenier
5758268866
Don't compute split_words for phrases
2023-05-16 17:01:18 +02:00
Loïc Lecrenier
3e19702de6
Update snapshot tests
2023-05-16 12:22:46 +02:00
Loïc Lecrenier
f6524a6858
Adjust costs of edges in position ranking rule
...
To ensure good performance
2023-05-16 11:28:56 +02:00
meili-bors[bot]
65ad8cce36
Merge #3741
...
3741: Add ngram support to the highlighter r=ManyTheFish a=loiclec
This PR fixes a bug introduced by the search refactor, where ngrams were not highlighted.
The solution was to add the ngrams to the vector of `LocatedQueryTerm` that is given to the `MatchingWords` structure.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-05-16 09:03:31 +00:00
Loïc Lecrenier
a37da36766
Implement words
as a graph-based ranking rule and fix some bugs
2023-05-16 10:42:11 +02:00
Loïc Lecrenier
85d96d35a8
Highlight ngram matches as well
2023-05-16 10:39:36 +02:00
Loïc Lecrenier
4d352a21ac
Compute split words derivations of terms that don't accept typos
2023-05-10 13:31:19 +02:00
Loïc Lecrenier
3625389057
Highlight ngram matches as well
2023-05-08 15:35:41 +02:00
meili-bors[bot]
eace6df91b
Merge #3726
...
3726: Fix prefix highlighting r=loiclec a=ManyTheFish
The prefix queries were not properly highlighted, this PR now highlights only the start of a word when it matched with a prefix
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-05-08 07:46:46 +00:00
Loïc Lecrenier
83ab8cf4e5
Remove dbg!(..) expression in highlighter tests
2023-05-08 09:45:23 +02:00
ManyTheFish
cd2573fcc3
Fix prefix highlighting
2023-05-04 16:53:50 +02:00
Jakub Jirutka
13f1277637
Allow to disable specialized tokenizations (again)
...
In PR #2773 , I added the `chinese`, `hebrew`, `japanese` and `thai`
feature flags to allow melisearch to be built without huge specialed
tokenizations that took up 90% of the melisearch binary size.
Unfortunately, due to some recent changes, this doesn't work anymore.
The problem lies in excessive use of the `default` feature flag, which
infects the dependency graph.
Instead of adding `default-features = false` here and there, it's easier
and more future-proof to not declare `default` in `milli` and
`meilisearch-types`. I've renamed it to `all-tokenizers`, which also
makes it a bit clearer what it's about.
2023-05-04 15:45:40 +02:00
Louis Dureuil
f8f190cd40
Update exactness tests following charabia camelCase tokenization
2023-05-03 14:45:09 +02:00
Louis Dureuil
1aaf24ccbf
Cargo fmt
2023-05-03 12:21:58 +02:00
Louis Dureuil
342c4ff85d
geosort: Remove rtree unwrap
2023-05-03 09:52:16 +02:00
Tamo
c85392ce40
make the descendent geosort fast
2023-05-03 09:13:12 +02:00
Tamo
8875d24a48
deserialize the rtree only when its needed, and keep it in memory once it has been deserialized
2023-05-03 09:13:12 +02:00
Tamo
c470b67fa2
revamp the test to use execute_iterative_and_rtree_returns_the_same
2023-05-03 09:13:12 +02:00
Louis Dureuil
b60840ebff
Remove self.iterating from words
2023-05-02 18:54:23 +02:00
Louis Dureuil
fdc1763838
Use MultiOps for resolve_query_graph
2023-05-02 18:54:09 +02:00
Louis Dureuil
75819bc940
Remove too many arguments on resolve_maximally_reduced_query_graph
2023-05-02 18:53:40 +02:00
Louis Dureuil
7b8cc25625
rename located_query_terms_from_string -> located_query_terms_from_tokens
2023-05-02 18:53:01 +02:00
Loïc Lecrenier
aa63091752
Fix bug in exact_attribute
2023-05-02 10:48:32 +02:00
Loïc Lecrenier
1b514517f5
Fix bug in computation of query term at a position
2023-05-02 10:48:32 +02:00
Loïc Lecrenier
11f814821d
Minor cleanup
2023-05-02 10:48:32 +02:00
Loïc Lecrenier
30fb1153cc
Speed up graph based ranking rule when a lot of different costs exist
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
3b2c8b9f25
Improve performance of position rr
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
2a7f9adf78
Build query graph more correctly from paths
...
Update snapshots
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
608ceea440
Fix bug in position rr
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
79001b9c97
Improve performance of the cheapest path finder algorithm
2023-05-02 09:59:42 +02:00
Loïc Lecrenier
59b12fca87
Fix errors, clippy warnings, and add review comments
2023-04-29 11:48:11 +02:00
Loïc Lecrenier
48f5bb1693
Implements the geo-sort ranking rule
2023-04-29 11:02:16 +02:00
Loïc Lecrenier
bc4efca611
Add more tests for the attribute ranking rule
2023-04-29 10:56:48 +02:00
Loïc Lecrenier
899baa0ea5
Update forgotten snapshot from previous commit
2023-04-27 13:43:04 +02:00
Loïc Lecrenier
374095d42c
Add tests for stop words and fix a couple of bugs
2023-04-27 13:30:09 +02:00
Louis Dureuil
b41a6cbd7a
Check sort criteria also in placeholder search
2023-04-26 16:28:17 +02:00
Louis Dureuil
c8af572697
Add tests for exact words and exact attributes
2023-04-26 16:13:01 +02:00
Loïc Lecrenier
b448aca49c
Add more tests for exactness rr
2023-04-26 11:04:18 +02:00
Loïc Lecrenier
55bad07c16
Fix bug in exact_attribute rr implementation
2023-04-26 10:40:05 +02:00
Loïc Lecrenier
3421125a55
Prevent the exactness
ranking rule from removing random words
...
Make it strictly follow the term matching strategy
2023-04-26 09:09:19 +02:00
Loïc Lecrenier
d3a94e8b25
Fix bugs and add tests to exactness ranking rule
2023-04-25 16:49:08 +02:00
Loïc Lecrenier
8f2e971879
Add tests for "exactness" rr, make correct universe computation
2023-04-24 16:57:34 +02:00
Loïc Lecrenier
d1fdbb63da
Make all search tests pass, fix distinctAttribute bug
2023-04-24 12:12:08 +02:00
Loïc Lecrenier
84d9c731f8
Fix bug in encoding of word_position_docids and word_fid_docids
2023-04-24 09:59:30 +02:00
Loïc Lecrenier
bd9aba4d77
Add "position" part of the attribute ranking rule
2023-04-13 10:46:09 +02:00
Loïc Lecrenier
8edad8291b
Add logger to attribute rr, fix a bug
2023-04-13 10:25:00 +02:00
Kerollmops
d9cebff61c
Add a simple test to check that attributes are ranking correctly
2023-04-13 08:27:09 +02:00
Loïc Lecrenier
30f7bd03f6
Fix compiler warning/errors caused by previous merge
2023-04-13 08:27:09 +02:00
Kerollmops
df0d9bb878
Introduce the attribute ranking rule in the list of ranking rules
2023-04-13 08:27:09 +02:00
Kerollmops
5230ddb3ea
Resolve the attribute ranking rule conditions
2023-04-13 08:27:09 +02:00
Kerollmops
d6a7c28e4d
Implement the attribute ranking rule edge computation
2023-04-13 08:27:09 +02:00
Kerollmops
e55efc419e
Introduce a new cache for the words fids
2023-04-13 08:27:09 +02:00
Loïc Lecrenier
644e136aee
Merge branch 'search-refactor-typo-attributes' into search-refactor
2023-04-13 08:26:56 +02:00
Louis Dureuil
38b7b31beb
Decide to use prefix DB if the word is not an ngram
2023-04-12 16:45:38 +02:00
Louis Dureuil
7a01f20df7
Use word_prefix_docids, make get_word_prefix_docids private
2023-04-12 16:45:38 +02:00
Louis Dureuil
c20c38a7fa
Add SearchContext::word_prefix_docids() method
2023-04-12 16:44:43 +02:00
Louis Dureuil
5ab46324c4
Everyone uses the SearchContext::word_docids instead of get_db_word_docids
...
make get_db_word_docids private
2023-04-12 16:44:43 +02:00
Louis Dureuil
325f17488a
Add SearchContext::word_docids() method
2023-04-12 16:37:05 +02:00
Louis Dureuil
e7ff987c46
Update call sites
2023-04-12 16:36:38 +02:00
Louis Dureuil
244003e36f
Refactor DB cache to return Roaring Bitmaps directly instead of byte slices
2023-04-12 16:35:48 +02:00
Loïc Lecrenier
1f813a6f3b
Simplify implementation of the detailed (=visual) logger
2023-04-12 16:32:53 +02:00
Loïc Lecrenier
96183e804a
Simplify the logger
2023-04-12 16:32:53 +02:00
Loïc Lecrenier
7ab48ed8c7
Matching words fixes
2023-04-12 16:21:43 +02:00
Loïc Lecrenier
e7bb8c940f
Merge branch 'search-refactor-highlighter' into search-refactor-highlighter-merged
2023-04-11 12:22:34 +02:00
Loïc Lecrenier
d0e9d65025
Fix distinct attribute bugs
2023-04-07 11:09:01 +02:00
Loïc Lecrenier
a81165f0d8
Merge remote-tracking branch 'origin/main' into search-refactor
2023-04-07 10:15:55 +02:00
Loïc Lecrenier
d6585eb10b
Avoid splitting ngrams into their original component words
2023-04-07 10:13:49 +02:00
Loïc Lecrenier
f7d90ad19f
Merge remote-tracking branch 'origin/search-refactor-tests-doc' into search-refactor
2023-04-07 10:13:18 +02:00
Louis Dureuil
31630c85d0
exactness graph rr: Add important TODO/FIXME after review
2023-04-06 17:50:39 +02:00
Louis Dureuil
ab09dc0167
exact_attributes: Add TODOs and additional check after review
2023-04-06 17:50:39 +02:00
Louis Dureuil
618c54915d
exact_attribute: dedup nodes after sorting them
2023-04-06 17:50:39 +02:00
Louis Dureuil
90a6c01495
Use correct codec in proximity
2023-04-06 17:50:39 +02:00
Louis Dureuil
e58426109a
Fix panics and issues in exactness graph ranking rule
2023-04-06 17:50:39 +02:00
Louis Dureuil
f513cf930a
Exact attribute with state
2023-04-06 17:50:39 +02:00
Louis Dureuil
8a13ed7e3f
Add exactness ranking rules
2023-04-06 17:50:39 +02:00
Louis Dureuil
1b8e4d0301
Add ExactTerm and helper method
2023-04-06 17:50:39 +02:00
Louis Dureuil
996619b22a
Increase position by 8 on hard separator when building query terms
2023-04-06 17:50:39 +02:00
Louis Dureuil
2c9822a337
Rename is_multiple_words
to is_ngram
and zero_typo
to exact
2023-04-06 17:50:39 +02:00
Louis Dureuil
7276deee0a
Add new db caches
2023-04-06 17:50:39 +02:00
ManyTheFish
f7e7f438f8
Patch prefix match
2023-04-06 17:22:31 +02:00
ManyTheFish
ba8dcc2d78
Fix clippy
2023-04-06 15:50:47 +02:00
Loïc Lecrenier
7ca91ebb71
Merge branch 'search-refactor-exactness' into search-refactor-tests-doc
2023-04-06 15:16:35 +02:00
ManyTheFish
47f6a3ad3d
Take into account that a logger need the search context
2023-04-06 15:02:23 +02:00
ManyTheFish
ae17c62e24
Remove warnings
2023-04-06 14:07:18 +02:00
ManyTheFish
9c5f64769a
Integrate the new Highlighter in the search
2023-04-06 13:58:56 +02:00
ManyTheFish
ebe23b04c9
Make the matcher consume the search context
2023-04-06 12:28:28 +02:00
ManyTheFish
13b7c826c1
add new highlighter
2023-04-06 12:15:37 +02:00
Louis Dureuil
d1ddaa223d
Use correct codec in proximity
2023-04-05 18:14:00 +02:00
Louis Dureuil
f7ecea142e
Fix panics and issues in exactness graph ranking rule
2023-04-05 18:13:46 +02:00
Louis Dureuil
337e75b0e4
Exact attribute with state
2023-04-05 18:12:46 +02:00
Loïc Lecrenier
b5691802a3
Add new tests and fix construction of query graph from paths
2023-04-05 16:31:10 +02:00
Loïc Lecrenier
6e50f23896
Add more search tests
2023-04-05 13:33:23 +02:00
Loïc Lecrenier
4c8a0179ba
Add more search tests
2023-04-05 11:30:49 +02:00
Loïc Lecrenier
c69cbec64a
Add more search tests
2023-04-05 11:20:04 +02:00
Loïc Lecrenier
ce328c329d
Move bucket sort function to its own module and fix a bug
2023-04-04 18:03:08 +02:00
Loïc Lecrenier
959e4607bb
Add more search tests
2023-04-04 18:02:46 +02:00
Louis Dureuil
4b4ffb8ec9
Add exactness ranking rules
2023-04-04 17:12:07 +02:00
Louis Dureuil
3951fe22ab
Add ExactTerm and helper method
2023-04-04 17:09:32 +02:00
Louis Dureuil
4d5bc9df4c
Increase position by 8 on hard separator when building query terms
2023-04-04 17:07:26 +02:00
Louis Dureuil
ec2f8e8040
Rename is_multiple_words
to is_ngram
and zero_typo
to exact
2023-04-04 17:06:07 +02:00
Louis Dureuil
406b8bd248
Add new db caches
2023-04-04 17:04:46 +02:00
Loïc Lecrenier
62b9c6fbee
Add search tests
2023-04-04 16:18:22 +02:00
Loïc Lecrenier
b439d36807
Split query_term module into multiple submodules
2023-04-04 15:38:30 +02:00
Loïc Lecrenier
faceb661e3
Add note that a part of the code needs fixing
2023-04-04 15:02:01 +02:00
Loïc Lecrenier
4129d657e2
Simplify query_term module a bit
2023-04-04 15:01:42 +02:00
Loïc Lecrenier
3f13608002
Fix computation of ngram derivations
2023-04-03 15:27:49 +02:00
Loïc Lecrenier
4708d9b016
Fix compiler warnings/errors
2023-04-03 10:09:27 +02:00
Clément Renault
0d2e7bcc13
Implement the previous way for the exhaustive distinct candidates
2023-04-03 10:08:10 +02:00
Loïc Lecrenier
55fbfb6124
Merge branch 'search-refactor-located-query-terms' into search-refactor
2023-04-03 10:04:36 +02:00
Loïc Lecrenier
58fe260c72
Allow removing all the terms from a query if it contains a phrase
2023-04-03 09:18:02 +02:00
Loïc Lecrenier
24e5f6f7a9
Don't remove phrases with "last" term matching strategy
2023-04-03 09:17:33 +02:00
Louis Dureuil
9b87c36200
Limit the number of derivations for a single word.
2023-03-31 09:19:18 +02:00
Loïc Lecrenier
12b26cd54e
Don't remove phrases from the query with term matching strategy Last
2023-03-30 14:54:08 +02:00
Loïc Lecrenier
061b1e6d7c
Tiny refactor of query graph remove_nodes method
2023-03-30 14:49:25 +02:00
Loïc Lecrenier
0d6e8b5c31
Fix phrase search bug when the phrase has only one word
2023-03-30 14:48:12 +02:00
Loïc Lecrenier
d48cdc67a0
Fix term matching strategy bugs
2023-03-30 14:01:52 +02:00
Loïc Lecrenier
35c16ad047
Use new term matching strategy logic in words ranking rule
2023-03-30 13:15:43 +02:00
Loïc Lecrenier
2997d1f186
Use new term matching strategy logic in resolve_maximally_reduced_...
2023-03-30 13:12:51 +02:00
Loïc Lecrenier
2a5997fb20
Avoid expensive assert! in bucket sort function
2023-03-30 13:07:17 +02:00
Loïc Lecrenier
ee8a9e0bad
Remove outdated sentence in documentation
2023-03-30 12:22:24 +02:00
Loïc Lecrenier
3b0737a092
Fix detailed logger
2023-03-30 12:20:44 +02:00
Loïc Lecrenier
fdd02105ac
Graph-based ranking rule + term matching strategy support
2023-03-30 12:19:21 +02:00
Loïc Lecrenier
aa9592455c
Refactor the paths_of_cost algorithm
...
Support conditions that require certain nodes to be skipped
2023-03-30 12:11:11 +02:00
Loïc Lecrenier
01e24dd630
Rewrite proximity ranking rule
2023-03-30 11:59:06 +02:00
Loïc Lecrenier
ae6bb1ce17
Update the ConditionDocidsCache after change to RankingRuleGraphTrait
2023-03-30 11:41:20 +02:00
Loïc Lecrenier
5fd28620cd
Build ranking rule graph correctly after changes to trait definition
2023-03-30 11:32:55 +02:00
Loïc Lecrenier
728710d63a
Update typo ranking rule to use new query term structure
2023-03-30 11:32:19 +02:00
Loïc Lecrenier
fa81381865
Update the trait requirements of ranking-rule graphs
2023-03-30 11:19:45 +02:00
Loïc Lecrenier
b96a682f16
Update resolve_graph module to work with lazy query terms
2023-03-30 11:10:38 +02:00