MeiliSearch/milli/src/search/new
meili-bors[bot] 2e49d6aec1
Merge #3768
3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec

This PR contains three changes:

## 1. Don't call the `words` ranking rule if the term matching strategy is `All`

This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph.

## 2. The `words` ranking rule is replaced by a graph-based ranking rule. 

This is for three reasons:

1. **performance**: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily.

2. **consistency**: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get.

3. **surfacing bugs**: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix.

## 3. Fix the `update_all_costs_before_nodes` function

It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like:
<img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7">
and we gave the node `is` as argument.

Then, we'd walk backwards from the node breadth-first. We'd update the costs of:
1. `sun`
2. `thesun`
3. `start`
4. `the`

which is an incorrect order. The correct order is:

1. `sun`
2. `thesun`
3. `the`
4. `start`

That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-23 13:28:08 +00:00
..
logger Don't compute split_words for phrases 2023-05-16 17:01:18 +02:00
matches Remove dbg!(..) expression in highlighter tests 2023-05-08 09:45:23 +02:00
query_term Don't compute split_words for phrases 2023-05-16 17:01:18 +02:00
ranking_rule_graph Merge #3768 2023-05-23 13:28:08 +00:00
tests Update snapshot tests 2023-05-16 12:22:46 +02:00
bucket_sort.rs Improve performance of the cheapest path finder algorithm 2023-05-02 09:59:42 +02:00
db_cache.rs Fix bug in encoding of word_position_docids and word_fid_docids 2023-04-24 09:59:30 +02:00
distinct.rs Fix distinct attribute bugs 2023-04-07 11:09:01 +02:00
exact_attribute.rs Fix bug in exact_attribute 2023-05-02 10:48:32 +02:00
geo_sort.rs geosort: Remove rtree unwrap 2023-05-03 09:52:16 +02:00
graph_based_ranking_rule.rs Implement words as a graph-based ranking rule and fix some bugs 2023-05-16 10:42:11 +02:00
interner.rs Add new tests and fix construction of query graph from paths 2023-04-05 16:31:10 +02:00
limits.rs Limit the number of derivations for a single word. 2023-03-31 09:19:18 +02:00
mod.rs Implement words as a graph-based ranking rule and fix some bugs 2023-05-16 10:42:11 +02:00
query_graph.rs Highlight ngram matches as well 2023-05-16 10:39:36 +02:00
ranking_rules.rs Move bucket sort function to its own module and fix a bug 2023-04-04 18:03:08 +02:00
resolve_query_graph.rs Use MultiOps for resolve_query_graph 2023-05-02 18:54:09 +02:00
small_bitmap.rs SmallBitmap: Consistently panic on incoherent universe lengths 2023-03-29 08:45:38 +02:00
sort.rs Initialize query time ranking rule for query search 2023-03-28 12:40:52 +02:00