MeiliSearch/milli/src/search/new/ranking_rule_graph
meili-bors[bot] 2e49d6aec1
Merge #3768
3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec

This PR contains three changes:

## 1. Don't call the `words` ranking rule if the term matching strategy is `All`

This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph.

## 2. The `words` ranking rule is replaced by a graph-based ranking rule. 

This is for three reasons:

1. **performance**: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily.

2. **consistency**: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get.

3. **surfacing bugs**: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix.

## 3. Fix the `update_all_costs_before_nodes` function

It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like:
<img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7">
and we gave the node `is` as argument.

Then, we'd walk backwards from the node breadth-first. We'd update the costs of:
1. `sun`
2. `thesun`
3. `start`
4. `the`

which is an incorrect order. The correct order is:

1. `sun`
2. `thesun`
3. `the`
4. `start`

That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-05-23 13:28:08 +00:00
..
exactness Prevent the exactness ranking rule from removing random words 2023-04-26 09:09:19 +02:00
fid Add "position" part of the attribute ranking rule 2023-04-13 10:46:09 +02:00
position Adjust costs of edges in position ranking rule 2023-05-16 11:28:56 +02:00
proximity Make all search tests pass, fix distinctAttribute bug 2023-04-24 12:12:08 +02:00
typo Don't compute split_words for phrases 2023-05-16 17:01:18 +02:00
words Implement words as a graph-based ranking rule and fix some bugs 2023-05-16 10:42:11 +02:00
build.rs Cargo fmt 2023-05-03 12:21:58 +02:00
cheapest_paths.rs Remove trailing whitespace 2023-05-23 15:27:25 +02:00
condition_docids_cache.rs Update the ConditionDocidsCache after change to RankingRuleGraphTrait 2023-03-30 11:41:20 +02:00
dead_ends_cache.rs Improve performance of the cheapest path finder algorithm 2023-05-02 09:59:42 +02:00
mod.rs Implement words as a graph-based ranking rule and fix some bugs 2023-05-16 10:42:11 +02:00