mirror of
https://github.com/meilisearch/MeiliSearch
synced 2024-11-26 06:44:27 +01:00
Add comments suggesting performance improvements
This commit is contained in:
parent
862714a18b
commit
00bad8c716
@ -185,6 +185,43 @@ impl<'ctx, G: RankingRuleGraphTrait> RankingRule<'ctx, QueryGraph> for GraphBase
|
|||||||
if universe.is_empty() {
|
if universe.is_empty() {
|
||||||
return Ok(ControlFlow::Break(()));
|
return Ok(ControlFlow::Break(()));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* TODO: there are a couple ways to improve the speed of path computation.
|
||||||
|
|
||||||
|
1. Since the `visit_paths_of_cost` method uses a depth-first-search, we know that
|
||||||
|
consecutive calls to this closure have a high chance of giving paths sharing
|
||||||
|
some prefix. It would be good to reuse `subpath_docids` and `visited_conditions`
|
||||||
|
to find out what this common prefix is, to avoid recomputing it. In a way, doing
|
||||||
|
this serves as the dual of the DeadEndsCache: it takes advantage of our knowledge that
|
||||||
|
some paths *aren't* deadends. There is however a subtlety in that the universe might
|
||||||
|
have changed between the two consecutive calls. This is why we should subtract the docids
|
||||||
|
of the previous path (if successful) to the `subpath_docids`, at the same time as we do
|
||||||
|
it for the universe.
|
||||||
|
|
||||||
|
2. We perform way too many intersections with the universe. For the first visited path,
|
||||||
|
the operation we do is essentially:
|
||||||
|
universe & (c1 & universe) & (c2 & universe) & (c3 & universe) & etc.
|
||||||
|
This is a good idea *only if the universe is very small*. But if the universe is (almost)
|
||||||
|
a superset of each condition, then these intersections serve no purpose and slow down the search.
|
||||||
|
Maybe in the future we have a `deserialize_within_universe` method, which would speed up
|
||||||
|
these intersections. But for now, we have to be careful.
|
||||||
|
|
||||||
|
3. We could know in advance how many paths of a certain cost exist, and only update the
|
||||||
|
DeadEndsCache if (m)any remaining paths exist. There is a subtlety here because
|
||||||
|
on the next call of `next_bucket`, we will want an updated and correct DeadEndsCache.
|
||||||
|
We need to think about that. We could also avoid putting forbidden edges in this cache
|
||||||
|
if we know, somehow, that we won't visit this edge again.
|
||||||
|
|
||||||
|
4. Finally, but that will be a long term difficult project. We should compute the path *lazily*.
|
||||||
|
That is, when we do `path_docids &= condition`. We shouldn't *actually* perform the intersection,
|
||||||
|
but simply register that operation. It's only when we ask if the path_docids is empty that
|
||||||
|
**the minimum amount of work to determine whether the path is empty** is carried out. In practice,
|
||||||
|
that means performing a MultiOps on each container, in order or not, until any resulting container
|
||||||
|
is found to be non-empty. (In fact, when we ask `is_empty`, we should probably find the container
|
||||||
|
that has the highest chance of being non-empty and compute that one first).
|
||||||
|
|
||||||
|
*/
|
||||||
|
|
||||||
// Accumulate the path for logging purposes only
|
// Accumulate the path for logging purposes only
|
||||||
considered_paths.push(path.to_vec());
|
considered_paths.push(path.to_vec());
|
||||||
|
|
||||||
|
@ -53,7 +53,7 @@ impl QueryTermDocIdsCache {
|
|||||||
return Ok(&self.terms[&term_interned]);
|
return Ok(&self.terms[&term_interned]);
|
||||||
};
|
};
|
||||||
let mut docids = RoaringBitmap::new();
|
let mut docids = RoaringBitmap::new();
|
||||||
|
// TODO: use a MultiOps?
|
||||||
let term = term_interner.get(term_interned);
|
let term = term_interner.get(term_interned);
|
||||||
for word in term.all_single_words_except_prefix_db() {
|
for word in term.all_single_words_except_prefix_db() {
|
||||||
if let Some(word_docids) = db_cache.get_word_docids(index, txn, word_interner, word)? {
|
if let Some(word_docids) = db_cache.get_word_docids(index, txn, word_interner, word)? {
|
||||||
|
Loading…
Reference in New Issue
Block a user