Work in progress: Bad Typo detection

I have an issue where "speakers" is split into "speaker" and "s", when I compute the distances for the Typo criterion, it takes "s" into account and put a distance of zero in the bucket 0 (the "speakers" bucket), therefore it reports any document matching "s" without typos as best results. I need to make sure to ignore "s" when its associated part "speaker" doesn't even exist in the document and is not in the place it should be ("speaker" followed by "s"). This is hard to think that it will had much computation time to the Typo criterion like in the previous algorithm where I computed the real query/words indexes based and removed the invalid ones before sending the documents to the bucket sort.
2025-07-03 20:07:09 +02:00 · 2019-12-06 19:15:19 +01:00 · 2019-12-06 19:15:19 +01:00 · 0f698d6bd9
commit 0f698d6bd9
parent 4e91b31b1f
4 changed files with 111 additions and 29 deletions
--- a/meilisearch-core/src/automaton/dfa.rs
+++ b/meilisearch-core/src/automaton/dfa.rs
@ -46,3 +46,8 @@ pub fn build_prefix_dfa(query: &str) -> DFA {
 pub fn build_dfa(query: &str) -> DFA {
    build_dfa_with_setting(query, PrefixSetting::NoPrefix)
 }
+
+pub fn build_exact_dfa(query: &str) -> DFA {
+    let builder = LEVDIST0.get_or_init(|| LevBuilder::new(0, true));
+    builder.build_dfa(query)
+}
--- a/meilisearch-core/src/automaton/mod.rs
+++ b/meilisearch-core/src/automaton/mod.rs
@ -13,7 +13,7 @@ use crate::database::MainT;
 use crate::error::MResult;
 use crate::store;

-pub use self::dfa::{build_dfa, build_prefix_dfa};
+pub use self::dfa::{build_dfa, build_prefix_dfa, build_exact_dfa};
 pub use self::query_enhancer::QueryEnhancer;
 pub use self::query_enhancer::QueryEnhancerBuilder;