Commit Graph

12 Commits

Author SHA1 Message Date
Clément Renault
d93e35cace
Introduce ContextMut and Context structs 2019-12-13 14:38:24 +01:00
Clément Renault
86ee0cbd6e
Introduce bucket_sort_with_distinct function 2019-12-13 14:38:24 +01:00
Clément Renault
248ccfc0d8
Update the criteria to the new ones 2019-12-13 14:38:24 +01:00
Clément Renault
ea148575cf
Remove the raw_query functions 2019-12-13 14:38:23 +01:00
Clément Renault
efc2be0b7b
Bump the sdset dependency to 0.3.6 2019-12-13 14:38:23 +01:00
Clément Renault
8d71112dcb
Rewrite the phrase query postings lists
This simplified the multiword_rewrite_matches function a little bit.
2019-12-13 14:38:23 +01:00
Clément Renault
dd03a6256a
Debug pre filtered number of documents 2019-12-13 14:38:23 +01:00
Clément Renault
9c03bb3428
First probably working phrase query doc filtering 2019-12-13 14:38:23 +01:00
Clément Renault
22b19c0d93
Fix the processed distance algorithm 2019-12-13 14:38:22 +01:00
Clément Renault
0f698d6bd9
Work in progress: Bad Typo detection
I have an issue where "speakers" is split into "speaker" and "s",
when I compute the distances for the Typo criterion,
it takes "s" into account and put a distance of zero in the bucket 0
(the "speakers" bucket), therefore it reports any document matching "s"
without typos as best results.

I need to make sure to ignore "s" when its associated part "speaker"
doesn't even exist in the document and is not in the place
it should be ("speaker" followed by "s").

This is hard to think that it will had much computation time to
the Typo criterion like in the previous algorithm where I computed
the real query/words indexes based and removed the invalid ones
before sending the documents to the bucket sort.
2019-12-13 14:38:22 +01:00
Clément Renault
4e91b31b1f
Make the Typo and Words work with synonyms 2019-12-13 14:38:22 +01:00
Clément Renault
902625601a
Work in progress: It seems like we support synonyms, split and concat words 2019-12-13 14:38:22 +01:00