Commit Graph

52 Commits

Author SHA1 Message Date
qdequele
a5b0e468ee
fix for review 2020-02-11 15:28:00 +01:00
qdequele
21d122a870
rewrite indexed_pos -> field_id for hightligths 2020-02-11 15:27:54 +01:00
Clément Renault
789e05304c
Replace prints by debug logs 2020-01-21 11:05:34 +01:00
Clément Renault
5465e401bb
Catch query tree related errors 2020-01-17 10:41:27 +01:00
Clément Renault
96139da0d2
Reintroduce the distinct search system 2020-01-16 15:55:55 +01:00
Clément Renault
74fa9ee4df
Introduce a better higlighting system 2020-01-16 14:56:16 +01:00
Clément Renault
00336c5154
Reintroduce a basic highlight display 2020-01-16 14:24:45 +01:00
Clément Renault
3912d1ec4b
Improve query parsing and interpretation 2020-01-16 14:11:17 +01:00
Clément Renault
54dacb362d
Use different algorithms for different documents ratios 2020-01-14 17:51:08 +01:00
Clément Renault
6edb460bea
Try with an exponential search 2020-01-14 16:52:24 +01:00
Clément Renault
40dab80dfa
Change the way we filter the documents 2020-01-14 14:18:01 +01:00
Clément Renault
681711fced
Fix query ids to be usize 2020-01-14 13:12:42 +01:00
Clément Renault
21c1473e0c
Introduce the distance data 2020-01-14 11:38:04 +01:00
Clément Renault
8acbdcbbad
wip: Make the new query tree work with the criteria 2020-01-13 14:36:06 +01:00
Clément Renault
da8abebfa2
Introduce the query words mapping along with the query tree 2020-01-13 13:29:47 +01:00
Clément Renault
4f7a7ea0bb
Faster intersection group by 2020-01-09 16:30:03 +01:00
Clément Renault
d6c9ba8f08
Store the postings lists 2020-01-09 15:04:53 +01:00
Clément Renault
81c573ec92
Add the raw document IDs to the postings lists 2020-01-08 15:30:43 +01:00
Clément Renault
9420edadf4
Introduce the Postings type to decorrelate the DocumentIds 2020-01-08 14:48:23 +01:00
Clément Renault
d724a7659e
Introduce a query tree context struct 2020-01-08 13:37:22 +01:00
Clément Renault
07937ed6d7
Use the prefix caches 2020-01-08 13:14:07 +01:00
Clément Renault
13ca30c4d8
WIP: Made the query tree traversing support prefix search 2020-01-08 12:02:58 +01:00
Clément Renault
fbcec2975d wip: Impl a basic tree traversing 2020-01-07 18:24:13 +01:00
Clément Renault
6e1f4af833 wip: Create a tree from query but need to show synonyms 2020-01-07 18:24:13 +01:00
Clément Renault
856c5c4214 Fix group offset computing 2019-12-31 14:24:10 +01:00
Clément Renault
670e80c151 Use the cached postings lists in the query system 2019-12-31 13:32:36 +01:00
Clément Renault
eed07c724f Add more logging for postings lists fetching by word 2019-12-31 13:32:36 +01:00
Clément Renault
99d35fb940 Introduce a first version of a number of candidates reducer
It works by ignoring the postings lists associated to documents that the previous words did not returned
2019-12-31 13:32:36 +01:00
Clément Renault
58836d89aa
Rename the PrefixCache into PrefixDocumentsCache 2019-12-30 15:42:09 +01:00
Clément Renault
1a5a104f13
Display proximity evaluation number of calls 2019-12-30 15:42:09 +01:00
Clément Renault
064cfa4755
Add more debug, where are those 100ms 2019-12-30 15:42:08 +01:00
Clément Renault
ed6172aa94
Add a time measurement of the criterion loop 2019-12-30 15:42:08 +01:00
Clément Renault
1e1f0fcaf5
Introduce a basic cache system for first letters 2019-12-30 15:42:08 +01:00
Clément Renault
d21352a109
Change the time measurement of the FST 2019-12-30 15:42:08 +01:00
Clément Renault
4be11f961b
Use an ugly trick to avoid cloning the FST 2019-12-30 15:42:07 +01:00
Clément Renault
1163f390b3
Restrict FST search to the first letter of the word 2019-12-30 15:42:07 +01:00
Clément Renault
04bb49989f
Add more debug timings 2019-12-20 14:18:48 +01:00
Clément Renault
40c0b14d1c
Reintroduce searchable attributes and reordering 2019-12-13 14:38:25 +01:00
Clément Renault
a4dd033ccf
Rename raw_matches into bare_matches 2019-12-13 14:38:25 +01:00
Clément Renault
746e6e170c
Make the test pass again 2019-12-13 14:38:24 +01:00
Clément Renault
d93e35cace
Introduce ContextMut and Context structs 2019-12-13 14:38:24 +01:00
Clément Renault
86ee0cbd6e
Introduce bucket_sort_with_distinct function 2019-12-13 14:38:24 +01:00
Clément Renault
248ccfc0d8
Update the criteria to the new ones 2019-12-13 14:38:24 +01:00
Clément Renault
ea148575cf
Remove the raw_query functions 2019-12-13 14:38:23 +01:00
Clément Renault
efc2be0b7b
Bump the sdset dependency to 0.3.6 2019-12-13 14:38:23 +01:00
Clément Renault
8d71112dcb
Rewrite the phrase query postings lists
This simplified the multiword_rewrite_matches function a little bit.
2019-12-13 14:38:23 +01:00
Clément Renault
dd03a6256a
Debug pre filtered number of documents 2019-12-13 14:38:23 +01:00
Clément Renault
9c03bb3428
First probably working phrase query doc filtering 2019-12-13 14:38:23 +01:00
Clément Renault
22b19c0d93
Fix the processed distance algorithm 2019-12-13 14:38:22 +01:00
Clément Renault
0f698d6bd9
Work in progress: Bad Typo detection
I have an issue where "speakers" is split into "speaker" and "s",
when I compute the distances for the Typo criterion,
it takes "s" into account and put a distance of zero in the bucket 0
(the "speakers" bucket), therefore it reports any document matching "s"
without typos as best results.

I need to make sure to ignore "s" when its associated part "speaker"
doesn't even exist in the document and is not in the place
it should be ("speaker" followed by "s").

This is hard to think that it will had much computation time to
the Typo criterion like in the previous algorithm where I computed
the real query/words indexes based and removed the invalid ones
before sending the documents to the bucket sort.
2019-12-13 14:38:22 +01:00