Commit Graph

210 Commits

Author SHA1 Message Date
Clément Renault 6e1f4af833 wip: Create a tree from query but need to show synonyms 2020-01-07 18:24:13 +01:00
Clément Renault 856c5c4214 Fix group offset computing 2019-12-31 14:24:10 +01:00
Clément Renault 670e80c151 Use the cached postings lists in the query system 2019-12-31 13:32:36 +01:00
Clément Renault eed07c724f Add more logging for postings lists fetching by word 2019-12-31 13:32:36 +01:00
Clément Renault 99d35fb940 Introduce a first version of a number of candidates reducer
It works by ignoring the postings lists associated to documents that the previous words did not returned
2019-12-31 13:32:36 +01:00
Clément Renault 106b886873
Cache the prefix postings lists 2019-12-30 18:01:32 +01:00
Clément Renault 928876b553
Introduce the postings lists caching stores
Currently not used
2019-12-30 18:01:27 +01:00
Clément Renault 58836d89aa
Rename the PrefixCache into PrefixDocumentsCache 2019-12-30 15:42:09 +01:00
Clément Renault 1a5a104f13
Display proximity evaluation number of calls 2019-12-30 15:42:09 +01:00
Clément Renault 064cfa4755
Add more debug, where are those 100ms 2019-12-30 15:42:08 +01:00
Clément Renault ed6172aa94
Add a time measurement of the criterion loop 2019-12-30 15:42:08 +01:00
Clément Renault 8c140f6bcd
Increase the disk usage limit 2019-12-30 15:42:08 +01:00
Clément Renault 1e1f0fcaf5
Introduce a basic cache system for first letters 2019-12-30 15:42:08 +01:00
Clément Renault d21352a109
Change the time measurement of the FST 2019-12-30 15:42:08 +01:00
Clément Renault 4be11f961b
Use an ugly trick to avoid cloning the FST 2019-12-30 15:42:07 +01:00
Clément Renault 1163f390b3
Restrict FST search to the first letter of the word 2019-12-30 15:42:07 +01:00
Clément Renault 691e2a3c1d
Fix a blocking channel, appearing like a deadlock 2019-12-30 15:28:28 +01:00
Clément Renault 04bb49989f
Add more debug timings 2019-12-20 14:18:48 +01:00
Clément Renault d12ff15ee3
Set the indexes info in the create_index function 2019-12-19 10:38:56 +01:00
Clément Renault 40c0b14d1c
Reintroduce searchable attributes and reordering 2019-12-13 14:38:25 +01:00
Clément Renault a4dd033ccf
Rename raw_matches into bare_matches 2019-12-13 14:38:25 +01:00
Clément Renault 48e8778881
Clean up the modules declarations 2019-12-13 14:38:25 +01:00
Clément Renault 4be23efe66
Remove the AttrCount type
Could probably be reintroduced later
2019-12-13 14:38:25 +01:00
Clément Renault 7d67750865
Reintroduce exacteness for one word document field 2019-12-13 14:38:25 +01:00
Clément Renault 746e6e170c
Make the test pass again 2019-12-13 14:38:24 +01:00
Clément Renault d93e35cace
Introduce ContextMut and Context structs 2019-12-13 14:38:24 +01:00
Clément Renault d75339a271
Prefer summing the attribute 2019-12-13 14:38:24 +01:00
Clément Renault 86ee0cbd6e
Introduce bucket_sort_with_distinct function 2019-12-13 14:38:24 +01:00
Clément Renault 248ccfc0d8
Update the criteria to the new ones 2019-12-13 14:38:24 +01:00
Clément Renault ea148575cf
Remove the raw_query functions 2019-12-13 14:38:23 +01:00
Clément Renault efc2be0b7b
Bump the sdset dependency to 0.3.6 2019-12-13 14:38:23 +01:00
Clément Renault 8d71112dcb
Rewrite the phrase query postings lists
This simplified the multiword_rewrite_matches function a little bit.
2019-12-13 14:38:23 +01:00
Clément Renault dd03a6256a
Debug pre filtered number of documents 2019-12-13 14:38:23 +01:00
Clément Renault 9c03bb3428
First probably working phrase query doc filtering 2019-12-13 14:38:23 +01:00
Clément Renault 22b19c0d93
Fix the processed distance algorithm 2019-12-13 14:38:22 +01:00
Clément Renault 0f698d6bd9
Work in progress: Bad Typo detection
I have an issue where "speakers" is split into "speaker" and "s",
when I compute the distances for the Typo criterion,
it takes "s" into account and put a distance of zero in the bucket 0
(the "speakers" bucket), therefore it reports any document matching "s"
without typos as best results.

I need to make sure to ignore "s" when its associated part "speaker"
doesn't even exist in the document and is not in the place
it should be ("speaker" followed by "s").

This is hard to think that it will had much computation time to
the Typo criterion like in the previous algorithm where I computed
the real query/words indexes based and removed the invalid ones
before sending the documents to the bucket sort.
2019-12-13 14:38:22 +01:00
Clément Renault 4e91b31b1f
Make the Typo and Words work with synonyms 2019-12-13 14:38:22 +01:00
Clément Renault f87c67fcad
Improve the QueryEnhancer by doing a single lookup 2019-12-13 14:38:22 +01:00
Clément Renault 902625601a
Work in progress: It seems like we support synonyms, split and concat words 2019-12-13 14:38:22 +01:00
Clément Renault d17d4dc5ec
Add more debug infos 2019-12-13 14:38:21 +01:00
Clément Renault ef6a4db182
Before improving fields AttrCount
Removing the fields_count fetching reduced by 2 times the serach time, we should look at lazily pulling them form the criterions in needs

ugly-test: Make the fields_count fetching lazy

Just before running the exactness criterion
2019-12-13 14:38:21 +01:00
Clément Renault 11f3d7782d
Introduce the AttrCount type 2019-12-13 14:38:21 +01:00
Clément Renault 951f0bcb10
sqaush-me: Improve benchmarks naming 2019-12-13 14:17:40 +01:00
Clément Renault d8ba405baf
Add some criterion benchmarks to help mesure improvements 2019-12-13 14:17:40 +01:00
Quentin de Quelen 3a4130f344 Allow to index files with null or boolean 2019-12-12 19:25:05 +01:00
Quentin de Quelen 88b3c05155 Stop words; Do not reindex all documents if there is no documents 2019-12-12 15:31:39 +01:00
Quentin de Quelen a4f26e8e48 Rewrite the synonym endpoint 2019-12-12 12:47:02 +01:00
Clément Renault dc1849d291
Bump heed to 0.6.1 2019-12-07 11:49:45 +01:00
Clément Renault 29fd54dcfa
Allow users to send csv files from stdin in examples 2019-12-05 12:23:56 +01:00
Thomas Payet 51636402c2 Add debian package in CI 2019-12-04 18:02:30 +01:00
Clément Renault 4f87465f18
Bump meilisearch crates to v0.8.4 2019-12-03 17:22:45 +01:00
qdequele 773a51e7d0
Rename 'update_type' to 'type' on EnqueuedUpdateResult 2019-11-29 15:09:48 +01:00
qdequele 7923752513
Serialize updates results to camelCase 2019-11-29 15:05:54 +01:00
Clément Renault 30cb60f679
Bump meilisearch crates to v0.8.3 2019-11-29 14:06:17 +01:00
qdequele 3a90233a3d
Add status failed on UpdateStatus 2019-11-28 18:41:11 +01:00
Clément Renault 9a2b4d08e1
Bump meilisearch crates to v0.8.2 2019-11-28 17:15:13 +01:00
Clément Renault 1def56ea11
Change the update loop to be more explicit on index clear 2019-11-27 13:43:28 +01:00
Clément Renault f6fb31c531
Bump meilisearch crates to v0.8.1 2019-11-27 11:47:27 +01:00
Clément Renault d08b76a323
Separate the update and main databases
We used the heed typed transaction to make it safe (https://github.com/Kerollmops/heed/pull/27).
2019-11-27 11:29:06 +01:00
Clément Renault 7cc096e0a2
Rename MeiliDB into MeiliSearch 2019-11-26 11:12:30 +01:00