Clément Renault
29d021ad4d
Fixes the stop words and words fst generation
2020-04-06 18:53:02 +02:00
sgummaluri
e5a336a042
Fix for 'First update does not appear before being processed' #542
2020-04-04 23:18:43 +05:30
Clément Renault
38c43759bb
Update most of the dependencies
2020-04-02 18:36:04 +02:00
Pedro Paulo de Amorim
9950fffb6f
Simplify imports of std::fs and std::io, remove space not needed, Remove UpdateState
2020-04-02 11:02:19 +01:00
Pedro Paulo de Amorim
f5d57c9dce
Replace the toml reader with the JSON settings reader, directly parse the data to SettingsUpdate, Update CHANGELOG
2020-04-02 11:01:56 +01:00
Pedro Paulo de Amorim
1b47a10e89
Add support for seq values
2020-04-01 12:59:40 +01:00
Pedro Paulo de Amorim
690b8e0dd0
Replace .toString to String::new()
2020-03-31 14:01:44 +01:00
Pedro Paulo de Amorim
bc6d86c8ce
serialize_unit returns a empty string
2020-03-31 13:51:12 +01:00
Clément Renault
69aee870da
Make the engine index booleans
...
The engine will see the values like text "true" and "false"
2020-03-31 10:39:58 +02:00
Clément Renault
c18e907f96
Construct a Set using the from_dirty method
...
This commit fixes #566 by ensuring that the slice of matches is
ordered and deduplicated.
2020-03-30 20:56:30 +02:00
mposmta
a6dcd7a421
fixes tests
...
fixes tests impacted by sifnature change of query
2020-03-25 15:17:20 +01:00
mposmta
fd65cf9dcb
populates exhaustive number of hits
2020-03-25 12:44:38 +01:00
Clément Renault
3ca8db2cc1
Bump the workspace crates to 0.9.0
2020-03-19 11:56:23 +01:00
Clément Renault
f6972ec682
Bump the workspace crates to 0.9.0-rc.1
2020-03-16 16:58:20 +01:00
Quentin de Quelen
2d82f1b655
ranking fields should be stored and indexed by default; fix #521
2020-03-16 16:19:23 +01:00
qdequele
ef3bcd65ab
fix comments from review
2020-03-10 15:59:11 +01:00
qdequele
179969a9e2
fix tests + fmt
2020-03-10 11:29:56 +01:00
qdequele
c984d8d5a5
rename identifier into primaryKey; fix #514
2020-03-09 18:45:29 +01:00
qdequele
8ffa80883a
remove the unused function
2020-03-09 18:45:29 +01:00
qdequele
86c3482cbd
review the internal schema to allow to create schema without identifier; fix #513
2020-03-09 18:45:20 +01:00
qdequele
6016f2e941
change wording of custom ranking rules dsc
-> desc
; #490
2020-03-06 10:15:19 +01:00
Clément Renault
5e31d28759
Fix the inference of the documents searchable fields
2020-03-03 20:54:17 +01:00
qdequele
a2f0f95337
use distinct on search
2020-03-02 16:19:41 +01:00
qdequele
250aeaa86c
stop reindexing by chunk during complete reindexing
2020-02-28 11:49:12 +01:00
qdequele
47009615ee
rename words_position to wordsPosition; fix #483
2020-02-27 16:24:49 +01:00
qdequele
dda08d60d2
cargo fmt
2020-02-27 14:33:57 +01:00
qdequele
f182afc50b
update tests
2020-02-27 11:30:23 +01:00
qdequele
bb5d931f16
rename criterions on settings route; fix #480
2020-02-27 11:30:22 +01:00
qdequele
3c74e71d4f
show default ranking rules if user reset them; fix #476
2020-02-27 11:30:17 +01:00
qdequele
79e07fa852
reset value of searchable and displayed attributes; fix #473
2020-02-27 11:04:39 +01:00
qdequele
2eb6f81c58
rename ranking_distinct to distinct_attribute; fix #474
2020-02-27 11:04:39 +01:00
qdequele
a067a1b16b
replace index_new_fields to accept_new_fields; fix #475
2020-02-27 11:04:38 +01:00
Clément Renault
96248d9bfa
Change the exactness criterion in the tests
2020-02-25 14:24:15 +01:00
Clément Renault
9d167c08f4
Rename the Exact criterion into Exactness
2020-02-25 14:16:55 +01:00
qdequele
2d7a1bfce0
fix un-rankable fields errors; fix #463
2020-02-14 10:34:33 +01:00
qdequele
4986adc186
move identifier from settings to index; fix #470
2020-02-12 17:00:14 +01:00
qdequele
dc9ca2ebc9
fixes for review
2020-02-12 16:51:14 +01:00
qdequele
559c2f8907
Add stop words on query
2020-02-11 15:28:00 +01:00
Quentin de Quelen
dc6907e748
rebase from master
2020-02-11 15:28:00 +01:00
qdequele
a5b0e468ee
fix for review
2020-02-11 15:28:00 +01:00
qdequele
50a9825a0f
fix some uses cases on settings
2020-02-11 15:27:59 +01:00
Quentin de Quelen
585bba43a0
set new attributes indexed if needed
2020-02-11 15:27:58 +01:00
qdequele
9c0497c419
change the way settings are show in updates
2020-02-11 15:27:58 +01:00
qdequele
f77f38dfa0
fix update system
2020-02-11 15:27:57 +01:00
qdequele
58fe87067b
finish settings
2020-02-11 15:27:57 +01:00
Quentin de Quelen
dbba310770
squash me
2020-02-11 15:27:57 +01:00
Quentin de Quelen
6deb481589
definitely remove attributes_ranked on settings; auto create it with ranking_rules
2020-02-11 15:27:57 +01:00
Quentin de Quelen
036977bfe4
add the possibility to totally clear the schema
2020-02-11 15:27:57 +01:00
Quentin de Quelen
7a6f583b1f
fix issue on ranking rules
2020-02-11 15:27:56 +01:00
Quentin de Quelen
e078eafb1f
clean unused functions
2020-02-11 15:27:56 +01:00
Quentin de Quelen
6f534540a6
fix error on stop words fst
2020-02-11 15:27:56 +01:00
qdequele
38d57d213f
expose api for new settings
2020-02-11 15:27:56 +01:00
qdequele
ae0a11e422
fix schema & fix tests
2020-02-11 15:27:55 +01:00
qdequele
a35eb16a2a
store the schema after each document updates
2020-02-11 15:27:54 +01:00
qdequele
4f0ead625b
adapt meilisearch-http to the new schemaless option
2020-02-11 15:27:54 +01:00
qdequele
21d122a870
rewrite indexed_pos -> field_id for hightligths
2020-02-11 15:27:54 +01:00
qdequele
130fb74928
introduce a new schemaless way
2020-02-11 15:27:54 +01:00
qdequele
bbe1845f66
squash-me
2020-02-11 15:27:54 +01:00
qdequele
2ee90a891c
introduce a new settings update system
2020-02-11 15:27:54 +01:00
qdequele
110adcae85
Remove the schema; fix #422
2020-02-11 15:27:53 +01:00
qdequele
91c6539baf
Rewrite the stop-words endpoint; fix #417
2020-02-11 15:27:53 +01:00
qdequele
f0590d3301
Change documents routes; fix #416
2020-02-11 15:27:53 +01:00
Clément Renault
7c0d8f073b
Support compaction with multi database
2020-01-24 17:38:14 +01:00
Clément Renault
a2bc689b92
Fix the tests a little bit
2020-01-22 18:12:56 +01:00
Clément Renault
a9adbda2cd
Make the engine support non-exact multi-words synonyms
2020-01-22 18:11:58 +01:00
Clément Renault
0b9fe2c072
Introduce the new Query Tree creation supporting more operations
2020-01-22 17:46:46 +01:00
Clément Renault
789e05304c
Replace prints by debug logs
2020-01-21 11:05:34 +01:00
Clément Renault
7604387701
Clean up the dependencies
2020-01-21 11:04:25 +01:00
Clément Renault
daffcaf4c6
Make the docids OR operation method conditional
2020-01-19 12:29:06 +01:00
Clément Renault
ff1ec599e0
Try a better version of sdset
2020-01-19 12:01:24 +01:00
Clément Renault
e44d498c94
Display more debug info for prefix tolerant fetches
2020-01-19 11:07:32 +01:00
Clément Renault
c334d6b7fe
Avoid sorting sorted sequences, prefer using set operations
2020-01-19 10:58:01 +01:00
Clément Renault
5465e401bb
Catch query tree related errors
2020-01-17 10:41:27 +01:00
Clément Renault
9cc3c56c9c
Fix the prefix system
2020-01-16 18:41:27 +01:00
Clément Renault
d7a7560220
Use an union instead of a sort for prefix fetching
2020-01-16 17:09:27 +01:00
Clément Renault
70a529d197
Reduce the number of args of update functions
2020-01-16 16:29:50 +01:00
Clément Renault
be31a14326
Make the clear all operation clear caches
2020-01-16 16:19:04 +01:00
Clément Renault
96139da0d2
Reintroduce the distinct search system
2020-01-16 15:55:55 +01:00
Clément Renault
74fa9ee4df
Introduce a better higlighting system
2020-01-16 14:56:16 +01:00
Clément Renault
00336c5154
Reintroduce a basic highlight display
2020-01-16 14:24:45 +01:00
Clément Renault
3912d1ec4b
Improve query parsing and interpretation
2020-01-16 14:11:17 +01:00
Clément Renault
70d4f47f37
Differentiate short words as prefix or exact matches
2020-01-16 12:01:51 +01:00
Clément Renault
9809ded23d
Implement synonym fetching
2020-01-16 11:38:23 +01:00
Clément Renault
5f9a3546e0
Use an union instead of a sort for OR ops
2020-01-15 15:14:24 +01:00
Clément Renault
db625a08f7
Update lock file
2020-01-15 12:25:14 +01:00
Clément Renault
44fec1b6c9
Cache prefixes of a length of 2
2020-01-14 18:17:52 +01:00
Clément Renault
54dacb362d
Use different algorithms for different documents ratios
2020-01-14 17:51:08 +01:00
Clément Renault
6edb460bea
Try with an exponential search
2020-01-14 16:52:24 +01:00
Clément Renault
40dab80dfa
Change the way we filter the documents
2020-01-14 14:18:01 +01:00
Clément Renault
681711fced
Fix query ids to be usize
2020-01-14 13:12:42 +01:00
Clément Renault
21c1473e0c
Introduce the distance data
2020-01-14 11:38:04 +01:00
Clément Renault
8acbdcbbad
wip: Make the new query tree work with the criteria
2020-01-13 14:36:06 +01:00
Clément Renault
da8abebfa2
Introduce the query words mapping along with the query tree
2020-01-13 13:29:47 +01:00
Clément Renault
4f7a7ea0bb
Faster intersection group by
2020-01-09 16:30:03 +01:00
Clément Renault
d6c9ba8f08
Store the postings lists
2020-01-09 15:04:53 +01:00
Clément Renault
ec8916bf54
Change the debug outputs
2020-01-09 12:05:39 +01:00
Clément Renault
81c573ec92
Add the raw document IDs to the postings lists
2020-01-08 15:30:43 +01:00
Clément Renault
9420edadf4
Introduce the Postings type to decorrelate the DocumentIds
2020-01-08 14:48:23 +01:00
Clément Renault
d724a7659e
Introduce a query tree context struct
2020-01-08 13:37:22 +01:00
Clément Renault
887c212b49
Add more logs about the docids construction
2020-01-08 13:22:42 +01:00
Clément Renault
07937ed6d7
Use the prefix caches
2020-01-08 13:14:07 +01:00
Clément Renault
a262c67ec3
limit the search in the FST
2020-01-08 13:06:12 +01:00
Clément Renault
13ca30c4d8
WIP: Made the query tree traversing support prefix search
2020-01-08 12:02:58 +01:00
Clément Renault
fbcec2975d
wip: Impl a basic tree traversing
2020-01-07 18:24:13 +01:00
Clément Renault
6e1f4af833
wip: Create a tree from query but need to show synonyms
2020-01-07 18:24:13 +01:00
Clément Renault
856c5c4214
Fix group offset computing
2019-12-31 14:24:10 +01:00
Clément Renault
670e80c151
Use the cached postings lists in the query system
2019-12-31 13:32:36 +01:00
Clément Renault
eed07c724f
Add more logging for postings lists fetching by word
2019-12-31 13:32:36 +01:00
Clément Renault
99d35fb940
Introduce a first version of a number of candidates reducer
...
It works by ignoring the postings lists associated to documents that the previous words did not returned
2019-12-31 13:32:36 +01:00
Clément Renault
106b886873
Cache the prefix postings lists
2019-12-30 18:01:32 +01:00
Clément Renault
928876b553
Introduce the postings lists caching stores
...
Currently not used
2019-12-30 18:01:27 +01:00
Clément Renault
58836d89aa
Rename the PrefixCache into PrefixDocumentsCache
2019-12-30 15:42:09 +01:00
Clément Renault
1a5a104f13
Display proximity evaluation number of calls
2019-12-30 15:42:09 +01:00
Clément Renault
064cfa4755
Add more debug, where are those 100ms
2019-12-30 15:42:08 +01:00
Clément Renault
ed6172aa94
Add a time measurement of the criterion loop
2019-12-30 15:42:08 +01:00
Clément Renault
8c140f6bcd
Increase the disk usage limit
2019-12-30 15:42:08 +01:00
Clément Renault
1e1f0fcaf5
Introduce a basic cache system for first letters
2019-12-30 15:42:08 +01:00
Clément Renault
d21352a109
Change the time measurement of the FST
2019-12-30 15:42:08 +01:00
Clément Renault
4be11f961b
Use an ugly trick to avoid cloning the FST
2019-12-30 15:42:07 +01:00
Clément Renault
1163f390b3
Restrict FST search to the first letter of the word
2019-12-30 15:42:07 +01:00
Clément Renault
691e2a3c1d
Fix a blocking channel, appearing like a deadlock
2019-12-30 15:28:28 +01:00
Clément Renault
04bb49989f
Add more debug timings
2019-12-20 14:18:48 +01:00
Clément Renault
d12ff15ee3
Set the indexes info in the create_index function
2019-12-19 10:38:56 +01:00
Clément Renault
40c0b14d1c
Reintroduce searchable attributes and reordering
2019-12-13 14:38:25 +01:00
Clément Renault
a4dd033ccf
Rename raw_matches into bare_matches
2019-12-13 14:38:25 +01:00
Clément Renault
48e8778881
Clean up the modules declarations
2019-12-13 14:38:25 +01:00
Clément Renault
4be23efe66
Remove the AttrCount type
...
Could probably be reintroduced later
2019-12-13 14:38:25 +01:00
Clément Renault
7d67750865
Reintroduce exacteness for one word document field
2019-12-13 14:38:25 +01:00
Clément Renault
746e6e170c
Make the test pass again
2019-12-13 14:38:24 +01:00
Clément Renault
d93e35cace
Introduce ContextMut and Context structs
2019-12-13 14:38:24 +01:00
Clément Renault
d75339a271
Prefer summing the attribute
2019-12-13 14:38:24 +01:00
Clément Renault
86ee0cbd6e
Introduce bucket_sort_with_distinct function
2019-12-13 14:38:24 +01:00
Clément Renault
248ccfc0d8
Update the criteria to the new ones
2019-12-13 14:38:24 +01:00
Clément Renault
ea148575cf
Remove the raw_query functions
2019-12-13 14:38:23 +01:00
Clément Renault
efc2be0b7b
Bump the sdset dependency to 0.3.6
2019-12-13 14:38:23 +01:00
Clément Renault
8d71112dcb
Rewrite the phrase query postings lists
...
This simplified the multiword_rewrite_matches function a little bit.
2019-12-13 14:38:23 +01:00
Clément Renault
dd03a6256a
Debug pre filtered number of documents
2019-12-13 14:38:23 +01:00
Clément Renault
9c03bb3428
First probably working phrase query doc filtering
2019-12-13 14:38:23 +01:00
Clément Renault
22b19c0d93
Fix the processed distance algorithm
2019-12-13 14:38:22 +01:00
Clément Renault
0f698d6bd9
Work in progress: Bad Typo detection
...
I have an issue where "speakers" is split into "speaker" and "s",
when I compute the distances for the Typo criterion,
it takes "s" into account and put a distance of zero in the bucket 0
(the "speakers" bucket), therefore it reports any document matching "s"
without typos as best results.
I need to make sure to ignore "s" when its associated part "speaker"
doesn't even exist in the document and is not in the place
it should be ("speaker" followed by "s").
This is hard to think that it will had much computation time to
the Typo criterion like in the previous algorithm where I computed
the real query/words indexes based and removed the invalid ones
before sending the documents to the bucket sort.
2019-12-13 14:38:22 +01:00
Clément Renault
4e91b31b1f
Make the Typo and Words work with synonyms
2019-12-13 14:38:22 +01:00
Clément Renault
f87c67fcad
Improve the QueryEnhancer by doing a single lookup
2019-12-13 14:38:22 +01:00
Clément Renault
902625601a
Work in progress: It seems like we support synonyms, split and concat words
2019-12-13 14:38:22 +01:00
Clément Renault
d17d4dc5ec
Add more debug infos
2019-12-13 14:38:21 +01:00
Clément Renault
ef6a4db182
Before improving fields AttrCount
...
Removing the fields_count fetching reduced by 2 times the serach time, we should look at lazily pulling them form the criterions in needs
ugly-test: Make the fields_count fetching lazy
Just before running the exactness criterion
2019-12-13 14:38:21 +01:00
Clément Renault
11f3d7782d
Introduce the AttrCount type
2019-12-13 14:38:21 +01:00
Clément Renault
951f0bcb10
sqaush-me: Improve benchmarks naming
2019-12-13 14:17:40 +01:00
Clément Renault
d8ba405baf
Add some criterion benchmarks to help mesure improvements
2019-12-13 14:17:40 +01:00
Quentin de Quelen
3a4130f344
Allow to index files with null or boolean
2019-12-12 19:25:05 +01:00
Quentin de Quelen
88b3c05155
Stop words; Do not reindex all documents if there is no documents
2019-12-12 15:31:39 +01:00