Commit Graph

7474 Commits

Author SHA1 Message Date
bors[bot]
270da98c46
Merge #202
202: Add field id word count docids database r=Kerollmops a=LegendreM

This PR introduces a new database, `field_id_word_count_docids`, that maps the number of words in an attribute with a list of document ids. This relation is limited to attributes that contain less than 11 words.
This database is used by the exactness criterion to know if a document has an attribute that contains exactly the query without any additional word.

Fix #165 
Fix #196
Related to [specifications:#36](https://github.com/meilisearch/specifications/pull/36)

Co-authored-by: many <maxime@meilisearch.com>
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-06-01 16:09:48 +00:00
many
e857ca4d7d
Fix PR comments 2021-06-01 18:06:46 +02:00
Many
ab2cf69e8d
Update milli/src/update/delete_documents.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:10 +02:00
Many
8e6d1ff0dc
Update milli/src/update/index_documents/store.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-01 17:04:02 +02:00
bors[bot]
168fe0aa28
Merge #206
206: Fix http-ui r=Kerollmops a=irevoire

I just noticed that `http-ui` was not compiling on `main`.
I'm not sure this is the best fix, but it works 👀

Co-authored-by: Tamo <irevoire@hotmail.fr>
2021-06-01 14:31:32 +00:00
Tamo
608c5bad24
fix http-ui 2021-06-01 16:24:46 +02:00
bors[bot]
7d36d664a7
Merge #203
203: Make the MatchingWords return the number of matching bytes r=Kerollmops a=LegendreM

Make the MatchingWords return the number of matching bytes using a custom Levenshtein algorithm.

Fix #138

Co-authored-by: many <maxime@meilisearch.com>
2021-06-01 12:00:33 +00:00
many
225ae6fd25
Resolve PR comments 2021-06-01 11:53:09 +02:00
bors[bot]
3a7c1f2469
Merge #191
191: dumps v2 r=irevoire a=MarinPostma



Co-authored-by: Marin Postma <postma.marin@protonmail.com>
Co-authored-by: marin <postma.marin@protonmail.com>
2021-06-01 09:46:31 +00:00
marin
df6ba0e824
Apply suggestions from code review
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-06-01 11:18:37 +02:00
bors[bot]
2f9f6a1f21
Merge #169
169: Optimize roaring codec r=Kerollmops a=MarinPostma

Optimize the `BoRoaringBitmapCodec` by preventing it from emiting useless error that caused allocation. On my flamegraph, the byte_decode function went from 4.13% to  1.70% (of transplant graph).

This may not be the greatest optimization ever, but hey, this was a low hanging fruit.

before:
![image](https://user-images.githubusercontent.com/28804882/116241125-17018880-a754-11eb-9f9d-a67418d100e1.png)
after:
![image](https://user-images.githubusercontent.com/28804882/116241167-21bc1d80-a754-11eb-9afc-d9d72727477c.png)



Co-authored-by: Marin Postma <postma.marin@protonmail.com>
2021-06-01 06:30:25 +00:00
Marin Postma
984dc7c1ed
rewrite roaring codec without byteorder. 2021-05-31 22:15:39 +02:00
Marin Postma
1373637da1
optimize roaring codec 2021-05-31 22:15:35 +02:00
Marin Postma
6609f9e3be review edits 2021-05-31 18:41:37 +02:00
many
1df68d342a
Make the MatchingWords return the number of matching bytes 2021-05-31 18:22:29 +02:00
many
b8e6db0feb
Add database in infos crate 2021-05-31 16:29:27 +02:00
many
c701f8bf36
Use field id word count database in exactness criterion 2021-05-31 16:27:28 +02:00
many
4ddf008be2
add field id word count database 2021-05-31 16:27:28 +02:00
Marin Postma
1c4f0b2ccf
clippy, fmt & tests 2021-05-31 16:03:39 +02:00
Marin Postma
10fc870684
improve dump info reports 2021-05-31 15:49:04 +02:00
bors[bot]
2f5e61bacb
Merge #184
184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops

This pull request is related to https://github.com/meilisearch/milli/issues/152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database.

There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28](d916b57d74/text/0028-indexing-csv.md) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section.

All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good!

Fixes #192.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-05-31 13:32:58 +00:00
Kerollmops
1c0a5cd136
Resolve code modification suggestions 2021-05-31 15:22:50 +02:00
tamo
dffbaca63b
bump sentry version 2021-05-31 13:59:31 +02:00
Marin Postma
b3c8f0e1f6
fix empty index error 2021-05-31 10:58:51 +02:00
Marin Postma
bc5a5e37ea
fix dump v1 2021-05-31 10:42:31 +02:00
Marin Postma
33c6c4f0ee
add timestamos to dump info 2021-05-30 15:55:17 +02:00
Marin Postma
39c16c0fe4
fix dump import 2021-05-30 12:35:17 +02:00
Marin Postma
1cb64caae4
dump content is now only uuid 2021-05-29 00:08:17 +02:00
Marin Postma
b258f4f394
fix dump import 2021-05-27 14:30:20 +02:00
Marin Postma
c47369839b
dump meta 2021-05-27 10:51:19 +02:00
Marin Postma
b924e897f1
load index dump 2021-05-27 10:27:47 +02:00
Marin Postma
e818c33fec
implement load uuid_resolver 2021-05-26 20:42:09 +02:00
bors[bot]
76b9178b16
Merge #200
200: Fix plane sweep algorithm r=Kerollmops a=LegendreM

Fix plain sweep algorithm after creating some tests on proximity.

Co-authored-by: many <maxime@meilisearch.com>
2021-05-26 11:36:24 +00:00
many
a5e98cf46d
Fix plane sweep algorithm 2021-05-25 18:21:55 +02:00
Marin Postma
9278a6fe59
integrate in dump actor 2021-05-25 18:14:11 +02:00
Marin Postma
3593ebb8aa
dump updates 2021-05-25 16:44:58 +02:00
Marin Postma
464639aa0f
udpate actor error improvements 2021-05-25 16:44:58 +02:00
Marin Postma
4acbe8e473
implement index dump 2021-05-25 16:44:58 +02:00
Marin Postma
7ad553670f
index error handling 2021-05-25 16:44:58 +02:00
Marin Postma
2185fb8367
dump uuid resolver 2021-05-25 16:44:54 +02:00
marin
cbcf50960f
Merge pull request #192 from meilisearch/dumps-tasks
Dumps tasks
2021-05-25 15:49:15 +02:00
tamo
89846d1656
improve panic message 2021-05-25 15:47:57 +02:00
tamo
e5175f5dc1
merge 2021-05-25 15:24:39 +02:00
tamo
1a6dcec83a
crash when the actor have no inbox 2021-05-25 15:23:13 +02:00
Irevoire
fe260f1330
Update meilisearch-http/src/index_controller/dump_actor/actor.rs
Co-authored-by: marin <postma.marin@protonmail.com>
2021-05-25 15:13:47 +02:00
Kerollmops
5012cc3a32
Fix the http-ui crate to support split facet databases 2021-05-25 11:31:06 +02:00
Kerollmops
28bd9e183e
Fix the infos crate to support split facet databases 2021-05-25 11:31:06 +02:00
Clément Renault
3a4a150ef0
Fix the tests and remaining warnings 2021-05-25 11:31:06 +02:00
Clément Renault
02c655ff1a
Refine the facet distribution to use both databases 2021-05-25 11:30:00 +02:00
Clément Renault
79efded841
Refine the FacetCondition from_array constructor 2021-05-25 11:30:00 +02:00