302 Commits

Author SHA1 Message Date
mpostma
55e6cb9c7b
typos on first letter counts as 2 2022-02-02 12:56:09 +01:00
mpostma
642c01d0dc
set max typos on ngram to 1 2022-02-02 12:56:08 +01:00
Marin Postma
0c84a40298 document batch support
reusable transform

rework update api

add indexer config

fix tests

review changes

Co-authored-by: Clément Renault <clement@meilisearch.com>

fmt
2022-01-19 12:40:20 +01:00
Tamo
01968d7ca7
ensure we get no documents and no error when filtering on an empty db 2022-01-18 11:40:30 +01:00
bors[bot]
8f4499090b
Merge #433
433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire

- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we return an empty RoaringBitmap
  instead of throwing an internal error

Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-17 14:06:53 +00:00
Tamo
d1ac40ea14
fix(filter): Fix two bugs.
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
  documents containing this field thus we returns an empty RoaringBitmap
  instead of throwing an internal error
2022-01-17 13:51:46 +01:00
Samyak S Sarnayak
2d7607734e
Run cargo fmt on matching_words.rs 2022-01-17 13:04:33 +05:30
Samyak S Sarnayak
5ab505be33
Fix highlight by replacing num_graphemes_from_bytes
num_graphemes_from_bytes has been renamed in the tokenizer to
num_chars_from_bytes.

Highlight now works correctly!
2022-01-17 13:02:55 +05:30
Samyak S Sarnayak
e752bd06f7
Fix matching_words tests to compile successfully
The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59
2022-01-17 11:37:45 +05:30
Samyak S Sarnayak
30247d70cd
Fix search highlight for non-unicode chars
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
  clusters to highlight.

In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?

Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
  clusters from tokens
- `<mark>` tag is put around only the matched part
    - before this change, the entire word was highlighted even if only a
      part of it matched
2022-01-17 11:37:44 +05:30
Tamo
98a365aaae
store the geopoint in three dimensions 2021-12-14 12:21:24 +01:00
Clément Renault
25faef67d0
Remove the database setup in the filter_depth test 2021-12-09 11:57:53 +01:00
Clément Renault
65519bc04b
Test that empty filters return a None 2021-12-09 11:57:53 +01:00
Clément Renault
ef59762d8e
Prefer returning None instead of the Empty Filter state 2021-12-09 11:57:52 +01:00
Clément Renault
ee856a7a46
Limit the max filter depth to 2000 2021-12-07 17:36:45 +01:00
Clément Renault
32bd9f091f
Detect the filters that are too deep and return an error 2021-12-07 17:20:11 +01:00
Clément Renault
90f49eab6d
Check the filter max depth limit and reject the invalid ones 2021-12-07 16:32:48 +01:00
Marin Postma
6eb47ab792 remove update_id in UpdateBuilder 2021-11-16 13:07:04 +01:00
Irevoire
0ea0146e04
implement deref &str on the tokens 2021-11-09 11:34:10 +01:00
Tamo
7483c7513a
fix the filterable fields 2021-11-07 01:52:19 +01:00
Tamo
e5af3ac65c
rename the filter_condition.rs to filter.rs 2021-11-06 16:37:55 +01:00
Tamo
6831c23449
merge with main 2021-11-06 16:34:30 +01:00
Tamo
b249989bef
fix most of the tests 2021-11-06 01:32:12 +01:00
Tamo
27a6a26b4b
makes the parse function part of the filter_parser 2021-11-05 10:46:54 +01:00
Tamo
76d961cc77
implements the last errors 2021-11-04 17:42:06 +01:00
Tamo
8234f9fdf3
recreate most filter error except for the geosearch 2021-11-04 17:24:55 +01:00
Tamo
07a5ffb04c
update http-ui 2021-11-04 15:52:22 +01:00
Tamo
a58bc5bebb
update milli with the new parser_filter 2021-11-04 15:02:36 +01:00
Tamo
76a2adb7c3
re-enable the tests in the parser and start the creation of an error type 2021-11-02 17:35:17 +01:00
many
ed6db19681
Fix PR comments 2021-10-28 11:18:32 +02:00
many
2be755ce75
Lower error check, already check in meilisearch 2021-10-27 19:50:41 +02:00
many
3599df77f0
Change some error messages 2021-10-27 19:33:01 +02:00
bors[bot]
d7943fe225
Merge #402
402: Optimize document transform r=MarinPostma a=MarinPostma

This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
- For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation.
- For csv, we directly write the lines in the obkv, applying other optimization as well.

Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-10-26 09:55:28 +00:00
Clémentine Urquizar
208903ddde
Revert "Replacing pest with nom " 2021-10-25 11:58:00 +02:00
marin postma
2e62925a6e
fix tests 2021-10-25 10:26:42 +02:00
marin postma
8d70b01714
optimize document deserialization 2021-10-25 10:26:42 +02:00
Tamo
1327807caa
add some error messages 2021-10-22 19:00:33 +02:00
Tamo
c8d03046bf
add a check on the fid in the geosearch 2021-10-22 18:08:18 +02:00
Tamo
3942b3732f
re-implement the geosearch 2021-10-22 18:03:39 +02:00
Tamo
7cd9109e2f
lowercase value extracted from Token 2021-10-22 17:50:15 +02:00
Tamo
e25ca9776f
start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
Tamo
6c9165b6a8
provide a helper to parse the token but to not handle the errors 2021-10-22 16:52:13 +02:00
Tamo
efb2f8b325
convert the errors 2021-10-22 16:38:35 +02:00
Tamo
c27870e765
integrate a first version without any error handling 2021-10-22 14:33:18 +02:00
Tamo
01dedde1c9
update some names and move some parser out of the lib.rs 2021-10-22 01:59:38 +02:00
Tamo
c634d43ac5
add a simple test on the filters with an integer 2021-10-21 17:10:27 +02:00
Tamo
6c15f50899
rewrite the parser logic 2021-10-21 16:45:42 +02:00
Tamo
e1d81342cf
add test on the or and and operator 2021-10-21 13:01:25 +02:00
Tamo
423baac08b
fix the tests 2021-10-21 12:45:40 +02:00
Tamo
36281a653f
write all the simple tests 2021-10-21 12:40:11 +02:00