Commit Graph

173 Commits

Author SHA1 Message Date
bors[bot] 414b3fae89
Merge #3571
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops

# Pull Request

## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner.

## What does this PR do?

### `IS NULL` and `IS NOT NULL`

This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.

- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`

1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`

`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3

common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2

### `IS EMPTY` and `IS NOT EMPTY`

- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.

1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`

`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6

common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9

## What should the reviewer do?

- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents


Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-04-27 13:14:00 +00:00
Clément Renault cfd1b2cc97
Fix the clippy warnings 2023-04-25 16:40:32 +02:00
Clément Renault 64571c8288
Improve the testing of the filters 2023-03-15 14:57:17 +01:00
Clément Renault df48ac8803
Add one more test for the NULL operator 2023-03-09 13:53:37 +01:00
ManyTheFish 8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
Tamo 74dcfe9676
Fix a bug when you update a document that was already present in the db, deleted and then inserted again in the same transform 2023-02-14 19:09:40 +01:00
Tamo 93db755d57
add a test to ensure we handle correctly a deletion of multiple time the same document 2023-02-08 21:03:34 +01:00
Tamo 421a9cf05e
provide a new method on the transform to remove documents 2023-02-08 16:06:09 +01:00
Kerollmops fbec48f56e
Merge remote-tracking branch 'milli/main' into bring-v1-changes 2023-02-06 16:48:10 +01:00
f3r10 7681be5367 Format code 2023-01-31 11:28:05 +01:00
f3r10 50bc156257 Fix tests 2023-01-31 11:28:05 +01:00
f3r10 a27f329e3a Add tests for checking that detected script and language associated with document(s) were stored during indexing 2023-01-31 11:28:05 +01:00
Philipp Ahlner f5ca421227
Superfluous test removed 2023-01-19 15:39:21 +01:00
Clément Renault 1b78231e18
Make clippy happy 2023-01-17 18:25:54 +01:00
bors[bot] 6a10e85707
Merge #736
736: Update charabia r=curquiza a=ManyTheFish

Update Charabia to the last version.

> We are now Romanizing Chinese characters into Pinyin.
> Note that we keep the accent because they are in fact never typed directly by the end-user, moreover, changing an accent leads to a different Chinese character, and I don't have sufficient knowledge to forecast the impact of removing accents in this context.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-01-03 15:44:41 +00:00
Louis Dureuil 4b166bea2b
Add primary_key_inference test 2022-12-21 15:13:38 +01:00
Louis Dureuil 5943100754
Fix existing tests 2022-12-21 15:13:38 +01:00
Louis Dureuil fc7618d49b
Add DeletionStrategy 2022-12-19 09:49:58 +01:00
ManyTheFish 7f88c4ff2f Fix #1714 test 2022-12-15 18:22:28 +01:00
Loïc Lecrenier be3b00350c Apply review suggestions: naming and documentation 2022-12-13 10:15:22 +01:00
Loïc Lecrenier e3ee553dcc Remove soft deleted ids from ExternalDocumentIds during document import
If the document import replaces a document using hard deletion
2022-12-12 14:16:09 +01:00
Loïc Lecrenier f2cf981641 Add more tests and allow disabling of soft-deletion outside of tests
Also allow disabling soft-deletion in the IndexDocumentsConfig
2022-12-05 10:51:01 +01:00
Loïc Lecrenier 990a861241 Add test for indexing a document with a long facet value 2022-11-17 11:29:42 +01:00
unvalley c7322f704c Fix cargo clippy errors
Dont apply clippy for tests for now

Fix clippy warnings of filter-parser package

parent 8352febd646ec4bcf56a44161e5c4dce0e55111f
author unvalley <38400669+unvalley@users.noreply.github.com> 1666325847 +0900
committer unvalley <kirohi.code@gmail.com> 1666791316 +0900

Update .github/workflows/rust.yml

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>

Allow clippy lint too_many_argments

Allow clippy lint needless_collect

Allow clippy lint too_many_arguments and type_complexity

Fix for clippy warnings comparison_chains

Fix for clippy warnings vec_init_then_push

Allow clippy lint should_implement_trait

Allow clippy lint drop_non_drop

Fix lifetime clipy warnings in filter-paprser

Execute cargo fmt

Fix clippy remaining warnings

Fix clippy remaining warnings again and allow lint on each place
2022-10-27 01:04:23 +09:00
unvalley 811f156031 Execute cargo clippy --fix 2022-10-27 01:00:00 +09:00
Loïc Lecrenier 2741756248 Merge remote-tracking branch 'origin/main' into facet-levels-refactor 2022-10-26 14:03:23 +02:00
Loïc Lecrenier b7f2428961 Fix formatting and warning after rebasing from main 2022-10-26 13:49:33 +02:00
Loïc Lecrenier 27454e9828 Document and refine facet indexing algorithms 2022-10-26 13:47:04 +02:00
Loïc Lecrenier 9026867d17 Give same interface to bulk and incremental facet indexing types
+ cargo fmt, oops, sorry for the bad history :(
2022-10-26 13:47:04 +02:00
Loïc Lecrenier 61252248fb Fix some facet indexing bugs 2022-10-26 13:47:04 +02:00
Loïc Lecrenier 85824ee203 Try to make facet indexing incremental 2022-10-26 13:47:04 +02:00
Loïc Lecrenier e8a156d682 Reorganise facets database indexing code 2022-10-26 13:46:46 +02:00
Loïc Lecrenier 7913d6365c Update Facets indexing to be compatible with new database structure 2022-10-26 13:46:14 +02:00
bors[bot] c8f16530d5
Merge #616
616: Introduce an indexation abortion function when indexing documents r=Kerollmops a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-26 11:41:18 +00:00
Loïc Lecrenier d76d0cb1bf Merge branch 'main' into word-pair-proximity-docids-refactor 2022-10-24 15:23:00 +02:00
Loïc Lecrenier a983129613 Apply suggestions from code review 2022-10-20 09:49:37 +02:00
Loïc Lecrenier 264a04922d Add prefix_word_pair_proximity database
Similar to the word_prefix_pair_proximity one but instead the keys are:
(proximity, prefix, word2)
2022-10-18 10:37:34 +02:00
Kerollmops 6603437cb1
Introduce an indexation abortion function when indexing documents 2022-10-17 17:28:03 +02:00
Ewan Higgs beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
bors[bot] 15d478cf4d
Merge #635
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec

# Pull Request
## What does this PR do?

Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.

In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.


Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:00:52 +00:00
Loïc Lecrenier 3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
Kerollmops d4d7c9d577
We avoid skipping errors in the indexing pipeline 2022-09-13 14:03:00 +02:00
Kerollmops c83c3cd796
Add a test to make sure that long words are correctly skipped 2022-09-07 14:12:36 +02:00
ManyTheFish 5391e3842c replace optional_words by term_matching_strategy 2022-08-22 17:47:19 +02:00
ManyTheFish 9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00
Irevoire e96b852107
bump heed 2022-08-17 17:05:50 +02:00
ManyTheFish e9e2349ce6 Fix typo in comment 2022-08-17 15:09:48 +02:00
ManyTheFish 2668f841d1 Fix update indexing 2022-08-17 15:03:37 +02:00
ManyTheFish 7384650d85 Update test to showcase the bug 2022-08-17 15:03:08 +02:00
Loïc Lecrenier 58cb1c1bda Simplify unit tests in facet/filter.rs 2022-08-04 12:03:44 +02:00