Commit Graph

22 Commits

Author SHA1 Message Date
bors[bot] 414b3fae89
Merge #3571
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops

# Pull Request

## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner.

## What does this PR do?

### `IS NULL` and `IS NOT NULL`

This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.

- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`

1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`

`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3

common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2

### `IS EMPTY` and `IS NOT EMPTY`

- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.

1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`

`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6

common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9

## What should the reviewer do?

- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents


Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-04-27 13:14:00 +00:00
ManyTheFish efea1e5837 Fix facet normalization 2023-03-29 12:02:24 +02:00
Clément Renault ea016d97af
Implementing an IS EMPTY filter 2023-03-15 14:12:34 +01:00
Clément Renault 0ad53784e7
Create a new struct to reduce the type complexity 2023-03-09 13:21:21 +01:00
Clément Renault e106b16148
Fix a typo in a variable
Co-authored-by: Louis Dureuil <louis@meilisearch.com>

aaa
2023-03-09 13:08:02 +01:00
Clément Renault 7dc04747fd
Make clippy happy 2023-03-08 17:37:08 +01:00
Clément Renault 19ab4d1a15
Classify the NULL fields values in the facet extractor 2023-03-08 16:49:31 +01:00
ManyTheFish d1fc42b53a Use compatibility decomposition normalizer in facets 2023-01-18 15:02:13 +01:00
Loïc Lecrenier 8d0ace2d64 Avoid creating a MatchingWord for words that exceed the length limit 2022-11-28 10:20:13 +01:00
Loïc Lecrenier ac3baafbe8 Truncate facet values that are too long before indexing them 2022-11-17 11:29:42 +01:00
Ewan Higgs 6b2fe94192 Fixes for clippy bringing us down to 18 remaining issues.
This brings us a step closer to enforcing clippy on each build.
2022-10-25 20:49:02 +02:00
Loïc Lecrenier 3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
Loïc Lecrenier 1506683705 Avoid using too much memory when indexing facet-exists-docids 2022-07-19 14:42:35 +02:00
Loïc Lecrenier aed8c69bcb Refactor indexation of the "facet-id-exists-docids" database
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.

Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
2022-07-19 10:07:33 +02:00
Loïc Lecrenier 80b962b4f4 Run cargo fmt 2022-07-19 10:07:33 +02:00
Loïc Lecrenier 30bd4db0fc Simplify indexing task for facet_exists_docids database 2022-07-19 10:07:33 +02:00
Loïc Lecrenier 453d593ce8 Add a database containing the docids where each field exists 2022-07-19 10:07:33 +02:00
Clément Renault f367cc2e75
Finally bump grenad to v0.4.1 2022-02-16 15:28:48 +01:00
many db0c681bae
Fix Pr comments 2021-09-02 15:17:52 +02:00
many 4860fd4529
Ignore empty facet values 2021-09-01 16:48:40 +02:00
many fc7cc770d4
Add logging timers 2021-09-01 16:48:40 +02:00
many 1d314328f0
Plug new indexer 2021-09-01 16:48:36 +02:00