Clément Renault
7ac2f1489d
Extract the vectors from the documents
2023-06-27 12:32:37 +02:00
Clément Renault
34349faeae
Create a new _vector extractor
2023-06-27 12:32:37 +02:00
Louis Dureuil
9f37b61666
DB BREAKING: raise limit of word count from 10 to 30.
2023-06-08 12:07:12 +02:00
Louis Dureuil
c15c076da9
DB BREAKING: Count the number of words in field_id_word_count_docids
2023-06-08 12:07:11 +02:00
Loïc Lecrenier
8628a0c856
Remove docid_word_positions_db + fix deletion bug
...
That would happen when a word was deleted from all exact attributes
but not all regular attributes.
2023-06-07 10:52:50 +02:00
Louis Dureuil
90bc230820
Merge remote-tracking branch 'origin/main' into search-refactor
...
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml | took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
bors[bot]
414b3fae89
Merge #3571
...
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops
# Pull Request
## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884 ) is describing in a very detailed manner.
## What does this PR do?
### `IS NULL` and `IS NOT NULL`
This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.
- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`
1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`
`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3
common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2
### `IS EMPTY` and `IS NOT EMPTY`
- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.
1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`
`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6
common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9
## What should the reviewer do?
- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-04-27 13:14:00 +00:00
Loïc Lecrenier
84d9c731f8
Fix bug in encoding of word_position_docids and word_fid_docids
2023-04-24 09:59:30 +02:00
Loïc Lecrenier
a81165f0d8
Merge remote-tracking branch 'origin/main' into search-refactor
2023-04-07 10:15:55 +02:00
Loïc Lecrenier
130d2061bd
Fix indexing of word_position_docid and fid
2023-04-06 17:50:39 +02:00
Louis Dureuil
66ddee4390
Fix word_position_docids indexing
2023-04-06 17:50:39 +02:00
Louis Dureuil
e58426109a
Fix panics and issues in exactness graph ranking rule
2023-04-06 17:50:39 +02:00
Louis Dureuil
996619b22a
Increase position by 8 on hard separator when building query terms
2023-04-06 17:50:39 +02:00
ManyTheFish
efea1e5837
Fix facet normalization
2023-03-29 12:02:24 +02:00
Loïc Lecrenier
9b2653427d
Split position DB into fid and relative position DB
2023-03-23 09:22:01 +01:00
Clément Renault
1a9c58a7ab
Fix a bug with the new flattening rules
2023-03-15 16:56:44 +01:00
Clément Renault
ea016d97af
Implementing an IS EMPTY filter
2023-03-15 14:12:34 +01:00
ManyTheFish
2f8eb4f54a
last PR fixes
2023-03-09 15:34:36 +01:00
Clément Renault
0ad53784e7
Create a new struct to reduce the type complexity
2023-03-09 13:21:21 +01:00
Clément Renault
e106b16148
Fix a typo in a variable
...
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
aaa
2023-03-09 13:08:02 +01:00
ManyTheFish
5deea631ea
fix clippy too many arguments
2023-03-09 11:19:13 +01:00
ManyTheFish
b4b859ec8c
Fix typos
2023-03-09 10:58:35 +01:00
Clément Renault
7dc04747fd
Make clippy happy
2023-03-08 17:37:08 +01:00
Clément Renault
43ff236df8
Write the NULL facet values in the database
2023-03-08 16:49:53 +01:00
Clément Renault
19ab4d1a15
Classify the NULL fields values in the facet extractor
2023-03-08 16:49:31 +01:00
ManyTheFish
24c0775c67
Change indexing threshold
2023-03-08 12:36:04 +01:00
ManyTheFish
3092cf0448
Fix clippy errors
2023-03-08 10:53:42 +01:00
ManyTheFish
da48506f15
Rerun extraction when language detection might have failed
2023-03-07 18:35:26 +01:00
ManyTheFish
bbecab8948
fix clippy
2023-02-21 10:18:44 +01:00
ManyTheFish
8aa808d51b
Merge branch 'main' into enhance-language-detection
2023-02-20 18:14:34 +01:00
Tamo
18796d6e6a
Consider null as a valid geo object
2023-02-20 13:45:51 +01:00
f3r10
d8207356f4
Skip script,language insertion if language is undetected
2023-01-31 11:28:05 +01:00
f3r10
fd60a39f1c
Format code
2023-01-31 11:28:05 +01:00
f3r10
d97fb6117e
Extract and index data
2023-01-31 11:28:05 +01:00
ManyTheFish
d1fc42b53a
Use compatibility decomposition normalizer in facets
2023-01-18 15:02:13 +01:00
Loïc Lecrenier
8d0ace2d64
Avoid creating a MatchingWord for words that exceed the length limit
2022-11-28 10:20:13 +01:00
Loïc Lecrenier
ac3baafbe8
Truncate facet values that are too long before indexing them
2022-11-17 11:29:42 +01:00
Loïc Lecrenier
d95d02cb8a
Fix Facet Indexing bugs
...
1. Handle keys with variable length correctly
This fixes https://github.com/meilisearch/meilisearch/issues/3042 and
is easily reproducible with the updated fuzz tests, which now generate
keys with variable lengths.
2. Prevent adding facets to the database if their encoded value does
not satisfy `valid_lmdb_key`.
This fixes an indexing failure when a document had a filterable
attribute containing a value whose length is higher than ~500 bytes.
2022-11-17 11:29:42 +01:00
unvalley
70465aa5ce
Execute cargo fmt
2022-11-04 08:59:58 +09:00
unvalley
3009981d31
Fix clippy errors
...
Add clippy job
Add clippy job to CI
2022-11-04 08:58:14 +09:00
unvalley
c7322f704c
Fix cargo clippy errors
...
Dont apply clippy for tests for now
Fix clippy warnings of filter-parser package
parent 8352febd646ec4bcf56a44161e5c4dce0e55111f
author unvalley <38400669+unvalley@users.noreply.github.com> 1666325847 +0900
committer unvalley <kirohi.code@gmail.com> 1666791316 +0900
Update .github/workflows/rust.yml
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
Allow clippy lint too_many_argments
Allow clippy lint needless_collect
Allow clippy lint too_many_arguments and type_complexity
Fix for clippy warnings comparison_chains
Fix for clippy warnings vec_init_then_push
Allow clippy lint should_implement_trait
Allow clippy lint drop_non_drop
Fix lifetime clipy warnings in filter-paprser
Execute cargo fmt
Fix clippy remaining warnings
Fix clippy remaining warnings again and allow lint on each place
2022-10-27 01:04:23 +09:00
Loïc Lecrenier
54c0cf93fe
Merge remote-tracking branch 'origin/main' into facet-levels-refactor
2022-10-26 15:13:34 +02:00
Loïc Lecrenier
a034a1e628
Move StrRefCodec and ByteSliceRefCodec to their own files
2022-10-26 13:47:46 +02:00
Loïc Lecrenier
51961e1064
Polish some details
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
b1ab09196c
Remove outdated TODOs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
9026867d17
Give same interface to bulk and incremental facet indexing types
...
+ cargo fmt, oops, sorry for the bad history :(
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
485a72306d
Refactor facet-related codecs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
afdf87f6f7
Fix bugs in asc/desc criterion and facet indexing
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
a7201ece04
cargo fmt
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
61252248fb
Fix some facet indexing bugs
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
85824ee203
Try to make facet indexing incremental
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
39a4a0a362
Reintroduce filter range search and facet extractors
2022-10-26 13:46:14 +02:00
Loïc Lecrenier
c3f49f766d
Prepare refactor of facets database
...
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
Ewan Higgs
6b2fe94192
Fixes for clippy bringing us down to 18 remaining issues.
...
This brings us a step closer to enforcing clippy on each build.
2022-10-25 20:49:02 +02:00
Loïc Lecrenier
d76d0cb1bf
Merge branch 'main' into word-pair-proximity-docids-refactor
2022-10-24 15:23:00 +02:00
Loïc Lecrenier
a7de4f5b85
Don't add swapped word pairs to the word_pair_proximity_docids db
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
bdeb47305e
Change encoding of word_pair_proximity DB to (proximity, word1, word2)
...
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
Ewan Higgs
beb987d3d1
Fixing piles of clippy errors.
...
Most of these are calling clone when the struct supports Copy.
Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.
I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
msvaljek
762e320c35
Add proximity calculation for the same word
2022-10-07 12:59:12 +02:00
vishalsodani
00c02d00f3
Add missing logging timer to extractors
2022-09-30 22:17:06 +05:30
Loïc Lecrenier
3794962330
Use an unstable algorithm for grenad::Sorter when possible
2022-09-13 14:49:53 +02:00
Kerollmops
fe3973a51c
Make sure that long words are correctly skipped
2022-09-07 15:03:32 +02:00
Loïc Lecrenier
306593144d
Refactor word prefix pair proximity indexation
2022-08-17 11:59:00 +02:00
Loïc Lecrenier
07003704a8
Merge branch 'filter/field-exist'
2022-07-21 14:51:41 +02:00
Loïc Lecrenier
1506683705
Avoid using too much memory when indexing facet-exists-docids
2022-07-19 14:42:35 +02:00
Loïc Lecrenier
aed8c69bcb
Refactor indexation of the "facet-id-exists-docids" database
...
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.
Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
80b962b4f4
Run cargo fmt
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
30bd4db0fc
Simplify indexing task for facet_exists_docids database
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
392472f4bb
Apply suggestions from code review
...
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
453d593ce8
Add a database containing the docids where each field exists
2022-07-19 10:07:33 +02:00
Kerollmops
2eec290424
Check the validity of the latitute and longitude numbers
2022-07-12 15:14:06 +02:00
Kerollmops
d1a4da9812
Generate a real UUIDv4 when ids are auto-generated
2022-07-12 15:14:06 +02:00
Kerollmops
fcfc4caf8c
Move the Object type in the lib.rs file and use it everywhere
2022-07-12 14:55:51 +02:00
Kerollmops
0146175fe6
Introduce the validate_documents_batch function
2022-07-12 14:55:51 +02:00
ManyTheFish
86ac8568e6
Use Charabia in milli
2022-06-02 16:59:11 +02:00
Tamo
0af399a6d7
fix the mixed dataset geosearch indexing bug
2022-05-16 17:37:45 +02:00
Tamo
c55368ddd4
apply code suggestion
...
Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-05-04 14:11:03 +02:00
Tamo
3cb1f6d0a1
improve geosearch error messages
2022-05-02 19:20:47 +02:00
Irevoire
4f3ce6d9cd
nested fields
2022-04-07 16:58:46 +02:00
ad hoc
201fea0fda
limit extract_word_docids memory usage
2022-04-05 14:14:15 +02:00
ad hoc
b85cd4983e
remove field_id_from_position
2022-04-05 09:50:34 +02:00
ad hoc
b7694c34f5
remove println
2022-04-04 21:00:07 +02:00
ad hoc
6cabd47c32
fix typo in comment
2022-04-04 20:59:20 +02:00
ad hoc
6b2c2509b2
fix bug in exact search
2022-04-04 20:54:03 +02:00
ad hoc
8d46a5b0b5
extract exact word docids
2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0
introduce exact_word_docids db
2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d
refactor spawn_extraction_task
2022-04-04 20:54:02 +02:00
Clément Renault
ff8d7a810d
Change the behavior of the as_cloneable_grenad by taking a ref
2022-02-16 15:40:08 +01:00
Clément Renault
f367cc2e75
Finally bump grenad to v0.4.1
2022-02-16 15:28:48 +01:00
many
8970246bc4
Sort positions before iterating over them during word pair proximity extraction
2021-11-22 18:16:54 +01:00
many
c5a6075484
Make max_position_per_attributes changable
2021-10-12 10:10:50 +02:00
many
360c5ff3df
Remove limit of 1000 position per attribute
...
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
many
3296bb243c
Simplify word level position DB into a word position DB
2021-10-05 12:15:02 +02:00
Irevoire
a84f3a8b31
Apply suggestions from code review
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
bad8ea47d5
edit the two lasts TODO comments
2021-09-08 18:24:09 +02:00
Tamo
bd4c248292
improve the error handling in general and introduce the concept of reserved keywords
2021-09-08 18:24:09 +02:00
Tamo
f73273d71c
only call the extractor if needed
2021-09-08 17:51:08 +02:00
Irevoire
a21c854790
handle errors
2021-09-08 17:51:07 +02:00
Irevoire
70ab2c37c5
remove multiple bugs
2021-09-08 17:51:07 +02:00
Irevoire
b4b6ba6d82
rename all the ’long’ into ’lng’ like written in the specification
2021-09-08 17:51:07 +02:00
Irevoire
44d6b6ae9e
Index the geo points
2021-09-08 17:51:07 +02:00
many
e54280fbfc
Skip empty normalized words
2021-09-08 15:25:23 +02:00
many
db0c681bae
Fix Pr comments
2021-09-02 15:17:52 +02:00
many
4860fd4529
Ignore empty facet values
2021-09-01 16:48:40 +02:00
many
b3a22f31f6
Fix memory consuption in word pair proximity extractor
2021-09-01 16:48:40 +02:00
many
8f702828ca
Ignore errors comming from crossbeam channel senders
2021-09-01 16:48:40 +02:00
many
e09eec37bc
Handle distance addition with hard separators
2021-09-01 16:48:40 +02:00
many
fc7cc770d4
Add logging timers
2021-09-01 16:48:40 +02:00
many
a2f59a28f7
Remove unwrap sending errors in channel
2021-09-01 16:48:40 +02:00
many
2d1727697d
Take stop word in account
2021-09-01 16:48:40 +02:00
many
823da19745
Fix test and use progress callback
2021-09-01 16:48:39 +02:00
many
1d314328f0
Plug new indexer
2021-09-01 16:48:36 +02:00