Commit Graph

197 Commits

Author SHA1 Message Date
Clément Renault
6ae4100f07
Generate the DelAdd for is_null, is_empty, and exists 2023-10-30 11:29:08 +01:00
Clément Renault
0c47defeee
Work on fid docid facet values rewrite 2023-10-30 11:29:06 +01:00
ManyTheFish
313b16bec2
Support diff indexing on extract_docid_word_positions 2023-10-30 11:24:19 +01:00
ManyTheFish
1c5705c164
clean PR warnings 2023-10-30 11:22:05 +01:00
ManyTheFish
66c2c82a18
Split wpp in several sorters 2023-10-30 11:15:02 +01:00
ManyTheFish
28a8d0ccda
Fix word pair proximity 2023-10-30 11:15:02 +01:00
ManyTheFish
96be85396d
Use a vecDeque in wpp database 2023-10-30 11:15:02 +01:00
ManyTheFish
df9e5c8651
Generalize usage of CboRoaringBitmap codec to ease the use 2023-10-30 11:15:02 +01:00
ManyTheFish
b541d48847
Add buffer to the obkv writter 2023-10-30 11:15:02 +01:00
ManyTheFish
8ccf32d1a0
Compute word_fid_docids before word_docids and exact_word_docids 2023-10-30 11:15:02 +01:00
ManyTheFish
11ea5acff9
Fix 2023-10-30 11:13:10 +01:00
ManyTheFish
8d77736a67
Fix fid_word_docids 2023-10-30 11:13:10 +01:00
ManyTheFish
17b647dfe5
Wip 2023-10-30 11:13:08 +01:00
Tamo
d772073dfa use a bufreader everytime there is a grenad<file> 2023-10-10 15:00:30 +02:00
meili-bors[bot]
487d493f49
Merge #4043
4043: Bring back hotfixes from v1.3.3 into v1.4.0 r=Kerollmops a=curquiza



Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
2023-09-11 12:27:34 +00:00
meili-bors[bot]
256cf33bca
Merge #4039
4039: Fix multiple vectors dimensions r=ManyTheFish a=Kerollmops

This PR fixes #4035, making providing multiple vectors in documents possible. This is fixed by extracting the vectors from the non-flattened version of the documents.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-09-07 09:25:58 +00:00
Kerollmops
679c0b0f97
Extract the vectors from the non-flattened version of the documents 2023-09-06 12:26:00 +02:00
ManyTheFish
66aa6d5871 Ignore tokens with empty normalized value during indexing process 2023-09-05 15:44:14 +02:00
meili-bors[bot]
ccf3ba3f32
Merge #4019
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Kerollmops
c53841e166
Accept the null JSON value as the value of _vectors 2023-08-14 16:03:55 +02:00
meili-bors[bot]
e4e49e63d0
Merge #3993
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza



Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
ManyTheFish
6b2d671be7 Fix PR comments 2023-08-10 10:44:07 +02:00
Many the fish
43c13faeda
Update milli/src/update/index_documents/extract/extract_docid_word_positions.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-10 10:05:03 +02:00
meili-bors[bot]
44c1900f36
Merge #3986
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire

# Pull Request

When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.

This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973

## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-09 07:58:15 +00:00
ManyTheFish
35758db9ec Truncate the the normalized long facets used in search for facet value 2023-08-08 16:38:30 +02:00
Tamo
9d061cec26 automatically parse the filterable attribute to float if it's a geo field 2023-08-08 16:28:07 +02:00
ManyTheFish
9c485f8563 Make the search and the indexing work 2023-07-24 18:35:20 +02:00
Kerollmops
eef95de30e
First iteration on exposing puffin profiling 2023-07-18 17:38:13 +02:00
ManyTheFish
84845de9ef Update Charabia 2023-06-29 15:56:32 +02:00
Clément Renault
63bfe1cee2
Ignore when there are too many vectors 2023-06-27 12:32:43 +02:00
Kerollmops
ff3664431f
Make rustfmt happy 2023-06-27 12:32:42 +02:00
Kerollmops
531748c536
Return a user error when the _vectors type is invalid 2023-06-27 12:32:41 +02:00
Kerollmops
7aa1275337
Display the _semanticSimilarity even if the _vectors field is not displayed 2023-06-27 12:32:41 +02:00
Kerollmops
ab9f2269aa
Normalize the vectors during indexation and search 2023-06-27 12:32:41 +02:00
Kerollmops
321ec5f3fa
Accept multiple vectors by documents using the _vectors field 2023-06-27 12:32:40 +02:00
Clément Renault
7ac2f1489d
Extract the vectors from the documents 2023-06-27 12:32:37 +02:00
Clément Renault
34349faeae
Create a new _vector extractor 2023-06-27 12:32:37 +02:00
Louis Dureuil
9f37b61666
DB BREAKING: raise limit of word count from 10 to 30. 2023-06-08 12:07:12 +02:00
Louis Dureuil
c15c076da9
DB BREAKING: Count the number of words in field_id_word_count_docids 2023-06-08 12:07:11 +02:00
Loïc Lecrenier
8628a0c856 Remove docid_word_positions_db + fix deletion bug
That would happen when a word was deleted from all exact attributes
but not all regular attributes.
2023-06-07 10:52:50 +02:00
Louis Dureuil
90bc230820
Merge remote-tracking branch 'origin/main' into search-refactor
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml |  took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
bors[bot]
414b3fae89
Merge #3571
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops

# Pull Request

## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner.

## What does this PR do?

### `IS NULL` and `IS NOT NULL`

This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.

- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`

1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`

`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3

common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2

### `IS EMPTY` and `IS NOT EMPTY`

- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.

1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`

`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6

common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9

## What should the reviewer do?

- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents


Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-04-27 13:14:00 +00:00
Loïc Lecrenier
84d9c731f8 Fix bug in encoding of word_position_docids and word_fid_docids 2023-04-24 09:59:30 +02:00
Loïc Lecrenier
a81165f0d8 Merge remote-tracking branch 'origin/main' into search-refactor 2023-04-07 10:15:55 +02:00
Loïc Lecrenier
130d2061bd Fix indexing of word_position_docid and fid 2023-04-06 17:50:39 +02:00
Louis Dureuil
66ddee4390 Fix word_position_docids indexing 2023-04-06 17:50:39 +02:00
Louis Dureuil
e58426109a Fix panics and issues in exactness graph ranking rule 2023-04-06 17:50:39 +02:00
Louis Dureuil
996619b22a Increase position by 8 on hard separator when building query terms 2023-04-06 17:50:39 +02:00
ManyTheFish
efea1e5837 Fix facet normalization 2023-03-29 12:02:24 +02:00
Loïc Lecrenier
9b2653427d Split position DB into fid and relative position DB 2023-03-23 09:22:01 +01:00
Clément Renault
1a9c58a7ab
Fix a bug with the new flattening rules 2023-03-15 16:56:44 +01:00
Clément Renault
ea016d97af
Implementing an IS EMPTY filter 2023-03-15 14:12:34 +01:00
ManyTheFish
2f8eb4f54a last PR fixes 2023-03-09 15:34:36 +01:00
Clément Renault
0ad53784e7
Create a new struct to reduce the type complexity 2023-03-09 13:21:21 +01:00
Clément Renault
e106b16148
Fix a typo in a variable
Co-authored-by: Louis Dureuil <louis@meilisearch.com>

aaa
2023-03-09 13:08:02 +01:00
ManyTheFish
5deea631ea fix clippy too many arguments 2023-03-09 11:19:13 +01:00
ManyTheFish
b4b859ec8c Fix typos 2023-03-09 10:58:35 +01:00
Clément Renault
7dc04747fd
Make clippy happy 2023-03-08 17:37:08 +01:00
Clément Renault
43ff236df8
Write the NULL facet values in the database 2023-03-08 16:49:53 +01:00
Clément Renault
19ab4d1a15
Classify the NULL fields values in the facet extractor 2023-03-08 16:49:31 +01:00
ManyTheFish
24c0775c67 Change indexing threshold 2023-03-08 12:36:04 +01:00
ManyTheFish
3092cf0448 Fix clippy errors 2023-03-08 10:53:42 +01:00
ManyTheFish
da48506f15 Rerun extraction when language detection might have failed 2023-03-07 18:35:26 +01:00
ManyTheFish
bbecab8948 fix clippy 2023-02-21 10:18:44 +01:00
ManyTheFish
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
Tamo
18796d6e6a Consider null as a valid geo object 2023-02-20 13:45:51 +01:00
f3r10
d8207356f4 Skip script,language insertion if language is undetected 2023-01-31 11:28:05 +01:00
f3r10
fd60a39f1c Format code 2023-01-31 11:28:05 +01:00
f3r10
d97fb6117e Extract and index data 2023-01-31 11:28:05 +01:00
ManyTheFish
d1fc42b53a Use compatibility decomposition normalizer in facets 2023-01-18 15:02:13 +01:00
Loïc Lecrenier
8d0ace2d64 Avoid creating a MatchingWord for words that exceed the length limit 2022-11-28 10:20:13 +01:00
Loïc Lecrenier
ac3baafbe8 Truncate facet values that are too long before indexing them 2022-11-17 11:29:42 +01:00
Loïc Lecrenier
d95d02cb8a Fix Facet Indexing bugs
1. Handle keys with variable length correctly

This fixes https://github.com/meilisearch/meilisearch/issues/3042 and
is easily reproducible with the updated fuzz tests, which now generate
keys with variable lengths.

2. Prevent adding facets to the database if their encoded value does
not satisfy `valid_lmdb_key`.

This fixes an indexing failure when a document had a filterable
attribute containing a value whose length is higher than ~500 bytes.
2022-11-17 11:29:42 +01:00
unvalley
70465aa5ce Execute cargo fmt 2022-11-04 08:59:58 +09:00
unvalley
3009981d31 Fix clippy errors
Add clippy job

Add clippy job to CI
2022-11-04 08:58:14 +09:00
unvalley
c7322f704c Fix cargo clippy errors
Dont apply clippy for tests for now

Fix clippy warnings of filter-parser package

parent 8352febd646ec4bcf56a44161e5c4dce0e55111f
author unvalley <38400669+unvalley@users.noreply.github.com> 1666325847 +0900
committer unvalley <kirohi.code@gmail.com> 1666791316 +0900

Update .github/workflows/rust.yml

Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>

Allow clippy lint too_many_argments

Allow clippy lint needless_collect

Allow clippy lint too_many_arguments and type_complexity

Fix for clippy warnings comparison_chains

Fix for clippy warnings vec_init_then_push

Allow clippy lint should_implement_trait

Allow clippy lint drop_non_drop

Fix lifetime clipy warnings in filter-paprser

Execute cargo fmt

Fix clippy remaining warnings

Fix clippy remaining warnings again and allow lint on each place
2022-10-27 01:04:23 +09:00
Loïc Lecrenier
54c0cf93fe Merge remote-tracking branch 'origin/main' into facet-levels-refactor 2022-10-26 15:13:34 +02:00
Loïc Lecrenier
a034a1e628 Move StrRefCodec and ByteSliceRefCodec to their own files 2022-10-26 13:47:46 +02:00
Loïc Lecrenier
51961e1064 Polish some details 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
b1ab09196c Remove outdated TODOs 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
9026867d17 Give same interface to bulk and incremental facet indexing types
+ cargo fmt, oops, sorry for the bad history :(
2022-10-26 13:47:04 +02:00
Loïc Lecrenier
485a72306d Refactor facet-related codecs 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
afdf87f6f7 Fix bugs in asc/desc criterion and facet indexing 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
a7201ece04 cargo fmt 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
61252248fb Fix some facet indexing bugs 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
85824ee203 Try to make facet indexing incremental 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
39a4a0a362 Reintroduce filter range search and facet extractors 2022-10-26 13:46:14 +02:00
Loïc Lecrenier
c3f49f766d Prepare refactor of facets database
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
Ewan Higgs
6b2fe94192 Fixes for clippy bringing us down to 18 remaining issues.
This brings us a step closer to enforcing clippy on each build.
2022-10-25 20:49:02 +02:00
Loïc Lecrenier
d76d0cb1bf Merge branch 'main' into word-pair-proximity-docids-refactor 2022-10-24 15:23:00 +02:00
Loïc Lecrenier
a7de4f5b85 Don't add swapped word pairs to the word_pair_proximity_docids db 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
bdeb47305e Change encoding of word_pair_proximity DB to (proximity, word1, word2)
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
Ewan Higgs
beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
msvaljek
762e320c35
Add proximity calculation for the same word 2022-10-07 12:59:12 +02:00
vishalsodani
00c02d00f3 Add missing logging timer to extractors 2022-09-30 22:17:06 +05:30
Loïc Lecrenier
3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
Kerollmops
fe3973a51c
Make sure that long words are correctly skipped 2022-09-07 15:03:32 +02:00
Loïc Lecrenier
306593144d Refactor word prefix pair proximity indexation 2022-08-17 11:59:00 +02:00
Loïc Lecrenier
07003704a8 Merge branch 'filter/field-exist' 2022-07-21 14:51:41 +02:00
Loïc Lecrenier
1506683705 Avoid using too much memory when indexing facet-exists-docids 2022-07-19 14:42:35 +02:00