Commit Graph

168 Commits

Author SHA1 Message Date
Kerollmops
2cf747cb89
Fix the tests 2023-06-27 12:32:40 +02:00
Kerollmops
23eaaf1001
Change the name of the distance module 2023-06-27 12:32:39 +02:00
Kerollmops
436a10bef4
Replace the euclidean with a dot product 2023-06-27 12:32:39 +02:00
Kerollmops
8debf6fe81
Use a basic euclidean distance function 2023-06-27 12:32:39 +02:00
Kerollmops
c79e82c62a
Move back to the hnsw crate
This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.
2023-06-27 12:32:39 +02:00
Kerollmops
268a9ef416
Move to the hgg crate 2023-06-27 12:32:38 +02:00
Clément Renault
4571e512d2
Store the vectors in an HNSW in LMDB 2023-06-27 12:32:38 +02:00
Louis Dureuil
da833eb095
Expose the scores and detailed scores in the API 2023-06-22 12:39:14 +02:00
Louis Dureuil
e0c4682758
Fix tests 2023-06-14 13:30:52 +02:00
Loïc Lecrenier
8628a0c856 Remove docid_word_positions_db + fix deletion bug
That would happen when a word was deleted from all exact attributes
but not all regular attributes.
2023-06-07 10:52:50 +02:00
Kerollmops
f759ec7fad
Expose a flag to enable the MDB_WRITEMAP flag 2023-05-15 11:38:43 +02:00
Kerollmops
c4a40e7110
Use the writemap flag to reduce the memory usage 2023-05-15 10:15:33 +02:00
Louis Dureuil
a35d3fc708
Add Index::iter_documents 2023-05-04 15:31:54 +02:00
Louis Dureuil
3a408e8287
Increase map size for tests following charabia camelCase tokenization 2023-05-03 14:44:48 +02:00
Louis Dureuil
d3e5b10e23
fix nb of dbs 2023-05-03 14:11:20 +02:00
Louis Dureuil
90bc230820
Merge remote-tracking branch 'origin/main' into search-refactor
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml |  took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
bors[bot]
414b3fae89
Merge #3571
3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops

# Pull Request

## Related issue
This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner.

## What does this PR do?

### `IS NULL` and `IS NOT NULL`

This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter.

- `IS NULL` matches fields that `EXISTS` AND `= IS NULL`
- `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL`

1. `{"name": "A", "price": null}`
2. `{"name": "A", "price": 10}`
3. `{"name": "A"}`

`price IS NULL` would match 1
`price IS NOT NULL` or `NOT price IS NULL` would match 2,3
`price EXISTS` would match 1, 2
`price NOT EXISTS` or `NOT price EXISTS` would match 3

common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2

### `IS EMPTY` and `IS NOT EMPTY`

- `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty
- `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty.

1. `{"name": "A", "tags": null}`
2. `{"name": "A", "tags": [null]}`
3. `{"name": "A", "tags": []}`
4. `{"name": "A", "tags": ["hello","world"]}`
5. `{"name": "A", "tags": [""]}`
6. `{"name": "A"}`
7. `{"name": "A", "tags": {}}`
8. `{"name": "A", "tags": {"t1":"v1"}}`
9. `{"name": "A", "tags": {"t1":""}}`
10. `{"name": "A", "tags": ""}`

`tags IS EMPTY` would match 3,7,10
`tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9
`tags IS NULL` would match 1
`tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10
`tags EXISTS` would match 1,2,3,4,5,7,8,9,10
`tags NOT EXISTS` or `NOT tags EXISTS` would match 6

common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9

## What should the reviewer do?

- Check that I tested the filters
- Check that I deleted the ids of the documents when deleting documents


Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-04-27 13:14:00 +00:00
Loïc Lecrenier
84d9c731f8 Fix bug in encoding of word_position_docids and word_fid_docids 2023-04-24 09:59:30 +02:00
Loïc Lecrenier
a81165f0d8 Merge remote-tracking branch 'origin/main' into search-refactor 2023-04-07 10:15:55 +02:00
Tamo
a50b058557 update the geoBoundingBox feature
Now instead of using the (top_left, bottom_right) corners of the bounding box it s using the (top_right, bottom_left) corners.
2023-03-28 18:26:18 +02:00
Loïc Lecrenier
9b2653427d Split position DB into fid and relative position DB 2023-03-23 09:22:01 +01:00
Clément Renault
ea016d97af
Implementing an IS EMPTY filter 2023-03-15 14:12:34 +01:00
ManyTheFish
b4b859ec8c Fix typos 2023-03-09 10:58:35 +01:00
Clément Renault
7c0cd7172d
Introduce the NULL and NOT value NULL operator 2023-03-08 17:14:34 +01:00
Clément Renault
9287858997
Introduce a new facet_id_is_null_docids database in the index 2023-03-08 16:14:00 +01:00
ManyTheFish
37d4551e8e Add a threshold filtering the Languages allowed to be detected at search time 2023-03-07 19:38:01 +01:00
ManyTheFish
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
bors[bot]
c88c3637b4
Merge #3461
3461: Bring v1 changes into main r=curquiza a=Kerollmops

Also bring back changes in milli (the remote repository) into main done during the pre-release

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Philipp Ahlner <philipp@ahlner.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-02-07 11:27:27 +00:00
Tamo
42114325cd
Apply suggestions from code review
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-06 18:07:00 +01:00
Tamo
7a38fe624f
throw an error if the top left corner is found below the bottom right corner 2023-02-06 17:50:47 +01:00
Tamo
1b005f697d
update the syntax of the geoboundingbox filter to uses brackets instead of parens around lat and lng 2023-02-06 16:50:27 +01:00
Kerollmops
fbec48f56e
Merge remote-tracking branch 'milli/main' into bring-v1-changes 2023-02-06 16:48:10 +01:00
Tamo
fcb09ccc3d
add tests on the geoBoundingBox 2023-02-02 18:19:56 +01:00
ManyTheFish
0bc1a18f52 Use Languages list detected during indexing at search time 2023-02-01 18:57:43 +01:00
ManyTheFish
064158e4e2 Update test 2023-02-01 15:34:01 +01:00
f3r10
fd60a39f1c Format code 2023-01-31 11:28:05 +01:00
f3r10
34d04f3d3f Filter from script_language_docids database soft deleted documents 2023-01-31 11:28:05 +01:00
f3r10
a27f329e3a Add tests for checking that detected script and language associated with document(s) were stored during indexing 2023-01-31 11:28:05 +01:00
f3r10
c45d1e3610 Create a new database on index and add a specialized codec for it 2023-01-31 11:28:05 +01:00
Louis Dureuil
20f05efb3c
clippy: needless_lifetimes 2023-01-31 11:12:59 +01:00
Louis Dureuil
3296cf7ae6
clippy: remove needless lifetimes 2023-01-31 09:32:40 +01:00
Tamo
de3c4f1986 throw an error on unknown fields specified in the _geo field 2023-01-24 12:23:24 +01:00
Philipp Ahlner
a2cd7214f0
Fixes error message when lat/lng are unparseable 2023-01-19 10:10:26 +01:00
Philipp Ahlner
497187083b
Add test for bug #3007: Wrong error message
Adds a test for #3007: Wrong error message when lat and lng are
unparseable
2023-01-18 13:24:26 +01:00
Clément Renault
1b78231e18
Make clippy happy 2023-01-17 18:25:54 +01:00
Louis Dureuil
00746b32c0
Add Index::map_size 2023-01-10 11:16:51 +01:00
Loïc Lecrenier
fc0e7382fe Fix hard-deletion of an external id that was soft-deleted 2022-12-20 15:33:31 +01:00
Louis Dureuil
ad9937c755
Fix tests after adding DeletionStrategy 2022-12-19 10:07:17 +01:00
Loïc Lecrenier
e3ee553dcc Remove soft deleted ids from ExternalDocumentIds during document import
If the document import replaces a document using hard deletion
2022-12-12 14:16:09 +01:00
Loïc Lecrenier
bebd050961 Add new test for bug 3021 2022-12-08 19:19:40 +01:00