620 Commits

Author SHA1 Message Date
Tamo
3942b3732f
re-implement the geosearch 2021-10-22 18:03:39 +02:00
Tamo
7cd9109e2f
lowercase value extracted from Token 2021-10-22 17:50:15 +02:00
Tamo
e25ca9776f
start updating the exposed function to makes other modules happy 2021-10-22 17:23:22 +02:00
Tamo
6c9165b6a8
provide a helper to parse the token but to not handle the errors 2021-10-22 16:52:13 +02:00
Tamo
efb2f8b325
convert the errors 2021-10-22 16:38:35 +02:00
Tamo
c27870e765
integrate a first version without any error handling 2021-10-22 14:33:18 +02:00
Tamo
01dedde1c9
update some names and move some parser out of the lib.rs 2021-10-22 01:59:38 +02:00
Tamo
c634d43ac5
add a simple test on the filters with an integer 2021-10-21 17:10:27 +02:00
Tamo
6c15f50899
rewrite the parser logic 2021-10-21 16:45:42 +02:00
Tamo
e1d81342cf
add test on the or and and operator 2021-10-21 13:01:25 +02:00
Tamo
423baac08b
fix the tests 2021-10-21 12:45:40 +02:00
Tamo
36281a653f
write all the simple tests 2021-10-21 12:40:11 +02:00
Clémentine Urquizar
f8fe9316c0
Update version for the next release (v0.18.1) 2021-10-21 11:56:14 +02:00
Tamo
661bc21af5
Fix the filter parser
And add a bunch of tests on the filter::from_array
2021-10-21 11:45:03 +02:00
Clémentine Urquizar
2209acbfe2
Update version for the next release (v0.18.2) 2021-10-18 13:45:48 +02:00
bors[bot]
59cc59e93e
Merge #358
358: Replacing pest with nom  r=Kerollmops a=CNLHC



Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>
2021-10-16 20:44:38 +00:00
刘瀚骋
7666e4f34a follow the suggestions 2021-10-14 21:37:59 +08:00
刘瀚骋
2ea2f7570c use nightly cargo to format the code 2021-10-14 16:46:13 +08:00
刘瀚骋
e750465e15 check logic for geolocation. 2021-10-14 16:12:00 +08:00
bors[bot]
aa5e099718
Merge #390
390: Add helper methods on the settings r=Kerollmops a=irevoire

This would be a good addition to look at the content of a setting without consuming it.
It’s useful for analytics.

Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-10-13 20:36:30 +00:00
bors[bot]
c7db4176f3
Merge #384
384: Replace memmap with memmap2 r=Kerollmops a=palfrey

[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.

Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
2021-10-13 13:47:23 +00:00
Irevoire
a3e7c468cd
add helper methods on the settings 2021-10-13 13:05:07 +02:00
刘瀚骋
cd359cd96e WIP: extract the error trait bound to new trait. 2021-10-13 18:04:15 +08:00
刘瀚骋
5de5dd80a3 WIP: remove '_nom' suffix/redundant error enum/... 2021-10-13 11:06:15 +08:00
刘瀚骋
2c65781d91 format 2021-10-12 22:20:22 +08:00
bors[bot]
6e3b869e6a
Merge #388
388: fix primary key inference r=MarinPostma a=MarinPostma

The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id".

This fix sorts the the index by field_id when infering the primary key.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-12 09:25:16 +00:00
mpostma
86ead92ed5 infer primary key on sorted fields 2021-10-12 11:15:11 +02:00
mpostma
9a266a531b test correct primary key inference 2021-10-12 11:08:53 +02:00
many
c5a6075484
Make max_position_per_attributes changable 2021-10-12 10:10:50 +02:00
many
360c5ff3df
Remove limit of 1000 position per attribute
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
刘瀚骋
d323e35001 add a test case 2021-10-12 13:30:40 +08:00
刘瀚骋
70f576d5d3 error handling 2021-10-12 13:30:40 +08:00
刘瀚骋
28f9be8d7c support syntax 2021-10-12 13:30:40 +08:00
刘瀚骋
469d92c569 tweak error handling 2021-10-12 13:30:40 +08:00
刘瀚骋
7a90a101ee reorganize parser logic 2021-10-12 13:30:40 +08:00
刘瀚骋
f7796edc7e remove everything about pest 2021-10-12 13:30:40 +08:00
刘瀚骋
ac1df9d9d7 fix typo and remove pest 2021-10-12 13:30:40 +08:00
刘瀚骋
50ad750ec1 enhance error handling 2021-10-12 13:30:40 +08:00
刘瀚骋
8748df2ca4 draft without error handling 2021-10-12 13:30:40 +08:00
bors[bot]
07fb6d64e5
Merge #386
386: fix obkv document r=curquiza a=MarinPostma

When serializing a document, the serializer resolved the field_id of the current field and immediately added it to the obkv document under construction. The issue with that is that obkv expects the fields to be inserted in order, and when a document with out of order fields was added, obkv failed to insert the field.

The current fix first resolves each field_id, and adds all the fields to a temporary `BTreeMap`, until `end` is called on the map serializer, where all the fields are added to the obkv at once, and in order.


Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-11 13:45:04 +00:00
Clémentine Urquizar
dd56e82dba
Update version for the next release (v0.17.2) 2021-10-11 15:20:35 +02:00
mpostma
99889a0ed0 add obkv document serialization test 2021-10-11 15:13:17 +02:00
mpostma
799f3d43c8 fix serialization to obkv format 2021-10-11 15:04:47 +02:00
Tom Parker-Shemilt
2dfe24f067 memmap -> memmap2 2021-10-10 22:47:12 +01:00
Irevoire
b65aa7b5ac
Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-10-07 17:51:52 +02:00
Tamo
11dfe38761
Update the check on the latitude and longitude
Latitude are not supposed to go beyound 90 degrees or below -90.
The same goes for longitude with 180 or -180.

This was badly implemented in the filters, and was not implemented for the AscDesc rules.
2021-10-07 16:10:43 +02:00
many
085bc6440c
Apply PR comments 2021-10-06 11:12:26 +02:00
many
1bd15d849b
Reduce candidates threshold 2021-10-05 18:52:14 +02:00
many
ea4bd29d14
Apply PR comments 2021-10-05 17:35:07 +02:00
many
3296bb243c
Simplify word level position DB into a word position DB 2021-10-05 12:15:02 +02:00
many
75d341d928
Re-implement set based algorithm for attribute criterion 2021-10-05 12:14:50 +02:00
Clémentine Urquizar
05d8a33a28
Update version for the next release (v0.17.1) 2021-10-02 16:21:31 +02:00
Tamo
d9eba9d145
improve and test the sort error message 2021-09-30 14:38:27 +02:00
Tamo
0ee67bb7d1
improve the reserved keyword error message for the filters 2021-09-30 14:38:27 +02:00
bors[bot]
22551d0941
Merge #379
379: Revert "Change chunk size to 4MiB to fit more the end user usage" r=curquiza a=ManyTheFish

Reverts meilisearch/milli#370

Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-09-29 13:20:53 +00:00
Many
26b5dad042
Revert "Change chunk size to 4MiB to fit more the end user usage" 2021-09-29 15:08:39 +02:00
Many
2e49230ca2
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:45 +02:00
Many
7ad0214089
Update milli/src/search/criteria/attribute.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:41 +02:00
many
1df5b8712b
Hotfix meilisearch#1707 2021-09-29 14:41:56 +02:00
bors[bot]
68c758a533
Merge #376
376: Stop casting integer docids to string r=Kerollmops a=irevoire

When a docid is an integer, we stop casting it to a string, and thus we don't add `"` around it.

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-29 08:32:48 +00:00
Clémentine Urquizar
0e8665bf18
Update version for the next release (v0.17.0) 2021-09-28 19:38:12 +02:00
Tamo
f65153ad64
stop casting integer docids to string 2021-09-28 18:35:54 +02:00
Vishnu Gt
785c1372f2
Change "settings" to "setting"
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-09-28 20:11:32 +05:30
Vishnu Ganesan
3580b2d803 Fixes #365 2021-09-28 19:30:23 +05:30
bors[bot]
3a12f5887e
Merge #373
373: Improve error message for bad sort syntax with geosearch r=Kerollmops a=irevoire

`@Kerollmops` This should be the last PR for the geosearch and error handling, sorry for doing it in so many steps 😬 

Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-28 12:39:32 +00:00
Tamo
a80dcfd4a3
improve error message for bad sort syntax with geosearch 2021-09-28 14:32:24 +02:00
bors[bot]
b2a332599e
Merge #372
372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish

The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word.
On Chinese Script characters, we were allowing  2 typos on 3 characters words.
We are now counting the number of char instead of counting bytes to assign the typo tolerance.

Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714)

Co-authored-by: many <maxime@meilisearch.com>
2021-09-28 11:59:45 +00:00
many
8046ae4bd5
Count the number of char instead of counting bytes to assign the typo tolerance 2021-09-28 12:10:43 +02:00
many
1988416295
Add failing test related to Meilisearch#1714 2021-09-28 12:05:11 +02:00
Tamo
c7cb816ae1
simplify the error handling of the sort syntax for meilisearch 2021-09-27 19:07:22 +02:00
many
b188063869
Change chunk size to 4MiB to fit more the end user usage 2021-09-27 14:26:21 +02:00
many
551df0cb77
Add test checking the bug reported in meilisearch issue 1716 2021-09-23 15:55:39 +02:00
bors[bot]
87dd441a3a
Merge #367
367: Update version for the next release (v0.16.0) r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-22 15:20:20 +00:00
Clémentine Urquizar
1eacab2169
Update version for the next release (v0.15.1) 2021-09-22 17:18:54 +02:00
Irevoire
218f0a6661
Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-22 17:00:27 +02:00
Tamo
47ee93b0bd
return an error when _geoPoint is used but _geo is not sortable 2021-09-22 16:37:41 +02:00
Tamo
1e5e3d57e2
auto convert AscDescError into CriterionError 2021-09-22 16:37:41 +02:00
Tamo
023446ecf3
create a smaller and easier to maintain CriterionError type 2021-09-22 16:37:41 +02:00
Tamo
86e272856a
create an asc_desc error type that is never supposed to be returned to the end user 2021-09-22 16:37:41 +02:00
Tamo
257e621d40
create an asc_desc module 2021-09-22 16:37:41 +02:00
Tamo
113a061bee
fix the error handling on the criterion side 2021-09-22 15:09:07 +02:00
Tamo
78b0bce9a1
fix the returned error when asc desc fails to be parsed 2021-09-22 11:37:05 +02:00
Clémentine Urquizar
f8ecbc28e2
Update version for the next release (v0.15.0) 2021-09-21 18:09:14 +02:00
mpostma
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
bors[bot]
94764e5c7c
Merge #360
360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza

Release containing the geosearch, cf #322 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-21 08:43:27 +00:00
bors[bot]
31c8de1cca
Merge #322
322: Geosearch r=ManyTheFish a=irevoire

This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.

### What we will have to do on the indexing part:
 - [x] Index the `_geo` fields from the documents.
   - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
   - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
   - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field

### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
  - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
  - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
  - [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
  - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.

-----------

- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 19:04:57 +00:00
Irevoire
0d104a0fce
Update milli/src/criterion.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 18:13:17 +02:00
Clémentine Urquizar
3f1453f470
Update version for the next release (v0.14.0) 2021-09-20 18:12:23 +02:00
Tamo
f4b8e5675d
move the reserved keyword logic for the criterion and sort + add test 2021-09-20 17:21:02 +02:00
Irevoire
3b7a2cdbce
fix typo
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 16:10:39 +02:00
Tamo
c695a1ffd2
add the possibility to sort by descending order on geoPoint 2021-09-15 11:49:58 +02:00
Tamo
91ce4d1721
Stop iterating through the whole list of points
We stop when there is no possible candidates left
2021-09-15 11:49:58 +02:00
Clémentine Urquizar
f167f7b412
Update version for the next release (v0.13.1) 2021-09-10 09:48:17 +02:00
Tamo
cfc62a1c15
use geoutils instead of haversine 2021-09-09 18:11:38 +02:00
many
26deeb45a3
Add lacking parameter to word level position builder 2021-09-09 17:49:04 +02:00
Tamo
3fc145c254
if we have no rtree we return all other provided documents 2021-09-09 17:44:09 +02:00
Irevoire
a84f3a8b31
Apply suggestions from code review
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
c81ff22c5b
delete the invalid criterion name error in favor of invalid ranking rule name 2021-09-08 19:17:00 +02:00
Tamo
bad8ea47d5
edit the two lasts TODO comments 2021-09-08 18:24:09 +02:00
Tamo
b15c77ebc4
return an error in case a user try to sort with :desc 2021-09-08 18:24:09 +02:00