Commit Graph

195 Commits

Author SHA1 Message Date
bors[bot] 15c29cdd9b
Merge #401
401: Update version for the next release (v0.19.0) r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-10-25 12:49:53 +00:00
Clémentine Urquizar 208903ddde
Revert "Replacing pest with nom " 2021-10-25 11:58:00 +02:00
Clémentine Urquizar 679fe18b17
Update version for the next release (v0.19.0) 2021-10-25 11:52:17 +02:00
marin postma 0f86d6b28f
implement csv serialization 2021-10-25 10:26:42 +02:00
Tamo efb2f8b325
convert the errors 2021-10-22 16:38:35 +02:00
Tamo c27870e765
integrate a first version without any error handling 2021-10-22 14:33:18 +02:00
Tamo 01dedde1c9
update some names and move some parser out of the lib.rs 2021-10-22 01:59:38 +02:00
Clémentine Urquizar f8fe9316c0
Update version for the next release (v0.18.1) 2021-10-21 11:56:14 +02:00
Clémentine Urquizar 2209acbfe2
Update version for the next release (v0.18.2) 2021-10-18 13:45:48 +02:00
bors[bot] 59cc59e93e
Merge #358
358: Replacing pest with nom  r=Kerollmops a=CNLHC



Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>
2021-10-16 20:44:38 +00:00
刘瀚骋 7666e4f34a follow the suggestions 2021-10-14 21:37:59 +08:00
bors[bot] c7db4176f3
Merge #384
384: Replace memmap with memmap2 r=Kerollmops a=palfrey

[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.

Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
2021-10-13 13:47:23 +00:00
刘瀚骋 f7796edc7e remove everything about pest 2021-10-12 13:30:40 +08:00
刘瀚骋 8748df2ca4 draft without error handling 2021-10-12 13:30:40 +08:00
Clémentine Urquizar dd56e82dba
Update version for the next release (v0.17.2) 2021-10-11 15:20:35 +02:00
Tom Parker-Shemilt 2dfe24f067 memmap -> memmap2 2021-10-10 22:47:12 +01:00
Clémentine Urquizar 05d8a33a28
Update version for the next release (v0.17.1) 2021-10-02 16:21:31 +02:00
Clémentine Urquizar 0e8665bf18
Update version for the next release (v0.17.0) 2021-09-28 19:38:12 +02:00
Clémentine Urquizar 1eacab2169
Update version for the next release (v0.15.1) 2021-09-22 17:18:54 +02:00
Clémentine Urquizar f8ecbc28e2
Update version for the next release (v0.15.0) 2021-09-21 18:09:14 +02:00
mpostma aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
bors[bot] 94764e5c7c
Merge #360
360: Update version for the next release (v0.14.0) r=Kerollmops a=curquiza

Release containing the geosearch, cf #322 

Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-09-21 08:43:27 +00:00
bors[bot] 31c8de1cca
Merge #322
322: Geosearch r=ManyTheFish a=irevoire

This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html) that we de/serialize in the index database [by using serde](https://serde.rs/) with [bincode](https://docs.rs/bincode). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.

### What we will have to do on the indexing part:
 - [x] Index the `_geo` fields from the documents.
   - [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
   - [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
   - [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field

### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter) while they are in range.
  - [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
  - [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
  - [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
  - [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.

-----------

- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.

Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 19:04:57 +00:00
Clémentine Urquizar 3f1453f470
Update version for the next release (v0.14.0) 2021-09-20 18:12:23 +02:00
Clémentine Urquizar f167f7b412
Update version for the next release (v0.13.1) 2021-09-10 09:48:17 +02:00
Tamo cfc62a1c15
use geoutils instead of haversine 2021-09-09 18:11:38 +02:00
Tamo e5ef0cad9a
use meters in the filters 2021-09-08 18:24:09 +02:00
Irevoire 8d9c2c4425
create a new db with getters and setters 2021-09-08 17:51:07 +02:00
Kerollmops 8a088fb99e
Bump grenad to v0.3.1 2021-09-08 14:08:55 +02:00
Kerollmops 20ad43b908
Enable the grenad tempfile feature back 2021-09-08 14:06:28 +02:00
Clémentine Urquizar eb7b9d9dbf
Update version for the next release (v0.13.0) 2021-09-08 10:59:30 +02:00
mpostma cd043d4461 remove unused grenad default features 2021-09-07 16:21:46 +02:00
bors[bot] 5cbe879325
Merge #308
308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish

Rewrite the indexer:
- enhance memory consumption control
- optimize parallelism using rayon and crossbeam channel
- factorize the different parts and make new DB implementation easier
- optimize and fix prefix databases


Co-authored-by: many <maxime@meilisearch.com>
2021-09-02 15:03:52 +00:00
many db0c681bae
Fix Pr comments 2021-09-02 15:17:52 +02:00
Clémentine Urquizar 285849e3a6
Update version for the next release (v0.12.0) 2021-09-02 10:08:41 +02:00
many 1d314328f0
Plug new indexer 2021-09-01 16:48:36 +02:00
Kerollmops f2e1591826
Remove the unused tinytemplate dependency 2021-08-24 18:10:58 +02:00
Kerollmops 2f20257070
Update milli to the v0.11.0 2021-08-24 18:10:11 +02:00
Clément Renault 89d0758713
Revert "Revert "Sort at query time"" 2021-08-24 11:55:16 +02:00
Clémentine Urquizar 88f6c18665
Update version for the next release (v0.10.2) 2021-08-23 11:33:30 +02:00
Clémentine Urquizar 922f9fd4d5
Revert "Sort at query time" 2021-08-20 18:09:17 +02:00
bors[bot] 41fc0dcb62
Merge #309
309: Sort at query time r=Kerollmops a=Kerollmops

This PR:
 - Makes the `Asc/Desc` criteria work with strings too, it first returns documents ordered by numbers then by strings, and finally the documents that can't be ordered. Note that it is lexicographically ordered and not ordered by character, which means that it doesn't know about wide and short characters i.e. `a`, `丹`, `▲`.
 - Changes the syntax for the `Asc/Desc` criterion by now using a colon to separate the name and the order i.e. `title:asc`, `price:desc`.
 - Add the `Sort` criterion at the third position in the ranking rules by default.
 - Add the `sort_criteria` method to the `Search` builder struct to let the users define the `Asc/Desc` sortable attributes they want to use at query time. Note that we need to check that the fields are registered in the sortable attributes before performing the search.
 - Introduce a new `InvalidSortableAttribute` user error that is raised when the sort criteria declared at query time are not part of the sortable attributes.
 - `@ManyTheFish` introduced integration tests for the dynamic Sort criterion.

Fixes #305.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
2021-08-18 16:55:32 +00:00
bors[bot] 198c416bd8
Merge #312
312: Update milli version to v0.10.1 r=Kerollmops a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2021-08-18 12:08:04 +00:00
Clémentine Urquizar 6cb9c3b81f
Update milli version to v0.10.1 2021-08-18 13:46:27 +02:00
Clémentine Urquizar 42cf847a63
Update tokenizer version to v0.2.5 2021-08-18 13:37:41 +02:00
Kerollmops fcedff95e8
Change the Asc/Desc criterion syntax to use a colon (:) 2021-08-17 14:03:21 +02:00
Clémentine Urquizar fcc520e49a
Update version for the next release (v0.10.0) 2021-08-16 12:00:28 +02:00
Clémentine Urquizar 7f26c75610
Update milli to v0.9.0 2021-08-04 16:04:55 +02:00
Kerollmops 341c244965
Bump milli to v0.8.1 2021-07-29 15:56:36 +02:00
Clémentine Urquizar 6a141694da
Update version for the next release (v0.8.0) 2021-07-27 16:38:42 +02:00
Kerollmops 0353fbb5df
Bump the tokenizer version to v0.2.4 2021-07-22 17:14:45 +02:00
Kerollmops 838ed1cd32
Use an u16 field id instead of one byte 2021-07-06 11:58:03 +02:00
Kerollmops 91c5d0c042
Use the AlwaysFreePages flag when opening an index 2021-07-05 16:36:13 +02:00
Kerollmops a6b4069172
Bump to v0.7.2 2021-07-05 10:54:53 +02:00
Clémentine Urquizar 3c149d8a43
Update tokenizer version to v0.2.3 2021-06-30 18:41:35 +02:00
Clémentine Urquizar b489515f4d
Update milli version to v0.7.1 2021-06-30 13:52:46 +02:00
Clément Renault 80c6aaf1fd
Bump milli to 0.7.0 2021-06-28 18:31:56 +02:00
Clément Renault bdc5599b73
Bump heed to use the git repo with v0.12.0 2021-06-28 18:26:20 +02:00
Kerollmops 98285b4b18
Bump milli to 0.6.0 2021-06-23 17:30:26 +02:00
Clémentine Urquizar 9885fb4159
Update version for the next release (v0.5.1) 2021-06-23 14:05:20 +02:00
Clémentine Urquizar 320670f8fe
Update version for the next release (v0.5.0) 2021-06-21 15:59:17 +02:00
Clémentine Urquizar 35fcc351a0
Update version for the next release (v0.4.2) 2021-06-20 17:37:24 +02:00
Kerollmops ccd6f13793
Update version to the next release (0.4.1) 2021-06-17 15:01:20 +02:00
Clémentine Urquizar f5ff3e8e19
Update version for the next release (v0.4.0) 2021-06-16 14:01:05 +02:00
Kerollmops 312c2d1d8e
Use the Error enum everywhere in the project 2021-06-14 16:58:38 +02:00
Clémentine Urquizar dc64e139b9
Update version for the next release (v0.3.1) 2021-06-09 14:39:21 +02:00
Kerollmops 103dddba2f
Move the UpdateStore into the http-ui crate 2021-06-08 17:59:51 +02:00
Clémentine Urquizar 3b2b3aeea9
Update Cargo.toml for next release v0.3.0 2021-06-03 12:24:27 +02:00
tamo 06c414a753
move the benchmarks to another crate so we can download the datasets automatically without adding overhead to the build of milli 2021-06-02 11:11:50 +02:00
tamo d0b44c380f
add benchmarks on a wiki dataset 2021-06-02 11:05:07 +02:00
tamo 5132a106a1
refactorize everything related to the songs dataset in a songs benchmark file 2021-06-02 11:05:07 +02:00
tamo 3def42abd8
merge all the criterion only benchmarks in one file 2021-06-02 11:05:07 +02:00
tamo aee49bb3cd
add the proximity criterion 2021-06-02 11:05:07 +02:00
tamo 49e4cc3daf
add the words criterion to the bench 2021-06-02 11:05:07 +02:00
tamo 4fdbfd6048
push a first version of the benchmark for the typo 2021-06-02 11:05:07 +02:00
Clémentine Urquizar 1e11578ef0
Update version for the next release (v0.2.1) 2021-05-05 14:57:34 +02:00
Clémentine Urquizar a8680887d8
Upgrade Milli version (v0.2.0) 2021-05-03 14:50:47 +02:00
Clémentine Urquizar 34e02aba42
Upgrade Tokenizer version (v0.2.2) 2021-05-03 10:55:55 +02:00
many 0d7d3ce802
Update roaring package 2021-04-27 14:39:53 +02:00
many 71740805a7
Fix forgotten typo tests 2021-04-27 14:39:53 +02:00
Clément Renault 658f316511
Introduce the Initial Criterion 2021-04-27 14:35:43 +02:00
Kerollmops 0f4c0beffd
Introduce the Attribute criterion 2021-04-27 14:25:34 +02:00
Kerollmops 51767725b2
Simplify integer and float functions trait bounds 2021-04-20 10:23:31 +02:00
Clémentine Urquizar 127d3d028e
Update version for the next release (v0.1.1) 2021-04-19 14:48:13 +02:00
Clémentine Urquizar 2c5c79d68e
Update Tokenizer version to v0.2.1 2021-04-14 18:54:04 +02:00
tamo 62a8f1d707
bump the version of the tokenizer 2021-04-01 13:49:22 +02:00
tamo 73dcdb27f6
select a specific release of the tokenizer instead of using the latests git commit 2021-03-25 15:00:18 +01:00
mpostma 80d0f9c49d
methods to update index time metadata 2021-03-15 14:05:47 +01:00
Clément Renault b18ec00a7a
Add a logging_timer macro to te criterion next methods 2021-03-08 16:12:06 +01:00
Kerollmops 636a9df177
Temporarily fix the tinytemplate doc hidden issue 2021-03-08 15:57:45 +01:00
Kerollmops 79a143b32f
Introduce the query tree data structure 2021-03-03 13:40:18 +01:00
Kerollmops 519b1cb5c9
Update dependencies 2021-02-21 10:26:04 +01:00
Clément Renault fecf3d6fc1
Move the command lines helpers into different crates 2021-02-14 18:55:15 +01:00
Clément Renault d8f3421608
Update the dependencies and remove the unused ones 2021-02-14 18:32:46 +01:00
Clément Renault e8639517da
Change the project to become a workspace with milli as a default-member 2021-02-12 16:15:09 +01:00