169 Commits

Author SHA1 Message Date
bors[bot]
5fad37aebd
Merge #1711
1711: MeiliSearch refactor introducing OBKV format r=MarinPostma a=MarinPostma

This PR refactor some multiple components of meilisearch, and introduce the obkv document format to meilisearch

- [x] Split meilisearch-http and meilisearch-lib
- [x] Replace `IndexActor` and `UuidResolver` with `IndexResolver`
- [x] Remove mentions to Actor
- [x] Remove Actor traits to simplify code
- [x] Integrate obkv document format
- [x] Remove `Data`
- [x] Restore all route
- [x] Replace `Box<dyn error>` with `anyhow::Error`
- [x] Introduce update file store
- [x] Update file store error handling
- [x] Fix dumps
- [x] Fix snapshots
- [x] Fix tests
- [x] Update module documentation
- [x] add csv suppport (feat `@ManyTheFish` #1729 )
- [x] add jsonl support
- [x] integrate geosearch (feat `@irevoire` #1725) 

partially implements #1691 and #1690. The error handling is very basic now, I will finish it in the next pr.

Some unit tests have been disabled, I will re-enable them ASAP, but they need a bit more work.

close #1531 


P.S: sorry for this monstrous PR :'(

Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: many <maxime@meilisearch.com>
2021-09-29 14:38:55 +00:00
mpostma
aa6c5df0bc Implement documents format
document reader transform

remove update format

support document sequences

fix document transform

clean transform

improve error handling

add documents! macro

fix transform bug

fix tests

remove csv dependency

Add comments on the transform process

replace search cli

fmt

review edits

fix http ui

fix clippy warnings

Revert "fix clippy warnings"

This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.

fix review comments

remove smallvec in transform loop

review edits
2021-09-21 16:58:33 +02:00
mpostma
60518449fc split meilisearch-http and meilisearch-lib 2021-09-21 13:23:22 +02:00
happysalada
770b6d25ae deps: unify pest dependency 2021-09-15 12:15:44 +09:00
Many
41bdc90f46
Revert "Enable optimization in every profile" 2021-06-16 14:17:02 +02:00
Irevoire
86b916b008
enable optimization in every profile 2021-06-09 10:26:57 +02:00
Irevoire
e0c327bae2
Update Cargo.toml
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-06-08 11:39:10 +02:00
Irevoire
c82a382b0b
compile every build.rs with optimization 2021-06-08 11:19:22 +02:00
tamo
06c414a753
move the benchmarks to another crate so we can download the datasets automatically without adding overhead to the build of milli 2021-06-02 11:11:50 +02:00
Clément Renault
9423310816
Introduce an helpers crate that export the database to stdout 2021-03-01 19:55:04 +01:00
mpostma
a9a9ed6318
create workspace with meilisearch-error 2021-03-01 14:41:55 +01:00
mpostma
79708aeb67
add milli as git dep 2021-03-01 14:41:20 +01:00
mpostma
5ba58c1e9c
add Marin to authors 2021-02-28 10:09:56 +01:00
mpostma
91d6e90d5d
enable faceted searches 2021-02-16 19:20:39 +01:00
Clément Renault
fecf3d6fc1
Move the command lines helpers into different crates 2021-02-14 18:55:15 +01:00
Clément Renault
e8639517da
Change the project to become a workspace with milli as a default-member 2021-02-12 16:15:09 +01:00
mpostma
f8f02af23e
incorporate review changes 2021-02-04 13:21:15 +01:00
mpostma
9af0a08122
post review fixes 2021-02-02 17:34:06 +01:00
mpostma
17c463ca61
remove unused deps 2021-02-01 13:32:21 +01:00
mpostma
6c63ee6798
implement list all indexes 2021-01-28 18:32:24 +01:00
mpostma
74410d8c6b
architecture rework 2021-01-28 14:12:34 +01:00
Clément Renault
433ac8c38a
Remove the ordered-float serde feature 2021-01-27 14:11:10 +01:00
Kerollmops
61dbcfa44a
Bump the roaring to 0.6.4 2021-01-26 14:38:43 +01:00
Clément Renault
51a37de885
Introduce the FacetValue enum type 2021-01-26 14:09:09 +01:00
mpostma
6a3f625e11
WIP: refactor IndexController
change the architecture of the index controller to allow it to own an
index store.
2021-01-16 15:09:48 +01:00
mpostma
686f987180
fix compile errors 2021-01-14 11:27:07 +01:00
mpostma
ddd7789713
WIP: IndexController 2021-01-13 17:50:36 +01:00
mpostma
1ae761311e
integrate with meilisearch tokenizer 2021-01-07 16:14:27 +01:00
mpostma
0cd9e62fc6 search first iteration 2020-12-24 12:58:34 +01:00
mpostma
02ef1d41d7 route document add json 2020-12-23 16:12:37 +01:00
mpostma
1a38bfd31f data add documents 2020-12-23 13:52:28 +01:00
mpostma
55e1552957 update queue refactor, first iteration 2020-12-22 17:13:50 +01:00
mpostma
7c9eaaeadb clean code, and fix errors 2020-12-22 14:02:41 +01:00
Kerollmops
77e951e933
Use the byte-unit crate to ease library usage 2020-12-20 12:00:37 +01:00
mpostma
8c0ab106c7 initial commit 2020-12-12 13:32:06 +01:00
Clément Renault
e7f2ab9138
Bump grenad to fix an indexing bug 2020-12-05 16:39:15 +01:00
Clément Renault
0959e1501f
Introduce the FacetRevRange Iterator struct 2020-12-04 12:02:23 +01:00
Clément Renault
61b383f422
Introduce the criteria update setting 2020-12-04 12:02:22 +01:00
Clément Renault
a0adfb5e8e
Introduce a real pest parser and support every facet filter conditions 2020-11-23 16:43:55 +01:00
Clément Renault
07a0c82790
Bump heed to 0.10.4 to use be able to lazily decode roaring bitmaps 2020-11-23 16:43:53 +01:00
Clément Renault
38c76754ef
Make the facet level search system generic on f64 and i64 2020-11-23 16:43:52 +01:00
Clément Renault
b255be93fa
Bump heed to 0.10.3 2020-11-23 16:43:49 +01:00
Clément Renault
a18d9a1f87
Parse and store the faceted fields 2020-11-13 16:13:51 +01:00
Clément Renault
640c7d748a
Modify the highlight function to support any JSON type 2020-11-05 13:59:32 +01:00
Clément Renault
0408c9d66a
Move the http server into its own sub-module 2020-11-05 11:16:39 +01:00
Clément Renault
4fded5bd0e
Bump heed to be able to reference a RoTxn from multiple threads 2020-11-02 12:49:23 +01:00
Clément Renault
f0d028d3a4
Update the Transform struct to support JSON updates 2020-10-31 20:52:49 +01:00
Clément Renault
9d47ee52b4
Generate a uuid v4 based document id when missing 2020-10-31 15:11:06 +01:00
Clément Renault
085d3b9d94
Update heed to 0.10.0 2020-10-30 11:42:00 +01:00
Clément Renault
b5d52b6b45
Prefer using a smallstr instead of a real String to reduce allocations 2020-10-29 14:32:32 +01:00
Clément Renault
98fc24cbdf
Bump heed to fix a prefix iter bug 2020-10-28 10:55:21 +01:00
Clément Renault
b44b04d25b
Serialize the CSV record values as JSON strings 2020-10-24 14:43:46 +02:00
Clément Renault
802e925fd7
Switch to a JSON protocol for the front page 2020-10-21 18:26:29 +02:00
Clément Renault
2210818114
Introduce the obkv heed codec 2020-10-21 15:51:48 +02:00
Clément Renault
f948a03be2
Optimise the merge functions to avoid allocations 2020-10-20 16:40:50 +02:00
Clément Renault
cde8478388
Replace the panic in the merge function by actual errors 2020-10-20 16:19:07 +02:00
Clément Renault
35c9a3c558
Brodacast the updates infos to every ws clients 2020-10-20 11:19:34 +02:00
Clément Renault
871222aebd
Introduce some new routes to handle live indexing 2020-10-19 16:06:43 +02:00
Clément Renault
65e32fecb1
Move the binaries into one with subcommands 2020-10-19 13:44:17 +02:00
Clément Renault
eca49e3a03
Introduce a notification channel for the UpdateStore 2020-10-18 16:37:37 +02:00
Clément Renault
83c1db8763
Introduce the UpdateStore 2020-10-18 15:26:57 +02:00
Clément Renault
9021b2dba6
Introduce the enable-chunk-fusing flag 2020-10-14 18:44:59 +02:00
Kerollmops
f980422c57
Move from oxidized-mtbl to grenad 2020-10-14 12:47:32 +02:00
Kerollmops
4e9bd1fef5
Bump oxidized-mtbl 2020-10-07 14:23:22 +02:00
Kerollmops
433d9bbc6e
Use CompressionType::from_str rather than a custom function 2020-10-06 13:50:34 +02:00
Kerollmops
4b819457c9
Enable the strucopt/clap warp help feature 2020-10-06 13:06:22 +02:00
Clément Renault
770f29fd05
Bump the oxidized-mtbl dependency 2020-10-04 17:04:33 +02:00
Clément Renault
acd2a63879
Introduce a simple FST based chinese word segmenter 2020-10-04 17:04:33 +02:00
Kerollmops
68f4af7d2e
Improve the display of the number of processed documents 2020-09-29 16:08:58 +02:00
Clément Renault
ed05999f63
Replace the arc cache by a simple linked hash map 2020-09-23 14:50:52 +02:00
Clément Renault
d6fa9c0414
Index the intra documents word pair proximities 2020-09-22 14:04:33 +02:00
Kerollmops
3ded98e5fa
Bump the roaring version that fix a deserialization bug 2020-09-10 22:37:51 +02:00
Kerollmops
d5e5baa20f
Bump the oxidized-mtbl dependency 2020-09-10 13:29:12 +02:00
Kerollmops
0fb086f241
Use the crates.io raoring library 2020-09-08 15:16:04 +02:00
Clément Renault
bb1ab428db
Use another function to define the proximity 2020-09-06 17:55:07 +02:00
Clément Renault
f928b91e9d
Specify the exact rev for the near-proximity dep 2020-09-06 17:21:38 +02:00
Clément Renault
1c504471d3
Introduce the plane-sweep algorithm 2020-09-05 18:25:27 +02:00
Clément Renault
dc88a86259
Store the word positions under the documents 2020-09-05 18:03:06 +02:00
Kerollmops
580ed1119a
Make the engine to return csv string records as documents and headers 2020-08-31 19:02:00 +02:00
Clément Renault
bad0663138
Come back to the old tokenizer 2020-08-31 13:34:38 +02:00
Clément Renault
3fe497e129
Improve the Mtbl heed codec to only encode MTBL databases 2020-08-29 11:20:39 +02:00
Clément Renault
d19f394630
Make the indexer support gzipped CSV as input 2020-08-21 18:10:24 +02:00
Clément Renault
ff479c865d
Replace pipe by ringtail to improve stdin read performances 2020-08-21 17:45:52 +02:00
Clément Renault
8806fcd545
Introduce a better query and document lexer 2020-08-16 14:36:54 +02:00
Clément Renault
1e358e3ae8
Introduce the AstarBagIter that iterates through best paths 2020-08-15 16:24:06 +02:00
Clément Renault
d5a356902a
Update oxidized-mtbl 2020-08-07 12:14:03 +02:00
Clément Renault
405a71d3a4
Accept csv from stdin 2020-08-06 13:38:21 +02:00
Clément Renault
6508d497ce
Replace the regex highlighting by a simple algorithm 2020-08-05 13:52:27 +02:00
Clément Renault
bd4b18541c
Introduce a new indexer which uses an MTBL sorter 2020-08-04 15:44:37 +02:00
Kerollmops
085c376655
Use the regex crate to highlight "hello" 2020-07-14 11:28:40 +02:00
Kerollmops
12358476da
Use the log crate instead of stderr 2020-07-12 10:55:09 +02:00
Kerollmops
2c62eeea3c
Rename the project milli 2020-07-12 00:16:41 +02:00
Kerollmops
f6eae91c7d
Pretty print the new dashboard numbers 2020-07-11 14:17:37 +02:00
Kerollmops
11c7fef80a
Implement a memory dumper
It moves the in memory HashMaps used when indexing to a disk based MTBL file
2020-07-07 16:48:49 +02:00
Kerollmops
7178b6c2c4
First basic version using MTBL again 2020-07-07 11:32:33 +02:00
Kerollmops
2a3b03138b
Use heed 0.8.1 with the RwIter append method 2020-07-05 19:50:28 +02:00
Kerollmops
46ced5c828
Introduce the RwIter append heed API 2020-07-04 12:34:10 +02:00
Kerollmops
2ae3f40971
Make the indexer ignore certain words
This is a preparation for making the indexing fully parallel by making the
indexer only be aware of certain words for each threads to avoid postings lists
conflicts for each words
2020-07-01 17:49:46 +02:00
Kerollmops
f98b615bf3
Replace the LRU by an Arc cache 2020-06-29 20:48:57 +02:00
Kerollmops
07abebfc46
Introduce a (too big) LRU cache 2020-06-29 18:15:03 +02:00