3056b351fa
doc: add a new +19k movies example dataset |
||
---|---|---|
ci | ||
examples | ||
misc | ||
src | ||
.gitignore | ||
azure-pipelines.yml | ||
Cargo.toml | ||
deep-dive.md | ||
LICENSE | ||
README.md | ||
typos-ranking-rules.md |
MeiliDB
A full-text search database using a key-value store internally.
Features
- Provides 6 default ranking criteria used to bucket sort documents
- Accepts custom criteria and can apply them in any custom order
- Support ranged queries, useful for paginating results
- Can distinct and filter returned documents based on context defined rules
- Can store complete documents or only user schema specified fields
- The default tokenizer can index latin and kanji based languages
- Returns the matching text areas, useful to highlight matched words in results
- Accepts query time search config like the searchable fields
- Supports run time indexing (incremental indexing)
It uses RocksDB as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads. The whole ranking system is data oriented and provides great performances.
You can read the deep dive if you want more information on the engine, it describes the whole process of generating updates and handling queries or you can take a look at the typos and ranking rules if you want to know the default rules used to sort the documents.
We will be proud if you submit issues and pull requests. You can help to grow this project and start contributing by checking issues tagged "good-first-issue". It is a good start!
The project is only a library yet. It means that there is no binary provided yet. To get started, you can check the examples wich are made to work with the data located in the misc/
folder.
MeiliDB will be a binary in a near future so you will be able to use it as a database out-of-the-box. We should be able to query it using a to-be-defined protocol. This is our current goal, see the milestones. In the end, the binary will be a bunch of network protocols and wrappers around the library - which will also be published on crates.io. Both the binary and the library will follow the same update cycle.
Performances
With a database composed of 100 353 documents with 352 attributes each and 3 of them indexed. So more than 300 000 fields indexed for 35 million stored we can handle more than 2.8k req/sec with an average response time of 9 ms on an Intel i7-7700 (8) @ 4.2GHz.
Requests are made using wrk and scripted to simulate real users queries.
Running 10s test @ http://localhost:2230
2 threads and 25 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.52ms 7.61ms 99.25ms 84.58%
Req/Sec 1.41k 119.11 1.78k 64.50%
28080 requests in 10.01s, 7.42MB read
Requests/sec: 2806.46
Transfer/sec: 759.17KB
Notes
The default Rust allocator has recently been changed to use the system allocator. We have seen much better performances when using jemalloc as the global allocator.
Usage and examples
MeiliDB runs with an index like most search engines. So to test the library you can create one by indexing a simple csv file.
cargo run --release --example create-database -- test.mdb examples/movies/movies.csv --schema examples/movies/schema-movies.toml
Once the command is executed, the index should be in the test.mdb
folder. You are now able to run the query-database
example and play with MeiliDB.
cargo run --release --example query-database -- test.mdb -n 10 id title overview release_date