Go to file
Clément Renault ae8b4f56f2
Merge pull request #163 from meilisearch/export-compute-docid
Expose a function to compute the DocumentId from an Hashable value
2019-06-25 12:25:38 +02:00
ci chore: Add travis-ci to check the codebase 2018-12-17 15:52:49 +01:00
examples chore: Rename the ebay example into kaggle 2019-05-16 12:04:07 +02:00
meilidb fix: Remove tide as it break compilation on the latest nightly 2019-06-18 13:40:46 +02:00
meilidb-core chore: Do a little clippy pass 2019-05-22 11:00:58 +02:00
meilidb-data feat: Expose a function to compute the DocumentId from an Hashable value 2019-06-25 11:21:12 +02:00
meilidb-schema feat: Move the Schema to its own workspace crate 2019-05-29 15:37:28 +02:00
meilidb-tokenizer feat: Make the Tokenizer able to support tokenizing sequences 2019-04-29 14:32:00 +02:00
misc doc: add a new +19k movies example dataset 2019-04-13 21:11:28 +02:00
.gitignore chore: Update the .gitignore file 2019-04-29 14:31:36 +02:00
Cargo.toml feat: Move the Schema to its own workspace crate 2019-05-29 15:37:28 +02:00
LICENSE Initial commit 2018-05-05 10:16:18 +02:00
README.md doc: Fix some badly spelled sentences 2019-05-22 11:41:03 +02:00
azure-pipelines.yml Update ci with rust nightly only 2019-05-02 11:43:45 +02:00
deep-dive.md doc: Update the deep-dive explanation text 2019-05-16 12:04:08 +02:00
typos-ranking-rules.md doc: Add a reading on the default typos and ranking rules 2019-02-11 11:58:17 +01:00

README.md

MeiliDB

Build Status dependency status License Rust 1.31+

A full-text search database using a key-value store internally.

Features

It uses sled as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads. The whole ranking system is data oriented and provides great performances.

You can read the deep dive if you want more information on the engine, it describes the whole process of generating updates and handling queries or you can take a look at the typos and ranking rules if you want to know the default rules used to sort the documents.

We will be proud if you submit issues and pull requests. You can help to grow this project and start contributing by checking issues tagged "good-first-issue". It is a good start!

The project is only a library yet. It means that there is no binary provided yet. To get started, you can check the examples wich are made to work with the data located in the misc/ folder.

MeiliDB will be a binary in a near future so you will be able to use it as a database out-of-the-box. We should be able to query it using a to-be-defined protocol. This is our current goal, see the milestones. In the end, the binary will be a bunch of network protocols and wrappers around the library - which will also be published on crates.io. Both the binary and the library will follow the same update cycle.

Performances

With a database composed of 100 353 documents with 352 attributes each and 3 of them indexed. So more than 300 000 fields indexed for 35 million stored we can handle more than 2.8k req/sec with an average response time of 9 ms on an Intel i7-7700 (8) @ 4.2GHz.

Requests are made using wrk and scripted to simulate real users queries.

Running 10s test @ http://localhost:2230
  2 threads and 25 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.52ms    7.61ms  99.25ms   84.58%
    Req/Sec     1.41k   119.11     1.78k    64.50%
  28080 requests in 10.01s, 7.42MB read
Requests/sec:   2806.46
Transfer/sec:    759.17KB

Notes

The default Rust allocator has recently been changed to use the system allocator. We have seen much better performances when using jemalloc as the global allocator.

Usage and examples

You can try a little part of MeiliDB with the following commands. It creates an index named movies and insert two great Tarantino movies in it.

cargo run --release

curl -XPOST 'http://127.0.0.1:8000/movies' \
    -d '
identifier = "id"

[attributes.id]
stored = true

[attributes.title]
stored = true
indexed = true
'

curl -H 'Content-Type: application/json' \
     -XPUT 'http://127.0.0.1:8000/movies' \
     -d '{ "id": 123, "title": "Inglorious Bastards" }'

curl -H 'Content-Type: application/json' \
     -XPUT 'http://127.0.0.1:8000/movies' \
     -d '{ "id": 456, "title": "Django Unchained" }'

Once the database is initialized you can query it by using the following command:

curl -XGET 'http://127.0.0.1:8000/movies/search?q=inglo'