Go to file
Clément Renault ebf620c7f9
Merge pull request #302 from meilisearch/fix-dataset-schema
Rename the movies dataset schema file
2019-11-17 17:17:33 +01:00
datasets/movies Rename the movies dataset schema file 2019-11-17 16:45:13 +01:00
meilidb-core Move the main types to a separate library 2019-11-17 12:19:36 +01:00
meilidb-http Improve the highlight formatted outputs 2019-11-15 14:16:21 +01:00
meilidb-schema Allow to introduce attributes only at the end of a schema 2019-11-05 12:09:52 +01:00
meilidb-tokenizer Add support for back/slashes 2019-11-11 21:23:08 +01:00
meilidb-types Make the serde and zerocopy meilidb-types dependencies optional 2019-11-17 12:30:39 +01:00
misc Add a gif to show a demo using crates.io 2019-11-09 12:59:39 +01:00
.gitignore Make the repository be a binary and version the Cargo.lock 2019-11-09 12:13:28 +01:00
azure-pipelines.yml Add the meilidb-http binary to the artifacts 2019-11-13 11:15:39 +01:00
Cargo.lock Move the main types to a separate library 2019-11-17 12:19:36 +01:00
Cargo.toml Move the main types to a separate library 2019-11-17 12:19:36 +01:00
deep-dive.md doc: Update the deep-dive explanation text 2019-05-16 12:04:08 +02:00
LICENSE Change the license to an MIT one 2019-11-12 14:24:28 +01:00
README.md Update the README 2019-11-14 19:09:04 +01:00
typos-ranking-rules.md doc: Add a reading on the default typos and ranking rules 2019-02-11 11:58:17 +01:00

MeiliDB

Build Status dependency status License

Ultra relevant and instant full-text search API.

MeiliSearch is a powerful, fast, open-source, easy to use and deploy search engine. The search and indexation are fully customizable and handles features like typo-tolerance, filters, and ranking.

Features

It uses LMDB as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads. The whole ranking system is data oriented and provides great performances.

You can read the deep dive if you want more information on the engine, it describes the whole process of generating updates and handling queries or you can take a look at the typos and ranking rules if you want to know the default rules used to sort the documents.

We will be proud if you submit issues and pull requests. You can help to grow this project and start contributing by checking issues tagged "good-first-issue". It is a good start!

crates.io demo gif

Meili helps the Rust community find crates on crates.meilisearch.com

Quick Start

You can deploy your own instant, relevant and typo-tolerant MeiliDB search engine by yourself too. Something similar to the demo above can be achieve by following these little three steps first. You will need to create your own web front display to make it pretty though.

Deploy the Server

If you have not installed Rust and its package manager cargo yet, go to the installation page.
You can deploy the server on your own machine, it will listen to HTTP requests on the 8080 port by default.

rustup override set nightly
cargo run --release

For more logs during the execution, run:

RUST_LOG=info cargo run --release

Create an Index and Upload Some Documents

MeiliDB can serve multiple indexes, with different kinds of documents, therefore, it is required to create the index before sending documents to it.

curl -i -X POST 'http://127.0.0.1:8080/indexes/movies'

Now that the server knows about our brand new index, we can send it data. We provided you a little dataset, it is available in the datasets/ directory.

curl -i -X POST 'http://127.0.0.1:8080/indexes/movies/documents' \
  --header 'content-type: application/json' \
  --data @datasets/movies/movies.json

Search for Documents

The search engine is now aware of our documents and can serve those via our HTTP server again. The jq command line tool can greatly help you read the server responses.

curl 'http://127.0.0.1:8080/indexes/movies/search?q=botman'
{
  "hits": [
    {
      "id": "29751",
      "title": "Batman Unmasked: The Psychology of the Dark Knight",
      "poster": "https://image.tmdb.org/t/p/w1280/jjHu128XLARc2k4cJrblAvZe0HE.jpg",
      "overview": "Delve into the world of Batman and the vigilante justice tha",
      "release_date": "2008-07-15"
    },
    {
      "id": "471474",
      "title": "Batman: Gotham by Gaslight",
      "poster": "https://image.tmdb.org/t/p/w1280/7souLi5zqQCnpZVghaXv0Wowi0y.jpg",
      "overview": "ve Victorian Age Gotham City, Batman begins his war on crime",
      "release_date": "2018-01-12"
    }
  ],
  "offset": 0,
  "limit": 2,
  "processingTimeMs": 1,
  "query": "botman"
}

Performances

With a dataset composed of 100 353 documents with 352 attributes each and 3 of them indexed. So more than 300 000 fields indexed for 35 million stored we can handle more than 2.8k req/sec with an average response time of 9 ms on an Intel i7-7700 (8) @ 4.2GHz.

Requests are made using wrk and scripted to simulate real users queries.

Running 10s test @ http://localhost:2230
  2 threads and 25 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.52ms    7.61ms  99.25ms   84.58%
    Req/Sec     1.41k   119.11     1.78k    64.50%
  28080 requests in 10.01s, 7.42MB read
Requests/sec:   2806.46
Transfer/sec:    759.17KB

We also indexed a dataset containing something like 12 millions cities names in 24 minutes on a machine with 8 cores, 64 GB of RAM and a 300 GB NMVe SSD.
The resulting database was 16 GB and search results were between 30 ms and 4 seconds for short prefix queries.

Notes

With Rust 1.32 the allocator has been changed to use the system allocator. We have seen much better performances when using jemalloc as the global allocator.

Usage and Examples

MeiliDB also provides an example binary that is mostly used for features testing. Notice that the example binary is faster to index data as it does read direct CSV files and not JSON HTTP payloads.

The index subcommand has been made to create an index and inject documents into it. Using the command line below, the index will be named movies and the 19 700 movies of the datasets/ will be injected in MeiliDB.

cargo run --release --example from_file -- \
    index example.mdb datasets/movies/movies.csv \
    --schema datasets/movies/schema.toml

Once the first command is done, you can query the freshly created movies index using the search subcomand. In this example we filtered the dataset to only show non-adult movies using the non-definitive !adult syntax filter.

cargo run --release --example from_file -- \
    search example.mdb \
    --number-results 4 \
    --filter '!adult' \
    id popularity adult original_title