mirror of https://github.com/meilisearch/MeiliSearch synced 2025-07-03 11:57:07 +02:00

No description

Find a file

Clément Renault ccded7b429 Improve the indexer to not not deunicode before indexing Revert of #179		2019-11-04 16:41:58 +01:00
datasets/movies	Add a movies example dataset to the repository	2019-10-09 16:46:11 +02:00
meilidb-core	Improve the indexer to not not deunicode before indexing	2019-11-04 16:41:58 +01:00
meilidb-http	Moving to heed v0.5.0	2019-11-04 10:49:27 +01:00
meilidb-schema	Bump the meili-core/schema/tokenizer crates to 0.6.0	2019-10-31 14:05:59 +01:00
meilidb-tokenizer	Bump the meili-core/schema/tokenizer crates to 0.6.0	2019-10-31 14:05:59 +01:00
misc	doc: add a new +19k movies example dataset	2019-04-13 21:11:28 +02:00
.gitignore	Merge branch 'moving-to-lmdb'	2019-10-09 17:23:48 +02:00
azure-pipelines.yml	Update the CI to check the fmt and clippy	2019-10-18 13:33:38 +02:00
Cargo.toml	Introduce the HTTP tide based library	2019-10-31 15:02:34 +01:00
deep-dive.md	Reintroduce the deep-dive and typos-ranking-rules explanations documents	2019-10-09 16:57:27 +02:00
LICENSE	Merge branch 'moving-to-lmdb'	2019-10-09 17:23:48 +02:00
README.md	Add information about search concat and split query words support	2019-10-23 18:19:15 +02:00
typos-ranking-rules.md	Reintroduce the deep-dive and typos-ranking-rules explanations documents	2019-10-09 16:57:27 +02:00

README.md

MeiliDB

A full-text search database based on the fast LMDB key-value store.

Features

Provides 6 default ranking criteria used to bucket sort documents
Accepts custom criteria and can apply them in any custom order
Support ranged queries, useful for paginating results
Can distinct and filter returned documents based on context defined rules
Searches for concatenated and splitted query words to improve the search quality.
Can store complete documents or only user schema specified fields
The default tokenizer can index latin and kanji based languages
Returns the matching text areas, useful to highlight matched words in results
Accepts query time search config like the searchable attributes
Supports runtime incremental indexing

It uses LMDB as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads. The whole ranking system is data oriented and provides great performances.

You can read the deep dive if you want more information on the engine, it describes the whole process of generating updates and handling queries or you can take a look at the typos and ranking rules if you want to know the default rules used to sort the documents.

We will be proud if you submit issues and pull requests. You can help to grow this project and start contributing by checking issues tagged "good-first-issue". It is a good start!

The project is only a library yet. It means that there is no binary provided yet. To get started, you can check the examples wich are made to work with the data located in the datasets/ folder.

MeiliDB will be a binary in a near future so you will be able to use it as a database out-of-the-box. We should be able to query it using HTTP. This is our current goal, see the milestones. In the end, the binary will be a bunch of network protocols and wrappers around the library - which will also be published on crates.io. Both the binary and the library will follow the same update cycle.

Performances

With a database composed of 100 353 documents with 352 attributes each and 3 of them indexed. So more than 300 000 fields indexed for 35 million stored we can handle more than 2.8k req/sec with an average response time of 9 ms on an Intel i7-7700 (8) @ 4.2GHz.

Requests are made using wrk and scripted to simulate real users queries.

Running 10s test @ http://localhost:2230
  2 threads and 25 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.52ms    7.61ms  99.25ms   84.58%
    Req/Sec     1.41k   119.11     1.78k    64.50%
  28080 requests in 10.01s, 7.42MB read
Requests/sec:   2806.46
Transfer/sec:    759.17KB

Notes

With Rust 1.32 the allocator has been changed to use the system allocator. We have seen much better performances when using jemalloc as the global allocator.

Usage and examples

Currently MeiliDB do not provide an http server but you can run the example binary.

The index subcommand has been made to create an index and inject documents into it. Using the command line below, the index will be named movies and the 19 700 movies of the datasets/ will be injected in MeiliDB.

cargo run --release --example from_file -- \
    index example.mdb datasets/movies/data.csv \
    --schema datasets/movies/schema.toml

Once the first command is done, you can query the freshly created movies index using the search subcomand. In this example we filtered the dataset to only show non-adult movies using the non-definitive !adult syntax filter.

cargo run --release --example from_file -- \
    search example.mdb
    --number 4 \
    --filter '!adult' \
    id popularity adult original_title