ef6a4db182
Removing the fields_count fetching reduced by 2 times the serach time, we should look at lazily pulling them form the criterions in needs ugly-test: Make the fields_count fetching lazy Just before running the exactness criterion |
||
---|---|---|
.github/workflows | ||
datasets/movies | ||
meilisearch-core | ||
meilisearch-http | ||
meilisearch-schema | ||
meilisearch-tokenizer | ||
meilisearch-types | ||
misc | ||
.dockerignore | ||
.gitignore | ||
Cargo.lock | ||
Cargo.toml | ||
deep-dive.md | ||
Dockerfile | ||
download-latest.sh | ||
LICENSE | ||
README.md | ||
typos-ranking-rules.md |
MeiliSearch
⚡ Ultra relevant and instant full-text search API 🔍
MeiliSearch is a powerful, fast, open-source, easy to use, and deploy search engine. The search and indexation are fully customizable and handles features like typo-tolerance, filters, and synonyms. For more details about those features, go to our documentation.
Meili helps the Rust community find crates on crates.meilisearch.com
Features
- Search as-you-type experience (answers < 50ms)
- Full-text search
- Typo tolerant (understands typos and spelling mistakes)
- Supports Kanji
- Supports Synonym
- Easy to install, deploy, and maintain
- Whole documents returned
- Highly customizable
- RESTfull API
Quick Start
Deploy the Server
Run it using Docker
docker run -it -p 7700:7700 --rm getmeili/meilisearch
Installation using APT
echo "deb [trusted=yes] https://apt.fury.io/meilisearch/ /" > /etc/apt/sources.list.d/fury.list
apt update && apt install meilisearch-http
meilisearch
Download the binary
curl -L https://install.meilisearch.com | sh
./meilisearch
Compile and run it from sources
If you have the Rust toolchain already installed, you can compile from the source
git clone https://github.com/meilisearch/MeiliSearch.git
cd MeiliSearch
cargo run --release
Create an Index and Upload Some Documents
We provide a movie dataset that you can use for testing purposes.
curl -L 'https://bit.ly/2PAcw9l' -o movies.json
MeiliSearch can serve multiple indexes, with different kinds of documents, therefore, it is required to create the index before sending documents to it.
curl -i -X POST 'http://127.0.0.1:7700/indexes' --data '{ "name": "Movies", "uid": "movies" }'
Now that the server knows about our brand new index, we can send it data.
We provided you a small dataset that is available in the datasets/
directory.
curl -i -X POST 'http://127.0.0.1:7700/indexes/movies/documents' \
--header 'content-type: application/json' \
--data-binary @movies.json
Search for Documents
The search engine is now aware of our documents and can serve those via our HTTP server again.
The jq
command-line tool can significantly help you read the server responses.
curl 'http://127.0.0.1:7700/indexes/movies/search?q=botman+robin&limit=2' | jq
{
"hits": [
{
"id": "415",
"title": "Batman & Robin",
"poster": "https://image.tmdb.org/t/p/w1280/79AYCcxw3kSKbhGpx1LiqaCAbwo.jpg",
"overview": "Along with crime-fighting partner Robin and new recruit Batgirl...",
"release_date": "1997-06-20",
},
{
"id": "411736",
"title": "Batman: Return of the Caped Crusaders",
"poster": "https://image.tmdb.org/t/p/w1280/GW3IyMW5Xgl0cgCN8wu96IlNpD.jpg",
"overview": "Adam West and Burt Ward returns to their iconic roles of Batman and Robin...",
"release_date": "2016-10-08",
}
],
"offset": 0,
"limit": 2,
"processingTimeMs": 1,
"query": "botman robin"
}
Documentation
Now, that you have a running MeiliSearch, you can learn more and tune your search engine using the documentation.
How it works
MeiliSearch uses LMDB as the internal key-value store. The key-value store allows us to handle updates and queries with small memory and CPU overheads. The whole ranking system is data oriented and provides great performances.
You can read the deep dive if you want more information on the engine; it describes the whole process of generating updates and handling queries. Also, you can take a look at the typos and ranking rules if you want to know the default rules used to sort the documents.
Technical features
- Provides 6 default ranking criteria used to bucket sort documents
- Accepts custom criteria and can apply them in any custom order
- Support ranged queries, useful for paginating results
- Can distinct and filter returned documents based on context defined rules
- Searches for concatenated and splitted query words to improve the search quality.
- Can store complete documents or only user schema specified fields
- The default tokenizer can index latin and kanji based languages
- Returns the matching text areas, useful to highlight matched words in results
- Accepts query time search config like the searchable attributes
- Supports runtime incremental indexing
Performances
With a dataset composed of 100 353 documents with 352 attributes each and 3 of them indexed. So more than 300 000 fields indexed for 35 million stored we can handle more than 2.8k req/sec with an average response time of 9 ms on an Intel i7-7700 (8) @ 4.2GHz.
Requests are made using wrk and scripted to simulate real users' queries.
Running 10s test @ http://localhost:2230
2 threads and 25 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 9.52ms 7.61ms 99.25ms 84.58%
Req/Sec 1.41k 119.11 1.78k 64.50%
28080 requests in 10.01s, 7.42MB read
Requests/sec: 2806.46
Transfer/sec: 759.17KB
We also indexed a dataset containing something like 12 millions cities names in 24 minutes on a machine with 8 cores, 64 GB of RAM, and a 300 GB NMVe SSD.
The resulting database was 16 GB and search results were between 30 ms and 4 seconds for short prefix queries.
Notes
With Rust 1.32 the allocator has been changed to use the system allocator. We have seen much better performances when using jemalloc as the global allocator.
Contributing
We will be glad if you submit issues and pull requests. You can help to grow this project and start contributing by checking issues tagged "good-first-issue". It is a good start!
Analytic Events
We send events to our Amplitude instance to be aware of the number of people who use MeiliSearch.
We only send the platform on which the server runs once by day. No other information is sent.
If you do not want us to send events, you can disable these analytics by using the MEILI_NO_ANALYTICS
env variable.