doc: Update the README

This commit is contained in:
Clément Renault 2018-12-11 16:17:22 +01:00
parent 2cbb943cbe
commit f97f7f93f3
No known key found for this signature in database
GPG Key ID: 0151CDAB43460DAE

View File

@ -1,17 +1,24 @@
# MeiliDB
A search engine based on the [blog posts serie](https://blog.algolia.com/inside-the-algolia-engine-part-1-indexing-vs-search/) of the great Algolia company.
A _full-text search database_ using a key-value store internally.
If you want to be involved in the project you can [read the deep dive](deep-dive.md).
It uses [RocksDB](https://github.com/facebook/rocksdb) like a classic database, to store documents and internal data. The key-value store power allow us to handle updates and queries with small memory and CPU overheads.
This is a library, this means that binary are not part of this repository
but since I'm still nice I have made some examples for you in the `examples/` folder.
You can [read the deep dive](deep-dive.md) if you want more informations on the engine, it describes the whole process of generating updates and handling queries.
We will be proud if you send pull requests to help us grow this project, you can start with [issues tagged "good-first-issue"](https://github.com/Kerollmops/MeiliDB/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) to start !
At the moment this is a library only, this means that binaries are not part of this repository but since I'm still nice I have made some examples for you in the `examples/` folder that works with the data located in the `misc/` folder.
In a near future MeiliDB we be a binary like any database: updated and queried using some kind of protocol. It is the final goal, [see the milestones](https://github.com/Kerollmops/MeiliDB/milestones). MeiliDB will just be a bunch of network and protocols functions wrapping the library which itself will be published to https://crates.io, following the same update cycle.
## Performances
We made some tests on remote machines and found that we can handle, on a server that cost 5$/month with 1vCPU and 1GB of ram and on the same index and with a simple query:
_these informations have been made with a version dated of october 2018, we must update them_
We made some tests on remote machines and found that we can handle with a dataset of near 280k products, on a server that cost 5$/month with 1vCPU and 1GB of ram and on the same index and with a simple query:
- near 190 users with an average response time of 90ms
- 150 users with an average response time of 70ms
@ -27,21 +34,14 @@ MeiliDB work with an index like most of the search engines.
So to test the library you can create one by indexing a simple csv file.
```bash
cargo build --release --example csv-indexer
time ./target/release/examples/csv-indexer --stop-words misc/en.stopwords.txt misc/kaggle.csv
cargo run --release --example create-database -- test.mdb misc/kaggle.csv
```
The `en.stopwords.txt` file here is a simple file that contains one stop word by line (e.g. or, and).
Once the command finished indexing the database should have been saved under the `test.mdb` folder.
Once the command finished indexing you will have 3 files that compose the index:
- The `xxx.map` represent the fst map.
- The `xxx.idx` represent the doc indexes matching the words in the map.
- The `xxx.sst` is a file that contains all the fields and the values asociated with it, it is passed to the internal RocksDB.
Now you can easily run the `serve-console` or `serve-http` examples with the name of the dump. (e.g. relaxed-colden).
Now you can easily run the `query-database` example to check what is stored in it.
```bash
cargo build --release --example serve-console
./target/release/examples/serve-console relaxed-colden
cargo run --release --example query-database -- test.mdb
```