2020-08-04 15:40:02 +02:00
|
|
|
<p align="center">
|
2020-11-05 11:41:31 +01:00
|
|
|
<img alt="the milli logo" src="http-ui/public/logo-black.svg">
|
2020-08-04 15:40:02 +02:00
|
|
|
</p>
|
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
<p align="center">a concurrent indexer combined with fast and relevant search algorithms</p>
|
2020-06-28 12:40:08 +02:00
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
This engine is a prototype, do not use it in production.
|
|
|
|
This is one of the most advanced search engine I have worked on.
|
|
|
|
It currently only supports the proximity criterion.
|
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
### Compile and Run the server
|
|
|
|
|
|
|
|
You can specify the number of threads to use to index documents and many other settings too.
|
2020-06-28 12:40:08 +02:00
|
|
|
|
|
|
|
```bash
|
2020-11-05 11:16:39 +01:00
|
|
|
cd http-ui
|
2020-11-02 18:06:10 +01:00
|
|
|
cargo run --release -- serve --db my-database.mdb -vvv --indexing-jobs 8
|
2020-06-28 12:40:08 +02:00
|
|
|
```
|
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
### Index your documents
|
2020-06-28 12:40:08 +02:00
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
It can index a massive amount of documents in not much time, I already achieved to index:
|
|
|
|
- 115m songs (song and artist name) in ~1h and take 107GB on disk.
|
|
|
|
- 12m cities (name, timezone and country ID) in 15min and take 10GB on disk.
|
2020-06-28 12:40:08 +02:00
|
|
|
|
|
|
|
All of that on a 39$/month machine with 4cores.
|
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
You can feed the engine with your CSV (comma-seperated, yes) data like this:
|
2020-06-28 12:40:08 +02:00
|
|
|
|
|
|
|
```bash
|
2021-03-08 18:56:22 +01:00
|
|
|
echo "name,age\nhello,32\nkiki,24\n" | http POST 127.0.0.1:9700/documents content-type:text/csv
|
2020-06-28 12:40:08 +02:00
|
|
|
```
|
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
Here ids will be automatically generated as UUID v4 if they doesn't exist in some or every documents.
|
2020-06-28 12:40:08 +02:00
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
Note that it also support JSON and JSON streaming, you can send them to the engine by using
|
|
|
|
the `content-type:application/json` and `content-type:application/x-ndjson` headers respectively.
|
2020-06-28 12:40:08 +02:00
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
### Querying the engine via the website
|
2020-06-28 12:40:08 +02:00
|
|
|
|
2020-11-02 18:06:10 +01:00
|
|
|
You can query the engine by going to [the HTML page itself](http://127.0.0.1:9700).
|