MeiliSearch/README.md

<p align="center">
  <img alt="the milli logo" src="http-ui/public/logo-black.svg">
</p>

<p align="center">a concurrent indexer combined with fast and relevant search algorithms</p>

## Introduction

This repository contains the core engine used in [Meilisearch].

It contains a library that can manage one and only one index. Meilisearch
manages the multi-index itself. Milli is unable to store updates in a store:
it is the job of something else above and this is why it is only able
to process one update at a time.

This repository contains crates to quickly debug the engine:
 - There are benchmarks located in the `benchmarks` crate.
 - The `http-ui` crate is a simple HTTP dashboard to tests the features like for real!
 - The `infos` crate is used to dump the internal data-structure and ensure correctness.
 - The `search` crate is a simple command-line that helps run [flamegraph] on top of it.
 - The `helpers` crate is only used to modify the database inplace, sometimes.

### Compile and run the HTTP debug server

You can specify the number of threads to use to index documents and many other settings too.

```bash
cd http-ui
cargo run --release -- --db my-database.mdb -vvv --indexing-jobs 8
```

### Index your documents

It can index a massive amount of documents in not much time, I already achieved to index:
 - 115m songs (song and artist name) in \~48min and take 81GiB on disk.
 - 12m cities (name, timezone and country ID) in \~4min and take 6GiB on disk.

These metrics are done on a MacBook Pro with the M1 processor.

You can feed the engine with your CSV (comma-separated, yes) data like this:

```bash
printf "id,name,age\n1,hello,32\n2,kiki,24\n" | http POST 127.0.0.1:9700/documents content-type:text/csv
```

Don't forget to specify the `id` of the documents. Also, note that it supports JSON and JSON
streaming: you can send them to the engine by using the `content-type:application/json` and
`content-type:application/x-ndjson` headers respectively.

### Querying the engine via the website

You can query the engine by going to [the HTML page itself](http://127.0.0.1:9700).

## Contributing

You can setup a `git-hook` to stop you from making a commit too fast. It'll stop you if:
- Any of the workspaces does not build
- Your code is not well-formatted

These two things are also checked in the CI, so ignoring the hook won't help you merge your code.
But if you need to, you can still add `--no-verify` when creating your commit to ignore the hook.

To enable the hook, run the following command from the root of the project:
```
cp script/pre-commit .git/hooks/pre-commit
```

[Meilisearch]: https://github.com/meilisearch/meilisearch
[flamegraph]: https://github.com/flamegraph-rs/flamegraph
Display the milli logo and update the description 2020-08-04 15:40:02 +02:00			`<p align="center">`
Fix the milli logo in the README 2020-11-05 11:41:31 +01:00			`<img alt="the milli logo" src="http-ui/public/logo-black.svg">`
Display the milli logo and update the description 2020-08-04 15:40:02 +02:00			`</p>`

Update the README to be up to date with the recent updates 2020-11-02 18:06:10 +01:00			`<p align="center">a concurrent indexer combined with fast and relevant search algorithms</p>`
Update the README 2020-06-28 12:40:08 +02:00
			`## Introduction`

Replace meilisearch by Meilisearch 2022-01-26 17:47:26 +01:00			`This repository contains the core engine used in [Meilisearch].`
Update the README 2020-06-28 12:40:08 +02:00
Replace meilisearch by Meilisearch 2022-01-26 17:47:26 +01:00			`It contains a library that can manage one and only one index. Meilisearch`
Modify the README file 2021-08-17 16:49:17 +02:00			`manages the multi-index itself. Milli is unable to store updates in a store:`
			`it is the job of something else above and this is why it is only able`
			`to process one update at a time.`

			`This repository contains crates to quickly debug the engine:`
			- There are benchmarks located in the `benchmarks` crate.
			- The `http-ui` crate is a simple HTTP dashboard to tests the features like for real!
			- The `infos` crate is used to dump the internal data-structure and ensure correctness.
			- The `search` crate is a simple command-line that helps run [flamegraph] on top of it.
			- The `helpers` crate is only used to modify the database inplace, sometimes.

			`### Compile and run the HTTP debug server`
Update the README to be up to date with the recent updates 2020-11-02 18:06:10 +01:00
			`You can specify the number of threads to use to index documents and many other settings too.`
Update the README 2020-06-28 12:40:08 +02:00
			```bash
Move the http server into its own sub-module 2020-11-05 11:16:39 +01:00			`cd http-ui`
typo: wrong command in example 2021-04-16 20:08:43 +02:00			`cargo run --release -- --db my-database.mdb -vvv --indexing-jobs 8`
Update the README 2020-06-28 12:40:08 +02:00			```

Update the README to be up to date with the recent updates 2020-11-02 18:06:10 +01:00			`### Index your documents`
Update the README 2020-06-28 12:40:08 +02:00
Update the README to be up to date with the recent updates 2020-11-02 18:06:10 +01:00			`It can index a massive amount of documents in not much time, I already achieved to index:`
Update the indexing timings in the README 2021-09-13 16:06:45 +02:00			`- 115m songs (song and artist name) in \~48min and take 81GiB on disk.`
			`- 12m cities (name, timezone and country ID) in \~4min and take 6GiB on disk.`
Update the README 2020-06-28 12:40:08 +02:00
Update the indexing timings in the README 2021-09-13 16:06:45 +02:00			`These metrics are done on a MacBook Pro with the M1 processor.`
Update the README 2020-06-28 12:40:08 +02:00
fix typo in repo 2021-10-18 04:00:19 +01:00			`You can feed the engine with your CSV (comma-separated, yes) data like this:`
Update the README 2020-06-28 12:40:08 +02:00
			```bash
Inform the users that documents must have an id in there documents 2021-09-13 14:00:56 +02:00			`printf "id,name,age\n1,hello,32\n2,kiki,24\n" \| http POST 127.0.0.1:9700/documents content-type:text/csv`
Update the README 2020-06-28 12:40:08 +02:00			```

Update the indexing timings in the README 2021-09-13 16:06:45 +02:00			Don't forget to specify the `id` of the documents. Also, note that it supports JSON and JSON
			streaming: you can send them to the engine by using the `content-type:application/json` and
			`content-type:application/x-ndjson` headers respectively.
Update the README 2020-06-28 12:40:08 +02:00
Update the README to be up to date with the recent updates 2020-11-02 18:06:10 +01:00			`### Querying the engine via the website`
Update the README 2020-06-28 12:40:08 +02:00
Update the README to be up to date with the recent updates 2020-11-02 18:06:10 +01:00			`You can query the engine by going to [the HTML page itself](http://127.0.0.1:9700).`
format the whole project 2021-06-16 18:33:33 +02:00
			`## Contributing`

			You can setup a `git-hook` to stop you from making a commit too fast. It'll stop you if:
			`- Any of the workspaces does not build`
			`- Your code is not well-formatted`

			`These two things are also checked in the CI, so ignoring the hook won't help you merge your code.`
			But if you need to, you can still add `--no-verify` when creating your commit to ignore the hook.

			`To enable the hook, run the following command from the root of the project:`
			```
			`cp script/pre-commit .git/hooks/pre-commit`
			```
Modify the README file 2021-08-17 16:49:17 +02:00
Replace meilisearch by Meilisearch 2022-01-26 17:47:26 +01:00			`[Meilisearch]: https://github.com/meilisearch/meilisearch`
Modify the README file 2021-08-17 16:49:17 +02:00			`[flamegraph]: https://github.com/flamegraph-rs/flamegraph`