Go to file
2018-12-06 17:23:01 +01:00
examples chore: Improve the gitignore 2018-12-02 12:43:48 +01:00
misc doc: Add a deep dive in Pentium 2018-10-21 18:21:04 +02:00
src chore: Remove the MapSerializer temporarily 2018-12-06 17:23:01 +01:00
.gitignore chore: Improve the gitignore 2018-12-02 12:43:48 +01:00
Cargo.toml feat: Add the "i128" feature flag to allow some i128 support 2018-12-06 17:22:26 +01:00
deep-dive.md doc: Add a deep dive in Pentium 2018-10-21 18:21:04 +02:00
LICENSE Initial commit 2018-05-05 10:16:18 +02:00
README.md doc: Add a deep dive in Pentium 2018-10-21 18:21:04 +02:00

pentium

A search engine based on the blog posts serie of the great Algolia company.

If you want to be involved in the project you can read the deep dive.

This is a library, this means that binary are not part of this repository but since I'm still nice I have made some examples for you in the examples/ folder.

Performances

We made some tests on remote machines and found that we can handle, on a server that cost 5$/month with 1vCPU and 1GB of ram and on the same index and with a simple query:

  • near 190 users with an average response time of 90ms
  • 150 users with an average response time of 70ms
  • 100 users with an average response time of 45ms

Network is mesured, servers are located in amsterdam and tests are made between two different datacenters.

Usage and examples

Pentium work with an index like most of the search engines. So to test the library you can create one by indexing a simple csv file.

cargo build --release --example csv-indexer
time ./target/release/examples/csv-indexer --stop-words misc/en.stopwords.txt misc/kaggle.csv

The en.stopwords.txt file here is a simple file that contains one stop word by line (e.g. or, and).

Once the command finished indexing you will have 3 files that compose the index:

  • The xxx.map represent the fst map.
  • The xxx.idx represent the doc indexes matching the words in the map.
  • The xxx.sst is a file that contains all the fields and the values asociated with it, it is passed to the internal RocksDB.

Now you can easily run the serve-console or serve-http examples with the name of the dump. (e.g. relaxed-colden).

cargo build --release --example serve-console
./target/release/examples/serve-console relaxed-colden