MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-07-04 20:37:15 +02:00

No description

Find a file

bors[bot] 758b4acea7 Merge #776 776: Reduce incremental indexing time of `words_prefix_position_docids` DB r=curquiza a=loiclec Fixes partially https://github.com/meilisearch/milli/issues/605 The `words_prefix_position_docids` can easily contain millions of entries. Thus, iterating over it can be very expensive. But we do so needlessly for every document addition tasks. It can sometimes cause indexing performance issues when : - a user sends many `documentAdditionOrUpdate` tasks that cannot be all batched together (for example if they are interspersed with `documentDeletion` tasks) - the documents contain long, diverse text fields, thus increasing the number of entries in `words_prefix_position_docids` - the index has accumulated many soft-deleted documents, further increasing the size of `words_prefix_position_docids` - the machine running Meilisearch does not have great IO performance (e.g. slow SSD, or quota-limited by the cloud provider) Note, before approving the PR: the only changed file should be `milli/src/update/words_prefix_position_docids.rs`. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>		2023-01-31 15:52:28 +00:00
.github	clippy: allow uninlined_format_args	2023-01-31 11:13:47 +01:00
assets	chore: move logo to (new) assets folder	2022-10-04 12:20:24 +02:00
benchmarks	Update version for the next release (v0.41.1) in Cargo.toml files	2023-01-31 09:56:22 +00:00
cli	Update version for the next release (v0.41.1) in Cargo.toml files	2023-01-31 09:56:22 +00:00
filter-parser	Update version for the next release (v0.41.1) in Cargo.toml files	2023-01-31 09:56:22 +00:00
flatten-serde-json	Update version for the next release (v0.41.1) in Cargo.toml files	2023-01-31 09:56:22 +00:00
json-depth-checker	Update version for the next release (v0.41.1) in Cargo.toml files	2023-01-31 09:56:22 +00:00
milli	Merge #776	2023-01-31 15:52:28 +00:00
script	format the whole project	2021-06-16 18:33:33 +02:00
.gitignore	Ignore files generated by fuzzcheck	2022-10-26 13:47:46 +02:00
.rustfmt.toml	format the whole project	2021-06-16 18:33:33 +02:00
bors.toml	Add clippy job	2022-11-04 08:58:12 +09:00
Cargo.toml	Optimize a few performance sensitive dependencies on debug builds	2022-10-12 09:22:05 +02:00
CONTRIBUTING.md	add a sentence about installing rust-nightly	2022-12-07 12:31:43 +01:00
LICENSE	Update LICENSE	2022-02-15 15:52:50 +01:00
README.md	Update README.md	2023-01-24 15:58:41 +01:00

README.md

a concurrent indexer combined with fast and relevant search algorithms

DO NOT CONTRIBUTE TO THIS REPOSITORY ANYMORE. IT WILL BE ARCHIVED SOON. ONLY THE MEILISEARCH TEAM IS ALLOWED TO CONTRIBUTE.

The content of this repository is now available in the Meilisearch repository in the workspace milli.

Introduction

This repository contains the core engine used in Meilisearch.

It contains a library that can manage one and only one index. Meilisearch manages the multi-index itself. Milli is unable to store updates in a store: it is the job of something else above and this is why it is only able to process one update at a time.

This repository contains crates to quickly debug the engine:

There are benchmarks located in the benchmarks crate.
The cli crate is a simple command-line interface that helps run flamegraph on top of it.
The filter-parser crate contains the parser for the Meilisearch filter syntax.
The flatten-serde-json crate contains the library that flattens serde-json Value objects like Elasticsearch does.
The json-depth-checker crate is used to indicate if a JSON must be flattened.

How to use it?

Milli is a library that does search things, it must be embedded in a program. You can compute the documentation of it by using cargo doc --open.

Here is an example usage of the library where we insert documents into the engine and search for one of them right after.

let path = tempfile::tempdir().unwrap();
let mut options = EnvOpenOptions::new();
options.map_size(10 * 1024 * 1024); // 10 MB
let index = Index::new(options, &path).unwrap();

let mut wtxn = index.write_txn().unwrap();
let content = documents!([
    {
        "id": 2,
        "title": "Prideand Prejudice",
        "author": "Jane Austin",
        "genre": "romance",
        "price$": "3.5$",
    },
    {
        "id": 456,
        "title": "Le Petit Prince",
        "author": "Antoine de Saint-Exupéry",
        "genre": "adventure",
        "price$": "10.0$",
    },
    {
        "id": 1,
        "title": "Wonderland",
        "author": "Lewis Carroll",
        "genre": "fantasy",
        "price$": "25.99$",
    },
    {
        "id": 4,
        "title": "Harry Potter ing fantasy\0lood Prince",
        "author": "J. K. Rowling",
        "genre": "fantasy\0",
    },
]);

let config = IndexerConfig::default();
let indexing_config = IndexDocumentsConfig::default();
let mut builder =
    IndexDocuments::new(&mut wtxn, &index, &config, indexing_config.clone(), |_| ())
        .unwrap();
builder.add_documents(content).unwrap();
builder.execute().unwrap();
wtxn.commit().unwrap();


// You can search in the index now!
let mut rtxn = index.read_txn().unwrap();
let mut search = Search::new(&rtxn, &index);
search.query("horry");
search.limit(10);

let result = search.execute().unwrap();
assert_eq!(result.documents_ids.len(), 1);

Contributing

We're glad you're thinking about contributing to this repository! Feel free to pick an issue, and to ask any question you need. Some points might not be clear and we are available to help you!

Also, we recommend following the CONTRIBUTING.md to create your PR.