2e49d6aec1
3768: Fix bugs in graph-based ranking rules + make `words` a graph-based ranking rule r=dureuill a=loiclec This PR contains three changes: ## 1. Don't call the `words` ranking rule if the term matching strategy is `All` This is because the purpose of `words` is only to remove nodes from the query graph. It would never do any useful work when the matching strategy was `All`. Remember that the universe was already computed before by computing all the docids corresponding to the "maximally reduced" query graph, which, in the case of `All`, is equal to the original graph. ## 2. The `words` ranking rule is replaced by a graph-based ranking rule. This is for three reasons: 1. **performance**: graph-based ranking rules benefit from a lot of optimisations by default, which ensures that they are never too slow. The previous implementation of `words` could call `compute_query_graph_docids` many times if some words had to be removed from the query, which would be quite expensive. I was especially worried about its performance in cases where it is placed right after the `sort` ranking rule. Furthermore, `compute_query_graph_docids` would clone a lot of bitmaps many times unnecessarily. 2. **consistency**: every other ranking rule (except `sort`) is graph-based. It makes sense to implement `words` like that as well. It will automatically benefit from all the features, optimisations, and bug fixes that all the other ranking rules get. 3. **surfacing bugs**: as the first ranking rule to be called (most of the time), I'd like `words` to behave the same as the other ranking rules so that we can quickly detect bugs in our graph algorithms. This actually already happened, which is why this PR also contains a bug fix. ## 3. Fix the `update_all_costs_before_nodes` function It is a bit difficult to explain what was wrong, but I'll try. The bug happened when we had graphs like: <img width="730" alt="Screenshot 2023-05-16 at 10 58 57" src="https://github.com/meilisearch/meilisearch/assets/6040237/40db1a68-d852-4e89-99d5-0d65757242a7"> and we gave the node `is` as argument. Then, we'd walk backwards from the node breadth-first. We'd update the costs of: 1. `sun` 2. `thesun` 3. `start` 4. `the` which is an incorrect order. The correct order is: 1. `sun` 2. `thesun` 3. `the` 4. `start` That is, we can only update the cost of a node when all of its successors have either already been visited or were not affected by the update to the node passed as argument. To solve this bug, I factored out the graph-traversal logic into a `traverse_breadth_first_backward` function. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com> |
||
---|---|---|
.github | ||
assets | ||
benchmarks | ||
dump | ||
file-store | ||
filter-parser | ||
flatten-serde-json | ||
grafana-dashboards | ||
index-scheduler | ||
json-depth-checker | ||
meili-snap | ||
meilisearch | ||
meilisearch-auth | ||
meilisearch-types | ||
milli | ||
permissive-json-pointer | ||
.dockerignore | ||
.gitignore | ||
.rustfmt.toml | ||
bors.toml | ||
Cargo.lock | ||
Cargo.toml | ||
CODE_OF_CONDUCT.md | ||
config.toml | ||
CONTRIBUTING.md | ||
Cross.toml | ||
Dockerfile | ||
download-latest.sh | ||
LICENSE | ||
README.md | ||
SECURITY.md |
Website | Roadmap | Blog | Documentation | FAQ | Discord
⚡ A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow 🔍
Meilisearch helps you shape a delightful search experience in a snap, offering features that work out-of-the-box to speed up your workflow.
🔥 Try it! 🔥
✨ Features
- Search-as-you-type: find search results in less than 50 milliseconds
- Typo tolerance: get relevant matches even when queries contain typos and misspellings
- Filtering and faceted search: enhance your user's search experience with custom filters and build a faceted search interface in a few lines of code
- Sorting: sort results based on price, date, or pretty much anything else your users need
- Synonym support: configure synonyms to include more relevant content in your search results
- Geosearch: filter and sort documents based on geographic data
- Extensive language support: search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
- Security management: control which users can access what data with API keys that allow fine-grained permissions handling
- Multi-Tenancy: personalize search results for any number of application tenants
- Highly Customizable: customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
- RESTful API: integrate Meilisearch in your technical stack with our plugins and SDKs
- Easy to install, deploy, and maintain
📖 Documentation
You can consult Meilisearch's documentation at https://www.meilisearch.com/docs.
🚀 Getting started
For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our Quick Start guide.
You may also want to check out Meilisearch 101 for an introduction to some of Meilisearch's most popular features.
☁️ Meilisearch cloud
Let us manage your infrastructure so you can focus on integrating a great search experience. Try Meilisearch Cloud today.
🧰 SDKs & integration tools
Install one of our SDKs in your project for seamless integration between Meilisearch and your favorite language or framework!
Take a look at the complete Meilisearch integration list.
⚙️ Advanced usage
Experienced users will want to keep our API Reference close at hand.
We also offer a wide range of dedicated guides to all Meilisearch features, such as filtering, sorting, geosearch, API keys, and tenant tokens.
Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as documents and indexes.
📊 Telemetry
Meilisearch collects anonymized data from users to help us improve our product. You can deactivate this whenever you want.
To request deletion of collected data, please write to us at privacy@meilisearch.com. Don't forget to include your Instance UID
in the message, as this helps us quickly find and delete your data.
If you want to know more about the kind of data we collect and what we use it for, check the telemetry section of our documentation.
📫 Get in touch!
Meilisearch is a search engine created by Meili, a software development company based in France and with team members all over the world. Want to know more about us? Check out our blog!
🗞 Subscribe to our newsletter if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.
💌 Want to make a suggestion or give feedback? Here are some of the channels where you can reach us:
- For feature requests, please visit our product repository
- Found a bug? Open an issue!
- Want to be part of our Discord community? Join us!
Thank you for your support!
👩💻 Contributing
Meilisearch is, and will always be, open-source! If you want to contribute to the project, please take a look at our contribution guidelines.
📦 Versioning
Meilisearch releases and their associated binaries are available in this GitHub page.
The binaries are versioned following SemVer conventions. To know more, read our versioning policy.
Differently from the binaries, crates in this repository are not currently available on crates.io and do not follow SemVer conventions.