MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-07-04 12:27:13 +02:00

No description

Find a file

meili-bors[bot] 8084cf29f3 Merge #3946 3946: Settings customizing tokenization r=irevoire a=ManyTheFish # Pull Request This pull Request allows the User to customize Meilisearch Tokenization by providing specialized settings. ## Small documentation All the new settings can be set and reset like the other index settings by calling the route `/indexes/:name/settings` ### `nonSeparatorTokens` The Meilisearch word segmentation uses a default list of separators to segment words, however, for specific use cases some of the default separators shouldn't be considered separators, the `nonSeparatorTokens` setting allows to remove of some tokens from the default list of separators. *Request payload `PUT`- `/indexes/articles/settings/non-separator-tokens`* ```json ["`@",` "#", "&"] ``` ### `separatorTokens` Some use cases need to define additional separators, some are related to a specific way of parsing technical documents some others are related to encodings in documents, the `separatorTokens` setting allows adding some tokens to the list of separators. *Request payload `PUT`- `/indexes/articles/settings/separator-tokens`* ```json ["§", "&sep"] ``` ### `dictionary` The Meilisearch word segmentation relies on separators and language-based word-dictionaries to segment words, however, this segmentation is inaccurate on technical or use-case specific vocabulary (like `G/Box` to say `Gear Box`), or on proper nouns (like `J. R. R.` when parsing `J. R. R. Tolkien`), the `dictionary` setting allows defining a list of words that would be segmented as described in the list. *Request payload `PUT`- `/indexes/articles/settings/dictionary`* ```json ["J. R. R.", "J.R.R."] ``` these last feature synergies well with the `stopWords` setting or the `synonyms` setting allowing to segment words and correctly retrieve the synonyms: *Request payload `PATCH`- `/indexes/articles/settings`* ```json { "dictionary": ["J. R. R.", "J.R.R."], "synonyms": { "J.R.R.": ["jrr", "J. R. R."], "J. R. R.": ["jrr", "J.R.R."], "jrr": ["J.R.R.", "J. R. R."], } } ``` ### Related specifications: - https://github.com/meilisearch/specifications/pull/255 - https://github.com/meilisearch/specifications/pull/254 ### Try it with Docker ```bash $ docker pull getmeili/meilisearch:prototype-tokenizer-customization-3 ``` ## Related issue Fixes #3610 Fixes #3917 Fixes https://github.com/meilisearch/product/discussions/468 Fixes https://github.com/meilisearch/product/discussions/160 Fixes https://github.com/meilisearch/product/discussions/260 Fixes https://github.com/meilisearch/product/discussions/381 Fixes https://github.com/meilisearch/product/discussions/131 Related to https://github.com/meilisearch/meilisearch/issues/2879 Fixes #2760 ## What does this PR do? - Add a setting `nonSeparatorTokens` allowing to remove a token from the default separator tokens - Add a setting `separatorTokens` allowing to add a token in the separator tokens - Add a setting `dictionary` allowing to override the segmentation on specific words - add new error code `invalid_settings_non_separator_tokens` (invalid_request) - add new error code `invalid_settings_separator_tokens` (invalid_request) - add new error code `invalid_settings_dictionary` (invalid_request) Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Many the fish <many@meilisearch.com>		2023-08-10 10:01:18 +00:00
.github	Improve test suite CI for workflow_dispatch event	2023-08-09 16:47:28 +02:00
assets	Introduce a PROFILING.md tutorial to profile Meilisearch	2023-07-18 17:38:13 +02:00
benchmarks	Merge branch 'main' into tmp-release-v1.3.0	2023-08-01 15:05:17 +02:00
dump	Merge branch 'main' into settings-customizing-tokenization	2023-08-08 16:08:16 +02:00
file-store	Upgrade the compatible versions of the dependencies	2023-04-24 17:50:52 +02:00
filter-parser	Make clippy happy (again)	2023-07-25 10:30:50 +02:00
flatten-serde-json	Update criterion to 0.5.1 to remove the atty dependency	2023-07-03 18:51:42 +02:00
fuzzers	Stop the fuzzer after an hour	2023-06-12 15:30:51 +02:00
index-scheduler	Merge branch 'main' into tmp-release-v1.3.0	2023-08-01 15:05:17 +02:00
json-depth-checker	Update criterion to 0.5.1 to remove the atty dependency	2023-07-03 18:51:42 +02:00
meili-snap	Make clippy happy (again)	2023-07-25 10:30:50 +02:00
meilisearch	Fix clippy	2023-08-10 11:27:56 +02:00
meilisearch-auth	Merge #3811	2023-06-06 13:10:24 +00:00
meilisearch-types	Merge branch 'main' into settings-customizing-tokenization	2023-08-08 16:08:16 +02:00
milli	Fix clippy	2023-08-10 11:27:56 +02:00
permissive-json-pointer	Use the workspace inheritance feature of rust 1.64	2023-02-15 13:51:07 +01:00
.dockerignore	Revert "Improve docker cache"	2023-05-25 11:48:26 +02:00
.gitignore	edit gitignore to ignore .idea and .vscode folders	2023-02-10 11:42:19 +04:00
.rustfmt.toml	Introduce a rustfmt file	2022-10-27 11:35:05 +02:00
bors.toml	Remove macos-latest and windows-latest usages	2022-12-20 11:10:09 +01:00
Cargo.lock	Merge branch 'main' into tmp-release-v1.3.0	2023-08-01 15:05:17 +02:00
Cargo.toml	Update version for the next release (v1.3.0) in Cargo.toml	2023-07-03 08:32:21 +00:00
CODE_OF_CONDUCT.md	Create CODE_OF_CONDUCT.md	2020-04-30 20:16:02 +02:00
config.toml	Merge branch 'main' into tmp-release-v1.2.0	2023-06-05 18:36:28 +02:00
CONTRIBUTING.md	Update links of the docs	2023-05-03 19:14:57 +02:00
Cross.toml	Cross build with action-rs	2021-10-10 02:21:30 +08:00
Dockerfile	Revert "Improve docker cache"	2023-05-25 11:48:26 +02:00
download-latest.sh	Update links of the docs	2023-05-03 19:14:57 +02:00
LICENSE	Update LICENSE	2022-02-15 15:54:45 +01:00
PROFILING.md	Introduce a PROFILING.md tutorial to profile Meilisearch	2023-07-18 17:38:13 +02:00
README.md	Fix README after git conflict	2023-08-01 16:06:33 +02:00
SECURITY.md	docs(security): Fix `Supported`	2022-05-31 14:21:34 -05:00

README.md

Website | Roadmap | Meilisearch Cloud | Blog | Documentation | FAQ | Discord

⚡ A lightning-fast search engine that fits effortlessly into your apps, websites, and workflow 🔍

Meilisearch helps you shape a delightful search experience in a snap, offering features that work out-of-the-box to speed up your workflow.

🔥 Try it! 🔥

✨ Features

Search-as-you-type: find search results in less than 50 milliseconds
Typo tolerance: get relevant matches even when queries contain typos and misspellings
Filtering and faceted search: enhance your user's search experience with custom filters and build a faceted search interface in a few lines of code
Sorting: sort results based on price, date, or pretty much anything else your users need
Synonym support: configure synonyms to include more relevant content in your search results
Geosearch: filter and sort documents based on geographic data
Extensive language support: search datasets in any language, with optimized support for Chinese, Japanese, Hebrew, and languages using the Latin alphabet
Security management: control which users can access what data with API keys that allow fine-grained permissions handling
Multi-Tenancy: personalize search results for any number of application tenants
Highly Customizable: customize Meilisearch to your specific needs or use our out-of-the-box and hassle-free presets
RESTful API: integrate Meilisearch in your technical stack with our plugins and SDKs
Easy to install, deploy, and maintain

📖 Documentation

You can consult Meilisearch's documentation at https://www.meilisearch.com/docs.

🚀 Getting started

For basic instructions on how to set up Meilisearch, add documents to an index, and search for documents, take a look at our Quick Start guide.

You may also want to check out Meilisearch 101 for an introduction to some of Meilisearch's most popular features.

⚡ Supercharge your Meilisearch experience

Say goodbye to server deployment and manual updates with Meilisearch Cloud. No credit card required.

🧰 SDKs & integration tools

Install one of our SDKs in your project for seamless integration between Meilisearch and your favorite language or framework!

Take a look at the complete Meilisearch integration list.

⚙️ Advanced usage

Experienced users will want to keep our API Reference close at hand.

We also offer a wide range of dedicated guides to all Meilisearch features, such as filtering, sorting, geosearch, API keys, and tenant tokens.

Finally, for more in-depth information, refer to our articles explaining fundamental Meilisearch concepts such as documents and indexes.

📊 Telemetry

Meilisearch collects anonymized data from users to help us improve our product. You can deactivate this whenever you want.

To request deletion of collected data, please write to us at privacy@meilisearch.com. Don't forget to include your Instance UID in the message, as this helps us quickly find and delete your data.

If you want to know more about the kind of data we collect and what we use it for, check the telemetry section of our documentation.

📫 Get in touch!

Meilisearch is a search engine created by Meili, a software development company based in France and with team members all over the world. Want to know more about us? Check out our blog!

🗞 Subscribe to our newsletter if you don't want to miss any updates! We promise we won't clutter your mailbox: we only send one edition every two months.

💌 Want to make a suggestion or give feedback? Here are some of the channels where you can reach us:

For feature requests, please visit our product repository
Found a bug? Open an issue!
Want to be part of our Discord community? Join us!

Thank you for your support!

👩‍💻 Contributing

Meilisearch is, and will always be, open-source! If you want to contribute to the project, please take a look at our contribution guidelines.

📦 Versioning

Meilisearch releases and their associated binaries are available in this GitHub page.

The binaries are versioned following SemVer conventions. To know more, read our versioning policy.

Differently from the binaries, crates in this repository are not currently available on crates.io and do not follow SemVer conventions.