MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-06-21 14:08:29 +02:00

Author	SHA1	Message	Date
meili-bors[bot]	e78da35287	Merge #4930 4930: Return `UserError::InvalidDocumentId` for primary keys with a length greater than 512 bytes r=curquiza a=flevi29 # Pull Request ## Related issue Fixes #4843 ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: F. Levi <55688616+flevi29@users.noreply.github.com>	2024-09-30 15:55:05 +00:00
meili-bors[bot]	462a2329f1	Merge #4941 4941: Implement the binary quantization in meilisearch r=irevoire a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4873 ## What does this PR do? - Add a settings for the binary quantization - Once enabled, the bq cannot be disabled TODO: - [ ] Missing a bunch of tests Co-authored-by: Tamo <tamo@meilisearch.com>	2024-09-19 15:50:24 +00:00
Tamo	f6483cf15d	apply review comment	2024-09-19 16:47:06 +02:00
meili-bors[bot]	bd34ed01d9	Merge #4945 4945: Add swedish in default pipelines r=dureuill a=ManyTheFish # Summary ## Fix Swedish support In Swedish the characters `å`/`ä`/`ö` are completely different than `a` or `o` and should not be normalized as the same character. because the Swedish specialized pipeline was not activated by default, these characters were normalized even with the settings: ```json { "localizedAttributes": [ { "locales": ["swe"], "attributePatterns": [""] } ] } ``` ## Update Charabia adding German support German segmentation will now be activated using the setting: ```json { "localizedAttributes": [ { "locales": ["deu"], "attributePatterns": [""] } ] } ``` # TODO - [x] Activate Swedish Pipeline - [x] Add a test to avoid future regressions - [x] Update Charabia Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-09-19 14:42:03 +00:00
Tamo	74199f328d	Make clippy happy	2024-09-19 16:27:34 +02:00
Tamo	1113c42de0	fix broken comments	2024-09-19 16:18:36 +02:00
ManyTheFish	7d6768e4c4	Add german tokenization pipeline	2024-09-19 16:09:01 +02:00
ManyTheFish	f77661ec44	Update Charabia v0.9.1	2024-09-19 16:08:59 +02:00
Tamo	b8fd85a46d	Get rids of useless collect before an iteration on the readers	2024-09-19 15:57:38 +02:00
Tamo	fd43c6c404	Improve the error message explaining you can't un-bq an embedder	2024-09-19 15:51:29 +02:00
Tamo	2564ec1496	Update milli/src/index.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-09-19 15:41:44 +02:00
Tamo	b6b73fe41c	Update milli/src/update/settings.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-09-19 15:41:14 +02:00
Tamo	6dde41cc46	stop using a local version of arroy and instead point to the git repo with the rev	2024-09-19 15:25:38 +02:00
Tamo	163f8023a1	remove debug println	2024-09-19 12:13:25 +02:00
Tamo	633537ccd7	fix updating documents without updating the settings	2024-09-19 12:00:58 +02:00
Tamo	3f6301dbc9	fix the missing embedder name in the error message when trying to disable the binary quantization	2024-09-19 12:00:58 +02:00
Tamo	2b6952eda1	rename the ArroyReader to an ArroyWrapper since it can read and write	2024-09-19 12:00:58 +02:00
Tamo	79f29eed3c	fix the tests and the arroy_readers method	2024-09-19 12:00:58 +02:00
Tamo	cc45e264ca	implement the binary quantization in meilisearch	2024-09-19 12:00:56 +02:00
meili-bors[bot]	5f474a640d	Merge #4938 4938: Remove default embedder r=ManyTheFish a=dureuill # Pull Request ## Related issue Fixes #4738 ## What does this PR do? [See public usage](https://meilisearch.notion.site/v1-11-AI-search-changes-0e37727193884a70999f254fa953ce6e#1044b06b651f80edb9d4ef6dc367bad0) - Remove `hybrid.embedder` boolean from analytics because embedder is now mandatory and so the boolean would always be `true` - Rework search kind so that a search without query but with vector is a vector search regardless of (non-zero) semantic ratio Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-09-19 09:17:14 +00:00
ManyTheFish	bbaee3dbc6	Add Swedish pipeline in all-tokenization feature	2024-09-19 08:34:51 +02:00
meili-bors[bot]	ff523a2357	Merge #4939 4939: Introduce the `STARTS WITH` filter operator r=irevoire a=Kerollmops This PR fixes #4872 by introducing the `STARTS WITH` filter operator and gating it under the _contains filter_ experimental feature along with the `CONTAINS` one. I also updated [the experimental feature discussion page](https://github.com/orgs/meilisearch/discussions/763). Co-authored-by: Clément Renault <clement@meilisearch.com>	2024-09-18 10:19:48 +00:00
Clément Renault	9f1fb4b425	Introduce the STARTS WITH filter operator gated under an experimental feature	2024-09-17 16:44:11 +02:00
Louis Dureuil	3c5e363554	Remove default embedders	2024-09-17 16:30:43 +02:00
F. Levi	e098cc8320	Make comparison simpler, add IndexUid error details similarly	2024-09-17 00:16:15 +03:00
F. Levi	dcb61f8b3a	Return error for primary keys with a length greater than 512 bytes	2024-09-14 11:34:13 +03:00
Louis Dureuil	23e14138bb	facet distribution: implement Display for OrderBy	2024-09-12 17:43:50 +02:00
Louis Dureuil	e44325683a	Facet distribution: fix issue where truncated facet distribution would have a wrong order	2024-09-12 17:43:49 +02:00
Louis Dureuil	f18e9cb7b3	Change openai default model	2024-09-09 13:09:35 +02:00
Louis Dureuil	ed19b7c3c3	Only reindex if the size increased	2024-09-03 12:07:59 +02:00
Louis Dureuil	1ac008926b	Add maxBytes parameter	2024-09-03 12:07:15 +02:00
Louis Dureuil	c49d892c82	Changes to prompt	2024-09-03 12:07:10 +02:00
Louis Dureuil	de962a26f3	New error type when maxBytes is null	2024-09-03 12:01:04 +02:00
Louis Dureuil	21296190a3	Reindex embedders	2024-09-02 13:00:53 +02:00
Louis Dureuil	4464d319af	Change default template to use the new facility	2024-09-02 11:30:59 +02:00
Louis Dureuil	580ea2f450	Pass the fields <-> ids map with metadata to render	2024-09-02 11:30:10 +02:00
Louis Dureuil	915cf4bae5	Add field.is_searchable property to fields	2024-09-02 11:28:53 +02:00
meili-bors[bot]	9a756cf2c5	Merge #4888 4888: bring back v1.10.0 into main r=Kerollmops a=ManyTheFish Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-27 14:02:08 +00:00
ManyTheFish	b12e997c8a	Add pinyin flag	2024-08-21 14:38:04 +02:00
ManyTheFish	8bf89ec394	Infer locales from index settings	2024-08-21 10:47:40 +02:00
meili-bors[bot]	ee62d9ce30	Merge #4845 4845: Fix perf regression facet strings r=ManyTheFish a=dureuill Benchmarks between v1.9 and v1.10 show a performance regression of about x2 (+3dB regression) for most indexing workloads (+44s for hackernews). [Benchmark interpretation in the engine weekly meeting](https://www.notion.so/meilisearch/Engine-weekly-4d49560d374c4a87b4e3d126a261d4a0?pvs=4#98a709683276450295fcfe1f8ea5cef3). - Initial investigation pointed to #4819 as the origin of the regression. - Further investigation points towards the hypernormalization of each facet value in `extract_facet_string_docids` - Most of the slowdown is in `normalize_facet_strings`, and precisely in `detection.language()`. This PR improves the situation (-10s compared with `main` for hackernews, so only +34s regression compared with `v1.9`) by skipping normalization when it can be skipped. I'm not sure how to fix the root cause though. Should we skip facet locale normalization for now? Cc `@ManyTheFish` --- Tentative resolution options: 1. remove locale normalization from facet. I'm not sure why this is required, I believe we weren't doing this before, so maybe we can stop doing that again. 2. don't do language detection when it can be helped: won't help with the regressions in benchmark, but maybe we can skip language detection when the locales contain only one language? 3. use a faster language detection library: `@Kerollmops` told me about https://github.com/quickwit-oss/whichlang which bolsters x10 to x100 throughput compared with whatlang. Should we consider replacing whatlang with whichlang? Now I understand whichlang supports fewer languages than whatlang, so I also suggest: 4. use whichlang when the list of locales is empty (autodetection), or when it only contains locales that whichlang can detect. If the list of locales contains locales that whichlang cannot detect, then use whatlang instead. --- > [!CAUTION] > this PR contains a commit that adds detailed spans, that were used to detect which part of `extract_facet_string_docids` was taking too much time. As this commit adds spans that are called too often and adds 7s overhead, it should be removed before landing. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-19 06:29:48 +00:00
ManyTheFish	0f965d3574	Remove hotloop's spans	2024-08-14 14:33:36 +02:00
ManyTheFish	ade54493ab	Only detect language for a facet if several locales have been specified by the user in the settings	2024-08-14 12:03:52 +02:00
Louis Dureuil	c3cdc407ec	Avoid unnecessary clone()	2024-08-08 14:57:02 +02:00
Louis Dureuil	2f10273d14	Group by normalized values, make sure you don't remove a value where there remains at still one value that normalizes towards it	2024-08-08 14:02:53 +02:00
Louis Dureuil	e3ef0ae19e	also intersect the universe for searchOnAttributes	2024-08-06 14:06:56 +02:00
meili-bors[bot]	57f7af77c7	Merge #4846 4846: Add OpenAI tests r=dureuill a=dureuill # Pull Request ## Related issue Part of fixing #4757 ## What does this PR do? - OpenAI embedder: don't pass apiKey when it is empty (slightly improves error messages) - rest embedder and rest-based embedders: specialize the authorization denied error message depending on the configuration source - fix existing tests - Adds assets containing prerecorded texts to embed and the embeddings obtained from OpenAI - Adds an asset containing a tokenized long document and the embedding obtained from OpenAI for this token - Uses the wiremock crate to mock the OpenAI API: parse the openai request, lookup the response in assets, craft an openai response Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-08-05 10:49:28 +00:00
Louis Dureuil	e64d0e0ca8	use insert instead of push for bitmaps	2024-08-01 18:32:45 +02:00
Louis Dureuil	9ef710cad4	Use wrapper that forces the desired date format	2024-07-31 17:12:19 +02:00
Louis Dureuil	5aa6cb3600	Specialize authorized error message depending on config source	2024-07-31 15:03:44 +02:00

1 2 3 4 5 ...

2491 Commits