MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-06-26 16:38:30 +02:00

Author	SHA1	Message	Date
Clément Renault	98e48371c3	Factorize some stuff	2024-09-04 12:17:13 +02:00
Clément Renault	6d74fb0229	Introduce the WordFidWordDocids database	2024-09-04 11:40:55 +02:00
ManyTheFish	1eb75a1040	remove milli/src/update/new/extract/tokenize_document.rs	2024-09-04 11:40:26 +02:00
Clément Renault	3b82d8b5b9	Fix the cache to serialize entries correctly	2024-09-04 10:55:36 +02:00
ManyTheFish	781a186f75	remove milli/src/update/new/extract/extract_word_docids.rs	2024-09-04 10:28:31 +02:00
ManyTheFish	6a399556b5	Implement more searchable extractor	2024-09-04 10:20:18 +02:00
Clément Renault	27b4cab857	Extract and write the documents and words fst in the database	2024-09-04 09:59:19 +02:00
Clément Renault	52d32b4ee9	Move the channel sender in the closure to stop the merger thread	2024-09-03 16:08:33 +02:00
ManyTheFish	da61408e52	Remove unimplemented from document changes	2024-09-03 15:14:16 +02:00
ManyTheFish	fe69385bd7	Fix tokenizer test	2024-09-03 14:24:37 +02:00
Clément Renault	c1557734dc	Use the GlobalFieldsIdsMap everywhere and write it to disk Co-authored-by: Dureuill <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-09-03 12:01:01 +02:00
ManyTheFish	c50d3edc4a	Integrate first searchable exctrator	2024-09-03 11:02:39 +02:00
Clément Renault	5369bf4a62	Change some lifetimes	2024-09-02 19:51:22 +02:00
Clément Renault	bcb1aa3d22	Find a temporary solution to par into iter on an HashMap Spoiler: Do not use an HashMap but drain it into a Vec	2024-09-02 19:39:48 +02:00
Clément Renault	9b7858fb90	Expose the new indexer	2024-09-02 15:21:59 +02:00
Clément Renault	ab01679a8f	Remove the useless option from the document changes	2024-09-02 15:21:00 +02:00
Clément Renault	521775f788	I push for Many	2024-09-02 15:10:21 +02:00
Clément Renault	72e7b7846e	Renaming the indexers	2024-09-02 14:42:27 +02:00
Clément Renault	6526ce1208	Fix the merging of documents	2024-09-02 14:41:20 +02:00
Clément Renault	e639ec79d1	Move the indexers into their own modules	2024-09-02 10:42:19 +02:00
Clément Renault	bb885a5810	Fix the merge for roaring bitmap	2024-09-01 23:20:19 +02:00
Clément Renault	b625d31c7d	Introduce the PartialDumpIndexer indexer that generates document ids in parallel	2024-08-30 15:07:21 +02:00
Clément Renault	6487a67f2b	Introduce the ConcurrentAvailableIds struct and rename the other to AvailableIds	2024-08-30 15:06:50 +02:00
Clément Renault	271ce91b3b	Add the rayon Threadpool to the index function parameter	2024-08-30 14:34:24 +02:00
Clément Renault	54f2eb4507	Remove duplication of grenad merger	2024-08-30 14:34:05 +02:00
Clément Renault	794ebcd582	Replace grenad with the new grenad various-improvement branch	2024-08-30 11:53:59 +02:00
Clément Renault	b7c77c7a39	Use the latest version of the obkv crate	2024-08-30 11:53:59 +02:00
Clément Renault	0c57cf7565	Replace obkv with the temporary new version of it	2024-08-30 11:53:58 +02:00
Clément Renault	27df9e6c73	Introduce the indexer::index function that runs the indexation	2024-08-30 11:53:58 +02:00
Clément Renault	45c060831e	Introduce typed channels and the merger loop	2024-08-30 11:53:58 +02:00
Clément Renault	874c1ac538	First channels types	2024-08-30 11:53:58 +02:00
Clément Renault	e6ffa4d454	Implement the document merge function for the replace method	2024-08-30 11:53:58 +02:00
Clément Renault	637a9c8bdd	Implement the document merge function for the update method	2024-08-30 11:53:58 +02:00
Louis Dureuil	c683fa98e6	WIP Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-30 11:53:57 +02:00
meili-bors[bot]	9a756cf2c5	Merge #4888 4888: bring back v1.10.0 into main r=Kerollmops a=ManyTheFish Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-27 14:02:08 +00:00
meili-bors[bot]	36d8684dc8	Merge #4881 4881: Infer locales from index settings r=curquiza a=ManyTheFish # Pull Request ## Related issue Fixes #4828 Fixes #4816 ## What does this PR do? - Add some test using `AttributesToSearchOn` - Make the search infer the language based on the index settings when the `locales` filed is not precise CI is now working: https://github.com/meilisearch/meilisearch/actions/runs/10490050545/job/29055955667 Co-authored-by: ManyTheFish <many@meilisearch.com> v1.10.0-rc.3 v1.10.0	2024-08-21 14:18:16 +00:00
ManyTheFish	b12e997c8a	Add pinyin flag	2024-08-21 14:38:04 +02:00
ManyTheFish	8bf89ec394	Infer locales from index settings	2024-08-21 10:47:40 +02:00
meili-bors[bot]	ee62d9ce30	Merge #4845 4845: Fix perf regression facet strings r=ManyTheFish a=dureuill Benchmarks between v1.9 and v1.10 show a performance regression of about x2 (+3dB regression) for most indexing workloads (+44s for hackernews). [Benchmark interpretation in the engine weekly meeting](https://www.notion.so/meilisearch/Engine-weekly-4d49560d374c4a87b4e3d126a261d4a0?pvs=4#98a709683276450295fcfe1f8ea5cef3). - Initial investigation pointed to #4819 as the origin of the regression. - Further investigation points towards the hypernormalization of each facet value in `extract_facet_string_docids` - Most of the slowdown is in `normalize_facet_strings`, and precisely in `detection.language()`. This PR improves the situation (-10s compared with `main` for hackernews, so only +34s regression compared with `v1.9`) by skipping normalization when it can be skipped. I'm not sure how to fix the root cause though. Should we skip facet locale normalization for now? Cc `@ManyTheFish` --- Tentative resolution options: 1. remove locale normalization from facet. I'm not sure why this is required, I believe we weren't doing this before, so maybe we can stop doing that again. 2. don't do language detection when it can be helped: won't help with the regressions in benchmark, but maybe we can skip language detection when the locales contain only one language? 3. use a faster language detection library: `@Kerollmops` told me about https://github.com/quickwit-oss/whichlang which bolsters x10 to x100 throughput compared with whatlang. Should we consider replacing whatlang with whichlang? Now I understand whichlang supports fewer languages than whatlang, so I also suggest: 4. use whichlang when the list of locales is empty (autodetection), or when it only contains locales that whichlang can detect. If the list of locales contains locales that whichlang cannot detect, then use whatlang instead. --- > [!CAUTION] > this PR contains a commit that adds detailed spans, that were used to detect which part of `extract_facet_string_docids` was taking too much time. As this commit adds spans that are called too often and adds 7s overhead, it should be removed before landing. Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-08-19 06:29:48 +00:00
ManyTheFish	0f965d3574	Remove hotloop's spans	2024-08-14 14:33:36 +02:00
ManyTheFish	ade54493ab	Only detect language for a facet if several locales have been specified by the user in the settings	2024-08-14 12:03:52 +02:00
meili-bors[bot]	07c8ed0459	Merge #4864 4864: Don't remove facet value when multiple original values map to the same normalized value r=ManyTheFish a=dureuill # Pull Request ## Related issue Fixes #4860 > [!WARNING] > This PR contains a fix to the immediate issue, but it looks like the underlying data model is faulty: there is only one possible "original" value for each normalized value in a facet of a document, while because of array values (or manually written nested fields, if you're evil), it is technically possible to have multiple, distinct original values mapping to the same normalized value. Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-08-13 14:04:17 +00:00
Louis Dureuil	c3cdc407ec	Avoid unnecessary clone()	2024-08-08 14:57:02 +02:00
Louis Dureuil	2f10273d14	Group by normalized values, make sure you don't remove a value where there remains at still one value that normalizes towards it	2024-08-08 14:02:53 +02:00
meili-bors[bot]	321639364f	Merge #4861 4861: Make sure the index scheduler never stops running r=irevoire a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/4748 ## What does this PR do? - Whatever happens, we always try to process tasks once every minute (if no tasks are enqueued that's practically free) Co-authored-by: Tamo <tamo@meilisearch.com>	2024-08-07 16:21:54 +00:00
Tamo	442d06dce7	ensure the run function doesn't panic even if the tick function does	2024-08-07 17:50:32 +02:00
Tamo	8f6a98df07	make sure the index scheduler never stops running	2024-08-07 17:06:43 +02:00
meili-bors[bot]	b44e17c4c3	Merge #4858 4858: also intersect the universe for searchOnAttributes r=irevoire a=dureuill # Pull Request ## Related issue Fixes #4857 ## What does this PR do? - intersect with the universe (which does not contain the filtered out ids) when looking up documents for words, even when using `searchOnAttributes` Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2024-08-07 13:15:26 +00:00
Louis Dureuil	e3ef0ae19e	also intersect the universe for searchOnAttributes	2024-08-06 14:06:56 +02:00
meili-bors[bot]	57f7af77c7	Merge #4846 4846: Add OpenAI tests r=dureuill a=dureuill # Pull Request ## Related issue Part of fixing #4757 ## What does this PR do? - OpenAI embedder: don't pass apiKey when it is empty (slightly improves error messages) - rest embedder and rest-based embedders: specialize the authorization denied error message depending on the configuration source - fix existing tests - Adds assets containing prerecorded texts to embed and the embeddings obtained from OpenAI - Adds an asset containing a tokenized long document and the embedding obtained from OpenAI for this token - Uses the wiremock crate to mock the OpenAI API: parse the openai request, lookup the response in assets, craft an openai response Co-authored-by: Louis Dureuil <louis@meilisearch.com> v1.10.0-rc.2	2024-08-05 10:49:28 +00:00

1 2 3 4 5 ...

9866 Commits