MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2024-11-16 01:48:56 +01:00

Author	SHA1	Message	Date
Louis Dureuil	6028d6ba43	Remove somme warnings	2024-10-10 22:42:37 +02:00
Louis Dureuil	68a2502388	Introduce indexer level bumpalo	2024-10-10 22:23:05 +02:00
Clément Renault	39b27e42be	Plug the deletion pipeline	2024-10-08 16:04:19 +02:00
ManyTheFish	974272f2e9	Merge branch 'main' into indexer-edition-2024	2024-09-25 07:41:16 +02:00
Clément Renault	e0c7067355	Expose an IndexedParallelIterator to the index function	2024-09-24 17:24:59 +02:00
Clément Renault	3e9198ebaa	Support guessing primary key again	2024-09-11 17:25:40 +02:00
Clément Renault	8fd0afaaaa	Make sure we iterate over the payload documents in order	2024-09-06 08:09:08 +02:00
Clément Renault	72c6a21a30	Use raw JSON to read the payloads	2024-09-05 20:08:23 +02:00
Clément Renault	52d32b4ee9	Move the channel sender in the closure to stop the merger thread	2024-09-03 16:08:33 +02:00
Clément Renault	c1557734dc	Use the GlobalFieldsIdsMap everywhere and write it to disk Co-authored-by: Dureuill <louis@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2024-09-03 12:01:01 +02:00
Clément Renault	bcb1aa3d22	Find a temporary solution to par into iter on an HashMap Spoiler: Do not use an HashMap but drain it into a Vec	2024-09-02 19:39:48 +02:00
Tamo	e6dd66e4a0	Do not fail the whole batch when a single document deletion by filter fails	2024-09-02 16:27:51 +02:00
Tamo	6e3839d8b6	autobatch document deletion by filter	2024-09-02 16:27:51 +02:00
Clément Renault	794ebcd582	Replace grenad with the new grenad various-improvement branch	2024-08-30 11:53:59 +02:00
Tamo	cf760cbfb1	Log the time to index a batch of task	2024-07-17 11:56:57 +02:00
Clément Renault	33fa17bf12	Support deleting documents with functions	2024-07-10 16:28:15 +02:00
Clément Renault	400e6b93ce	Support user-provided context for documents edition	2024-07-10 16:28:15 +02:00
Clément Renault	f32e6c32fc	Rename editionCode to function	2024-07-10 16:28:15 +02:00
Clément Renault	efc156a4a4	Executing Lua works correctly	2024-07-10 16:27:36 +02:00
Clément Renault	ba85959642	Support filtering the documents to edit with lua	2024-07-10 16:23:21 +02:00
Clément Renault	1702b5cf44	Prepare for processing documents edition	2024-07-10 16:23:21 +02:00
Louis Dureuil	3bc8f81abc	user_provided => regenerate	2024-06-12 18:12:20 +02:00
Tamo	d85ab23b82	rename all occurences of user_defined to user_provided for consistency	2024-06-06 11:39:29 +02:00
Tamo	cc5dca8321	fix two bug and add a dump test	2024-06-06 11:39:29 +02:00
Clément Renault	dc949ab46a	Remove puffin usage	2024-05-27 15:59:14 +02:00
Louis Dureuil	8a941c0241	Smaller review changes	2024-05-22 14:44:42 +02:00
Louis Dureuil	9969f7a638	Add test on index-scheduler	2024-05-20 14:44:10 +02:00
Louis Dureuil	02714ef5ed	Add vectors from vector DB in dump	2024-05-20 10:36:18 +02:00
Tamo	897d25780e	update milli to latest version	2024-05-16 18:31:32 +02:00
writegr	ab43a8a949	chore: fix some typos in comments Signed-off-by: writegr <wellweek@outlook.com>	2024-04-18 14:12:52 +08:00
Louis Dureuil	f82d056072	Hide secrets in settings and task queue	2024-03-26 10:36:24 +01:00
Tamo	066a7a3cde	takes only one read transaction per thread	2024-02-26 10:43:04 +01:00
Louis Dureuil	02e6c8a440	Add tracing to index-scheduler	2024-02-08 15:03:31 +01:00
meili-bors[bot]	1ccde9bf0b	Merge #4316 4316: Autobatch the task deletions r=curquiza a=irevoire # Pull Request ## Related issue Fix part of https://github.com/meilisearch/meilisearch-support/issues/69 Fix #4315 ## What does this PR do? - Autobatch the task deletions Co-authored-by: Tamo <tamo@meilisearch.com>	2024-01-15 17:54:50 +00:00
Tamo	b4d7d80ad9	autobatch the task deletions	2024-01-11 14:58:07 +01:00
Louis Dureuil	97bb1ff9e2	Move `currently_updating_index` to IndexMapper	2024-01-09 15:37:27 +01:00
Louis Dureuil	ee54d3171e	Check experimental feature at query time	2023-12-21 15:26:12 +01:00
Many the fish	9e1b458010	Merge branch 'main' into change-proximity-precision-settings	2023-12-18 09:08:47 +01:00
Louis Dureuil	13c2c6c16b	Small commit to add hybrid search and autoembedding	2023-12-14 16:07:48 +01:00
ManyTheFish	35e1981488	Remove proximityPrecision form the experimental feature	2023-12-14 15:52:42 +01:00
Clément Renault	7e259cb0d2	Expose the --max-number-of-batched-tasks argument	2023-12-11 16:08:39 +01:00
ManyTheFish	1f4fc9c229	Make the feature experimental	2023-12-06 15:49:05 +01:00
Clément Renault	ec9b52d608	Rename copy_to_path to copy_to_file	2023-11-28 14:32:30 +01:00
Clément Renault	0dbf1a16ff	Make clippy happy	2023-11-23 14:11:38 +01:00
Clément Renault	0d4482625a	Make the changes to use heed v0.20-alpha.6	2023-11-23 11:43:58 +01:00
Clément Renault	7cb7e37ba8	Merge branch 'main' into tmp-release-v1.5.0	2023-11-21 16:30:46 +01:00
meili-bors[bot]	33b7c574ea	Merge #4090 4090: Diff indexing r=ManyTheFish a=ManyTheFish This pull request aims to reduce the indexing time by computing a difference between the data added to the index and the data removed from the index before writing in LMDB. ## Why focus on reducing the writings in LMDB? The indexing in Meilisearch is split into 3 main phases: 1) The computing or the extraction of the data (Multi-threaded) 2) The writing of the data in LMDB (Mono-threaded) 3) The processing of the prefix databases (Mono-threaded) see below: ![Capture d’écran 2023-09-28 à 20 01 45](https://github.com/meilisearch/meilisearch/assets/6482087/51513162-7c39-4244-978b-2c6b60c43a56) Because the writing is mono-threaded, it represents a bottleneck in the indexing, reducing the number of writes in LMDB will reduce the pressure on the main thread and should reduce the global time spent on the indexing. ## Give Feedback We created [a dedicated discussion](https://github.com/meilisearch/meilisearch/discussions/4196) for users to try this new feature and to give feedback on bugs or performance issues. ## Technical approach ### Part 1: merge the addition and the deletion process This part: a) Aims to reduce the time spent on indexing only the filterable/sortable fields of documents, for example: - Updating the number of "likes" or "stars" of a song or a movie - Updating the "stock count" or the "price" of a product b) Aims to reduce the time spent on writing in LMDB which should reduce the global indexing time for the highly multi-threaded machines by reducing the writing bottleneck. c) Aims to reduce the average time spent to delete documents without having to keep the soft-deleted documents implementation - [x] Create a preprocessing function that creates the diff-based documents chuck (`OBKV<fid, OBKV<AddDel, value>>`) - [x] and clearly separate the faceted fields and the searchable fields in two different chunks - Change the parameters of the input extractor by taking an `OBKV<fid, OBKV<AddDel, value>>` instead of `OBKV<fid, value>`. - [x] extract_docid_word_positions - [x] extract_geo_points - [x] extract_vector_points - [x] extract_fid_docid_facet_values - Adapt the searchable extractors to the new diff-chucks - [x] extract_fid_word_count_docids - [x] extract_word_pair_proximity_docids - [x] extract_word_position_docids - [x] extract_word_docids - Adapt the facet extractors to the new diff-chucks - [x] extract_facet_number_docids - [x] extract_facet_string_docids - [x] extract_fid_docid_facet_values - [x] FacetsUpdate - [x] Adapt the prefix database extractors ⚠️ ⚠️ - [x] Make the LMDB writer remove the document_ids to delete at the same time the new document_ids are added - [x] Remove document deletion pipeline - [x] remove `new_documents_ids` entirely and `replaced_documents_ids` - [x] reuse extracted external id from transform instead of re-extracting in `TypedChunks::Documents` - [x] Remove deletion pipeline after autobatcher - [x] remove autobatcher deletion pipeline - [x] everything uses `IndexOperation::DocumentOperation` - [x] repair deletion by internal id for filter by delete - [x] Improve the deletion via internal ids by avoiding iterating over the whole set of external document ids. - [x] Remove soft-deleted documents #### FIXME - [x] field distribution is not correctly updated after deletion - [x] missing documents in the tests of tokenizer_customization ### Part 2: Only compute the documents field by field This part aims to reduce the global indexing time for any kind of partial document modification on any size of machine from the mono-threaded one to the highly multi-threaded one. - [ ] Make the preprocessing function only send the fields that changed to the extractors - [ ] remove the `word_docids` and `exact_word_docids` database and adapt the search (⚠️ could impact the search performances) - [ ] replace the `word_pair_proximity_docids` database with a `word_pair_proximity_fid_docids` database and adapt the search (⚠️ could impact the search performances) - [ ] Adapt the prefix database extractors ⚠️ ⚠️ ## Technical Concerns - The part 1 implementation could increase the indexing time for the smallest machines (with few threads) by increasing the extracting time (multi-threaded) more than the writing time (mono-threaded) - The part 2 implementation needs to change the databases which could have a significant impact on the search performances - The prefix databases are a bit special to process and may be a pain to adapt to the difference-based indexing Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-11-21 09:44:38 +00:00
Tamo	5b57fbab08	makes the dump cancellable	2023-11-14 11:23:13 +01:00
Louis Dureuil	a2d6dc8571	Fix typo, remove caching for the change of index	2023-11-13 10:44:36 +01:00
Louis Dureuil	492fc086f0	cargo fmt	2023-11-12 21:53:11 +01:00

1 2 3 4 5

204 Commits