MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2024-11-10 23:18:55 +01:00

Author	SHA1	Message	Date
bors[bot]	290a29b5fb	Merge #457 457: Avoid iterating on big databases when useless r=Kerollmops a=Kerollmops This PR makes the prefix database updates to avoid iterating on big grenad files when it is unnecessary. We introduced this regression in #436 but it went unnoticed. --- According to the following benchmark results, we take more time when we index documents in one run than before #436. It looks like it is probably due to the fact that, now, instead of computing the prefixes database by iterating on the LMDB we directly iterate on the grenad file. Those could be slower to iterate on and could be the slowdown cause. I just pushed a commit that tests this branch with the new unreleased version of grenad where some work was done to speed up the iteration on grenad files. [The benchmarks for this last commit](https://github.com/meilisearch/milli/actions/runs/1927187408) are currently running. You can [see the diff](https://github.com/meilisearch/grenad/compare/v0.4.1...main) between the v0.4 and the unreleased v0.5 version of grenad. ```diff group indexing_benchmark-multi-batch-indexing-before-speed-up_45f52620 indexing_stop-iterating-on-big-grenad-files_ac8b85c4 ----- ---------------------------------------------------------------- ---------------------------------------------------- + indexing/Indexing songs in three batches with default settings 1.12 57.7±2.14s ? ?/sec 1.00 51.3±2.76s ? ?/sec - indexing/Indexing wiki 1.00 917.3±30.01s ? ?/sec 1.10 1008.4±38.27s ? ?/sec + indexing/Indexing wiki in three batches 1.10 1091.2±32.73s ? ?/sec 1.00 995.5±24.33s ? ?/sec ``` Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-03-09 16:46:34 +00:00
Kerollmops	1ae13c1374	Avoid iterating on big databases when useless	2022-03-09 15:43:54 +01:00
bors[bot]	a8d28e364d	Merge #461 461: Add a new error message when the `valid_fields` is empty r=curquiza a=brunoocasali I've created a test case to handle the new error formatting behavior, but I'm not sure if: - this is the right place to add the test? - this is the best way to test this behavior? And I'm not sure also regarding the `match` implementation, is this something required? Or maybe just an `if` statement is ok as well? I left the two messages literally without "reusing the prefix" in the implementation because I think this could help the "searchability" of the error in the future. # Pull Request ## What does this PR do? Fixes https://github.com/meilisearch/meilisearch/issues/2140 ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [ ] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Bruno Casali <brunoocasali@gmail.com>	2022-03-08 09:55:58 +00:00
Liu Hancheng	35bf7ee538	fix test	2022-03-08 12:26:02 +08:00
LiuHanCheng	c8895cab77	Update meilisearch-lib/src/document_formats.rs Co-authored-by: Clément Renault <renault.cle@gmail.com>	2022-03-08 12:03:59 +08:00
bors[bot]	b669a73432	Merge #2209 2209: rename auto batching cli r=curquiza a=MarinPostma rename `--enable-autobatching` to `--enable-auto-batching`. as per https://github.com/meilisearch/specifications/pull/96#issuecomment-1060693721 Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-07 15:58:58 +00:00
bors[bot]	833f7fbdbe	Merge #2204 2204: Fix blocking auth r=Kerollmops a=MarinPostma Fix auth blocking runtime I have decided to remove async code from `meilisearch-auth` and let `meilisearch-http` handle that. Because Actix polls the extractor futures concurrently, I have made a wrapper extractor that forces the errors from the futures to be returned sequentially (though is still polls them sequentially). close #2201 Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-07 15:39:24 +00:00
ad hoc	62ce8e0bda	chore(http): rename auto batching cli option	2022-03-07 15:19:19 +01:00
ad hoc	ddd25bfe01	remove token from InvalidToken error	2022-03-07 15:16:07 +01:00
bors[bot]	2ef5751795	Merge #463 463: Allow setting the primary-key in the cli r=irevoire a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-03-07 14:11:40 +00:00
ad hoc	19da45c53b	Update meilisearch-http/src/extractors/sequential_extractor.rs Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-03-07 15:02:07 +01:00
Tamo	8bb45956d4	allow to set the primary key in the cli	2022-03-07 14:56:49 +01:00
ad hoc	0026410c61	review edits	2022-03-07 14:21:44 +01:00
bors[bot]	3cbadf92b6	Merge #462 462: cli improvements r=Kerollmops a=MarinPostma a few improvements: - use bufreader to load documents, so the loading of the document doesn't appear on flamegraphs - set default db path to current directory so the `-i` flag can be omitted. Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-03-07 09:39:01 +00:00
ad hoc	db3a1905de	default db path	2022-03-07 10:30:47 +01:00
ad hoc	6cf82ba993	bufread documents	2022-03-07 10:29:52 +01:00
Bruno Casali	66c6d5e1ef	Add a new error message when the `valid_fields` is empty > "Attribute `{}` is not sortable. This index doesn't have configured sortable attributes." > "Attribute `{}` is not sortable. Available sortable attributes are: `{}`." coexist in the error handling	2022-03-05 10:38:18 -03:00
ad hoc	b57c59baa4	sequential extractor	2022-03-04 20:43:12 +01:00
Liu Hancheng	a356c8359c	fix broken test	2022-03-04 15:39:18 +08:00
Liu Hancheng	b138b92d39	show detailed error message	2022-03-04 15:31:11 +08:00
Liu Hancheng	58e2903177	first try	2022-03-04 10:46:59 +08:00
ad hoc	af8a5f2c21	async auth	2022-03-02 19:25:51 +01:00
ad hoc	d6400aef27	remove async from meilsearch-authentication	2022-03-02 18:22:34 +01:00
bors[bot]	81fe65afed	Merge #2200 2200: Fix(dumps): Explicitly define serde for time r=ManyTheFish a=ManyTheFish fix #2199 Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-03-02 10:39:24 +00:00
ManyTheFish	c2b58720d1	Fix(dumps): Explicitly define serde for time	2022-03-02 11:37:48 +01:00
bors[bot]	df518d8b0b	Merge #459 459: Update heed link in cargo toml r=Kerollmops a=curquiza Since grenad and heed have been moved to the meilisearch orga, this PR changes the link. This is a minor change since GitHub handles automatically the redirection. This PR is only for consisitency. Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2022-03-01 19:47:14 +00:00
Clémentine Urquizar	d9ed9de2b0	Update heed link in cargo toml	2022-03-01 19:45:29 +01:00
bors[bot]	51cf44d6fd	Merge #456 456: Remove useless grenad merging r=Kerollmops a=Kerollmops This PR must be merged after #454. This PR removes the part of code that was merging all of the grenad Readers merging that we don't need as the indexer should have merged them and, therefore, we should only have one final grenad Reader. We reduce the amount of CPU usage and memory pressure we were doing uselessly. `@ManyTheFish` are you sure I can skip merging the `word_docids` database? Here is the benchmark comparison with the previously merged PR #454: ``` group indexing_reintroduce-appending-sorted-values_c05e42a8 indexing_remove-useless-grenad-merging_d5b8b5a2 ----- ----------------------------------------------------- ----------------------------------------------- indexing/Indexing movies with default settings 1.06 16.6±1.04s ? ?/sec 1.00 15.7±0.93s ? ?/sec indexing/Indexing songs with default settings 1.16 60.1±7.07s ? ?/sec 1.00 51.7±5.98s ? ?/sec indexing/Indexing songs without faceted numbers 1.06 55.4±6.14s ? ?/sec 1.00 52.2±4.13s ? ?/sec ``` And the comparison with multi-batch indexing before #436, we can see that we gain time for benchmarks that index datasets in multiple batches but there is _so much_ variance that it's not clear. ``` group indexing_benchmark-multi-batch-indexing-before-speed-up_45f52620 indexing_remove-useless-grenad-merging_d5b8b5a2 ----- ---------------------------------------------------------------- ----------------------------------------------- indexing/Indexing geo_point 1.07 6.6±0.08s ? ?/sec 1.00 6.2±0.11s ? ?/sec indexing/Indexing songs in three batches with default settings 1.12 57.7±2.14s ? ?/sec 1.00 51.5±3.80s ? ?/sec indexing/Indexing songs with default settings 1.00 47.5±2.52s ? ?/sec 1.09 51.7±5.98s ? ?/sec indexing/Indexing songs without any facets 1.00 43.5±1.43s ? ?/sec 1.12 48.8±3.73s ? ?/sec indexing/Indexing songs without faceted numbers 1.00 47.1±2.23s ? ?/sec 1.11 52.2±4.13s ? ?/sec indexing/Indexing wiki 1.00 917.3±30.01s ? ?/sec 1.09 998.7±38.92s ? ?/sec indexing/Indexing wiki in three batches 1.09 1091.2±32.73s ? ?/sec 1.00 996.5±15.70s ? ?/sec ``` What do you think `@irevoire?` Should we change the benchmarks to make them do more runs? Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-03-01 16:48:08 +00:00
bors[bot]	5515aa5045	Merge #2197 2197: Additions to 0.26 (Update actix-web dependency to 4.0) r=curquiza a=MarinPostma - `@robjtede` `@MarinPostma` [update actix-web dependency to 4.0](`3b2e467ca6`) From main to release-v0.26.0 Co-authored-by: Rob Ede <robjtede@icloud.com>	2022-02-28 18:09:43 +00:00
Rob Ede	15150db957	clippy	2022-02-28 19:03:38 +01:00
Rob Ede	3b2e467ca6	update actix-web dependency to 4.0	2022-02-28 19:03:37 +01:00
Kerollmops	d5b8b5a2f8	Replace the ugly unwraps by clean if let Somes	2022-02-28 16:31:33 +01:00
Kerollmops	8d26f3040c	Remove a useless grenad file merging	2022-02-28 16:31:33 +01:00
bors[bot]	0c9e8cdf8d	Merge #2194 2194: update actix-web dependency to 4.0 r=irevoire a=robjtede # Pull Request ## What does this PR do? Updates Actix Web ecosystem crates to 4.0 stable. ## PR checklist Please check if your PR fulfills the following requirements: - [x] ~~Does this PR fix an existing issue?~~ - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Co-authored-by: Rob Ede <robjtede@icloud.com>	2022-02-28 15:23:42 +00:00
bors[bot]	21898ffc60	Merge #454 454: Reintroduce appending sorted entries when possible r=Kerollmops a=Kerollmops This PR modifies the `sorter_into_lmdb_database` function to append values into the database instead of get-put-merging them, it should improve the indexation speed for when the database is empty. ```txt group indexing_main_25123af3 indexing_reintroduce-appending-sorted-values_c05e42a8 ----- ---------------------- ----------------------------------------------------- indexing/Indexing movies with default settings 1.07 17.8±0.99s ? ?/sec 1.00 16.6±1.04s ? ?/sec indexing/Indexing songs with default settings 1.00 57.0±6.01s ? ?/sec 1.05 60.1±7.07s ? ?/sec indexing/Indexing songs without any facets 1.10 51.8±5.36s ? ?/sec 1.00 47.3±3.30s ? ?/sec ``` Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-02-28 14:55:37 +00:00
Rob Ede	8d624b3800	clippy	2022-02-28 13:43:22 +00:00
ad hoc	4fbb83a34d	bug(snapshot): Correctly open environments in snapshots	2022-02-28 12:37:30 +01:00
Rob Ede	961e22493c	update actix-web dependency to 4.0	2022-02-25 23:28:55 +00:00
Clément Renault	04b1bbf932	Reintroduce appending sorted entries when possible	2022-02-24 14:50:45 +01:00
bors[bot]	382be56d36	Merge #453 453: Benchmark multi batch indexing r=Kerollmops a=Kerollmops Hey `@irevoire,` could you please add the new benchmarks into influx? Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-02-24 12:33:13 +00:00
bors[bot]	09ee8e34a5	Merge #2192 2192: Fix max dbs error r=Kerollmops a=MarinPostma Factor the way we open environments to make sure they are always opened with the same options. The issue was that indexes were first opened in snapshots with incorrect options, and heed cache returned an environment with incorrect open options on subsequent index open. fix #2190 Co-authored-by: ad hoc <postma.marin@protonmail.com>	2022-02-23 16:18:23 +00:00
ad hoc	7e832105d7	bug(snapshot): Correctly open environments in snapshots	2022-02-23 17:12:08 +01:00
Clément Renault	acfc96525c	Apply GitHub suggestions	2022-02-23 16:20:29 +01:00
Clément Renault	a820aa11e6	Add a new movies benchmark to test multi batch indexing	2022-02-23 16:20:29 +01:00
Kerollmops	8d2e3e4aba	Add a new wiki benchmark to test multi batch indexing	2022-02-23 16:20:29 +01:00
Kerollmops	ab5247dc64	Add a new songs benchmark to test multi batch indexing	2022-02-23 16:20:28 +01:00
bors[bot]	acd9535588	Merge #455 455: Raise the GitHub CI timeout limit to 72h r=irevoire a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com>	2022-02-23 14:33:31 +00:00
Kerollmops	19bfb2649b	Raise the GitHub CI timeout limit to 72h	2022-02-23 15:27:51 +01:00
bors[bot]	ff6a7b6007	Merge #2191 2191: fix(analytics): flatten the scheduler options r=curquiza a=irevoire Implement missing part of [this spec](https://github.com/meilisearch/specifications/blob/develop/text/0034-telemetry-policies.md) by flattening the scheduler options. Co-authored-by: Tamo <tamo@meilisearch.com>	2022-02-22 16:19:07 +00:00
Tamo	6312e7f1f3	fix(analytics): flatten the scheduler options	2022-02-22 15:55:50 +01:00

... 51 52 53 54 55 ...

7520 Commits