MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2024-11-25 14:24:26 +01:00

Author	SHA1	Message	Date
Tamo	099abefc6d	Merge branch 'main' into metrics/prometheus-setup	2022-08-22 09:56:15 +02:00
mohandasspat	a05101af4d	clippy & fmt fixed	2022-08-22 13:21:22 +05:30
mohandasspat	109540011a	conflict fixes	2022-08-22 13:21:22 +05:30
mohandasspat	2f92169e48	clippy issue in metrics fixed	2022-08-22 13:21:22 +05:30
Pavo-Tusker	a58b00d8f1	Update meilisearch-http/src/option.rs Co-authored-by: Tamo <irevoire@protonmail.ch>	2022-08-22 13:21:22 +05:30
mohandasspat	2b8f3c26ec	Changed prometheus metrics feature as optional	2022-08-22 13:21:22 +05:30
mohandasspat	0b6ca73790	review fixes	2022-08-22 13:21:22 +05:30
Pavo-Tusker	1f1482e97c	Update meilisearch-http/src/routes/mod.rs Co-authored-by: Tamo <irevoire@protonmail.ch>	2022-08-22 13:21:22 +05:30
mohandasspat	25fecf9360	clippy & rustfmt fixed	2022-08-22 13:21:22 +05:30
mohandasspat	4bee0565e8	prometheus and grafana dashboards implemented	2022-08-22 13:21:22 +05:30
mohandasspat	d5da063666	clippy & fmt fixed	2022-08-22 10:52:09 +05:30
mohandasspat	43bb5176a9	conflict fixes	2022-08-22 10:30:07 +05:30
Irevoire	e7624abe63	share heed between all sub-crates	2022-08-19 11:23:41 +02:00
ManyTheFish	993aa1321c	Fix query tree building	2022-08-18 17:56:06 +02:00
ManyTheFish	bff9653050	Fix remove count	2022-08-18 17:36:30 +02:00
ManyTheFish	9640976c79	Rename TermMatchingPolicies	2022-08-18 17:36:08 +02:00
bors[bot]	a0734c991c	Merge #2674 2674: Add analytics on the stats routes r=ManyTheFish a=irevoire # Pull Request ## What does this PR do? Implements https://github.com/meilisearch/specifications/pull/169 ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Irevoire <tamo@meilisearch.com>	2022-08-18 14:19:56 +00:00
bors[bot]	cb29d7d124	Merge #2678 2678: Accept either an array of documents or a single document r=irevoire a=Kerollmops # Pull Request ## What does this PR do? Fixes #2671 ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-08-18 14:00:01 +00:00
Clément Renault	e32d5ef2b3	Fix the test with an uncomprehensible user error message	2022-08-18 14:37:44 +02:00
bors[bot]	60a7221827	Merge #609 609: Retry downloading the benchmarks datasets r=Kerollmops a=irevoire Downloading the benchmarks datasets is failing [more and more](https://github.com/meilisearch/milli/pull/607#pullrequestreview-1076023074) often; thus, instead of fixing the issue, I thought we could retry multiple times. Co-authored-by: Irevoire <tamo@meilisearch.com>	2022-08-18 11:47:09 +00:00
bors[bot]	afc10acd19	Merge #596 596: Filter operators: NOT + IN[..] r=irevoire a=loiclec # Pull Request ## What does this PR do? Implements the changes described in https://github.com/meilisearch/meilisearch/issues/2580 It is based on top of #556 Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-18 11:24:32 +00:00
Loïc Lecrenier	c7a86b56ef	Fix filter parser compilation error	2022-08-18 13:16:56 +02:00
Loïc Lecrenier	9b6602cba2	Avoid cloning FilterCondition in filter array parsing	2022-08-18 13:06:57 +02:00
Loïc Lecrenier	8a271223a9	Change a macro_rules to a function in filter parser	2022-08-18 13:03:55 +02:00
bors[bot]	ee69ede1ce	Merge #2677 2677: Hide the batch_uid field from the tasks route r=Kerollmops a=Kerollmops # Pull Request ## What does this PR do? Fixes #2676 ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Co-authored-by: Clément Renault <clement@meilisearch.com>	2022-08-18 10:01:09 +00:00
Clément Renault	9b2036ac05	Accept either an array of documents or a single document	2022-08-18 11:55:14 +02:00
Loïc Lecrenier	dd34dbaca5	Add more filter parser tests	2022-08-18 11:55:01 +02:00
Loïc Lecrenier	5d74ebd5e5	Cargo fmt	2022-08-18 11:36:38 +02:00
Clément Renault	5c543f9d94	Add a test for single document upload	2022-08-18 11:33:22 +02:00
Loïc Lecrenier	9af69c151b	Limit the maximum depth of filters This should have no impact on the user but is there to safeguard meilisearch against malicious inputs.	2022-08-18 11:31:38 +02:00
Clément Renault	0c03ed3c1e	Hide the batch_uid field from the tasks route	2022-08-18 11:15:21 +02:00
Loïc Lecrenier	c51dcad51b	Don't recompute filterable fields in evaluation of IN[] filter	2022-08-18 10:59:21 +02:00
Loïc Lecrenier	98f0da6b38	Simplify representation of nested NOT filters	2022-08-18 10:58:24 +02:00
Loïc Lecrenier	b030efdc83	Fix parsing of IN[] filter followed by whitespace + factorise its impl	2022-08-18 10:58:04 +02:00
Irevoire	84a784834e	retry downloading the benchmarks datasets	2022-08-17 19:25:05 +02:00
bors[bot]	79094bcbcf	Merge #607 607: Better threshold r=Kerollmops a=irevoire # Pull Request ## What does this PR do? Fixes #570 This PR tries to improve the threshold used to trigger the real deletion of documents. The deletion is now triggered in two cases; - 10% of the total available space is used by soft deleted documents - 90% of the total available space is used. In this context, « total available space » means the `map_size` of lmdb. And the size used by the soft deleted documents is actually an estimation. We can't determine precisely the size used by one document thus what we do is; take the total space used, divide it by the number of documents + soft deleted documents to estimate the size of one average document. Then multiply the size of one avg document by the number of soft deleted document. -------- <img width="808" alt="image" src="https://user-images.githubusercontent.com/7032172/185083075-92cf379e-8ae1-4bfc-9ca6-93b54e6ab4e9.png"> Here we can see we have a ~10GB drift in the end between the space used by the soft deleted and the real space used by the documents. Personally I don’t think that's a big issue because once the red line reach 90GB everything will be freed but now you know. If you have an idea on how to improve this estimation I would love to hear it. It look like the difference is linear so maybe we could simply multiply the current estimation by two? Co-authored-by: Irevoire <tamo@meilisearch.com>	2022-08-17 16:31:04 +00:00
mohandasspat	54a0b47c2b	clippy issue in metrics fixed	2022-08-17 21:08:28 +05:30
Loïc Lecrenier	497f9817a2	Use snapshot testing for the filter parser	2022-08-17 17:35:01 +02:00
Pavo-Tusker	947fb5c956	Update meilisearch-http/src/option.rs Co-authored-by: Tamo <irevoire@protonmail.ch>	2022-08-17 20:57:07 +05:30
mohandasspat	cd18459484	Changed prometheus metrics feature as optional	2022-08-17 20:56:15 +05:30
mohandasspat	225d9936ed	review fixes	2022-08-17 20:55:29 +05:30
Pavo-Tusker	93daa4c464	Update meilisearch-http/src/routes/mod.rs Co-authored-by: Tamo <irevoire@protonmail.ch>	2022-08-17 20:55:29 +05:30
mohandasspat	d08c77706c	clippy & rustfmt fixed	2022-08-17 20:55:29 +05:30
mohandasspat	de58ccd4ba	prometheus and grafana dashboards implemented	2022-08-17 20:54:39 +05:30
Irevoire	4aae07d5f5	expose the size methods	2022-08-17 17:07:38 +02:00
Irevoire	e96b852107	bump heed	2022-08-17 17:05:50 +02:00
Loïc Lecrenier	238a7be58d	Fix filter parser handling of keywords and surrounding spaces Now the following fragments are allowed: AND(field = AND'field' = AND"field" =	2022-08-17 16:53:40 +02:00
Irevoire	62240b7e19	add analytics on the stats routes	2022-08-17 16:12:26 +02:00
Loïc Lecrenier	b09a8f1b91	Filters: add explicit error message when using a keyword as value	2022-08-17 16:07:00 +02:00
bors[bot]	087da5621a	Merge #587 587: Word prefix pair proximity docids indexation refactor r=Kerollmops a=loiclec # Pull Request ## What does this PR do? Refactor the code of `WordPrefixPairProximityDocIds` to make it much faster, fix a bug, and add a unit test. ## Why is it faster? Because we avoid using a sorter to insert the (`word1`, `prefix`, `proximity`) keys and their associated bitmaps, and thus we don't have to sort a potentially very big set of data. I have also added a couple of other optimisations: 1. reusing allocations 2. using a prefix trie instead of an array of prefixes to get all the prefixes of a word 3. inserting directly into the database instead of putting the data in an intermediary grenad when possible. Also avoid checking for pre-existing values in the database when we know for certain that they do not exist. ## What bug was fixed? When reindexing, the `new_prefix_fst_words` prefixes may look like: ``` ["ant", "axo", "bor"] ``` which we group by first letter: ``` [["ant", "axo"], ["bor"]] ``` Later in the code, if we have the word2 "axolotl", we try to find which subarray of prefixes contains its prefixes. This check is done with `word2.starts_with(subarray_prefixes[0])`, but `"axolotl".starts_with("ant")` is false, and thus we wrongly think that there are no prefixes in `new_prefix_fst_words` that are prefixes of `axolotl`. ## StrStrU8Codec I had to change the encoding of `StrStrU8Codec` to make the second string null-terminated as well. I don't think this should be a problem, but I may have missed some nuances about the impacts of this change. ## Requests when reviewing this PR I have explained what the code does in the module documentation of `word_pair_proximity_prefix_docids`. It would be nice if someone could read it and give their opinion on whether it is a clear explanation or not. I also have a couple questions regarding the code itself: - Should we clean up and factor out the `PrefixTrieNode` code to try and make broader use of it outside this module? For now, the prefixes undergo a few transformations: from FST, to array, to prefix trie. It seems like it could be simplified. - I wrote a function called `write_into_lmdb_database_without_merging`. (1) Are we okay with such a function existing? (2) Should it be in `grenad_helpers` instead? ## Benchmark Results We reduce the time it takes to index about 8% in most cases, but it varies between -3% and -20%. ``` group indexing_main_ce90fc62 indexing_word-prefix-pair-proximity-docids-refactor_cbad2023 ----- ---------------------- ------------------------------------------------------------ indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.00 1893.0±233.03µs ? ?/sec 1.01 1921.2±260.79µs ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.05 9.4±3.51ms ? ?/sec 1.00 9.0±2.14ms ? ?/sec indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.22 18.3±11.42ms ? ?/sec 1.00 15.0±5.79ms ? ?/sec indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.00 41.4±4.20ms ? ?/sec 1.28 53.0±13.97ms ? ?/sec indexing/-wiki-delete-searchable- 1.00 285.6±18.12ms ? ?/sec 1.03 293.1±16.09ms ? ?/sec indexing/Indexing geo_point 1.03 60.8±0.45s ? ?/sec 1.00 58.8±0.68s ? ?/sec indexing/Indexing movies in three batches 1.14 16.5±0.30s ? ?/sec 1.00 14.5±0.24s ? ?/sec indexing/Indexing movies with default settings 1.11 13.7±0.07s ? ?/sec 1.00 12.3±0.28s ? ?/sec indexing/Indexing nested movies with default settings 1.10 10.6±0.11s ? ?/sec 1.00 9.6±0.15s ? ?/sec indexing/Indexing nested movies without any facets 1.11 9.4±0.15s ? ?/sec 1.00 8.5±0.10s ? ?/sec indexing/Indexing songs in three batches with default settings 1.18 66.2±0.39s ? ?/sec 1.00 56.0±0.67s ? ?/sec indexing/Indexing songs with default settings 1.07 58.7±1.26s ? ?/sec 1.00 54.7±1.71s ? ?/sec indexing/Indexing songs without any facets 1.08 53.1±0.88s ? ?/sec 1.00 49.3±1.43s ? ?/sec indexing/Indexing songs without faceted numbers 1.08 57.7±1.33s ? ?/sec 1.00 53.3±0.98s ? ?/sec indexing/Indexing wiki 1.06 1051.1±21.46s ? ?/sec 1.00 989.6±24.55s ? ?/sec indexing/Indexing wiki in three batches 1.20 1184.8±8.93s ? ?/sec 1.00 989.7±7.06s ? ?/sec indexing/Reindexing geo_point 1.04 67.5±0.75s ? ?/sec 1.00 64.9±0.32s ? ?/sec indexing/Reindexing movies with default settings 1.12 13.9±0.17s ? ?/sec 1.00 12.4±0.13s ? ?/sec indexing/Reindexing songs with default settings 1.05 60.6±0.84s ? ?/sec 1.00 57.5±0.99s ? ?/sec indexing/Reindexing wiki 1.07 1725.0±17.92s ? ?/sec 1.00 1611.4±9.90s ? ?/sec ``` Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>	2022-08-17 14:06:12 +00:00

... 48 49 50 51 52 ...

8312 Commits