MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-06-12 19:11:36 +02:00

Author	SHA1	Message	Date
ManyTheFish	2d8d0af1a6	Rename short name bc by ic for initial_candidates	2022-12-13 10:56:38 +01:00
ManyTheFish	80d34a4169	Fix typo initial candiddates computation	2022-12-12 19:02:48 +01:00
bors[bot]	1f1beae077	Merge #729 729: Fix distincted exhaustive hits r=Kerollmops a=ManyTheFish This PR changes the name and behavior of `bucket_candidates`: - `bucket_candidates` become `initial_candidates` that is less confusing - `initial_candidates` is no more a simple `RoaringBitmap` but an enum allowing us to precise if the candidates are exhaustive or not - this enum ensures that any modification is allowed only if the candidates are not already exhaustive. The bug occurred because `initial_candidates` are modified during the bucket sort allowing the estimation to be more and more precise along the search, and this was an issue when the `initial_candidates` were already exhaustive, now, if candidates are exhaustive, then no modifications are made. Co-authored-by: ManyTheFish <many@meilisearch.com>	2022-12-08 09:26:34 +00:00
ManyTheFish	55724f2412	Introduce an initial candidates set that makes the difference between an exhaustive count and an estimation	2022-12-08 09:41:34 +01:00
ManyTheFish	6d50ea0830	add tests	2022-12-08 08:56:57 +01:00
bors[bot]	098c410612	Merge #727 727: Fix bug in filter search r=Kerollmops a=loiclec # Pull Request ## Related issue Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3178 ## What does this PR do? The most important change is this one: ```rust // in milli/src/search/facet/facet_range_search.rs, line 239 let should_stop = { match self.right { Bound::Included(right) => right < previous_key.left_bound, Bound::Excluded(right) => right <= previous_key.left_bound, Bound::Unbounded => false, } }; ``` where the operations `<` and `<=` between the two branches were switched. This caused (very few) documents to be missing from filter results. The second change is a simplification of the algorithm for filters such as `field = value`, where we now perform a direct query into the "Level 0" of the facet db to retrieve the docids instead of invoking the full facet search algorithm. This change is done in `milli/src/search/facet/filter.rs`. I have added yet more insta-snapshot tests, rechecked the content of the snapshots, and added some integration tests as well. This is purely a fix in the search algorithms. Based on this PR alone, a dump will not be necessary to switch from v0.30.1 (where this bug is present) to v0.30.2 (where this PR is merged). Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-12-07 14:34:59 +00:00
bors[bot]	ee10cb8c87	Merge #726 726: Update the contributing.md r=curquiza a=irevoire Co-authored-by: Tamo <tamo@meilisearch.com>	2022-12-07 13:59:04 +00:00
Loïc Lecrenier	d38cc73630	Add one more filter "integration" test	2022-12-07 14:38:25 +01:00
Loïc Lecrenier	e688581c36	Add tests for facet range search on different field ids	2022-12-07 14:38:21 +01:00
Loïc Lecrenier	4ac8f96342	Simplify implementation of equality condition in filters	2022-12-07 14:38:18 +01:00
Loïc Lecrenier	1c9555566e	Fix bug in facet range search	2022-12-07 14:38:14 +01:00
Loïc Lecrenier	303d740245	Prepare fix within facet range search By creating snapshots and updating the format of the existing snapshots. The next commit will apply the fix, which will show its effects cleanly on the old and new snapshot tests	2022-12-07 14:38:10 +01:00
Tamo	250743885d	add a sentence about installing rust-nightly	2022-12-07 12:31:43 +01:00
Tamo	5eecb8489d	Update CONTRIBUTING.md Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-12-07 12:23:12 +01:00
Tamo	0e5c3b1f64	Update CONTRIBUTING.md Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>	2022-12-07 12:23:06 +01:00
Tamo	f53bdc4320	update the contributing.md	2022-12-06 17:41:05 +01:00
bors[bot]	0a301b5f88	Merge #723 723: Fix bug in handling of soft deleted documents when updating settings r=Kerollmops a=loiclec # Pull Request ## Related issue Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3021 ## What does this PR do? This PR fixes the bug where a `missing key in documents database` internal error message could appear when indexing documents. When updating the settings, before clearing the database and before creating the transform output, we now modify the `ExternalDocumentsIds` structure to get rid of all references to soft deleted document ids in its FSTs. It used to be that updating the settings would clear the soft-deleted document ids, but keep the original `ExternalDocumentsIds` structure. As a consequence of this, when processing a future document addition, we could wrongly believe that a document was being replaced when, in fact, it was a completely new document. See the tests `bug_3021_first`, `bug_3021_second`, and `bug_3021` for a minimal test case that would have reproduced the issue. We need to take special care to: - evaluate how users should update to v0.30.1 (containing this fix): dump? reimporting all documents from scratch? - understand IF/HOW this bug could have caused duplicate documents to be returned - and evaluate the correctness of the fix, of course :) Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-12-06 14:37:38 +00:00
Loïc Lecrenier	a993b68684	Cargo fmt >:-(	2022-12-06 15:22:10 +01:00
Loïc Lecrenier	80c7a00567	Fix compilation error in tests of settings update	2022-12-06 15:19:26 +01:00
Loïc Lecrenier	67d8cec209	Fix bug in handling of soft deleted documents when updating settings	2022-12-06 15:09:19 +01:00
bors[bot]	2a846aaae7	Merge #719 719: Add more members of `filter_parser` to `milli::` & `From<&str>` implementation for `Token` r=Kerollmops a=GregoryConrad ## What does this PR do? The current `milli::Filter` and `milli::FilterCondition` APIs require working with some members of `filter_parser` directly that `milli::` does not re-export to its users (at least when not parsing input using `parse`). Also, using `filter_parser` does not make sense when using milli from an embedded context where there is no query to parse. Instead of reworking `milli::Filter` and `milli::FilterCondition`, this PR adds two non-breaking changes that ease the use of milli: - Re-exports more members of the dependent version of `filter_parser` in `milli` - Implements `From<&str>` for `filter_parser::Token` - This will also allow some basic tests that need to create a `Token` from a string to avoid some boilerplate. In conjunction, both of these will allow milli users to easily create a `Token` from a `&str` without needing to add `filter_parser` as an extra dependency. Note: I wanted to use `FromStr` for the `From` implementation; however, it requires returning a `Result` which is not needed for the conversion. Thus, I just left it as `From<&str>`. Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>	2022-12-06 10:36:00 +00:00
bors[bot]	d6eacb2aac	Merge #722 722: Geosearch for zero radius r=irevoire a=amab8901 # Pull Request ## Related issue Fixes #3167 (https://github.com/meilisearch/meilisearch/issues/3167) ## What does this PR do? - allows Geosearch with zero radius to return the specified location when the coordinates match perfectly (instead of returning nothing). See link for more details. - new attempt on https://github.com/meilisearch/milli/pull/713 ## PR checklist Please check if your PR fulfills the following requirements: - [ X ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ X ] Have you read the contributing guidelines? - [ X ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: amab8901 <amab8901@protonmail.com> Co-authored-by: Tamo <irevoire@protonmail.ch>	2022-12-05 19:57:08 +00:00
Tamo	212dbfa3b5	Update milli/src/search/facet/filter.rs	2022-12-05 20:56:21 +01:00
amab8901	456da5de9c	Geosearch for zero radius	2022-12-05 20:11:46 +01:00
bors[bot]	46e26ab550	Merge #720 720: Make soft deletion optional in document addition and deletion + add lots of tests r=irevoire a=loiclec # Pull Request ## What does this PR do? When debugging recent issues, I created a few unit tests in the hopes reproducing the bugs I was looking for. In the end, I didn't find any, but I thought it would still be good to keep those tests. More importantly, I added a field to the `DeleteDocuments` and `IndexDocuments` builders, called `disable_soft_deletion`. If set to `true`, the indexing/deletion will never add documents to the `soft_deleted_documents_ids` and instead perform a real deletion of the documents from the databases. For the new tests, I have: - Improved the insta-snapshot format of the `external_documents_ids` structure - Added more tests for the facet DB indexing, deletion, and search algorithms, making sure to test them when the facet DB contains strings (instead of numbers) as well. - Added more tests for the incremental indexing of the prefix proximity databases. For example, to see if documents are replaced correctly and if common prefixes are deleted correctly. - Added tests that mix soft deletion and hard deletion, including when processing batches of document updates. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-12-05 18:26:01 +00:00
Loïc Lecrenier	cda4ba2bb6	Add document import tests	2022-12-05 12:02:49 +01:00
Loïc Lecrenier	ae59d37b75	Improve insta-snap of the external document ids	2022-12-05 10:51:02 +01:00
Loïc Lecrenier	f2cf981641	Add more tests and allow disabling of soft-deletion outside of tests Also allow disabling soft-deletion in the IndexDocumentsConfig	2022-12-05 10:51:01 +01:00
Gregory Conrad	50954d31fa	feat: Re-export Span and Token to milli::	2022-12-03 13:37:33 -05:00
Gregory Conrad	1b5b5778c1	feat: Add From<&str> implementation for Token	2022-12-03 13:13:41 -05:00
bors[bot]	d3731dda48	Merge #706 706: Limit the reindexing caused by updating settings when not needed r=curquiza a=GregoryConrad ## What does this PR do? When updating index settings using `update::Settings`, sometimes a `reindex` of `update::Settings` is triggered when it doesn't need to be. This PR aims to prevent those unnecessary `reindex` calls. For reference, here is a snippet from the current `execute` method in `update::Settings`: ```rust // ... if stop_words_updated \|\| faceted_updated \|\| synonyms_updated \|\| searchable_updated \|\| exact_attributes_updated { self.reindex(&progress_callback, &should_abort, old_fields_ids_map)?; } ``` - [x] `faceted_updated` - looks good as-is ✅ - [x] `stop_words_updated` - looks good as-is ✅ - [x] `synonyms_updated` - looks good as-is ✅ - [x] `searchable_updated` - fixed in this PR - [x] `exact_attributes_updated` - fixed in this PR ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>	2022-12-01 13:58:02 +00:00
bors[bot]	51a2613c5c	Merge #715 715: Fix benchmark CI r=irevoire a=curquiza Fixes #714 Tested with our actions: https://github.com/meilisearch/milli/actions/runs/3591527753/jobs/6046157141 Co-authored-by: curquiza <clementine@meilisearch.com>	2022-12-01 10:39:38 +00:00
bors[bot]	82e1c4f468	Merge #716 716: Bump Swatinem/rust-cache from 2.0.1 to 2.2.0 r=curquiza a=dependabot[bot] Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.1 to 2.2.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/releases">Swatinem/rust-cache's releases</a>.</em></p> <blockquote> <h2>v2.2.0</h2> <ul> <li>Add new <code>save-if</code> option to always restore, but only conditionally save the cache.</li> </ul> <h2>v2.1.0</h2> <ul> <li>Only hash <code>Cargo.{lock,toml}</code> files in the configured workspace directories.</li> </ul> <h2>v2.0.2</h2> <ul> <li>Avoid calling cargo metadata on pre-cleanup.</li> <li>Added <code>prefix-key</code>, <code>cache-directories</code> and <code>cache-targets</code> options.</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md">Swatinem/rust-cache's changelog</a>.</em></p> <blockquote> <h2>2.2.0</h2> <ul> <li>Add new <code>save-if</code> option to always restore, but only conditionally save the cache.</li> </ul> <h2>2.1.0</h2> <ul> <li>Only hash <code>Cargo.{lock,toml}</code> files in the configured workspace directories.</li> </ul> <h2>2.0.2</h2> <ul> <li>Avoid calling <code>cargo metadata</code> on pre-cleanup.</li> <li>Added <code>prefix-key</code>, <code>cache-directories</code> and <code>cache-targets</code> options.</li> </ul> <h2>2.0.1</h2> <ul> <li>Primarily just updating dependencies to fix GitHub deprecation notices.</li> </ul> <h2>2.0.0</h2> <ul> <li>The action code was refactored to allow for caching multiple workspaces and different <code>target</code> directory layouts.</li> <li>The <code>working-directory</code> and <code>target-dir</code> input options were replaced by a single <code>workspaces</code> option that has the form of <code>$workspace -> $target</code>.</li> <li>Support for considering <code>env-vars</code> as part of the cache key.</li> <li>The <code>sharedKey</code> input option was renamed to <code>shared-key</code> for consistency.</li> </ul> <h2>1.4.0</h2> <ul> <li>Clean both <code>debug</code> and <code>release</code> target directories.</li> </ul> <h2>1.3.0</h2> <ul> <li>Use Rust toolchain file as additional cache key.</li> <li>Allow for a configurable target-dir.</li> </ul> <h2>1.2.0</h2> <ul> <li>Cache <code>~/.cargo/bin</code>.</li> <li>Support for custom <code>$CARGO_HOME</code>.</li> <li>Add a <code>cache-hit</code> output.</li> <li>Add a new <code>sharedKey</code> option that overrides the automatic job-name based key.</li> </ul> <h2>1.1.0</h2> <ul> <li>Add a new <code>working-directory</code> input.</li> <li>Support caching git dependencies.</li> <li>Lots of other improvements.</li> </ul> <h2>1.0.2</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`359a70e43a`"><code>359a70e</code></a> 2.2.0</li> <li><a href="`ecee04e7b3`"><code>ecee04e</code></a> feat: add save-if option, closes <a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/66">#66</a> (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/91">#91</a>)</li> <li><a href="`b894d59a8d`"><code>b894d59</code></a> 2.1.0</li> <li><a href="`e78327dd9e`"><code>e78327d</code></a> small code style improvements, README and CHANGELOG updates</li> <li><a href="`ccdddcc049`"><code>ccdddcc</code></a> only hash Cargo.toml/Cargo.lock that belong to a configured workspace (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/90">#90</a>)</li> <li><a href="`b5ec9edd91`"><code>b5ec9ed</code></a> 2.0.2</li> <li><a href="`3f2513fdf4`"><code>3f2513f</code></a> avoid calling cargo metadata on pre-cleanup</li> <li><a href="`19c46583c5`"><code>19c4658</code></a> update dependencies</li> <li><a href="`b8e72aae83`"><code>b8e72aa</code></a> Added <code>prefix-key</code> <code>cache-directories</code> and <code>cache-targets</code> options (<a href="https://github-redirect.dependabot.com/Swatinem/rust-cache/issues/85">#85</a>)</li> <li>See full diff in <a href="https://github.com/Swatinem/rust-cache/compare/v2.0.1...v2.2.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Swatinem/rust-cache&package-manager=github_actions&previous-version=2.0.1&new-version=2.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) You can trigger a rebase of this PR by commenting ``@dependabot` rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - ``@dependabot` rebase` will rebase this PR - ``@dependabot` recreate` will recreate this PR, overwriting any edits that have been made to it - ``@dependabot` merge` will merge this PR after your CI passes on it - ``@dependabot` squash and merge` will squash and merge this PR after your CI passes on it - ``@dependabot` cancel merge` will cancel a previously requested merge and block automerging - ``@dependabot` reopen` will reopen this PR if it is closed - ``@dependabot` close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - ``@dependabot` ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - ``@dependabot` ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-12-01 10:08:58 +00:00
curquiza	5bdf5c0aaf	Update the steps to set variables	2022-12-01 11:07:54 +01:00
dependabot[bot]	282b2e3b98	Bump Swatinem/rust-cache from 2.0.1 to 2.2.0 Bumps [Swatinem/rust-cache](https://github.com/Swatinem/rust-cache) from 2.0.1 to 2.2.0. - [Release notes](https://github.com/Swatinem/rust-cache/releases) - [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md) - [Commits](https://github.com/Swatinem/rust-cache/compare/v2.0.1...v2.2.0) --- updated-dependencies: - dependency-name: Swatinem/rust-cache dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-12-01 10:02:54 +00:00
bors[bot]	5e754b3ee0	Merge #708 708: Reduce memory usage of the MatchingWords structure r=ManyTheFish a=loiclec # Pull Request ## Related issue Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3115 ## What does this PR do? 1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x. This is done by deduplicating the `MatchingWord` values, which are heavy because of their inner DFA. The deduplication works by wrapping each `MatchingWord` in a reference-counted box and using a hash map to determine whether a `MatchingWord` DFA already exists for a certain signature, or whether a new one needs to be built. 2. Avoid the worst-case scenario of creating a `MatchingWord` for extremely long words that cannot be indexed by milli. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-11-30 17:47:34 +00:00
bors[bot]	e1612fcb01	Merge #712 712: Fix bulk facet indexing bug r=Kerollmops a=loiclec # Pull Request ## Related issue Fixes (partially, until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3165 ## What does this PR do? Fixes a bug where indexing certain numbers of filterable attribute values in bulk led to corrupted facet databases. This was due to a lossy integer conversion which would ultimately prevent entire levels of the facet database to be written into LMDB. More specifically, this change was made: ```diff - if cur_writer_len as u8 >= self.min_level_size { + if cur_writer_len >= self.min_level_size as usize { ``` I also checked other comparisons to `min_level_size` and other conversions such as `x as u8` in this part of the codebase. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2022-11-30 16:51:48 +00:00
Loïc Lecrenier	9dd4b33a9a	Fix bulk facet indexing bug	2022-11-30 14:27:36 +01:00
bors[bot]	de22116b3d	Merge #711 711: Replace deprecated gh actions r=curquiza a=pnhatminh # Pull Request ## Related issue Fixes #678 ## What does this PR do? - Replace deprecated github action command with newly defined command. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Minh Pham <minh.pham@codelink.io>	2022-11-29 09:56:22 +00:00
Minh Pham	5f78522044	Updagte	2022-11-29 10:11:38 +07:00
Gregory Conrad	87e2bc3bed	fix(reindex): reindex in a few more cases Cases: whenever searchable_fields OR user_defined_searchable_fields is modified	2022-11-28 13:12:19 -05:00
Loïc Lecrenier	61b58b115a	Don't create partial matching words for synonyms in ngrams	2022-11-28 16:32:28 +01:00
Gregory Conrad	d3182f3830	refactor: Change return type to keep consistency with others	2022-11-28 10:02:03 -05:00
bors[bot]	f698e6cfdf	Merge #707 707: Add all_obkv_to_json function r=Kerollmops a=GregoryConrad ## What does this PR do? When embedding milli in an application (other than Meilisearch), it often makes sense to not use the `displayed_attributes` functionality and instead just use milli as a full document store. Thus, this PR adds a function, `all_obkv_to_json`, to supplement the already exposed `milli::obkv_to_json` so that those embedding milli do not need to deal with `displayed_attributes` if they don't need to. ~This PR also introduces a slight breaking change: `obkv_to_json` now accepts a reference to `obkv::KvReaderU16` instead of taking ownership of it. As far as I can tell, this seems like a change for the better (`obkv_to_json` only acts upon `obkv` rather than consuming it), but I can change it back if you so desire.~ (reverted in [935a724](`935a724c57`)) ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>	2022-11-28 14:52:45 +00:00
Loïc Lecrenier	f70856bab1	Remove memory usage test that fails when many tests are run in parallel	2022-11-28 12:55:28 +01:00
Loïc Lecrenier	80588daae5	Fix compilation error in formatting benches	2022-11-28 10:27:15 +01:00
Loïc Lecrenier	e2ebed62b1	Don't create partial matching words for synonyms, split words, phrases	2022-11-28 10:20:13 +01:00
Loïc Lecrenier	8284bd760f	Relax memory ordering of operations within the test CountingAlloc	2022-11-28 10:20:13 +01:00
Loïc Lecrenier	8d0ace2d64	Avoid creating a MatchingWord for words that exceed the length limit	2022-11-28 10:20:13 +01:00
Loïc Lecrenier	86c34a996b	Deduplicate matching words	2022-11-28 10:20:13 +01:00

1 2 3 4 5 ...

2326 Commits