MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2025-06-25 16:08:29 +02:00

Author	SHA1	Message	Date
Tamo	8af8aa5a33	add a test	2023-05-03 17:41:49 +02:00
Tamo	6df2ba93a9	remove one useless txn	2023-05-03 17:41:49 +02:00
Louis Dureuil	3680a6bf1e	extract impl to a function	2023-05-03 17:41:49 +02:00
Louis Dureuil	732c52093d	Processing time without autobatching implementation	2023-05-03 17:41:48 +02:00
Louis Dureuil	05cc463fbc	Draft implementation of filter support for /delete-by-batch route	2023-05-03 17:41:48 +02:00
meili-bors[bot]	1afde4fea5	Merge #3542 3542: Refactor of the search algorithms r=dureuill a=loiclec This PR refactors a large part of the search logic (related to https://github.com/meilisearch/meilisearch/issues/3547) - The "query tree" is replaced by a "query graph", which describes the different ways in which the search query can be interpreted and precomputes the word derivations for each query term. Example: <img width="1162" alt="Screenshot 2023-02-27 at 10 26 50" src="https://user-images.githubusercontent.com/6040237/221525270-87917cc0-60d1-473f-847f-2c5a7de9e370.png"> - The control flow between the ~criterions~ ranking rules is managed in a single place instead of being independently implemented by each ranking rule. - The set of document candidates is determined greedily from the beginning. It is often referred as the "universe" in the code. - The ranking rules `proximity`, `attribute`, `typo`, and (maybe) `exactness` are or will be implemented using a K-shortest path graph algorithm. This minimises the number of database and bitmap operations we need to do to compute each ranking rule bucket. It also simplifies the code a lot since a lot of ranking rules will share a large part of their implementation. - Pointers to database values are stored in a cache to avoid searching in the LMDB databases needlessly. - The result of some roaring bitmap operations are also stored in a cache, although we'll need to measure the memory pressure this puts on the system and maybe deactivate this cache later on. - Search requests can be visually logged and debugged in tests. TODO: - [ ] Reintroduce search benchmarks - [x] Implement `disableOnWords` and `disableOnAttributes` settings of typo tolerance - [x] Implement "exhaustive number of hits - [x] Implement `attribute` ranking rule - [x] Indexing changes: split into `word_fid_docids` and `word_position_docids` (with bucketed position) - [x] Ranking rule implementations - [ ] Implement `exactness` ranking rule - [x] Initial implementation - [ ] Correct implementation when followed by `Words` - [ ] Implement `geosort` ranking rule - [ ] Add tests - [x] Typo tolerance `disableOnWords`/`disableOnAttributes` - [ ] Geosort - [x] Exactness - [ ] Attribute/Position - [ ] Interactions between ranking rules: - [x] Typo/Proximity/Attribute not preceded by Words - [x] Exactness not preceded by Words - [x] Exactness -> Words (+ check universe correctness) - [x] Exactness -> Typo, etc. - [ ] Sort -> Words (performance tests) - [ ] Attribute/Position -> Typo - [ ] Attribute/Position -> Proximity - [x] Typo -> Exactness - [x] Typo -> Proximity - [x] Proximity -> Typo - [x] Words - [x] Typo - [x] Proximity - [x] Sort - [x] Ngrams - [x] Split words - [x] Ngram + Split Words - [x] Term matching strategy - [x] Distinct attribute - [x] Phrase Search - [x] Placeholder search - [x] Highlighter - [x] Limit the number of word derivations in a search query - [x] Compute the initial universe correctly according to the terms matching strategy - [x] Implement placeholder search - [x] Get the list of ranking rules from the settings - [x] Implement `distinct` - [x] Determine what to do when one of `attribute`, `proximity`, `typo`, or `exactness` is placed before `words` - [x] Make sure the correct number of allowed typos is used for each word, including the prefix one - [x] Make sure stop words are treated correctly (e.g. correct position in query graph), including in phrases - [x] Support phrases correctly - [x] Support synonyms - [x] Support split words - [x] Support combination of ngram + split-words (e.g. `whiteh orse` -> `"white horse"`) - [x] Implement `typo` ranking rule - [x] Implement `sort` ranking rule - [x] Use existing `Search` interface to use the new search algorithms - [x] Remove old code Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>	2023-05-03 13:42:51 +00:00
Louis Dureuil	f8f190cd40	Update exactness tests following charabia camelCase tokenization	2023-05-03 14:45:09 +02:00
Louis Dureuil	3a408e8287	Increase map size for tests following charabia camelCase tokenization	2023-05-03 14:44:48 +02:00
Louis Dureuil	d3e5b10e23	fix nb of dbs	2023-05-03 14:11:20 +02:00
Louis Dureuil	1aaf24ccbf	Cargo fmt	2023-05-03 12:21:58 +02:00
Louis Dureuil	90bc230820	Merge remote-tracking branch 'origin/main' into search-refactor Conflicts \| resolution ----------\|----------- Cargo.lock \| added mimalloc Cargo.toml \| took origin/main version milli/src/search/criteria/exactness.rs \| deleted after checking it was only clippy changes milli/src/search/query_tree.rs \| deleted after checking it was only clippy changes	2023-05-03 12:19:06 +02:00
Louis Dureuil	342c4ff85d	geosort: Remove rtree unwrap	2023-05-03 09:52:16 +02:00
Tamo	c85392ce40	make the descendent geosort fast	2023-05-03 09:13:12 +02:00
Tamo	8875d24a48	deserialize the rtree only when its needed, and keep it in memory once it has been deserialized	2023-05-03 09:13:12 +02:00
Tamo	c470b67fa2	revamp the test to use execute_iterative_and_rtree_returns_the_same	2023-05-03 09:13:12 +02:00
meili-bors[bot]	c0e081cd98	Merge #3702 #3710 3702: Update charabia v0.7.2 r=curquiza a=ManyTheFish fixes #3701 fixes #3689 fixes #3285 3710: Updated messages pointing to the docs website r=curquiza a=roy9495 # Pull Request Fixes partially #3668 ## What does this PR do? - ...Any messages referencing this docs site https://docs.meilisearch.com has been changed to this docs site https://meilisearch.com/docs . Thanks. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: TATHAGATA ROY <98920199+roy9495@users.noreply.github.com>	2023-05-02 17:27:57 +00:00
Louis Dureuil	b60840ebff	Remove self.iterating from words	2023-05-02 18:54:23 +02:00
Louis Dureuil	fdc1763838	Use MultiOps for resolve_query_graph	2023-05-02 18:54:09 +02:00
Louis Dureuil	75819bc940	Remove too many arguments on resolve_maximally_reduced_query_graph	2023-05-02 18:53:40 +02:00
Louis Dureuil	7b8cc25625	rename located_query_terms_from_string -> located_query_terms_from_tokens	2023-05-02 18:53:01 +02:00
meili-bors[bot]	2be641f373	Merge #3718 3718: Fix broken README links r=curquiza a=Kerollmops This PR fixes #3708 by changing the link to the new SDKs and API Reference pages. I would like to thank `@Tommy-42,` who also found the issue. Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-05-02 16:23:38 +00:00
Clément Renault	d89d2efb7e	Change a the text of a link	2023-05-02 13:53:36 +02:00
Clément Renault	f284a9c0dd	Fix the README.md broken links	2023-05-02 13:51:50 +02:00
bors[bot]	134e7fc433	Merge #3709 3709: Add SDKs test in a CI r=Kerollmops a=curquiza Add a CI running every week to run the `nightly` docker image of Meilisearch with the most "strategic" SDKs (most used, well tested, strongly typed SDK) - meilisearch-js - instant-meilisearch - meilisearch-php - meilisearch-python - meilisearch-go - meilisearch-ruby - meilisearch-rust Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>	2023-05-02 11:22:09 +00:00
Clémentine Urquizar	0cba919228	Add SDKs test in a CI	2023-05-02 11:53:28 +02:00
Loïc Lecrenier	aa63091752	Fix bug in exact_attribute	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	58735d6d8f	Fix outdated relevancy test	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	1b514517f5	Fix bug in computation of query term at a position	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	11f814821d	Minor cleanup	2023-05-02 10:48:32 +02:00
Loïc Lecrenier	30fb1153cc	Speed up graph based ranking rule when a lot of different costs exist	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	3b2c8b9f25	Improve performance of position rr	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	2a7f9adf78	Build query graph more correctly from paths Update snapshots	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	608ceea440	Fix bug in position rr	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	79001b9c97	Improve performance of the cheapest path finder algorithm	2023-05-02 09:59:42 +02:00
Loïc Lecrenier	59b12fca87	Fix errors, clippy warnings, and add review comments	2023-04-29 11:48:11 +02:00
Loïc Lecrenier	48f5bb1693	Implements the geo-sort ranking rule	2023-04-29 11:02:16 +02:00
Loïc Lecrenier	93188b3c88	Fix indexing of word_prefix_fid_docids	2023-04-29 10:56:48 +02:00
Loïc Lecrenier	bc4efca611	Add more tests for the attribute ranking rule	2023-04-29 10:56:48 +02:00
TATHAGATA ROY	feaf25a95d	Updated messages pointing to the docs website	2023-04-28 20:52:03 +00:00
bors[bot]	414b3fae89	Merge #3571 3571: Introduce two filters to select documents with `null` and empty fields r=irevoire a=Kerollmops # Pull Request ## Related issue This PR implements the `X IS NULL`, `X IS NOT NULL`, `X IS EMPTY`, `X IS NOT EMPTY` filters that [this comment](https://github.com/meilisearch/product/discussions/539#discussioncomment-5115884) is describing in a very detailed manner. ## What does this PR do? ### `IS NULL` and `IS NOT NULL` This PR will be exposed as a prototype for now. Below is the copy/pasted version of a spec that defines this filter. - `IS NULL` matches fields that `EXISTS` AND `= IS NULL` - `IS NOT NULL` matches fields that `NOT EXISTS` OR `!= IS NULL` 1. `{"name": "A", "price": null}` 2. `{"name": "A", "price": 10}` 3. `{"name": "A"}` `price IS NULL` would match 1 `price IS NOT NULL` or `NOT price IS NULL` would match 2,3 `price EXISTS` would match 1, 2 `price NOT EXISTS` or `NOT price EXISTS` would match 3 common query : `(price EXISTS) AND (price IS NOT NULL)` would match 2 ### `IS EMPTY` and `IS NOT EMPTY` - `IS EMPTY` matches Array `[]`, Object `{}`, or String `""` fields that `EXISTS` and are empty - `IS NOT EMPTY` matches fields that `NOT EXISTS` OR are not empty. 1. `{"name": "A", "tags": null}` 2. `{"name": "A", "tags": [null]}` 3. `{"name": "A", "tags": []}` 4. `{"name": "A", "tags": ["hello","world"]}` 5. `{"name": "A", "tags": [""]}` 6. `{"name": "A"}` 7. `{"name": "A", "tags": {}}` 8. `{"name": "A", "tags": {"t1":"v1"}}` 9. `{"name": "A", "tags": {"t1":""}}` 10. `{"name": "A", "tags": ""}` `tags IS EMPTY` would match 3,7,10 `tags IS NOT EMPTY` or `NOT tags IS EMPTY` would match 1,2,4,5,6,8,9 `tags IS NULL` would match 1 `tags IS NOT NULL` or `NOT tags IS NULL` would match 2,3,4,5,6,7,8,9,10 `tags EXISTS` would match 1,2,3,4,5,7,8,9,10 `tags NOT EXISTS` or `NOT tags EXISTS` would match 6 common query : `(tags EXISTS) AND (tags IS NOT NULL) AND (tags IS NOT EMPTY)` would match 2,4,5,8,9 ## What should the reviewer do? - Check that I tested the filters - Check that I deleted the ids of the documents when deleting documents Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-04-27 13:14:00 +00:00
Loïc Lecrenier	899baa0ea5	Update forgotten snapshot from previous commit	2023-04-27 13:43:04 +02:00
Loïc Lecrenier	374095d42c	Add tests for stop words and fix a couple of bugs	2023-04-27 13:30:09 +02:00
Loïc Lecrenier	dd007dceca	Merge pull request #3703 from meilisearch/search-refactor-test-typo-tolerance Search refactor test typo tolerance + some bugfixes	2023-04-27 11:01:35 +02:00
bors[bot]	3ae587205c	Merge #3464 3464: Remove CLI changes for clippy r=curquiza a=dureuill # Pull Request ## Related issue Reverts #3434, which was linked to https://github.com/rust-lang/rust-clippy/issues/10087, as putting the lint in the pedantic group [is being uplifted to Rust 1.67.1](https://github.com/rust-lang/rust/pull/107743#issue-1573438821) (my thanks to everyone involved in this work 🎉). ## Motivation - Using "standard issue" clippy in the CI spares our contributors and us from knowing/remembering to add the lint when running clippy locally - In particular, spares us from configuring tools like rust-analyzer to take the lint into account. - Should this lint come back in another form in the future, we won't blindly ignore it, and we will be able to reassess it, which will be good wrt writing idiomatic Rust. By the time this occurs, lints might be configurable through `clippy.toml` too, which would make disabling one globally much more convenient if needs be. ## Note We should wait for the release of Rust 1.67.1 and its propagation to our CI before merging this. The PR won't pass CI before this. Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-04-26 17:36:56 +00:00
ManyTheFish	1bf2694604	Update cargo lock	2023-04-26 17:41:29 +02:00
Louis Dureuil	ed9cc1af55	Remove CLI changes for clippy	2023-04-26 17:04:09 +02:00
Louis Dureuil	b41a6cbd7a	Check sort criteria also in placeholder search	2023-04-26 16:28:17 +02:00
Louis Dureuil	c8af572697	Add tests for exact words and exact attributes	2023-04-26 16:13:01 +02:00
ManyTheFish	249053e514	Update feature flags	2023-04-26 14:59:25 +02:00
ManyTheFish	ff2cf2a5ae	Update charabia in milli	2023-04-26 14:56:54 +02:00

1 2 3 4 5 ...

7927 Commits