MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2024-11-18 02:44:25 +01:00

Author	SHA1	Message	Date
Tamo	68e30214ca	remove the feature flag and reorganize the module slightly	2023-02-23 12:26:21 +01:00
bors[bot]	b985b96e4e	Merge #3530 3530: Fix highlighter bug r=Kerollmops a=ManyTheFish # Pull Request There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag. ## Related issue Fixes #3517 Fixes #3526 ## What does this PR do? - add a test showcasing the bug - fix the bug by activating the char_map creation of the tokenizer during the highlighting process Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-02-23 10:59:43 +00:00
Louis Dureuil	71e7900c67	move index_map to file	2023-02-23 11:29:11 +01:00
Louis Dureuil	431782f3ee	Move index_mapper to mod.rs	2023-02-23 11:29:11 +01:00
Louis Dureuil	3db613ff77	Don't iterate all indexes manually	2023-02-23 11:29:09 +01:00
Louis Dureuil	5822764be9	Skip computing index budget in tests	2023-02-23 11:23:39 +01:00
Louis Dureuil	c63294f331	Switch to 2TiB default index size, updates documentation	2023-02-23 11:23:39 +01:00
Louis Dureuil	a529bf160c	Compute budget	2023-02-23 11:23:39 +01:00
Louis Dureuil	f1119f2dc2	Add dichotomic search to utils	2023-02-23 11:23:39 +01:00
Louis Dureuil	1db7d5d851	Add basic tests for index eviction and resize	2023-02-23 11:23:39 +01:00
Louis Dureuil	80b060f920	Use LRU cache	2023-02-23 11:23:39 +01:00
Louis Dureuil	fdf043580c	Add LruMap	2023-02-23 11:23:38 +01:00
bors[bot]	f62703cd67	Merge #3534 3534: Update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter r=Kerollmops a=irevoire Fixes #3533 Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-23 07:05:12 +00:00
Tamo	76f82c880d	update the csv error code from InvalidIndexCsvDelimiter to InvalidDocumentCsvDelimiter	2023-02-22 19:26:48 +01:00
bors[bot]	6eeba3a8ab	Merge #3417 3417: Allow multiple searches in a single request r=irevoire a=dureuill # Pull Request ## Related issue Fixes #3427 ## What does this PR do? ### User standpoint - Adds a new `/multi-search` entry point (not to be confused with the existing `/{index_uid}/search` entry points) that accepts a POST whose body is an object containing an array of queries. - Each query must specify on which index it acts by providing its `indexUid`. Other parameters are identical to the one in the existing search routes (`q`, `limit`, etc.). - The response is a JSON object containing an array of the results for each search query as if it had been performed using the `/{index_uid}/search` routes. ### Implementation standpoint - Refactor authentication module: - Allow tenant token to be checked even without an index in URL - Add `meilisearch-auth` as a dependency to `index-scheduler` so as to have a working method of checking if the indexes are authorized there that takes into account both the API key and the tenant token (existing method relied on a behavior that was returning the allowed indexes from the API key as long as there weren't any tenant token) - Make `AuthFilter` an object with invariants and so its fields are now private - Use the methods of `AuthFilter` to know if an index is authorized rather than relying on its internal search rules. - Make tenant token search rules optional and `None` when the `AuthFilter` was not built with a tenant token. - Add a new `routes::index::search::multiple_search` module containing a post handler that performs the same work as the existing `routes::index::search` post handler, but in a loop. - Add various tests - Add authentication test suite ### Sample request <details> <summary> Click to see request/response </summary> ```json ~/datasets ❯ curl \ -X POST 'http://localhost:7700/multi-search' \ -H 'Content-Type: application/json' \ --data-binary '{"queries": [{ "indexUid": "index-0", "q": "toto", "limit": 1 }, {"indexUid": "index-1", "q": "titi", "limit": 1}]}' \| jsonxf { "results": [ { "indexUid": "index-0", "hits": [ { "id": 20480, "title": "Toto - 25th Anniversary - Live in Amsterdam", "overview": "Filmed in High Definition in Amsterdam on Toto's 25th Anniversary Tour in 2003, this stunning concert captures the band at their very best, reunited with original vocalist Bobby Kimball. The set combines all their hits with tracks from their latest album \"Through the Looking Glass\" and other live favorites, performed in front of a wildly enthusastic sell-out crowd. Extras include 35 minute behind-the-scenes film following the band through various stages of their world tour including footage from Japan, Thailand, South Korea, and France. Toto celebrate their 25th anniversary with this blistering live concert, filmed in Amsterdam on February 25th, 2003. Proving they've still got exactly what it takes to move a crowd, the band perform a mixture of medley's, solo spots, and huge hits. Tracks include \"Rosanna,\" \"Africa,\" \"Hold The Line,\" a cover of the Beatles' \"While My Guitar Gently Weeps,\" and many more.", "genres": [ "Music" ], "poster": "https://image.tmdb.org/t/p/w500/7SCbUPwoB8Z7VUIA1Rn1WWwjNiT.jpg", "release_date": 1064275200 } ], "query": "toto", "processingTimeMs": 1, "limit": 1, "offset": 0, "estimatedTotalHits": 17 }, { "indexUid": "index-1", "hits": [ { "id": 41212, "title": "Titicut Follies", "overview": "The film is a stark and graphic portrayal of the conditions that existed at the State Prison for the Criminally Insane at Bridgewater, Massachusetts. TITICUT FOLLIES documents the various ways the inmates are treated by the guards, social workers and psychiatrists.", "genres": [ "Documentary" ], "poster": "https://image.tmdb.org/t/p/w500/2Ju5hn1ofOPeP1eRJtQWakiHuhW.jpg", "release_date": -70934400 } ], "query": "titi", "processingTimeMs": 0, "limit": 1, "offset": 0, "estimatedTotalHits": 7 } ]} ``` </details> ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-02-22 17:23:45 +00:00
ManyTheFish	28d6a4466d	Make the tokenizer creating a char map during highlighting	2023-02-22 17:43:10 +01:00
Louis Dureuil	1ba2fae3ae	multi-search/authentication: Add authentication tests	2023-02-22 17:04:12 +01:00
Louis Dureuil	28d6ab78de	multi-search: Add multi search tests	2023-02-22 17:04:12 +01:00
Louis Dureuil	3ba5dfb6ec	multi-search: Add test server search method for multi search	2023-02-22 17:04:12 +01:00
Louis Dureuil	a23fbf6c7b	multi-search: Add search with an array of indexes	2023-02-22 17:04:12 +01:00
Louis Dureuil	596a98f7c6	multi-search: Add basic analytics	2023-02-22 16:37:18 +01:00
Louis Dureuil	14c4a222da	Authentication: AuthFilter::allow_index_creation both check that the index is authorized and the IndexCreate action	2023-02-22 16:37:13 +01:00
Louis Dureuil	690bb2e5cc	Authentication: Make allow_index_creation a private field	2023-02-22 16:35:52 +01:00
Louis Dureuil	d0f2c9c72e	Authentication: Make search_rules optional in AuthFilter	2023-02-22 16:35:52 +01:00
Louis Dureuil	42577403d8	Authentication: Directly pass the authfilter to the index scheduler	2023-02-22 16:35:52 +01:00
Louis Dureuil	c8c5944094	Authentication: is_index_authorized takes into account API key indexes even with a tenant token	2023-02-22 16:35:52 +01:00
Louis Dureuil	4b65851793	Authentication: Refactor authentication check to work for tenant token even without an index in URL Callers need to manually check `is_index_authorized` when using the route without an index in URL	2023-02-22 16:35:51 +01:00
Louis Dureuil	10d4a1a9af	Make ResponseError code and message pub so that they can be modified	2023-02-22 16:35:51 +01:00
ManyTheFish	ad35edfa32	Add test	2023-02-22 15:47:15 +01:00
Tamo	033417e9cc	add an analytics on the geo bounding box feature	2023-02-22 15:35:26 +01:00
bors[bot]	ac5a1e4c4b	Merge #3423 3423: Add min and max facet stats r=dureuill a=dureuill # Pull Request ## Related issue Fixes #3426 ## What does this PR do? ### User standpoint - When using a `facets` parameter in search, the facets that have numeric values are displayed in a new section of the response called `facetStats` that contains, per facet, the numeric min and max value of the hits returned by the search. <details> <summary> Sample request/response </summary> ```json ❯ curl \ -X POST 'http://localhost:7700/indexes/meteorites/search?facets=mass' \ -H 'Content-Type: application/json' \ --data-binary '{ "q": "LL6", "facets":["mass", "recclass"], "limit": 5 }' \| jsonxf { "hits": [ { "name": "Niger (LL6)", "id": "16975", "nametype": "Valid", "recclass": "LL6", "mass": 3.3, "fall": "Fell" }, { "name": "Appley Bridge", "id": "2318", "nametype": "Valid", "recclass": "LL6", "mass": 15000, "fall": "Fell", "_geo": { "lat": 53.58333, "lng": -2.71667 } }, { "name": "Athens", "id": "4885", "nametype": "Valid", "recclass": "LL6", "mass": 265, "fall": "Fell", "_geo": { "lat": 34.75, "lng": -87.0 } }, { "name": "Bandong", "id": "4935", "nametype": "Valid", "recclass": "LL6", "mass": 11500, "fall": "Fell", "_geo": { "lat": -6.91667, "lng": 107.6 } }, { "name": "Benguerir", "id": "30443", "nametype": "Valid", "recclass": "LL6", "mass": 25000, "fall": "Fell", "_geo": { "lat": 32.25, "lng": -8.15 } } ], "query": "LL6", "processingTimeMs": 15, "limit": 5, "offset": 0, "estimatedTotalHits": 42, "facetDistribution": { "mass": { "110000": 1, "11500": 1, "1161": 1, "12000": 1, "1215.5": 1, "127000": 1, "15000": 1, "1676": 1, "1700": 1, "1710.5": 1, "18000": 1, "19000": 1, "220000": 1, "2220": 1, "22300": 1, "25000": 2, "265": 1, "271000": 1, "2840": 1, "3.3": 1, "3000": 1, "303": 1, "32000": 1, "34000": 1, "36.1": 1, "45000": 1, "460": 1, "478": 1, "483": 1, "5500": 2, "600": 1, "6000": 1, "67.8": 1, "678": 1, "680.5": 1, "6930": 1, "8": 1, "8300": 1, "840": 1, "8400": 1 }, "recclass": { "L/LL6": 3, "LL6": 39 } }, "facetStats": { "mass": { "min": 3.3, "max": 271000.0 } } } ``` </details> ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-02-22 13:06:43 +00:00
curquiza	3eb9a08b5c	Update comments in version bump CI	2023-02-21 19:14:59 +01:00
ManyTheFish	900bae3d9d	keep phrases that has at least one word	2023-02-21 18:16:51 +01:00
ManyTheFish	28b7d73d4a	Remove an unefficient part of a test on milli	2023-02-21 18:16:51 +01:00
ManyTheFish	6841f167b4	Add test	2023-02-21 18:02:52 +01:00
bors[bot]	c88b6f331f	Merge #3482 3482: Optimize meilisearch uffizzi build r=curquiza a=waveywaves # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3476 ## What does this PR do? even though docker cache was being used earlier for uffizzi builds, seems like the cache layers weren't persisting. This commit adds changes to move meilisearch building outside the dockerfile so that we can use the rust cache action. We are also building to the musl target so that the binary for meilisearch which is created can be used for the uffizzi ttyd image which uses alpine. Meilisearch build time brought to 5 mins example https://github.com/waveywaves/meilisearch/actions/runs/4142776058 we also update the version of uffizzi action used here which fixes another uffizzi bug where the environments are not deployed. https://app.uffizzi.com/github.com/waveywaves/meilisearch/pull/2 was built as a part of a test for this PR and we can be sure that the deployment works well now. ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Vibhav Bobade <vibhav.bobde@gmail.com>	2023-02-21 16:06:53 +00:00
Vibhav Bobade	09a94e0db3	optimize meilisearch uffizzi build even though docker cache was being used earlier for uffizzi builds, seems like the cache layers weren't persisting. This commit adds changes to move meilisearch building outside the dockerfile so that we can use the rust cache action. We are also building to the musl target so that the binary for meilisearch which is created can be used for the uffizzi ttyd image which uses alpine.	2023-02-21 17:25:28 +05:30
bors[bot]	39407885c2	Merge #3347 3347: Enhance language detection r=irevoire a=ManyTheFish ## Summary Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query. This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search. ## Detail - Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected - Fill the newly created database during indexing - Create an allow-list with this database and pass it to Charabia - Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese ## Related issues Fixes #2403 Fixes #3513 Co-authored-by: f3r10 <frledesma@outlook.com> Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Many the fish <many@meilisearch.com>	2023-02-21 10:52:13 +00:00
bors[bot]	a3e41ba33e	Merge #3496 3496: Fix metrics feature r=irevoire a=james-2001 # Pull Request ## Related issue Resolves: #3469 See also: #2763 ## What does this PR do? As reported the metrics feature was broken by still using and old reference to `meilisearch_auth::actions`. This commit switches to the new location, `meilisearch_types::keys::actions`. The original issue was not that clear as to exactly what was broken, and the build logs have disappeared, but it seemed to just be this one line fix. If this is not the case and I've missed the mark let me know, and i'll head back to the drawing board. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Co-authored-by: James <james.a.may.2001@gmail.com>	2023-02-21 10:13:11 +00:00
James	ce807d760b	Fix formatting issue on Opt struct tab in enable_metrics_route to fix cargo fmt issues Resolves: #3469 See also: #2763	2023-02-21 09:45:18 +00:00
ManyTheFish	bbecab8948	fix clippy	2023-02-21 10:18:44 +01:00
James	5cff435bf6	Add feature flags to Opt structure Resolves: #3469 See also: #2763	2023-02-21 07:41:41 +00:00
ManyTheFish	8aa808d51b	Merge branch 'main' into enhance-language-detection	2023-02-20 18:14:34 +01:00
bors[bot]	1e9ac00800	Merge #3505 3505: Csv delimiter r=irevoire a=irevoire Fixes https://github.com/meilisearch/meilisearch/issues/3442 Closes https://github.com/meilisearch/meilisearch/pull/2803 Specified in https://github.com/meilisearch/specifications/pull/221 This PR is a reimplementation of https://github.com/meilisearch/meilisearch/pull/2803, on the new engine. Thanks for your idea and initial PR `@MixusMinimax;` sorry I couldn’t update/merge your PR. Way too many changes happened on the engine in the meantime. Attention to reviewer; I had to update deserr to implement the support of deserializing `char`s ------- It introduces four new error messages; - Invalid value in parameter csvDelimiter: expected a string of one character, but found an empty string - Invalid value in parameter csvDelimiter: expected a string of one character, but found the following string of 5 characters: doggo - csv delimiter must be an ascii character. Found: 🍰 - The Content-Type application/json does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type text/csv. And one error code; - `invalid_index_csv_delimiter` The `invalid_content_type` error code is now also used when we encounter the `csvDelimiter` query parameter with a non-csv content type. Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 17:01:36 +00:00
bors[bot]	b08a49a16e	Merge #3319 #3470 3319: Transparently resize indexes on MaxDatabaseSizeReached errors r=Kerollmops a=dureuill # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/discussions/3280, depends on https://github.com/meilisearch/milli/pull/760 ## What does this PR do? ### User standpoint - Meilisearch no longer fails tasks that encounter the `milli::UserError(MaxDatabaseSizeReached)` error. - Instead, these tasks are retried after increasing the maximum size allocated to the index where the failure occurred. ### Implementation standpoint - Add `Batch::index_uid` to get the `index_uid` of a batch of task if there is one - `IndexMapper::create_or_open_index` now takes an additional `size` argument that allows to (re)open indexes with a size different from the base `IndexScheduler::index_size` field - `IndexScheduler::tick` now returns a `Result<TickOutcome>` instead of a `Result<usize>`. This offers more explicit control over what the behavior should be wrt the next tick. - Add `IndexStatus::BeingResized` that contains a handle that a thread can use to await for the resize operation to complete and the index to be available again. - Add `IndexMapper::resize_index` to increase the size of an index. - In `IndexScheduler::tick`, intercept task batches that failed due to `MaxDatabaseSizeReached` and resize the index that caused the error, then request a new tick that will eventually handle the still enqueued task. ## Testing the PR The following diff can be applied to this branch to make testing the PR easier: <details> ```diff diff --git a/index-scheduler/src/index_mapper.rs b/index-scheduler/src/index_mapper.rs index 553ab45a..022b2f00 100644 --- a/index-scheduler/src/index_mapper.rs +++ b/index-scheduler/src/index_mapper.rs `@@` -228,13 +228,15 `@@` impl IndexMapper { drop(lock); + std:🧵:sleep_ms(2000); + let current_size = index.map_size()?; let closing_event = index.prepare_for_closing(); - log::info!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resizing index {} from {} to {} bytes", name, current_size, current_size * 2); closing_event.wait(); - log::info!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); + log::error!("Resized index {} from {} to {} bytes", name, current_size, current_size * 2); let index_path = self.base_path.join(uuid.to_string()); let index = self.create_or_open_index(&index_path, None, 2 * current_size)?; `@@` -268,8 +270,10 `@@` impl IndexMapper { match index { Some(Available(index)) => break index, Some(BeingResized(ref resize_operation)) => { + log::error!("waiting for resize end"); // Deadlock: no lock taken while doing this operation. resize_operation.wait(); + log::error!("trying our luck again!"); continue; } Some(BeingDeleted) => return Err(Error::IndexNotFound(name.to_string())), diff --git a/index-scheduler/src/lib.rs b/index-scheduler/src/lib.rs index 11b17d05..242dc095 100644 --- a/index-scheduler/src/lib.rs +++ b/index-scheduler/src/lib.rs `@@` -908,6 +908,7 `@@` impl IndexScheduler { /// /// Returns the number of processed tasks. fn tick(&self) -> Result<TickOutcome> { + log::error!("ticking!"); #[cfg(test)] { *self.run_loop_iteration.write().unwrap() += 1; diff --git a/meilisearch/src/main.rs b/meilisearch/src/main.rs index 050c825a..63f312f6 100644 --- a/meilisearch/src/main.rs +++ b/meilisearch/src/main.rs `@@` -25,7 +25,7 `@@` fn setup(opt: &Opt) -> anyhow::Result<()> { #[actix_web::main] async fn main() -> anyhow::Result<()> { - let (opt, config_read_from) = Opt::try_build()?; + let (mut opt, config_read_from) = Opt::try_build()?; setup(&opt)?; `@@` -56,6 +56,8 `@@` We generated a secure master key for you (you can safely copy this token): _ => (), } + opt.max_index_size = byte_unit::Byte::from_str("1MB").unwrap(); + let (index_scheduler, auth_controller) = setup_meilisearch(&opt)?; #[cfg(all(not(debug_assertions), feature = "analytics"))] ``` </details> Mainly, these debug changes do the following: - Set the default index size to 1MiB so that index resizes are initially frequent - Turn some logs from info to error so that they can be displayed with `--log-level ERROR` (hiding the other infos) - Add a long sleep between the beginning and the end of the resize so that we can observe the `BeingResized` index status (otherwise it would never come up in my tests) ## Open questions - Is the growth factor of x2 the correct solution? For a `Vec` in memory it makes sense, but here we're manipulating quantities that are potentially in the order of 500GiBs. For bigger indexes it may make more sense to add at most e.g. 100GiB on each resize operation, avoiding big steps like 500GiB -> 1TiB. ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! 3470: Autobatch addition and deletion r=irevoire a=irevoire This PR adds the capability to meilisearch to batch document addition and deletion together. Fix https://github.com/meilisearch/meilisearch/issues/3440 -------------- Things to check before merging; - [x] What happens if we delete multiple time the same documents -> add a test - [x] If a documentDeletion gets batched with a documentAddition but the index doesn't exist yet? It should not work Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 15:00:19 +00:00
ManyTheFish	23f4e82b53	Add test ensuring that Meilisearch works on kanji only requests	2023-02-20 15:43:29 +01:00
Many the fish	119e6d8811	Update milli/src/search/mod.rs Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 15:33:10 +01:00
bors[bot]	a8f6f108e0	Merge #3515 3515: Consider null as a valid geo field r=irevoire a=irevoire Fix #3497 Associated spec; https://github.com/meilisearch/specifications/pull/222 Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-20 14:12:55 +00:00
Tamo	1479050f7a	apply review suggestions	2023-02-20 14:53:37 +01:00
bors[bot]	97b8c32e22	Merge #3514 3514: Bump version of mini-dashboard to v0.2.6 r=irevoire a=bidoubiwa Update the version of the mini-dashboard to v0.2.6. See [release notes](https://github.com/meilisearch/mini-dashboard/releases/tag/v0.2.6). Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>	2023-02-20 13:21:00 +00:00

... 50 51 52 53 54 ...

10042 Commits