MeiliSearch

mirror of https://github.com/meilisearch/MeiliSearch synced 2024-12-03 10:05:45 +01:00

Author	SHA1	Message	Date
ManyTheFish	1b7c164a55	Pass the semantic ratio to milli	2023-12-14 16:08:42 +01:00
ManyTheFish	f3f3944469	Fix error checking	2023-12-14 16:08:42 +01:00
ManyTheFish	93dcbf598d	Deserialize semantic ratio	2023-12-14 16:08:42 +01:00
ManyTheFish	9991152bbe	Add TODOs	2023-12-14 16:08:42 +01:00
Louis Dureuil	3c1a14f1cd	Add settings routes	2023-12-14 16:08:42 +01:00
Louis Dureuil	e0cc775dc4	Various changes - DistributionShift in Search object (to be set from model in embed?) - Fix issue where embedder index wasn't computed at search time - Accept as default embedder either the "default" one, or the only embedder when there is only one	2023-12-14 16:08:41 +01:00
Louis Dureuil	12940d79a9	WIP - manual embedder - multi embedders OK - clippy + tests OK	2023-12-14 16:08:41 +01:00
Louis Dureuil	922a640188	WIP multi embedders fixed template bugs	2023-12-14 16:08:41 +01:00
Louis Dureuil	65e49b7092	Remove stuff, add distribution shift (WIP)	2023-12-14 16:08:38 +01:00
Louis Dureuil	13c2c6c16b	Small commit to add hybrid search and autoembedding	2023-12-14 16:07:48 +01:00
ManyTheFish	35e1981488	Remove proximityPrecision form the experimental feature	2023-12-14 15:52:42 +01:00
Clément Renault	99fec27788	Make the --max-number-of-batched-tasks argument experimental	2023-12-12 10:55:39 +01:00
Clément Renault	7e259cb0d2	Expose the --max-number-of-batched-tasks argument	2023-12-11 16:08:39 +01:00
ManyTheFish	1f4fc9c229	Make the feature experimental	2023-12-06 15:49:05 +01:00
ManyTheFish	8cc3c54117	Add proximityPrecision setting in settings route	2023-12-06 15:49:05 +01:00
meili-bors[bot]	6376c342c1	Merge #4223 4223: Update to heed 0.20 r=dureuill a=Kerollmops This PR brings the v0.20-alpha.9 version of heed into Meilisearch 🎉 The main goal is to test it in a real environment to make the necessary changes if needed. We also want to merge it as soon as possible during the pre-release phase to ensure we catch bugs before the release. Most of the calls to heed are the same as before, except: - The `PolyDatabase` has been replaced with a `Database<Unspecified, Unspecified>`. We replaced the `get<T, U>()` by a `remap<T, U>().get()` calls. - The `Database` `append(...)` method has been replaced with a `put_with_flags(PutFlags::APPEND, ...)`. - The `RwTxn<'e, 'p>` has been simplified into a `RwTxn<'e>`. - The `BytesEncode/Decode` traits return a `Result<_, BoxedError>` instead of an `Option<_>`. - We no longer need to wrap and unwrap the `BEU32` integer when storing/getting them from heed. ### TODO - [x] Create actual, simple error types instead of using strings in the codecs. ### Follow-up work - Move the codecs into another member crate (we depend on the uuid one in the meilitool crate). - Display the internal decoding error in the `SerializationError` internal error variant. Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-11-28 13:39:44 +00:00
Clément Renault	5b563f872b	Move the clippy attribute on the problematic part of the code	2023-11-28 14:37:58 +01:00
Clément Renault	1575456594	Further reduce an async block	2023-11-28 14:23:32 +01:00
Clément Renault	d32eb11329	Move to the v0.20.0-alpha.9 of heed	2023-11-27 11:52:22 +01:00
karribalu	85626cff8e	Fixed payload limit setting being ignored for delete documents by batch route	2023-11-25 18:41:16 +00:00
Clément Renault	0dbf1a16ff	Make clippy happy	2023-11-23 14:11:38 +01:00
Clément Renault	e507ef5932	Slow the logging down	2023-11-01 13:49:32 +01:00
Clément Renault	dfab6293c9	Use an LMDB database to store the external documents ids	2023-10-30 11:41:23 +01:00
Louis Dureuil	cf8dad1ca0	index_scheduler.features() is no longer fallible	2023-10-23 10:38:56 +02:00
bwbonanno	d8c649b3cd	Return recoverable error if we fail to retrieve metrics state	2023-10-18 08:28:24 -07:00
bwbonanno	2b3adef796	Use index_scheduler from configured app_data in middleware	2023-10-17 08:17:13 -07:00
bwbonanno	956cfc5487	Add runtime check to metrics middleware	2023-10-16 13:48:57 -07:00
bwbonanno	12fc878640	Merge remote-tracking branch 'origin/main' into enable-metrics-http	2023-10-16 13:48:01 -07:00
bwbonanno	689ec7c7ad	Make the experimental route /metrics activable via HTTP	2023-10-13 22:12:54 +00:00
Kerollmops	513e61e9a3	Remove the experimental CLI flag	2023-10-13 13:11:29 +02:00
Kerollmops	58db8d85ec	Add the `exportPuffinReports` option to the runtime features route	2023-10-13 13:11:29 +02:00
Clément Renault	656dadabea	Expose an experimental flag to write the puffin reports to disk	2023-10-13 13:11:09 +02:00
Clément Renault	c5f7893fbb	Remove the puffin http dependency	2023-10-13 13:11:08 +02:00
meili-bors[bot]	86b314626d	Merge #4080 4080: Bring back changes from v1.4.0 into main r=Kerollmops a=curquiza Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com> Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: curquiza <curquiza@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: curquiza <clementine@meilisearch.com> Co-authored-by: Vivek Kumar <vivek.26@outlook.com> Co-authored-by: dogukanakkaya <doguakkaya27@hotmail.com>	2023-09-26 08:13:49 +00:00
Tamo	d429e7da99	make clippy happy	2023-09-21 17:41:12 +02:00
Tamo	584b772248	enable metrics in debug builds	2023-09-21 17:01:05 +02:00
Vivek Kumar	4f902490b9	struct destructuring for DocumentsFetchAggregator	2023-09-12 10:39:28 +05:30
Vivek Kumar	1faee92748	struct destructuring for HealthAggregator	2023-09-12 10:39:28 +05:30
Vivek Kumar	5831466525	struct destructuring for DocumentsDeletionAggregator and TasksAggregator	2023-09-12 10:39:28 +05:30
Vivek Kumar	3cdb3e4eaf	struct destructuring for DocumentsAggregator	2023-09-12 10:39:27 +05:30
Vivek Kumar	26f34ec7a2	struct destructuring for FacetSearchAggregator	2023-09-12 10:39:27 +05:30
Vivek Kumar	07d36180ad	struct destructuring for MultiSearchAggregator	2023-09-12 10:39:27 +05:30
Vivek Kumar	4c641b79a2	use rust struct destructuring for SearchAggregator	2023-09-12 10:39:27 +05:30
Tamo	e8c9367686	implement the snapshots on demand	2023-09-11 12:35:57 +02:00
Tamo	66aa682e23	Register the swap indexe task in a spawn blocking to be sure to never block the main thread	2023-09-07 11:37:02 +02:00
meili-bors[bot]	ccf3ba3f32	Merge #4019 4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: irevoire <irevoire@users.noreply.github.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-08-28 12:14:11 +00:00
Kerollmops	c53841e166	Accept the null JSON value as the value of _vectors	2023-08-14 16:03:55 +02:00
ManyTheFish	5a7c1bde84	Fix clippy	2023-08-10 11:27:56 +02:00
ManyTheFish	4a21fecf67	Merge branch 'main' into settings-customizing-tokenization	2023-08-08 16:08:16 +02:00
ManyTheFish	ae8e69c030	Add API route for the new settings	2023-08-08 16:03:16 +02:00
ManyTheFish	b45c36cd71	Merge branch 'main' into tmp-release-v1.3.0	2023-08-01 15:05:17 +02:00
María	fae61372be	Redirect CTAs to Cloud landing page	2023-07-26 15:54:43 +02:00
ManyTheFish	9c485f8563	Make the search and the indexing work	2023-07-24 18:35:20 +02:00
ManyTheFish	d8d12d5979	Be able to set and reset settings	2023-07-24 17:00:18 +02:00
Clément Renault	0b8bbd8750	Toggle the puffin profiling with a feature flag	2023-07-18 17:38:13 +02:00
Kerollmops	eef95de30e	First iteration on exposing puffin profiling	2023-07-18 17:38:13 +02:00
Kerollmops	516d2df862	Stop computing the update files size	2023-07-18 11:51:30 +02:00
meili-bors[bot]	657f24ec5f	Merge #3907 3907: Add telemetry for define field to search on at query time r=dureuill a=ManyTheFish Add "attributes_to_search_on" telemetry usage counter: ```json "attributes_to_search_on": { "total_number_of_use": 12, }, ``` This measures the number of search queries that the user uses `attributesToSearchOn` field. related to https://github.com/meilisearch/specifications/pull/251 ## reviewers: - `@macraig` for validating the telemetry's name - `@dureuill` for validating the code Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-07-13 10:14:00 +00:00
ManyTheFish	359b90288d	Use saturating add	2023-07-13 11:38:28 +02:00
ManyTheFish	13e3f8faae	Fix typo	2023-07-13 11:34:50 +02:00
meili-bors[bot]	177e6e27f9	Merge #3901 3901: Fix experimental analytics r=curquiza a=dureuill # Pull Request ## Related issue Fixes https://github.com/meilisearch/specifications/pull/250#discussion_r1253191583 ## What does this PR do? - `snake_case` instead of `camelCase` for feature fields Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-07-10 16:22:59 +00:00
meili-bors[bot]	50afe724ae	Merge #3909 3909: Effectively send the `vector.max_vector_size` telemetry r=curquiza a=Kerollmops This PR effectively aggregates and sends the `vector.max_vector_size` analytics value. Co-authored-by: Kerollmops <clement@meilisearch.com>	2023-07-10 15:44:30 +00:00
Kerollmops	012c960fad	Send the vector.max_vector_size telemetry	2023-07-10 16:50:37 +02:00
Louis Dureuil	d59e969c16	Allow a comma-separated value to the `vector` argument in GET search	2023-07-10 16:16:34 +02:00
ManyTheFish	c30a14cb97	Add telemetry	2023-07-10 13:12:12 +02:00
Louis Dureuil	106f98aa72	Add "scoring.*" analytics to multi search route	2023-07-10 11:45:43 +02:00
Louis Dureuil	bb40ce6e35	Experimental features analytics match the spec	2023-07-10 08:57:53 +02:00
meili-bors[bot]	ff192bb480	Merge #3889 3889: Display the total number of tasks matching a filter/query r=dureuill a=Kerollmops This PR returns a new field on the `/tasks` routes. The `total` field exposes the total number of tasks that matches the given filter/query. It is useful to display information on a user interface and can help understand when progress is made in processing tasks, i.e., the total number of tasks on `/tasks?statuses=succeeded` will increase over time. Fixes #3888. - [ ] Update the specs fo the `/tasks` route. ## How have I implemented it? I found it much easier to run two times the task filtering system. Once with the original `from` and `limit` parameters and a second time without. The second call will return the total number of tasks that match the query, not only the number of tasks on the current page. So far, in terms of performance, there doesn't seem to be any issue. I tried different filters with something like 250k tasks. Note that there is a limit of 1M tasks in the queue. Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-07-06 10:23:09 +00:00
Clément Renault	86b834c9e4	Display the total number of tasks in the tasks route	2023-07-06 10:05:18 +02:00
meili-bors[bot]	886c8bb647	Merge #3891 3891: Fix the way we compute the 99th percentile r=dureuill a=Kerollmops This PR fixes how we compute the 99th percentile by avoiding using float and doing the multiplication and divisions in the correct order avoiding going out of the buffer of timings. You can see the issue on [this rust playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021). When there are a very small number of successful requests, the number is so tiny that the 99th percentile calculus sometimes gives an index out of the buffer. In this example, the `1`/`1.0` represent the number of timings you collected (one). As you can see, the float computation gives us the index `1.0`, with is out of a vector of only one value. This makes the engine generate a `null` value. ```rust 1 * 99 / 100 = 0 // with integers 0.99_f64 * (1.0 - 1.0) + 1.0 = 1.0 // with floats ``` Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-07-06 06:04:08 +00:00
Clément Renault	d727ebee05	Fix the way we compute the 99th percentile	2023-07-05 17:53:09 +02:00
Clément Renault	da39a7b29e	Return the right analytics	2023-07-05 17:27:51 +02:00
meili-bors[bot]	82650eaae1	Merge #3877 3877: update the total_received properties of multiple events r=dureuill a=dureuill # Pull Request ## Related issue Fixes #3814 ## What does this PR do? -fix name of `total_received` for several events Co-authored-by: Tamo <tamo@meilisearch.com>	2023-07-03 19:49:53 +00:00
Kerollmops	d1ff631df8	Replace the atty dependency with the is-terminal one	2023-07-03 18:51:42 +02:00
Tamo	202183adf8	update the total_received properties of multiple events	2023-07-03 15:57:09 +02:00
meili-bors[bot]	aae099e330	Merge #3851 3851: Expose lastUpdate and isIndexing in /stats endpoint r=dureuill a=gentcys # Pull Request ## Related issue Fixes #3843 ## What does this PR do? - expose lastUpdate in `/stats` endpoint - expose isIndex in `stats` endpoint - add a method `is_task_processing` in index-scheduler/src/lib.rs. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Cong Chen <cong.chen@ocrlabs.com> Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-07-03 13:41:04 +00:00
Louis Dureuil	5387cf1718	Don't unwrap in case of error/missing last_update field	2023-07-03 15:32:11 +02:00
meili-bors[bot]	a0df4becf4	Merge #3867 3867: Add a new link to the cloud pricing page r=curquiza a=Kerollmops This PR promotes the Cloud by adding a link to the Pricing page to the startup message! <img width="1002" alt="Capture d’écran 2023-06-29 à 17 40 22" src="https://github.com/meilisearch/meilisearch/assets/3610253/b0528c24-fcc2-43ff-a6a1-3ed91716663b"> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-07-03 11:25:26 +00:00
ManyTheFish	7a80c0dfb3	Fix invalid attributeToSearchOn error code to be consistent with the others search parameters error codes	2023-07-03 11:52:43 +02:00
Cong Chen	9859e65d2f	fix tests	2023-07-01 09:32:50 +08:00
Cong Chen	3bdf01bc1c	Fix failed test	2023-06-30 17:39:23 +08:00
Clément Renault	cab4c4d7c9	Add a UTMs to the Cloud link	2023-06-29 17:59:59 +02:00
Clément Renault	4ec08e9430	Add a new link to the cloud pricing page	2023-06-29 17:38:10 +02:00
meili-bors[bot]	661d1f90dc	Merge #3866 3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish # Pull Request Update Charabia: - enhance Japanese segmentation - enhance Latin Tokenization - words containing `_` are now properly segmented into several words - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule - fixes #3815 - fixes #3778 - fixes [product#151](https://github.com/meilisearch/product/discussions/151) > Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22` Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-29 15:24:36 +00:00
ManyTheFish	a82c49ab08	Update test	2023-06-29 15:56:36 +02:00
ManyTheFish	84845de9ef	Update Charabia	2023-06-29 15:56:32 +02:00
Clément Renault	09c5edf242	Cargo fmt	2023-06-29 14:37:18 +02:00
Clément Renault	1d8dfafd25	Add analytics when all facets are sorted by count and the number of modified ones	2023-06-29 14:33:31 +02:00
Kerollmops	b132e859f7	Make clippy happy	2023-06-29 14:33:31 +02:00
Kerollmops	9917bf046a	Move the sortFacetValuesBy in the faceting settings	2023-06-29 14:33:31 +02:00
Kerollmops	d9fea0143f	Make Clippy happy	2023-06-29 14:33:31 +02:00
Kerollmops	a385642ec3	Replace the BTreeMap by an IndexMap to return values in order	2023-06-29 14:33:31 +02:00
Kerollmops	34b2e98fe9	Expose a sortFacetValuesBy parameter to the user	2023-06-29 14:33:00 +02:00
meili-bors[bot]	34a07110de	Merge #3864 3864: Remove `/experimental-features` verbs that weren't in the PRD r=dureuill a=dureuill Removes: - POST `/experimental-features` - DELETE `/experimental-features` keeping only: - PATCH `/experimental-features` - GET `/experimental-features` The two routes that are described in the PRD. Following `@guimachiavelli's` [question](https://github.com/meilisearch/documentation/issues/2482#issuecomment-1611845372) about the POST route. Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-06-29 09:43:14 +00:00
Clément Renault	44b5b9e1a7	Improve the documentation of the FacetSearchQuery struct	2023-06-29 10:28:23 +02:00
Louis Dureuil	68356869c0	Remove `/experimental-features` verbs that weren't in the PRD	2023-06-29 10:02:55 +02:00
Louis Dureuil	605c1dd54a	Fix analytics	2023-06-28 16:41:56 +02:00
Clément Renault	3e3f73ba1e	Fix the analytics	2023-06-28 15:45:09 +02:00
Louis Dureuil	82e1f59f1e	Add attributes_to_search_on	2023-06-28 15:28:24 +02:00
Clément Renault	32f2556d22	Move the additional_search_parameters_provided analytic inside facets	2023-06-28 15:06:09 +02:00
Kerollmops	63fd10aaa5	Fix the invalid facet name field error code	2023-06-28 15:06:09 +02:00
Kerollmops	29b40295b8	Ignore unknown facet search query parameters	2023-06-28 15:06:09 +02:00
Kerollmops	904f6574bf	Make rustfmt happy	2023-06-28 15:06:08 +02:00
Kerollmops	6fb8af423c	Rename the hits and query output into facetHits and facetQuery respectively	2023-06-28 15:06:08 +02:00
Kerollmops	cb0bb399fa	Fix the error code returned when the facetName field is missing	2023-06-28 15:06:08 +02:00
Clément Renault	87e22e436a	Fix compilation issues	2023-06-28 15:01:51 +02:00
Clément Renault	55c17aa38b	Rename the SearchForFacetValues struct	2023-06-28 15:01:50 +02:00
Clément Renault	f36de2115f	Make clippy happy	2023-06-28 15:01:50 +02:00
Clément Renault	702041b7e1	Improve the returned errors from the facet-search route	2023-06-28 15:01:48 +02:00
Clément Renault	93f30e65a9	Return the correct response JSON object from the facet-search route	2023-06-28 14:58:42 +02:00
Clément Renault	893592c5e9	Send analytics about the facet-search route	2023-06-28 14:58:42 +02:00
Clément Renault	e81809aae7	Make the search for facet work	2023-06-28 14:58:41 +02:00
Kerollmops	ce7e7f12c8	Introduce the facet search route	2023-06-28 14:58:41 +02:00
meili-bors[bot]	9deeec88e0	Merge #3861 3861: Add "meilisearch" prefix to last metrics that were missing it r=Kerollmops a=dureuill # Pull Request ## Related issue Related to #3790 ## What does this PR do? - change implementation to follow the spec on metrics name - regenerate grafana dashboard from the code ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-06-28 09:28:31 +00:00
Louis Dureuil	ea68ccd034	prefix http_* metrics by meilisearch	2023-06-28 11:21:50 +02:00
meili-bors[bot]	d4f10800f2	Merge #3834 3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish ## Summary This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`: ```json { "q": "Captain Marvel", "attributesToSearchOn": ["title"] } ``` This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. ## Trying the prototype A dedicated docker image has been released for this feature: #### last prototype version: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1 ``` #### others prototype versions: ```bash docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0 ``` ## Technical Detail The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases. The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache. ### Relevancy limits Almost all ranking rules behave as expected when ordering the documents. Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it: ```rust #[actix_rt::test] async fn proximity_ranking_rule_order() { let server = Server::new().await; let index = index_with_documents( &server, &json!([ { "title": "Captain super mega cool. A Marvel story", // Perfect distance between words in an ignored attribute "desc": "Captain Marvel", "id": "1", }, { "title": "Captain America from Marvel", "desc": "a Shazam ersatz", "id": "2", }]), ) .await; // Document 2 should appear before document 1. index .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), \|response, code\| { assert_eq!(code, 200, "{}", response); assert_eq!( response["hits"], json!([ {"id": "2"}, {"id": "1"}, ]) ); }) .await; } ``` Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR. ## Related Fixes #3772 Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com>	2023-06-28 08:19:23 +00:00
meili-bors[bot]	dc293911ad	Merge #3745 3745: tests: add unit test for `PayloadTooLarge` error r=curquiza a=cymruu # Pull Request Add a unit test for the `Payload`, which verifies that a request with a payload that is too large is rejected with the appropriate message. This was requested in this PR https://github.com/meilisearch/meilisearch/pull/3739 ## Related issue https://github.com/meilisearch/meilisearch/pull/3739 ## What does this PR do? - Adds requested test ## PR checklist Please check if your PR fulfills the following requirements: - [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [ ] Have you read the contributing guidelines? - [ ] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Filip Bachul <filipbachul@gmail.com>	2023-06-27 14:58:23 +00:00
Louis Dureuil	b4b686d253	Merge all analytics events pertaining to updating the experimental features	2023-06-27 15:16:23 +02:00
Clément Renault	e69be93e42	Log warn about using both q and vector field parameters	2023-06-27 12:32:44 +02:00
Clément Renault	b2b413db12	Return all the _semanticScore values in the documents	2023-06-27 12:32:43 +02:00
Clément Renault	f3e4d70638	Send analytics about the query vector length	2023-06-27 12:32:43 +02:00
Kerollmops	eecf20f109	Introduce a new invalid_vector_store	2023-06-27 12:32:42 +02:00
Louis Dureuil	864ad2a23c	Check that vector store feature is enabled	2023-06-27 12:32:42 +02:00
Kerollmops	66fb5c150c	Rename _semanticSimilarity into _semanticScore	2023-06-27 12:32:42 +02:00
Kerollmops	7aa1275337	Display the _semanticSimilarity even if the `_vectors` field is not displayed	2023-06-27 12:32:41 +02:00
Kerollmops	737aec1705	Expose an _semanticSimilarity as a dot product in the documents	2023-06-27 12:32:41 +02:00
Kerollmops	1b2923f7c0	Return the vector in the output of the search routes	2023-06-27 12:32:40 +02:00
Clément Renault	642b0f3a1b	Expose a new vector field on the search route	2023-06-27 12:32:38 +02:00
Clément Renault	cad90e8cbc	Add a vector field to the search routes	2023-06-27 12:32:38 +02:00
Louis Dureuil	13e9b4c2e5	Add dump support	2023-06-26 16:29:43 +02:00
Louis Dureuil	cca6e47ec1	Errors when GETting metrics without the feature gate	2023-06-26 16:29:43 +02:00
Louis Dureuil	6196a53668	Gate score_details behind a runtime experimental feature flag	2023-06-26 16:29:43 +02:00
Louis Dureuil	bb6448dc2e	Compute instance features from CLI options	2023-06-26 16:29:43 +02:00
Louis Dureuil	eef9293630	New route to set some experimental features	2023-06-26 16:29:43 +02:00
ManyTheFish	114f878205	Rename restrictSearchableAttributes into attributesToSearchOn	2023-06-26 14:55:57 +02:00
ManyTheFish	461b5118bd	Add API search setting	2023-06-26 14:55:14 +02:00
Cong Chen	6d4981ec25	Expose lastUpdate and isIndexing in /stats endpoint	2023-06-23 07:24:25 +08:00
Louis Dureuil	11d32ad192	Add very light analytics for scoring	2023-06-22 12:39:14 +02:00
Louis Dureuil	da833eb095	Expose the scores and detailed scores in the API	2023-06-22 12:39:14 +02:00
Filip Bachul	9015a8e8d9	Merge branch 'main' into cymruu/payload-unit-test	2023-06-21 09:26:50 +02:00
meili-bors[bot]	c1e3cc04b0	Merge #3811 3811: Bring back changes from `release-v1.2.0` to `main` r=Kerollmops a=curquiza Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com> Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: Filip Bachul <filipbachul@gmail.com> Co-authored-by: Kerollmops <clement@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Clément Renault <clement@meilisearch.com>	2023-06-06 13:10:24 +00:00
Tamo	2acc3ec5ee	fix the type of the document deletion by filter tasks	2023-05-30 15:18:52 +02:00
Tamo	c9b65677bf	return the on disk size actually used by meilisearch	2023-05-25 18:30:30 +02:00
Tamo	35d5556f1f	prefix all the metrics by meilisearch_	2023-05-25 17:41:53 +02:00
Tamo	c433bdd1cd	add a view for the task queue in the metrics	2023-05-25 12:58:13 +02:00
Tamo	1b601f70c6	increase the bucketing of requests	2023-05-25 11:08:16 +02:00
Tamo	9111f5176f	get rid of the invalid document delete filter in favor of the invalid document filter	2023-05-24 11:53:16 +02:00
Tamo	b9dd092a62	make the details return null in the originalFilter field if no filter was provided + add a big test on the details	2023-05-24 11:48:22 +02:00
Tamo	ca99bc3188	implement the missing document filter error code when deleting documents	2023-05-24 11:29:20 +02:00
Tamo	57d53de402	Increase the number of buckets	2023-05-24 10:47:15 +02:00
meili-bors[bot]	6ce1ce77e6	Merge #3738 3738: Add analytics on the get documents resource r=dureuill a=irevoire # Pull Request ## Related issue Fixes https://github.com/meilisearch/meilisearch/issues/3737 Related spec https://github.com/meilisearch/specifications/pull/234 ## What does this PR do? Add the analytics for the following routes: - `GET` - `/indexes/:uid/documents` - `GET` - `/indexes/:uid/documents/:doc_id` - `POST` - `/indexes/:uid/documents/fetch` These analytics are aggregated between two events: - `Documents Fetched GET` - `Documents Fetched POST` That shares the same payload: Property name \| Description \| Example \| \|---------------\|-------------\|---------\| \| `requests.total_received` \| Total number of request received in this batch \| 325 \| \| `per_document_id` \| `false` \| false \| \| `per_filter` \| `true` if `POST /indexes/:indexUid/documents/fetch` endpoint was used with a filter in this batch, otherwise `false` \| false \| \| `pagination.max_limit` \| Highest value given for the `limit` parameter in this batch \| 60 \| \| `pagination.max_offset` \| Highest value given for the `offset` parameter in this batch \| 1000 \| Co-authored-by: Tamo <tamo@meilisearch.com>	2023-05-16 19:37:41 +00:00
Tamo	96da5130a4	fix the error code in case of not filterable attributes on the get / delete documents by filter routes	2023-05-16 13:56:18 +02:00
Filip Bachul	64b11f45d7	fix test name	2023-05-16 09:24:49 +02:00
Clément Renault	13f870e993	Fix typos and documentation issues	2023-05-15 15:11:45 +02:00
Kerollmops	f759ec7fad	Expose a flag to enable the MDB_WRITEMAP flag	2023-05-15 11:38:43 +02:00
Filip Bachul	e68d86d6b6	tests: add unit test for `PayloadTooLarge`	2023-05-11 20:51:10 +02:00
Filip Bachul	25209a3590	introduce `remaining` field in `Payload`	2023-05-10 20:55:18 +02:00
Filip Bachul	3064ea6495	fix: update payload_too_large error message to include human readable maximum acceptable payload size	2023-05-10 18:16:59 +02:00
Tamo	46ec8a97e9	rename the analytics according to the spec	2023-05-10 14:28:30 +02:00
Tamo	c42a65a297	Update meilisearch/src/analytics/segment_analytics.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-05-10 14:28:30 +02:00
Tamo	d08f8690d2	add analytics on the get documents resource	2023-05-10 14:28:30 +02:00
Tamo	11e394dba1	merge the document fetch and get error codes	2023-05-04 15:39:49 +02:00
Tamo	469d2f2a9c	fix the fields field of the POST fetch document API	2023-05-04 15:34:09 +02:00
Tamo	ed3dfbe729	add error codes and tests	2023-05-04 15:34:08 +02:00
Louis Dureuil	441641397b	Implement document get with filters	2023-05-04 15:32:34 +02:00
Louis Dureuil	745c1a2668	Make parse_filter pub	2023-05-04 15:31:53 +02:00
meili-bors[bot]	a95128df6b	Merge #3550 3550: Delete documents by filter r=irevoire a=dureuill # Prototype `prototype-delete-by-filter-0` Usage: A new route is available under `POST /indexes/{index_uid}/documents/delete` that allows you to delete your documents by filter. The expected payload looks like that: ```json { "filter": "doggo = bernese", } ``` It'll then enqueue a task in your task queue that'll delete all the documents matching this filter once it's processed. Here is an example of the associated details; ```json "details": { "deletedDocuments": 53, "originalFilter": "\"doggo = bernese\"" } ``` ---------- # Pull Request ## Related issue Related to https://github.com/meilisearch/meilisearch/issues/3477 ## What does this PR do? ### User standpoint - Modifies the `/indexes/{:indexUid}/documents/delete-batch` route to accept either the existing array of documents ids, or a JSON object with a `filter` field representing a filter to apply. If that latter variant is used, any document matching the filter will be deleted. ### Implementation standpoint - (processing time version) Adds a new BatchKind that is not autobatchable and that performs the delete by filter - Reuse the `documentDeletion` task with a new `originalFilter` detail that replaces the `providedIds` detail. ## Example <details> <summary>Sample request, response and task result</summary> Request: ``` curl \ -X POST 'http://localhost:7700/indexes/index-10/documents/delete-batch' \ -H 'Content-Type: application/json' \ --data-binary '{ "filter" : "mass = 600"}' ``` Response: ``` { "taskUid": 3902, "indexUid": "index-10", "status": "enqueued", "type": "documentDeletion", "enqueuedAt": "2023-02-28T20:50:31.667502Z" } ``` Task log: ```json { "uid": 3906, "indexUid": "index-12", "status": "succeeded", "type": "documentDeletion", "canceledBy": null, "details": { "deletedDocuments": 3, "originalFilter": "\"mass = 600\"" }, "error": null, "duration": "PT0.001819S", "enqueuedAt": "2023-03-07T08:57:20.11387Z", "startedAt": "2023-03-07T08:57:20.115895Z", "finishedAt": "2023-03-07T08:57:20.117714Z" } ``` </details> ## Draft status - [ ] Error handling - [ ] Analytics - [ ] Do we want to reuse the `delete-batch` route in this way, or create a new route instead? - [ ] Should the filter be applied at request time or when the deletion task is processed? - The first commit in this PR applies the filter at request time, meaning that even if a document is modified in a way that no longer matches the filter in a later update, it will be deleted as long as the deletion task is processed after that update. - The other commits in this PR apply the filter only when the asynchronous deletion task is processed, meaning that documents that match the filter at processing time are deleted even if they didn't match the filter at request time. - [ ] If keeping the filter at request time, find a more elegant way to recover the user document ids from the internal document ids. The current way implemented in the first commit of this PR involves getting all the documents matching the filter, looking for the value of their primary key, and turning it into a string by copy-pasting routines found in milli... - [ ] Security consideration, if any - [ ] Fix the tests (but waiting until product questions are resolved) - [ ] Add delete by filter specific tests Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: Tamo <tamo@meilisearch.com>	2023-05-04 10:44:41 +00:00
meili-bors[bot]	e0537c3870	Merge #3720 3720: Change links of docs everywhere r=curquiza a=curquiza Completely fixes #3668 Co-authored-by: curquiza <clementine@meilisearch.com>	2023-05-04 10:07:41 +00:00
Tamo	aa7537a11e	make the autodeletion work with a fixed number of tasks and update the tests	2023-05-04 00:06:49 +02:00
Louis Dureuil	d5059520aa	Fix typo	2023-05-03 22:27:03 +02:00
Louis Dureuil	1c3642c9b2	Fix deletion per filter analytics	2023-05-03 22:26:51 +02:00
curquiza	30edba3497	Update links of the docs	2023-05-03 19:14:57 +02:00
Tamo	b5fe0b2b07	fix the details	2023-05-03 17:41:50 +02:00
Tamo	fc8c1d118d	fix the analytics	2023-05-03 17:41:50 +02:00
Tamo	0548ab9038	create and use the error code	2023-05-03 17:41:50 +02:00
Tamo	143acb9cdc	update the tests	2023-05-03 17:41:49 +02:00
Tamo	4b92f1b269	wip	2023-05-03 17:41:49 +02:00
Louis Dureuil	732c52093d	Processing time without autobatching implementation	2023-05-03 17:41:48 +02:00
Louis Dureuil	05cc463fbc	Draft implementation of filter support for /delete-by-batch route	2023-05-03 17:41:48 +02:00
TATHAGATA ROY	feaf25a95d	Updated messages pointing to the docs website	2023-04-28 20:52:03 +00:00
Kerollmops	a3cf104736	Fix the compilation	2023-04-24 17:50:58 +02:00
bors[bot]	654a3a9e19	Merge #3688 3688: Following release v1.1.1: bring back changes into `main` r=curquiza a=curquiza `@meilisearch/engine-team` ensure the changes we bring to `main` are the ones you want Co-authored-by: Louis Dureuil <louis@meilisearch.com> Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com> Co-authored-by: Tamo <tamo@meilisearch.com> Co-authored-by: dureuill <dureuill@users.noreply.github.com>	2023-04-24 11:38:23 +00:00
Tamo	b4fabce36d	update the error message + update the task db size to 20GiB with a limit at 50%	2023-04-12 18:54:11 +02:00
Tamo	be69ab320d	stops receiving tasks once the task queue is full	2023-04-12 18:54:11 +02:00
Tamo	4d308d5237	Improve the health route by ensuring lmdb is not down And refactorize slightly the auth controller.	2023-04-06 15:31:42 +02:00
Tamo	597d57bf1d	Merge branch 'main' into bring-back-changes-v1.1.0	2023-04-05 11:32:14 +02:00
Tamo	cf5145b542	Reduce the time to import a dump With this commit, for a dump containing 1M tasks we went from 3m36s to import the task queue down to 1m02s	2023-03-29 14:27:40 +02:00
bors[bot]	fb1260ee88	Merge #3568 #3569 3568: CI: Fix `publish-aarch64` job that still uses ubuntu-18.04 r=Kerollmops a=curquiza Fixes #3563 Main change - add the usage of the `ubuntu-18.04` container instead of the native `ubuntu-18.04` of GitHub actions: I had to install docker in the container. Small additional changes - remove useless `fail-fast` and unused/irrelevant matrix inputs (`build`, `linker`, `os`, `use-cross`...) - Remove useless step in job Proof of work with this CI triggered on this current branch: https://github.com/meilisearch/meilisearch/actions/runs/4366233882 3569: Enhance Japanese language detection r=dureuill a=ManyTheFish # Pull Request This PR is a prototype and can be tested by downloading [the dedicated docker image](https://hub.docker.com/layers/getmeili/meilisearch/prototype-better-language-detection-0/images/sha256-a12847de00e21a71ab797879fd09777dadcb0881f65b5f810e7d1ed434d116ef?context=explore): ```bash $ docker pull getmeili/meilisearch:prototype-better-language-detection-0 ``` ## Context Some Languages are harder to detect than others, this miss-detection leads to bad tokenization making some words or even documents completely unsearchable. Japanese is the main Language affected and can be detected as Chinese which has a completely different way of tokenization. A [first iteration has been implemented for v1.1.0](https://github.com/meilisearch/meilisearch/pull/3347) but is an insufficient enhancement to make Japanese work. This first implementation was detecting the Language during the indexing to avoid bad detections during the search. Unfortunately, some documents (shorter ones) can be wrongly detected as Chinese running bad tokenization for these documents and making possible the detection of Chinese during the search because it has been detected during the indexing. For instance, a Japanese document `{"id": 1, "name": "東京スカパラダイスオーケストラ"}` is detected as Japanese during indexing, during the search the query `東京` will be detected as Japanese because only Japanese documents have been detected during indexing despite the fact that v1.0.2 would detect it as Chinese. However if in the dataset there is at least one document containing a field with only Kanjis like: _A document with only 1 field containing only Kanjis:_ ```json { "id":4, "name": "東京特許許可局" } ``` _A document with 1 field containing only Kanjis and 1 field containing several Japanese characters:_ ```json { "id":105, "name": "東京特許許可局", "desc": "日経平均株価は26日に約8カ月ぶりに2万4000円の心理的な節目を上回った。株高を支える材料のひとつは、自民党総裁選で3選を決めた安倍晋三首相の経済政策への期待だ。恩恵が見込まれるとされる人材サービスや建設株の一角が買われている。ただ思惑が先行して資金が集まっている面は否めない。実際に政策効果を取り込む企業はどこか、なお未知数だ。" } ``` Then, in both cases, the field `name` will be detected as Chinese during indexing allowing the search to detect Chinese in queries. Therefore, the query `東京` will be detected as Chinese and only the two last documents will be retrieved by Meilisearch. ## Technical Approach The current PR partially fixes these issues by: 1) Adding a check over potential miss-detections and rerunning the extraction of the document forcing the tokenization over the main Languages detected in it. > 1) run a first extraction allowing the tokenizer to detect any Language in any Script > 2) generate a distribution of tokens by Script and Languages (`script_language`) > 3) if for a Script we have a token distribution of one of the Language that is under the threshold, then we rerun the extraction forbidding the tokenizer to detect the marginal Languages > 4) the tokenizer will fall back on the other available Languages to tokenize the text. For example, if the Chinese were marginally detected compared to the Japanese on the CJ script, then the second extraction will force Japanese tokenization for CJ text in the document. however, the text on another script like Latin will not be impacted by this restriction. 2) Adding a filtering threshold during the search over Languages that have been marginally detected in documents ## Limits This PR introduces 2 arbitrary thresholds: 1) during the indexing, a Language is considered miss-detected if the number of detected tokens of this Language is under 10% of the tokens detected in the same Script (Japanese and Chinese are 2 different Languages sharing the "same" script "CJK"). 2) during the search, a Language is considered marginal if less than 5% of documents are detected as this Language. This PR only partially fixes these issues: - ✅ the query `東京` now find Japanese documents if less than 5% of documents are detected as Chinese. - ✅ the document with the id `105` containing the Japanese field `desc` but the miss-detected field `name` is now completely detected and tokenized as Japanese and is found with the query `東京`. - ❌ the document with the id `4` no longer breaks the search Language detection but continues to be detected as a Chinese document and can't be found during the search. ## Related issue Fixes #3565 ## Possible future enhancements - Change or contribute to the Library used to detect the Language - the related issue on Whatlang: https://github.com/greyblake/whatlang-rs/issues/122 Co-authored-by: curquiza <clementine@meilisearch.com> Co-authored-by: ManyTheFish <many@meilisearch.com> Co-authored-by: Many the fish <many@meilisearch.com>	2023-03-09 15:34:35 +00:00
Many the fish	dea101e3d9	Update meilisearch/src/routes/indexes/mod.rs Co-authored-by: Louis Dureuil <louis@meilisearch.com>	2023-03-09 15:17:03 +01:00
ManyTheFish	dff2715ef3	Try removing needless collect	2023-03-09 11:28:10 +01:00
ManyTheFish	b4b859ec8c	Fix typos	2023-03-09 10:58:35 +01:00
ManyTheFish	7e2fd82e41	Use Language allow list in the highlighter	2023-03-08 12:44:16 +01:00
ManyTheFish	3092cf0448	Fix clippy errors	2023-03-08 10:53:42 +01:00
Louis Dureuil	2f5b9fbbd8	Restore contribution of the index sizes to the db size - the index size now contributes to the db size even if the index is not authorized	2023-03-07 14:05:27 +01:00
Louis Dureuil	076a3d371c	Eagerly compute stats as fallback to the cache. - Refactor all around to avoid spawning indexes more times than necessary	2023-03-06 16:57:31 +01:00
Tamo	fd5c48941a	Add cache on the indexes stats	2023-03-06 16:57:31 +01:00
bors[bot]	3d1046369c	Merge #3529 3529: Add an analytics on the geo bounding box feature r=ManyTheFish a=irevoire Fixes #3527 [The specification of the geoBoundingBox](https://github.com/meilisearch/specifications/pull/223) feature has been updated and now introduces a new analytics to follow the usage of the geoBoundingBox feature in the search requests. Co-authored-by: Tamo <tamo@meilisearch.com>	2023-03-02 11:58:39 +00:00
Louis Dureuil	d4d4702f1b	Rephrase hint message	2023-02-27 13:46:16 +01:00
Tamo	dc533584c6	Forbid the usage of the metrics route if your API key have a limitation on the indexes	2023-02-23 17:13:22 +01:00
bors[bot]	89ac1015f3	Merge #3524 3524: Update the metrics route r=irevoire a=irevoire Fixes #3523 Make the metrics available by default without a feature flag. + Rename the cli-flag to `experimental-enable-metrics`. Co-authored-by: Tamo <tamo@meilisearch.com>	2023-02-23 15:11:10 +00:00

... 2 3 4 5 6 ...

479 Commits