Commit Graph

214 Commits

Author SHA1 Message Date
Louis Dureuil
73e66d5a97
Add dummy log when calling tasks 2024-02-08 15:03:32 +01:00
Louis Dureuil
b8da117b9c
Simplify stream implementation 2024-02-08 15:03:32 +01:00
Louis Dureuil
5e52107474
better than before??? 2024-02-08 15:03:32 +01:00
Tamo
bcf1c4dae5
make it compile and runtime error 2024-02-08 15:03:32 +01:00
Tamo
50f84d43f5
init commit 2024-02-08 15:03:32 +01:00
Tamo
f76cc0806e
WIP: first draft at introducing a new log route 2024-02-08 15:03:32 +01:00
Louis Dureuil
05edd85d75
Stabilize scoreDetails 2024-02-06 11:15:19 +01:00
ManyTheFish
b79b03d4e2 Fix proximity precision telemetry 2024-01-11 13:24:26 +01:00
meili-bors[bot]
658ec6e0a4
Merge #4279
4279: Check experimental feature on setting update query rather than in the task. r=ManyTheFish a=dureuill

Improve the UX by checking for the vector store feature and returning an error synchronously when sending a setting update, rather than in the indexing task.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-12-22 11:36:12 +00:00
meili-bors[bot]
43e822e802
Merge #4238
4238: Task queue webhook r=dureuill a=irevoire

# Prototype `prototype-task-queue-webhook-1`

The prototype is available through Docker by using the following command:

```bash
docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-task-queue-webhook-1
```

# Pull Request

Implements the task queue webhook.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4236

## What does this PR do?
- Provide a new cli and env var for the webhook, respectively called `--task-webhook-url` and `MEILI_TASK_WEBHOOK_URL`
- Also supports sending the requests with a custom `Authorization` header by specifying the optional `--task-webhook-authorization-header` CLI parameter or `MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER` env variable.
- Throw an error if the specified URL is invalid
- Every time a batch is processed, send all the finished tasks into the webhook with our public `TaskView` type as a JSON Line GZIPed body.
- Add one test.

## PR checklist

### Before becoming ready to review
- [x] Add a test
- [x] Compress the data we send
- [x] Chunk and stream the data we send
- [x] Remove the unwrap in the index-scheduler when sending the data fails
- [x] The analytics are missing

### Before merging
- [x] Release a prototype



Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-12-21 14:43:46 +00:00
Louis Dureuil
ee54d3171e
Check experimental feature at query time 2023-12-21 15:26:12 +01:00
Louis Dureuil
2e4c9651df
Validate settings in route 2023-12-20 17:16:46 +01:00
Tamo
3adbc2b942
return a task view instead of a task 2023-12-19 10:35:51 +01:00
Many the fish
9e1b458010
Merge branch 'main' into change-proximity-precision-settings 2023-12-18 09:08:47 +01:00
ManyTheFish
e741bc1c62 Add proximity_precision value into the analytics 2023-12-14 16:48:06 +01:00
Louis Dureuil
87bba98bd8
Various changes
- fixed seed for arroy
- check vector dimensions as soon as it is provided to search
- don't embed whitespace
2023-12-14 16:08:42 +01:00
Louis Dureuil
217105b7da
hybrid search uses semantic ratio, error handling 2023-12-14 16:08:42 +01:00
ManyTheFish
f3f3944469
Fix error checking 2023-12-14 16:08:42 +01:00
ManyTheFish
93dcbf598d
Deserialize semantic ratio 2023-12-14 16:08:42 +01:00
Louis Dureuil
3c1a14f1cd
Add settings routes 2023-12-14 16:08:42 +01:00
Louis Dureuil
e0cc775dc4
Various changes
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
Louis Dureuil
12940d79a9
WIP
- manual embedder
- multi embedders OK
- clippy + tests OK
2023-12-14 16:08:41 +01:00
Louis Dureuil
922a640188
WIP multi embedders
fixed template bugs
2023-12-14 16:08:41 +01:00
Louis Dureuil
13c2c6c16b
Small commit to add hybrid search and autoembedding 2023-12-14 16:07:48 +01:00
ManyTheFish
35e1981488 Remove proximityPrecision form the experimental feature 2023-12-14 15:52:42 +01:00
ManyTheFish
1f4fc9c229 Make the feature experimental 2023-12-06 15:49:05 +01:00
ManyTheFish
8cc3c54117 Add proximityPrecision setting in settings route 2023-12-06 15:49:05 +01:00
Clément Renault
5b563f872b
Move the clippy attribute on the problematic part of the code 2023-11-28 14:37:58 +01:00
Clément Renault
1575456594
Further reduce an async block 2023-11-28 14:23:32 +01:00
Clément Renault
d32eb11329
Move to the v0.20.0-alpha.9 of heed 2023-11-27 11:52:22 +01:00
Clément Renault
0dbf1a16ff
Make clippy happy 2023-11-23 14:11:38 +01:00
Clément Renault
dfab6293c9
Use an LMDB database to store the external documents ids 2023-10-30 11:41:23 +01:00
Louis Dureuil
cf8dad1ca0
index_scheduler.features() is no longer fallible 2023-10-23 10:38:56 +02:00
bwbonanno
12fc878640 Merge remote-tracking branch 'origin/main' into enable-metrics-http 2023-10-16 13:48:01 -07:00
bwbonanno
689ec7c7ad Make the experimental route /metrics activable via HTTP 2023-10-13 22:12:54 +00:00
Kerollmops
58db8d85ec
Add the exportPuffinReports option to the runtime features route 2023-10-13 13:11:29 +02:00
meili-bors[bot]
86b314626d
Merge #4080
4080: Bring back changes from v1.4.0 into main r=Kerollmops a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
Co-authored-by: dogukanakkaya <doguakkaya27@hotmail.com>
2023-09-26 08:13:49 +00:00
Tamo
e8c9367686 implement the snapshots on demand 2023-09-11 12:35:57 +02:00
Tamo
66aa682e23 Register the swap indexe task in a spawn blocking to be sure to never block the main thread 2023-09-07 11:37:02 +02:00
ManyTheFish
4a21fecf67 Merge branch 'main' into settings-customizing-tokenization 2023-08-08 16:08:16 +02:00
ManyTheFish
ae8e69c030 Add API route for the new settings 2023-08-08 16:03:16 +02:00
ManyTheFish
d8d12d5979 Be able to set and reset settings 2023-07-24 17:00:18 +02:00
Kerollmops
516d2df862
Stop computing the update files size 2023-07-18 11:51:30 +02:00
meili-bors[bot]
177e6e27f9
Merge #3901
3901: Fix experimental analytics r=curquiza a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/specifications/pull/250#discussion_r1253191583

## What does this PR do?
- `snake_case` instead of `camelCase` for feature fields


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-10 16:22:59 +00:00
Louis Dureuil
d59e969c16
Allow a comma-separated value to the vector argument in GET search 2023-07-10 16:16:34 +02:00
Louis Dureuil
bb40ce6e35
Experimental features analytics match the spec 2023-07-10 08:57:53 +02:00
meili-bors[bot]
ff192bb480
Merge #3889
3889: Display the total number of tasks matching a filter/query r=dureuill a=Kerollmops

This PR returns a new field on the `/tasks` routes. The `total` field exposes the total number of tasks that matches the given filter/query. It is useful to display information on a user interface and can help understand when progress is made in processing tasks, i.e., the total number of tasks on `/tasks?statuses=succeeded` will increase over time.

Fixes #3888.

- [ ] Update the specs fo the `/tasks` route.

## How have I implemented it?

I found it much easier to run two times the task filtering system. Once with the original `from` and `limit` parameters and a second time without. The second call will return the total number of tasks that match the query, not only the number of tasks on the current page.

So far, in terms of performance, there doesn't seem to be any issue. I tried different filters with something like 250k tasks. Note that there is a limit of 1M tasks in the queue.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-07-06 10:23:09 +00:00
Clément Renault
86b834c9e4
Display the total number of tasks in the tasks route 2023-07-06 10:05:18 +02:00
Clément Renault
da39a7b29e
Return the right analytics 2023-07-05 17:27:51 +02:00
meili-bors[bot]
aae099e330
Merge #3851
3851: Expose lastUpdate and isIndexing in /stats endpoint r=dureuill a=gentcys

# Pull Request

## Related issue
Fixes #3843

## What does this PR do?
- expose lastUpdate in `/stats` endpoint
- expose isIndex in `stats` endpoint
- add a method `is_task_processing` in index-scheduler/src/lib.rs.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Cong Chen <cong.chen@ocrlabs.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-07-03 13:41:04 +00:00
Louis Dureuil
5387cf1718
Don't unwrap in case of error/missing last_update field 2023-07-03 15:32:11 +02:00
ManyTheFish
7a80c0dfb3 Fix invalid attributeToSearchOn error code to be consistent with the others search parameters error codes 2023-07-03 11:52:43 +02:00
Cong Chen
9859e65d2f fix tests 2023-07-01 09:32:50 +08:00
Cong Chen
3bdf01bc1c Fix failed test 2023-06-30 17:39:23 +08:00
Clément Renault
1d8dfafd25
Add analytics when all facets are sorted by count and the number of modified ones 2023-06-29 14:33:31 +02:00
Kerollmops
9917bf046a
Move the sortFacetValuesBy in the faceting settings 2023-06-29 14:33:31 +02:00
Kerollmops
d9fea0143f
Make Clippy happy 2023-06-29 14:33:31 +02:00
Kerollmops
34b2e98fe9
Expose a sortFacetValuesBy parameter to the user 2023-06-29 14:33:00 +02:00
meili-bors[bot]
34a07110de
Merge #3864
3864: Remove `/experimental-features` verbs that weren't in the PRD r=dureuill a=dureuill

Removes:

- POST `/experimental-features`
- DELETE `/experimental-features`

keeping only:

- PATCH `/experimental-features`
- GET `/experimental-features`

The two routes that are described in the PRD.

Following `@guimachiavelli's` [question](https://github.com/meilisearch/documentation/issues/2482#issuecomment-1611845372) about the POST route.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-06-29 09:43:14 +00:00
Clément Renault
44b5b9e1a7
Improve the documentation of the FacetSearchQuery struct 2023-06-29 10:28:23 +02:00
Louis Dureuil
68356869c0
Remove /experimental-features verbs that weren't in the PRD 2023-06-29 10:02:55 +02:00
Louis Dureuil
82e1f59f1e
Add attributes_to_search_on 2023-06-28 15:28:24 +02:00
Kerollmops
63fd10aaa5
Fix the invalid facet name field error code 2023-06-28 15:06:09 +02:00
Kerollmops
29b40295b8
Ignore unknown facet search query parameters 2023-06-28 15:06:09 +02:00
Kerollmops
cb0bb399fa
Fix the error code returned when the facetName field is missing 2023-06-28 15:06:08 +02:00
Clément Renault
87e22e436a
Fix compilation issues 2023-06-28 15:01:51 +02:00
Clément Renault
702041b7e1
Improve the returned errors from the facet-search route 2023-06-28 15:01:48 +02:00
Clément Renault
93f30e65a9
Return the correct response JSON object from the facet-search route 2023-06-28 14:58:42 +02:00
Clément Renault
893592c5e9
Send analytics about the facet-search route 2023-06-28 14:58:42 +02:00
Clément Renault
e81809aae7
Make the search for facet work 2023-06-28 14:58:41 +02:00
Kerollmops
ce7e7f12c8
Introduce the facet search route 2023-06-28 14:58:41 +02:00
meili-bors[bot]
d4f10800f2
Merge #3834
3834: Define searchable fields at runtime r=Kerollmops a=ManyTheFish

## Summary
This feature allows the end-user to search in one or multiple attributes using the search parameter `attributesToSearchOn`:

```json
{
  "q": "Captain Marvel",
  "attributesToSearchOn": ["title"]
}
```

This feature act like a filter, forcing Meilisearch to only return the documents containing the requested words in the attributes-to-search-on. Note that, with the matching strategy `last`, Meilisearch will only ensure that the first word is in the attributes-to-search-on, but, the retrieved documents will be ordered taking into account the word contained in the attributes-to-search-on. 

## Trying the prototype

A dedicated docker image has been released for this feature:

#### last prototype version:

```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-1
```

#### others prototype versions:

```bash
docker pull getmeili/meilisearch:prototype-define-searchable-fields-at-search-time-0
```

## Technical Detail

The attributes-to-search-on list is given to the search context, then, the search context uses the `fid_word_docids`database using only the allowed field ids instead of the global `word_docids` database. This is the same for the prefix databases.
The database cache is updated with the merged values, meaning that the union of the field-id-database values is only made if the requested key is missing from the cache.

### Relevancy limits

Almost all ranking rules behave as expected when ordering the documents.
Only `proximity` could miss-order documents if all the searched words are in the restricted attribute but a better proximity is found in an ignored attribute in a document that should be ranked lower. I put below a failing test showing it:
```rust
#[actix_rt::test]
async fn proximity_ranking_rule_order() {
    let server = Server::new().await;
    let index = index_with_documents(
        &server,
        &json!([
        {
            "title": "Captain super mega cool. A Marvel story",
            // Perfect distance between words in an ignored attribute
            "desc": "Captain Marvel",
            "id": "1",
        },
        {
            "title": "Captain America from Marvel",
            "desc": "a Shazam ersatz",
            "id": "2",
        }]),
    )
    .await;

    // Document 2 should appear before document 1.
    index
        .search(json!({"q": "Captain Marvel", "attributesToSearchOn": ["title"], "attributesToRetrieve": ["id"]}), |response, code| {
            assert_eq!(code, 200, "{}", response);
            assert_eq!(
                response["hits"],
                json!([
                    {"id": "2"},
                    {"id": "1"},
                ])
            );
        })
        .await;
}
```

Fixing this would force us to create a `fid_word_pair_proximity_docids` and a `fid_word_prefix_pair_proximity_docids` databases which may multiply the keys of `word_pair_proximity_docids` and `word_prefix_pair_proximity_docids` by the number of attributes in the searchable_attributes list. If we think we should fix this test, I'll suggest doing it in another PR.

## Related

Fixes #3772

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-28 08:19:23 +00:00
Louis Dureuil
b4b686d253
Merge all analytics events pertaining to updating the experimental features 2023-06-27 15:16:23 +02:00
Kerollmops
eecf20f109
Introduce a new invalid_vector_store 2023-06-27 12:32:42 +02:00
Clément Renault
cad90e8cbc
Add a vector field to the search routes 2023-06-27 12:32:38 +02:00
Louis Dureuil
cca6e47ec1
Errors when GETting metrics without the feature gate 2023-06-26 16:29:43 +02:00
Louis Dureuil
6196a53668
Gate score_details behind a runtime experimental feature flag 2023-06-26 16:29:43 +02:00
Louis Dureuil
eef9293630
New route to set some experimental features 2023-06-26 16:29:43 +02:00
ManyTheFish
114f878205 Rename restrictSearchableAttributes into attributesToSearchOn 2023-06-26 14:55:57 +02:00
ManyTheFish
461b5118bd Add API search setting 2023-06-26 14:55:14 +02:00
Cong Chen
6d4981ec25 Expose lastUpdate and isIndexing in /stats endpoint 2023-06-23 07:24:25 +08:00
Louis Dureuil
da833eb095
Expose the scores and detailed scores in the API 2023-06-22 12:39:14 +02:00
meili-bors[bot]
c1e3cc04b0
Merge #3811
3811: Bring back changes from `release-v1.2.0` to `main` r=Kerollmops a=curquiza



Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Filip Bachul <filipbachul@gmail.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-06-06 13:10:24 +00:00
Tamo
2acc3ec5ee
fix the type of the document deletion by filter tasks 2023-05-30 15:18:52 +02:00
Tamo
c9b65677bf
return the on disk size actually used by meilisearch 2023-05-25 18:30:30 +02:00
Tamo
35d5556f1f
prefix all the metrics by meilisearch_ 2023-05-25 17:41:53 +02:00
Tamo
c433bdd1cd add a view for the task queue in the metrics 2023-05-25 12:58:13 +02:00
Tamo
9111f5176f get rid of the invalid document delete filter in favor of the invalid document filter 2023-05-24 11:53:16 +02:00
Tamo
b9dd092a62 make the details return null in the originalFilter field if no filter was provided + add a big test on the details 2023-05-24 11:48:22 +02:00
Tamo
ca99bc3188 implement the missing document filter error code when deleting documents 2023-05-24 11:29:20 +02:00
Tamo
57d53de402 Increase the number of buckets 2023-05-24 10:47:15 +02:00
meili-bors[bot]
6ce1ce77e6
Merge #3738
3738: Add analytics on the get documents resource r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3737
Related spec https://github.com/meilisearch/specifications/pull/234

## What does this PR do?
Add the analytics for the following routes:
- `GET` - `/indexes/:uid/documents`
- `GET` - `/indexes/:uid/documents/:doc_id`
- `POST` - `/indexes/:uid/documents/fetch`

These analytics are aggregated between two events:
- `Documents Fetched GET`
- `Documents Fetched POST`

That shares the same payload:
 Property name | Description | Example |
|---------------|-------------|---------|
| `requests.total_received` | Total number of request received in this batch | 325 |
| `per_document_id` | `false` | false |
| `per_filter` | `true` if `POST /indexes/:indexUid/documents/fetch` endpoint was used with a filter in this batch, otherwise `false` | false |
| `pagination.max_limit` | Highest value given for the `limit` parameter in this batch | 60 |
| `pagination.max_offset` | Highest value given for the `offset` parameter in this batch | 1000 |

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-05-16 19:37:41 +00:00
Tamo
96da5130a4
fix the error code in case of not filterable attributes on the get / delete documents by filter routes 2023-05-16 13:56:18 +02:00
Tamo
d08f8690d2
add analytics on the get documents resource 2023-05-10 14:28:30 +02:00
Tamo
11e394dba1
merge the document fetch and get error codes 2023-05-04 15:39:49 +02:00
Tamo
469d2f2a9c
fix the fields field of the POST fetch document API 2023-05-04 15:34:09 +02:00
Tamo
ed3dfbe729
add error codes and tests 2023-05-04 15:34:08 +02:00
Louis Dureuil
441641397b
Implement document get with filters 2023-05-04 15:32:34 +02:00
Louis Dureuil
d5059520aa
Fix typo 2023-05-03 22:27:03 +02:00
Tamo
b5fe0b2b07
fix the details 2023-05-03 17:41:50 +02:00