Commit Graph

588 Commits

Author SHA1 Message Date
Tamo
4d5005b01a make clippy happy 2024-07-10 10:06:59 +02:00
Tamo
9feba5028d update byte-unit 2024-07-09 23:41:29 +02:00
karribalu
ea21b948b1 Address PR review changes 2024-07-09 09:18:57 +01:00
karribalu
47e526f5ea Add index exists function in index_scheduler 2024-07-08 22:27:10 +01:00
Tamo
4aa7d386d8 remove http and uses actix_web::http instead 2024-07-08 21:17:10 +02:00
Louis Dureuil
128e6c7502
Search: spans with a finer granularity 2024-07-02 16:13:53 +02:00
ManyTheFish
015d90a962 merge main 2024-07-01 11:50:36 +02:00
meili-bors[bot]
809e742253
Merge #4731
4731: Fix the missing geo distance when one or both of the lat / lng are string r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4193

## What does this PR do?
- Properly extract the lat / lng when one or both of them are string
- Add a test 


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-06-27 07:33:22 +00:00
meili-bors[bot]
decdfe03bc
Merge #4724
4724: Improve tenant token error messages r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes  #4727

## What does this PR do?
- Introduce a bunch of new error messages around tenant tokens
- Ignore the error messages in most tests that were doing for loop over multiple kinds of errors
- Introduce new tests that specifically test these error messages


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-06-27 06:47:40 +00:00
Tamo
eb292a7a62 Fix the missing geo distance when one or both of the lat / lng are string 2024-06-26 14:50:15 +02:00
Tamo
a1dcde6b9a
Update meilisearch/src/extractors/authentication/mod.rs
Co-authored-by: Many the fish <many@meilisearch.com>
2024-06-26 14:00:21 +02:00
karribalu
2608a596a0 Update error message and add tests for incomplete compressed document 2024-06-25 18:36:29 +01:00
Tamo
7da21bb601 introduce as many custom error message as possible 2024-06-25 12:40:51 +02:00
Tamo
a74fb87d1e start introducing new error messages 2024-06-24 19:00:53 +02:00
karribalu
2a38f5c757 Run Rustfmt 2024-06-21 00:14:26 +01:00
karribalu
fb683fe88b Fix bad http status and error message on wrong payload 2024-06-20 23:55:09 +01:00
meili-bors[bot]
e580d6b98f
Merge #4693
4693: Introduce distinct attributes at search time r=irevoire a=Kerollmops

This PR fixes #4611.

### To Do
- [x] Remove the `distinguishableAttributes` settings (not even a commit about that).
- [x] Use the `filterableAttributes` to be able to use the `distinct` parameter at search.
- [x] Work on the errors and make tests.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-06-18 07:45:03 +00:00
meili-bors[bot]
e9bf4c43a4
Merge #4649
4649: Don't store the vectors in the documents database r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4607

## What does this PR do?
- Ensure that anything falling under `_vectors` is NOT searchable, filterable or sortable
- [x] per embedder, add a roaring bitmap of documents that provide "userProvided" embeddings
- [x] in the indexing process in extract_vector_points, set the bit corresponding to the document depending on the "userProvided" subfield in the _vectors field.
- [x] in the document DB in typed chunks, when writing the _vectors field, remove all keys corresponding to an embedder

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-17 12:32:03 +00:00
Tamo
a8a0854421
Update meilisearch/src/analytics/segment_analytics.rs 2024-06-17 14:30:50 +02:00
Louis Dureuil
b9b938c902
Change retrieveVectors behavior:
- when the feature is disabled, documents are never modified
- when the feature is enabled and `retrieveVectors` is disabled, `_vectors` is removed from documents
- when the feature is enabled and `retrieveVectors` is enabled, vectors from the vectors DB are merged with `_vectors` in documents

Additionally `_vectors` is never displayed when the `displayedAttributes` list does not contain either `*` or `_vectors`

- fixed an issue where `_vectors` was not injected when all vectors in the dataset where always generated
2024-06-13 17:13:36 +02:00
Louis Dureuil
3bc8f81abc
user_provided => regenerate 2024-06-12 18:12:20 +02:00
Clément Renault
ee39309aae
Improve errors and introduce a new InvalidSearchDistinct error code 2024-06-11 16:03:39 -04:00
Clément Renault
0d31be1494
Make the distinct work at search 2024-06-11 11:39:35 -04:00
Tamo
600e97d9dc gate the retrieveVectors parameter behind the vectors feature flag 2024-06-10 18:26:12 +02:00
Louis Dureuil
7559dfc814
Merge tag 'v1.8.2' into release-v1.9.0 2024-06-10 15:07:34 +02:00
Louis Dureuil
50f8218a5d
Asynchronously drop permits 2024-06-10 10:19:57 +02:00
Tamo
63dded3961 implements the new analytics for the get documents routes 2024-06-06 11:39:29 +02:00
Tamo
6607875f49 add the retrieveVectors parameter to the get and fetch documents route 2024-06-06 11:39:29 +02:00
Tamo
31a793d226 fix the regeneration of the embeddings in the search 2024-06-06 11:39:29 +02:00
Tamo
d85ab23b82 rename all occurences of user_defined to user_provided for consistency 2024-06-06 11:39:29 +02:00
Tamo
49fa41ce65 apply first round of review comments 2024-06-06 11:39:29 +02:00
Tamo
376b3a19a7 makes clippy and fmt happy 2024-06-06 11:39:29 +02:00
Tamo
caad40964a implements the analytics 2024-06-06 11:39:29 +02:00
Tamo
cc5dca8321 fix two bug and add a dump test 2024-06-06 11:39:29 +02:00
Tamo
04f6523f3c expose a new parameter to retrieve the embedders at search time 2024-06-06 11:36:11 +02:00
meili-bors[bot]
fc584f1db3
Merge #4666
4666: Add a score threshold search parameter r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4609

## What does this PR do?
- See [usage](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#95b76ded400342ba9ab3d67c734836f0) and [the known limitation](https://meilisearch.notion.site/Filter-by-score-usage-224a183ce7b24ca99b6a9a8da755668a?pvs=25#e4e32195bf0e4195b5daecdbb7a97a17)


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-06-03 08:42:44 +00:00
Louis Dureuil
2b6db6541e
Changes after review 2024-06-03 10:30:00 +02:00
meili-bors[bot]
d6bd88ce4f
Merge #4667
4667: Frequency matching strategy r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #3773

## What does this PR do?
- add test for matching strategy
- implement frequency matching strategy

See the [PRD for more details](https://www.notion.so/meilisearch/Frequency-Matching-Strategy-0f3ba08833a442a39590a53a1505ab00).

[Public API](https://www.notion.so/meilisearch/frequency-matching-strategy-89868fb7fc584026bc56e378eb854a7f).


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-05-30 14:53:31 +00:00
Louis Dureuil
c2fb7afe59
fmt 2024-05-30 12:06:46 +02:00
Louis Dureuil
c36410fcbf
Analytics for ranking score threshold 2024-05-30 11:22:12 +02:00
Louis Dureuil
7ce2691374
Add ranking score threshold to similar API 2024-05-30 11:21:31 +02:00
Louis Dureuil
c26db7878c
Expose rankingScoreThreshold in API 2024-05-30 10:32:35 +02:00
ManyTheFish
6a4b2516aa WIP 2024-05-29 16:21:24 +02:00
Louis Dureuil
ca6cc4654b
Add similar route 2024-05-28 15:28:19 +02:00
Louis Dureuil
3bd9d2478c
Add error codes 2024-05-28 15:27:43 +02:00
Louis Dureuil
54b15059a0
Analytics changes 2024-05-28 15:27:43 +02:00
Louis Dureuil
e172e938e7
add search rules directly takes the filter rather than the searchquery 2024-05-28 15:22:25 +02:00
Louis Dureuil
02b3d82c60
filtered_universe accepts index and txn instead of SearchContext 2024-05-28 15:22:12 +02:00
Clément Renault
b6d450d484
Remove puffin experimental feature 2024-05-27 15:59:28 +02:00
Tamo
7e251b43d4
Revert "Stream documents" 2024-05-20 15:09:45 +02:00
Tamo
273c6e8c5c uses the latest version of heed to get rid of unsafe code 2024-05-16 18:31:32 +02:00
Tamo
c85d1752dd keep the same rtxn to compute the filters on the documents and to stream the documents later on 2024-05-16 18:31:32 +02:00
Tamo
8e6ffbfc6f stream documents 2024-05-16 18:31:32 +02:00
Tamo
7ec4e2a3fb apply all style review comments 2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d make clippy happy 2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a make the attribute ranking rule use the weights and fix the tests 2024-05-14 17:36:32 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
Tamo
3698aef66b fix warning 2024-05-06 11:36:37 +02:00
Simon Detheridge
7f5ab3cef5
Use http path pattern instead of full path in metrics 2024-05-03 12:29:31 +01:00
Louis Dureuil
5a305bfdea
Remove unused struct 2024-05-02 16:14:37 +02:00
meili-bors[bot]
c793b6ef6d
Merge #4600
4600: Fix embedders api r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #4594
Fixes #4595


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
ManyTheFish
cbbfff3594 Remove debuging prints 2024-04-25 10:37:18 +02:00
ManyTheFish
7468c1cf8d Introduce WildcardSetting that are serialized as wildcards by default 2024-04-24 18:15:03 +02:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
ManyTheFish
e87cb373de Avoid intermediate serializing when displaying settings 2024-04-24 12:33:07 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools 2024-04-22 18:09:33 +02:00
meili-bors[bot]
aa0bbbb246
Merge #4578
4578: Remove useless analytics r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes #4577

## What does this PR do?
Remove the following analytics:
- `Health Seen`
- `Stats Seen`
- `Task Seen`
- `Version Seen`


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-18 13:30:42 +00:00
Tamo
4a68e9f6ae reorganize the debug implementation of the search results and only dispaly the meaningful informations 2024-04-17 13:42:10 +02:00
Tamo
206887c7a2 update the SearchQuery Debug implementation so it’s smaller and gives the most important informations first 2024-04-17 12:57:19 +02:00
Tamo
2dd9dd6d0a remove the Health Seen analytic 2024-04-17 11:43:40 +02:00
Tamo
e1f27de51a remove the Stats Seen analytic 2024-04-16 18:49:41 +02:00
Tamo
abae31aee0 remove the Task Seen analytic 2024-04-16 18:48:10 +02:00
Tamo
70ce0095ea remove the Version Seen analytic 2024-04-16 18:48:03 +02:00
Louis Dureuil
a9013ed683
Fix comment mistake
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-04 17:21:47 +02:00
Louis Dureuil
355e5282b2
Remove _semanticScore 2024-04-04 16:04:07 +02:00
Louis Dureuil
1ff2a2d6fb
Add semanticHitCount 2024-04-04 16:04:06 +02:00
Louis Dureuil
4564a38ae7
Bail earlier when the experimental feature is not enabled 2024-04-04 15:58:19 +02:00
Louis Dureuil
6ebb6b55a6
Lazily embed, don't fail hybrid search on embedding failure 2024-04-04 15:58:17 +02:00
Louis Dureuil
190933f6e1
Breaking: Remove vector from SearchResult 2024-04-04 15:57:29 +02:00
Louis Dureuil
928e6e4c05
Breaking change: remove vector for score details 2024-04-04 15:57:29 +02:00
meili-bors[bot]
5509bafff8
Merge #4535
4535: Support Negative Keywords r=ManyTheFish a=Kerollmops

This PR fixes #4422 by supporting `-` before any word in the query.

The minus symbol `-`, from the ASCII table, is not the only character that can be considered the negative operator. You can see the two other matching characters under the `Based on "-" (U+002D)` section on [this unicode reference website](https://www.compart.com/en/unicode/U+002D).

It's important to notice the strange behavior when a query includes and excludes the same word; only the derivative ( synonyms and split) will be kept:
 - If you input `progamer -progamer`, the engine will still search for `pro gamer`.
 - If you have the synonym `like = love` and you input `like -like`, it will still search for `love`.

## TODO
 - [x] Add analytics
 - [x] Add support to the `-` operator
 - [x] Make sure to support spaces around `-` well
 - [x] Support phrase negation
 - [x] Add tests


Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-04-04 13:10:27 +00:00
meili-bors[bot]
78668584cd
Merge #4533
4533: Hide api key in settings and task queue r=dureuill a=dureuill

# Pull Request

See [Usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#117f5ff7b19f4d95bb3ae0005f6c6633)

## Motivation

See [slack discussion (internal link)](https://meilisearch.slack.com/archives/C06GQP7FQ6P/p1709804022298749)


## Changes

- The value of the `apiKey` parameter is now hidden in the settings and the details of the task queue.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-03-28 16:02:53 +00:00
meili-bors[bot]
fa9748cc99
Merge #4536
4536: Limit concurrent search requests r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4489

## What does this PR do?
- Adds a « search queue » that limits the number of search requests we can process at the same time and stores search requests to be processed
- Process only one search request per core/thread (we use available_parallelism)
- When the search queue is full, new search requests replace old ones **randomly**. The reason is that:
  - If we serve the oldest one first, like Typesense, we give the worst performances to everyone
  - If we serve the latest one, it gets too easy to DoS us (you just need to fill the queue with as many search requests as we can process simultaneously to ensure no other request will ever be processed)
  - By picking the search request randomly, we give a chance to recent search requests to be processed while ensuring that we can't be owned unless they fill our queue entirely and we start returning errors 5xx
- Adds an experimental parameter to control the size of the queue
- Adds a bunch of tests to ensure the search queue works correctly
- Ensure the loop consuming the search queue is running in the health route and crashes if it’s not the case

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-03-28 15:01:52 +00:00
Tamo
06a11b5b21
Improve error message 2024-03-27 17:34:49 +01:00
Tamo
b7c582e4f3 connect the search queue with the health route 2024-03-27 15:49:43 +01:00
Tamo
03c886ac1b adds a bit of documentation 2024-03-27 15:38:36 +01:00
Tamo
087a96d22e fix flaky test 2024-03-27 11:05:37 +01:00
meili-bors[bot]
34dfea72cc
Merge #4509
4509: Rest embedder r=ManyTheFish a=dureuill

Fixes #4531 

See [Usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42?pvs=25#e6f58c3b742c4effb4ddc625ce12ee16)

### Implementation changes

- Remove tokio, futures, reqwests
- Add a new `milli::vector::rest::Embedder` embedder
- Update OpenAI and Ollama embedders to use the REST embedder internally
- Make Embedder::embed a sync method
- Add the new embedder source as described in the usage


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-03-27 09:27:46 +00:00
Tamo
55df9daaa0 adds a comment about the safety of an operation 2024-03-26 19:34:55 +01:00
Tamo
2e36f069c2 fmt imports 2024-03-26 19:23:55 +01:00
Tamo
8f5d9f501a update the discussion link 2024-03-26 19:18:32 +01:00
Tamo
8127c9a115 handle the case of a queue of zero elements 2024-03-26 19:04:39 +01:00
Clément Renault
34262c7a0d
Add analytics for the negative operator 2024-03-26 18:01:27 +01:00
Tamo
e2a1bbae37 simplify and improve the http error 2024-03-26 17:53:37 +01:00
Tamo
e433fd53e6 rename the method to get a permit and use it in all search requests 2024-03-26 17:28:03 +01:00
Tamo
3f23fbb46d create the experimental CLI argument 2024-03-26 16:43:40 +01:00
Tamo
c41e1274dc push and test the search queue datastructure 2024-03-26 15:56:43 +01:00
Louis Dureuil
f82d056072
Hide secrets in settings and task queue 2024-03-26 10:36:24 +01:00
meili-bors[bot]
5ea017b922
Merge #4530
4530: fix: set the histogram bucket boundaries to follow the otel spec r=curquiza a=rohankmr414

# Pull Request

## What does this PR do?
- Fixes the http request duration histogram bucket boundaries to follow the opentelemetry spec, currently the bucket boundaries are too granular and only track latencies below 1s.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Rohan Kumar <rohankmr414@gmail.com>
2024-03-25 12:23:31 +00:00
Louis Dureuil
dfa5e41ea6
Check validity of the URL setting 2024-03-25 11:23:16 +01:00