Commit Graph

535 Commits

Author SHA1 Message Date
Tamo
7ec4e2a3fb apply all style review comments 2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d make clippy happy 2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a make the attribute ranking rule use the weights and fix the tests 2024-05-14 17:36:32 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
Tamo
3698aef66b fix warning 2024-05-06 11:36:37 +02:00
Simon Detheridge
7f5ab3cef5
Use http path pattern instead of full path in metrics 2024-05-03 12:29:31 +01:00
Louis Dureuil
5a305bfdea
Remove unused struct 2024-05-02 16:14:37 +02:00
meili-bors[bot]
c793b6ef6d
Merge #4600
4600: Fix embedders api r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #4594
Fixes #4595


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
ManyTheFish
cbbfff3594 Remove debuging prints 2024-04-25 10:37:18 +02:00
ManyTheFish
7468c1cf8d Introduce WildcardSetting that are serialized as wildcards by default 2024-04-24 18:15:03 +02:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
ManyTheFish
e87cb373de Avoid intermediate serializing when displaying settings 2024-04-24 12:33:07 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools 2024-04-22 18:09:33 +02:00
meili-bors[bot]
aa0bbbb246
Merge #4578
4578: Remove useless analytics r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes #4577

## What does this PR do?
Remove the following analytics:
- `Health Seen`
- `Stats Seen`
- `Task Seen`
- `Version Seen`


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-18 13:30:42 +00:00
Tamo
4a68e9f6ae reorganize the debug implementation of the search results and only dispaly the meaningful informations 2024-04-17 13:42:10 +02:00
Tamo
206887c7a2 update the SearchQuery Debug implementation so it’s smaller and gives the most important informations first 2024-04-17 12:57:19 +02:00
Tamo
2dd9dd6d0a remove the Health Seen analytic 2024-04-17 11:43:40 +02:00
Tamo
e1f27de51a remove the Stats Seen analytic 2024-04-16 18:49:41 +02:00
Tamo
abae31aee0 remove the Task Seen analytic 2024-04-16 18:48:10 +02:00
Tamo
70ce0095ea remove the Version Seen analytic 2024-04-16 18:48:03 +02:00
Louis Dureuil
a9013ed683
Fix comment mistake
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-04 17:21:47 +02:00
Louis Dureuil
355e5282b2
Remove _semanticScore 2024-04-04 16:04:07 +02:00
Louis Dureuil
1ff2a2d6fb
Add semanticHitCount 2024-04-04 16:04:06 +02:00
Louis Dureuil
4564a38ae7
Bail earlier when the experimental feature is not enabled 2024-04-04 15:58:19 +02:00
Louis Dureuil
6ebb6b55a6
Lazily embed, don't fail hybrid search on embedding failure 2024-04-04 15:58:17 +02:00
Louis Dureuil
190933f6e1
Breaking: Remove vector from SearchResult 2024-04-04 15:57:29 +02:00
Louis Dureuil
928e6e4c05
Breaking change: remove vector for score details 2024-04-04 15:57:29 +02:00
meili-bors[bot]
5509bafff8
Merge #4535
4535: Support Negative Keywords r=ManyTheFish a=Kerollmops

This PR fixes #4422 by supporting `-` before any word in the query.

The minus symbol `-`, from the ASCII table, is not the only character that can be considered the negative operator. You can see the two other matching characters under the `Based on "-" (U+002D)` section on [this unicode reference website](https://www.compart.com/en/unicode/U+002D).

It's important to notice the strange behavior when a query includes and excludes the same word; only the derivative ( synonyms and split) will be kept:
 - If you input `progamer -progamer`, the engine will still search for `pro gamer`.
 - If you have the synonym `like = love` and you input `like -like`, it will still search for `love`.

## TODO
 - [x] Add analytics
 - [x] Add support to the `-` operator
 - [x] Make sure to support spaces around `-` well
 - [x] Support phrase negation
 - [x] Add tests


Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-04-04 13:10:27 +00:00
meili-bors[bot]
78668584cd
Merge #4533
4533: Hide api key in settings and task queue r=dureuill a=dureuill

# Pull Request

See [Usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#117f5ff7b19f4d95bb3ae0005f6c6633)

## Motivation

See [slack discussion (internal link)](https://meilisearch.slack.com/archives/C06GQP7FQ6P/p1709804022298749)


## Changes

- The value of the `apiKey` parameter is now hidden in the settings and the details of the task queue.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-03-28 16:02:53 +00:00
meili-bors[bot]
fa9748cc99
Merge #4536
4536: Limit concurrent search requests r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4489

## What does this PR do?
- Adds a « search queue » that limits the number of search requests we can process at the same time and stores search requests to be processed
- Process only one search request per core/thread (we use available_parallelism)
- When the search queue is full, new search requests replace old ones **randomly**. The reason is that:
  - If we serve the oldest one first, like Typesense, we give the worst performances to everyone
  - If we serve the latest one, it gets too easy to DoS us (you just need to fill the queue with as many search requests as we can process simultaneously to ensure no other request will ever be processed)
  - By picking the search request randomly, we give a chance to recent search requests to be processed while ensuring that we can't be owned unless they fill our queue entirely and we start returning errors 5xx
- Adds an experimental parameter to control the size of the queue
- Adds a bunch of tests to ensure the search queue works correctly
- Ensure the loop consuming the search queue is running in the health route and crashes if it’s not the case

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-03-28 15:01:52 +00:00
Tamo
06a11b5b21
Improve error message 2024-03-27 17:34:49 +01:00
Tamo
b7c582e4f3 connect the search queue with the health route 2024-03-27 15:49:43 +01:00
Tamo
03c886ac1b adds a bit of documentation 2024-03-27 15:38:36 +01:00
Tamo
087a96d22e fix flaky test 2024-03-27 11:05:37 +01:00
meili-bors[bot]
34dfea72cc
Merge #4509
4509: Rest embedder r=ManyTheFish a=dureuill

Fixes #4531 

See [Usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42?pvs=25#e6f58c3b742c4effb4ddc625ce12ee16)

### Implementation changes

- Remove tokio, futures, reqwests
- Add a new `milli::vector::rest::Embedder` embedder
- Update OpenAI and Ollama embedders to use the REST embedder internally
- Make Embedder::embed a sync method
- Add the new embedder source as described in the usage


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-03-27 09:27:46 +00:00
Tamo
55df9daaa0 adds a comment about the safety of an operation 2024-03-26 19:34:55 +01:00
Tamo
2e36f069c2 fmt imports 2024-03-26 19:23:55 +01:00
Tamo
8f5d9f501a update the discussion link 2024-03-26 19:18:32 +01:00
Tamo
8127c9a115 handle the case of a queue of zero elements 2024-03-26 19:04:39 +01:00
Clément Renault
34262c7a0d
Add analytics for the negative operator 2024-03-26 18:01:27 +01:00
Tamo
e2a1bbae37 simplify and improve the http error 2024-03-26 17:53:37 +01:00
Tamo
e433fd53e6 rename the method to get a permit and use it in all search requests 2024-03-26 17:28:03 +01:00
Tamo
3f23fbb46d create the experimental CLI argument 2024-03-26 16:43:40 +01:00
Tamo
c41e1274dc push and test the search queue datastructure 2024-03-26 15:56:43 +01:00
Louis Dureuil
f82d056072
Hide secrets in settings and task queue 2024-03-26 10:36:24 +01:00
meili-bors[bot]
5ea017b922
Merge #4530
4530: fix: set the histogram bucket boundaries to follow the otel spec r=curquiza a=rohankmr414

# Pull Request

## What does this PR do?
- Fixes the http request duration histogram bucket boundaries to follow the opentelemetry spec, currently the bucket boundaries are too granular and only track latencies below 1s.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Rohan Kumar <rohankmr414@gmail.com>
2024-03-25 12:23:31 +00:00
Louis Dureuil
dfa5e41ea6
Check validity of the URL setting 2024-03-25 11:23:16 +01:00
Louis Dureuil
f649f58013
embed no longer async 2024-03-25 11:23:03 +01:00
Rohan Kumar
13a84ae557 fix: set the histogram bucket boundaries to follow the otel spec 2024-03-25 11:20:30 +05:30
Rohan Kumar
5833070358 feat: add status code label to prometheus http request counter 2024-03-25 10:49:40 +05:30
Tamo
0ae39644f7 fix the facet search 2024-03-19 15:07:06 +01:00
Tamo
4369e9e97c add an error code test on the setting 2024-03-19 11:14:28 +01:00
Tamo
7bd881b9bc adds the degraded searches to the prometheus dashboard 2024-03-19 10:35:47 +01:00
Tamo
6a0c399c2f rename the search_cutoff parameter to search_cutoff_ms 2024-03-19 10:35:47 +01:00
Tamo
038c26c118 stop returning the degraded boolean when a search was cutoff 2024-03-19 10:35:47 +01:00
Tamo
b72495eb58 fix the settings tests 2024-03-19 10:28:23 +01:00
Tamo
d1db495119 add a settings for the search cutoff 2024-03-19 10:28:23 +01:00
Tamo
4a467739cd implements a first version of the cutoff without settings 2024-03-19 10:28:21 +01:00
shuangcui
5c95b5c933 chore: remove repetitive words
Signed-off-by: shuangcui <fliter@qq.com>
2024-03-14 21:28:55 +08:00
meili-bors[bot]
abd954755d
Merge #4476
4476: Make the `/facet-search` route use the `sortFacetValuesBy` setting r=irevoire a=Kerollmops

This PR fixes #4423 by ensuring that the `/facet-search` route uses the `sortFacetValuesBy` setting.

Note for the documentation team (to be moved in the tracking issue): Using the new `sortFacetValuesBy` setting can slow down the facet-search requests as Meilisearch iterates over the whole list of facet values and computes the count of documents on every entry. That is hardly or even impossible to optimize correctly.

### TODO
 - [x] Create a custom HashMap wrapper for the facet `OrderBy` settings.
         This wrapper will return the `OrderBy` setting of the facet, if not defined will use the default `*` one, and if not there either (strange) will fall back on the lexicographic one.
- [x] Create a `ValuesCollection` wrapper that implements the logic for the lexicographic and count order by.
  - [x] Use it when there is no search query.
  - [x] Use it when there is a search query with and without allowed typos.
  - [x] Do not change the original logic, only use a wrapper.
- [x] Add tests

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-03-13 14:36:14 +00:00
Clément Renault
9f7a4fbfeb
Return the facets of a placeholder facet-search sorted by count 2024-03-13 10:09:01 +01:00
meili-bors[bot]
5ed7b6a0b2
Merge #4456
4456: Add Ollama as an embeddings provider r=dureuill a=jakobklemm

# Pull Request

## Related issue
[Related Discord Thread](https://discord.com/channels/1006923006964154428/1211977150316683305)

## What does this PR do?
- Adds Ollama as a provider of Embeddings besides HuggingFace and OpenAI under the name `ollama`
- Adds the environment variable `MEILI_OLLAMA_URL` to set the embeddings URL of an Ollama instance with a default value of `http://localhost:11434/api/embeddings` if no variable is set
- Changes some of the structs and functions in `openai.rs` to be public so that they can be shared.
- Added more error variants for Ollama specific errors
- It uses the model `nomic-embed-text` as default, but any string value is allowed, however it won't automatically check if the model actually exists or is an embedding model

Tested against Ollama version `v0.1.27` and the `nomic-embed-text` model.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: Jakob Klemm <jakob@jeykey.net>
Co-authored-by: Louis Dureuil <louis.dureuil@gmail.com>
2024-03-13 08:48:47 +00:00
Clément Renault
d3a95ea2f6
Introduce a new OrderByMap struct to simplify the sort by usage 2024-03-12 13:56:56 +01:00
Clément Renault
69c118ef76
Extract the facet order before extracting the facets values 2024-03-12 10:35:39 +01:00
Tamo
8ec3e30d2b Merge branch 'main' into tmp-release-v1.7.0 2024-03-11 15:39:51 +01:00
Louis Dureuil
7408db2a46
Meilisearch: fix date formatting 2024-03-05 14:56:48 +01:00
Louis Dureuil
15c38dca78
Output RFC 3339 dates where we can
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-03-05 14:44:48 +01:00
Louis Dureuil
c608b3f9b5
Factor vergen stuff to a build-info crate 2024-03-05 10:11:43 +01:00
Jakob Klemm
d3004d8040
Implemented Ollama as an embeddings provider
Initial prototype of Ollama embeddings actually working, error handlign / retries still missing.

Allow model to be any String and require dimensions parameter

Fixed rustfmt formatting issues

There were some formatting issues in the initial PR and this should not make the changes comply with the Rust style guidelines

Because I accidentally didn't follow the style guide for commits in my commit messages I squashed them into one to comply
2024-03-04 15:09:43 +01:00
Louis Dureuil
452a343a2b
Fix imports 2024-02-28 18:09:40 +01:00
Louis Dureuil
716ffc07ee
Build the embedders when importing a dump 2024-02-26 22:15:57 +01:00
meili-bors[bot]
9e664d87eb
Merge #4443
4443: Add GPU analytics r=dureuill a=dureuill

# Pull Request

## Related issue

Adds analytics indicating whether Meilisearch  was compiled with the `milli/cuda` feature.

Cc `@macraig` 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-02-26 17:13:45 +00:00
Tamo
0562818c2a fix and remove the file-store hack of /dev/null 2024-02-26 13:59:41 +01:00
Tamo
bbf3fb88ca rename the cli parameter 2024-02-26 13:59:40 +01:00
Tamo
60510e037b update the discussion link 2024-02-26 13:58:04 +01:00
Tamo
36c27a18a1 implement the dry run ha parameter 2024-02-26 13:58:04 +01:00
Tamo
1eb1c043b5 disable the auto deletion of tasks when the ha mode is enabled 2024-02-26 13:58:04 +01:00
Tamo
507739bd98 add an experimental cli parameter to allow specifying your task id 2024-02-26 13:58:03 +01:00
Tamo
eb25b07390 let you specify your task id 2024-02-26 13:56:31 +01:00
meili-bors[bot]
938149f814
Merge #4042
4042: Implements the new replication parameters r=ManyTheFish a=irevoire

### This PR implements the necessary parameters for the High Availability

- [ ] Update the spec

Introduce a new CLI flag called `--experimental-replication-parameters` that changes a few behaviors in the engine:
- [The auto-deletion of tasks is disabled](https://specs.meilisearch.com/specifications/text/0060-tasks-api.html#_2-technical-details)
- Upon registering a task, you can choose its task ID by sending a new header: `TaskId: 456645`. It must be a valid number, which must be superior to the last task id ever seen.
- Add the ability to « dry-register » a task. That means meilisearch will answer to you with a valid task ID like everything went well, but won’t actually write anything in the database. To do that, you need to use the `DryRun: true` header.

----

Old prototype `prototype-custom-task-id-0`:
-  Adds the capability to specify your own task ID via the `TaskId` http header
- Make the task IDs a u64 instead of a u32


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-02-26 11:37:34 +00:00
Louis Dureuil
55796406c5
Add GPU analytics 2024-02-26 10:41:47 +01:00
Tamo
eb90f0b4fb fix and remove the file-store hack of /dev/null 2024-02-26 10:19:07 +01:00
Tamo
693ba8dd15 rename the cli parameter 2024-02-21 14:33:40 +01:00
Tamo
e1a3eed1eb update the discussion link 2024-02-21 12:30:28 +01:00
Tamo
05ae291989 implement the dry run ha parameter 2024-02-21 11:21:26 +01:00
Tamo
6ba9994916 disable the auto deletion of tasks when the ha mode is enabled 2024-02-20 12:23:39 +01:00
Tamo
01ae46dd80 add an experimental cli parameter to allow specifying your task id 2024-02-20 11:24:44 +01:00
Tamo
9ee4f55e6c let you specify your task id 2024-02-19 14:29:33 +01:00
Tamo
4148d391b8 move logs to stderr 2024-02-15 15:24:16 +01:00
Tamo
d097431113
Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-02-15 10:58:43 +01:00
Tamo
1f8af81ba9 update the log mode discussion link 2024-02-15 10:32:48 +01:00
Tamo
5d3bad4120
Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-02-15 10:31:23 +01:00
Tamo
a081da0d90 add support for the json format in the stream route 2024-02-14 15:34:39 +01:00
Tamo
3b6544db6d Implement the experimental log mode cli flag 2024-02-13 18:09:15 +01:00
Eric Long
c02d585f5b
Upgrade rustls to 0.21.10 and ring to 0.17 2024-02-12 14:32:29 +08:00
Tamo
285aa15d2f
make the mode camelCase instead of lowercase 2024-02-08 15:04:06 +01:00
Tamo
2c88131bb1
rename the fmt mode to human 2024-02-08 15:04:06 +01:00
Tamo
35aa9d5904
fix an error message 2024-02-08 15:04:06 +01:00
Tamo
cfb3e6b51f
update the actix-web trace 2024-02-08 15:04:06 +01:00
Tamo
1502382316
use debug instead of debug_span 2024-02-08 15:04:06 +01:00