Commit Graph

2297 Commits

Author SHA1 Message Date
Louis Dureuil
2f7a8a4efb
Don't write vectors that weren't autogenerated in document DB 2024-05-20 10:36:18 +02:00
Louis Dureuil
52d9cb6e5a
Refactor vector indexing
- use the parsed_vectors module
- only parse `_vectors` once per document, instead of once per embedder per document
2024-05-20 10:36:17 +02:00
Louis Dureuil
261de888b7
Add function to get the embeddings of a document in an index 2024-05-20 10:36:17 +02:00
Louis Dureuil
98c811247e
Add parsed vectors module 2024-05-20 10:25:59 +02:00
Tamo
273c6e8c5c uses the latest version of heed to get rid of unsafe code 2024-05-16 18:31:32 +02:00
Tamo
897d25780e update milli to latest version 2024-05-16 18:31:32 +02:00
Tamo
f2d0a59f1d when no searchable attributes are defined, makes all the weight equals to zero 2024-05-16 01:06:33 +02:00
Tamo
c78a2fa4f5 rename method and variable around the attributes to search on feature 2024-05-15 18:04:42 +02:00
Tamo
5542f1d9f1 get back to what we were doingb efore in the DB cache and with the restricted field id 2024-05-15 18:00:39 +02:00
Tamo
ad4d8502b3 stops storing the whole fieldids weights map when no searchable are defined 2024-05-15 17:16:10 +02:00
Tamo
7ec4e2a3fb apply all style review comments 2024-05-15 15:02:26 +02:00
Tamo
9fffb8e83d make clippy happy 2024-05-14 17:36:32 +02:00
Tamo
caa6a7149a make the attribute ranking rule use the weights and fix the tests 2024-05-14 17:36:32 +02:00
Tamo
a0082c4df9 add a failing test on the attribute ranking rule 2024-05-14 17:00:02 +02:00
Tamo
b0afe0972e stop updating the fields ids map when fields are only swapped 2024-05-14 17:00:02 +02:00
Tamo
9ecde41853 add a test on the current behaviour 2024-05-14 17:00:02 +02:00
Tamo
685f452fb2 Fix the indexing of the searchable 2024-05-14 17:00:02 +02:00
Tamo
4e4a1ddff7 gate a test behind the required feature 2024-05-14 17:00:02 +02:00
Tamo
c22460045c Stops returning an option in the internal searchable fields 2024-05-14 17:00:02 +02:00
Clément Renault
ac4bc143c4
Bump ureq to v2.9.7 2024-05-07 10:39:38 +02:00
meili-bors[bot]
4d5971f343
Merge #4621
4621: Bring back changes from v1.8.0 into main r=curquiza a=curquiza



Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-05-06 13:46:39 +00:00
Louis Dureuil
f4dd73ec8c
Destructure EmbedderOptions so we don't miss some options 2024-05-02 15:39:36 +02:00
ManyTheFish
88174b8ae4 Update charabia v0.8.10 2024-04-30 14:30:23 +02:00
meili-bors[bot]
ebca29f3de
Merge #4597
4597: Fix embeddings settings update r=ManyTheFish a=ManyTheFish

# Pull Request
- add some conditions reducing the work done when changing the settings
- add some benchmarks on embedders

## Related issue
Fixes #4585


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 16:37:28 +00:00
meili-bors[bot]
c793b6ef6d
Merge #4600
4600: Fix embedders api r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #4594
Fixes #4595


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-04-25 13:16:33 +00:00
Clément Renault
d4aeff92d0
Introduce the ThreadPoolNoAbort wrapper 2024-04-24 16:40:12 +02:00
ManyTheFish
9b76501875 Display set API key for Ollama embedder 2024-04-24 12:33:07 +02:00
Clément Renault
b3173d0423
Remove useless dots in the error messages 2024-04-22 18:09:33 +02:00
Clément Renault
96cc5319c8
Introduce a new internal error type to categorize panics 2024-04-22 18:09:33 +02:00
Clément Renault
0c7003c5df
Introduce an atomic to catch panics in thread pools 2024-04-22 18:09:33 +02:00
ManyTheFish
a1aa999026 Add conditions reducing wrok 2024-04-22 14:18:35 +02:00
ManyTheFish
c71b5d09ff Updatre charabia v0.8.9 2024-04-18 11:38:26 +02:00
writegr
ab43a8a949 chore: fix some typos in comments
Signed-off-by: writegr <wellweek@outlook.com>
2024-04-18 14:12:52 +08:00
meili-bors[bot]
4a8459b799
Merge #4576
4576: increase the default search time budget from 150ms to 1.5s r=ManyTheFish a=irevoire

# Pull Request

## Related issue
Fixes #4575

## What does this PR do?
- increase the default search time budget from 150ms to 1.5s


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-04-17 16:04:47 +00:00
Clément Renault
c923adf222
Fix facet distribution for alpha on facet numbers 2024-04-17 16:31:16 +02:00
ManyTheFish
df29ba709a Make some cleaning in Arcs 2024-04-17 12:33:25 +02:00
ManyTheFish
3acfab2eb7 Fix PR comments 2024-04-17 10:55:51 +02:00
Tamo
19137be0ea increase the default search time budget from 150ms to 1.5s 2024-04-16 18:09:49 +02:00
ManyTheFish
87a93ba47d fix clippy 2024-04-16 14:39:30 +02:00
ManyTheFish
eaf113ef34 Fix wod pair proximity error when nothing has to be extracted 2024-04-16 14:39:30 +02:00
ManyTheFish
e5ae337aae Comeback to sorters in extract_word_docids
using buffers and merge the keys manually is less efficient
2024-04-16 14:39:30 +02:00
ManyTheFish
a489b406b4 fix test 2024-04-16 14:39:06 +02:00
ManyTheFish
02c3d6b265 finish work 2024-04-16 14:39:06 +02:00
ManyTheFish
b5e4a55af6 refactor faceted and searchable pipeline 2024-04-16 14:39:06 +02:00
ManyTheFish
a7e368aaa6 Create InnerIndexSettingsDiffs struct and populate it 2024-04-16 14:39:06 +02:00
ManyTheFish
893200ab87 Avoid clearing documents in transform 2024-04-16 14:39:06 +02:00
ManyTheFish
aabce52b1b Fix test 2024-04-16 14:39:06 +02:00
ManyTheFish
8fff5fc281 update tests 2024-04-16 14:39:06 +02:00
yudrywet
cf864a1c2e chore: fix some typos in comments
Signed-off-by: yudrywet <yudeyao@yeah.net>
2024-04-14 20:11:34 +08:00
Louis Dureuil
89e72fab32
Update grenad to fix rare DB corruption 2024-04-11 21:06:59 +02:00
meili-bors[bot]
b1844b0c27
Merge #4548
4548: v1.8 hybrid search changes r=dureuill a=dureuill

Implements the search changes from the [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#40f24df3da694428a39cc8043c9cfc64)

### ⚠️ Breaking changes in an experimental feature:

- Removed the `_semanticScore`. Use the `_rankingScore` instead.
- Removed `vector` in the response of the search (output was too big).
- Removed all the vectors from the `vectorSort` ranking score details
  - target vector appearing in the name of the rule
  - matched vector appearing in the details of the rule

### Other user-facing changes

- Added `semanticHitCount`, indicating how many hits were returned from the semantic search. This is especially useful in the hybrid search.
- Embed lazily: Meilisearch no longer generates an embedding when the keyword results are "good enough".
- Graceful embedding failure in hybrid search: when doing hybrid search (`semanticRatio in ]0.0, 1.0[`), an embedding failure no longer causes the search request to fail. Instead, only the keyword search is performed. When doing a full vector search (`semanticRatio==1.0`), a failure to embed will still result in failing that search.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-04 16:00:20 +00:00
Louis Dureuil
1ff2a2d6fb
Add semanticHitCount 2024-04-04 16:04:06 +02:00
Louis Dureuil
3c6e9851a4
Correct error formatting 2024-04-04 15:58:19 +02:00
Louis Dureuil
466d718a05
Fix test 2024-04-04 15:58:19 +02:00
Louis Dureuil
6ebb6b55a6
Lazily embed, don't fail hybrid search on embedding failure 2024-04-04 15:58:17 +02:00
Louis Dureuil
fabc9cf14a
milli: add Embedder::embed_one 2024-04-04 15:57:29 +02:00
Louis Dureuil
00c4ed3bc2
milli: refactor getting embedder and embedder name 2024-04-04 15:57:29 +02:00
Louis Dureuil
928e6e4c05
Breaking change: remove vector for score details 2024-04-04 15:57:29 +02:00
meili-bors[bot]
339a5e3431
Merge #4549
4549: Hugging Face embedder improvements r=dureuill a=dureuill

Architectural changes/Internal improvements

### 1. Prefer safetensors weights over pytorch weights when available

safetensors weights are memory mapped, which reduces memory usage of supported models.

### 2. Update candle

Updates candle to `0.4.1`, now targeting crates.io and the tokenizers to `v0.15.2` (still on github).

This might fix https://github.com/meilisearch/meilisearch/issues/4399 thanks to the now included https://github.com/huggingface/candle/issues/1454

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-04 13:47:18 +00:00
meili-bors[bot]
5509bafff8
Merge #4535
4535: Support Negative Keywords r=ManyTheFish a=Kerollmops

This PR fixes #4422 by supporting `-` before any word in the query.

The minus symbol `-`, from the ASCII table, is not the only character that can be considered the negative operator. You can see the two other matching characters under the `Based on "-" (U+002D)` section on [this unicode reference website](https://www.compart.com/en/unicode/U+002D).

It's important to notice the strange behavior when a query includes and excludes the same word; only the derivative ( synonyms and split) will be kept:
 - If you input `progamer -progamer`, the engine will still search for `pro gamer`.
 - If you have the synonym `like = love` and you input `like -like`, it will still search for `love`.

## TODO
 - [x] Add analytics
 - [x] Add support to the `-` operator
 - [x] Make sure to support spaces around `-` well
 - [x] Support phrase negation
 - [x] Add tests


Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-04-04 13:10:27 +00:00
Louis Dureuil
58cafcc824
Update candle 2024-04-03 13:11:56 +02:00
meili-bors[bot]
56bf8503db
Merge #4537
4537: Expose distribution shift in settings r=ManyTheFish a=dureuill

See [usage page](https://meilisearch.notion.site/v1-8-AI-search-API-usage-135552d6e85a4a52bc7109be82aeca42#d652adc0890445658aaf36352dbc8802)

# Changes

- Distribution shift added to all embedders.
- Exposed in settings
- Changed the reindexing logic to not trigger a reindex operation when only the distribution shift or API key change

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-04-03 09:08:58 +00:00
Louis Dureuil
a1eccc762a
Prefer safetensors to pytorch when both are available 2024-04-03 11:05:59 +02:00
meili-bors[bot]
75f81a0bab
Merge #4547
4547: Fix milli/Cargo.toml for usage as dependency via git r=dureuill a=Toromyx

# Pull Request

## Related issues/discussions
This enables th usage of `milli` [via git repository](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories) as mentioned in <https://github.com/meilisearch/meilisearch/issues/3367#issuecomment-1422613815>, <https://github.com/meilisearch/meilisearch/discussions/1523#discussioncomment-1039338>, and <https://github.com/meilisearch/meilisearch/discussions/1981#discussioncomment-1771568>

## What does this PR do?
Trying to depend on `milli` like

```
[dependencies.milli]
git = "https://github.com/meilisearch/meilisearch.git"
tag = "v1.7.4"
```

leads to the following error:

```
error: failed to select a version for the requirement `candle-core = "^0.3.1"`
candidate versions found which didn't match: 0.4.2
location searched: Git repository https://github.com/huggingface/candle.git
required by package `milli v1.7.4 (https://github.com/meilisearch/meilisearch.git?tag=v1.7.4#0259ad60)`
```

because the default branch of <https://github.com/huggingface/candle> does not contain the correct version.

To fix this, i added a `rev="..."` entry in the relevant dependencies, specifiyng the commit already present in the `Cargo.lock` file.
I also updated the version to the one in the Cargo.lock. This also updated `candle-kernels` sub-dependency from 0.3.1 to 0.3.3 which is probably correct?

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Thomas Gauges <thomas.gauges@gmail.com>
2024-04-03 07:31:36 +00:00
Thomas Gauges
d55d496250 Fix milli/Cargo.toml for usage as dependency via git 2024-04-02 15:19:30 +02:00
redistay
182cb42953 chore: fix some typos in conments
Signed-off-by: redistay <wujunjing@outlook.com>
2024-04-02 19:37:55 +08:00
meili-bors[bot]
92a049c2dd
Merge #4543
4543: Bring back changes from v1.7.4 into main r=Kerollmops a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
2024-03-28 16:53:51 +00:00
Clément Renault
877f4b1045
Support negative phrases 2024-03-28 15:51:43 +01:00
Louis Dureuil
796213af9a
Merge branch 'main' into tmp-release-v1.7.4 2024-03-28 10:51:49 +01:00
Clément Renault
69f8b2730d
Fix the tests 2024-03-28 10:47:04 +01:00
Louis Dureuil
ee8cbea810
Don't optimize reindexing when fields contain dots 2024-03-27 17:04:45 +01:00
Louis Dureuil
572fb3a51d
Finer granularity for embedder needs reindex 2024-03-27 12:01:34 +01:00
Louis Dureuil
4ff0255783
remove unused function 2024-03-27 11:51:14 +01:00
Louis Dureuil
a25456120d
Expose distribution in settings 2024-03-27 11:51:04 +01:00
Louis Dureuil
168ded3b9d
Deserr for distribution 2024-03-27 11:50:33 +01:00
Louis Dureuil
afd1da5642
Add distribution to all embedders 2024-03-27 11:50:22 +01:00
Clément Renault
34262c7a0d
Add analytics for the negative operator 2024-03-26 18:01:27 +01:00
Clément Renault
1da9e0f246
Better support space around the negative operator (-) 2024-03-26 17:47:13 +01:00
Clément Renault
e4a3e603b3
Expose a first working version of the negative keyword 2024-03-26 17:47:13 +01:00
Louis Dureuil
817ccc089a
also allow api_key 2024-03-25 11:50:00 +01:00
Louis Dureuil
4136630ea5
Use constants instead of raw strings in set_*set() 2024-03-25 11:39:33 +01:00
Louis Dureuil
58972f35cb
Allow url parameter for ollama embedder 2024-03-25 11:32:55 +01:00
Louis Dureuil
dfa5e41ea6
Check validity of the URL setting 2024-03-25 11:23:16 +01:00
Louis Dureuil
a1db342f01
Expose REST embedder to the API 2024-03-25 11:23:15 +01:00
Louis Dureuil
f87747f4d3
Remove unwraps 2024-03-25 11:23:04 +01:00
Louis Dureuil
b6b4b6bab7
Remove the tokio and the reqwests 2024-03-25 11:23:03 +01:00
Louis Dureuil
ac52c857e8
Update ollama and openai impls to use the rest embedder internally 2024-03-25 11:23:03 +01:00
Louis Dureuil
8708cbef25
Add RestEmbedder 2024-03-25 11:23:03 +01:00
Louis Dureuil
c3d02f092d
OpenAI sync 2024-03-25 11:23:03 +01:00
Louis Dureuil
bc58e8a310
Documentation for the vector module 2024-03-25 11:23:03 +01:00
meili-bors[bot]
ec81c2bf1a
Merge #4511
4511: Bump charabia to 0.8.8 r=ManyTheFish a=6543

... and update lock file

this will add the fix (https://github.com/meilisearch/charabia/pull/275) to support markdown formatted codeblocks

Co-authored-by: 6543 <6543@obermui.de>
2024-03-25 09:26:11 +00:00
meili-bors[bot]
fc1c3f4a29
Merge #4466
4466: Implements the search cutoff r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4488

## What does this PR do?
- Adds a cutoff to the bucket sort after 150ms has been spent
- Adds a new setting to customize the default value of 150ms
- When the time is exceeded, we exit early with what we had the time to sort
- If the cutoff has been reached, the search details are updated with a new `Skip` ranking details for the ranking rules that were skipped
- Adds analytics to measure the total number of degraded search requests
- Adds the number of degraded search requests to the Prometheus metrics and Grafana dashboard
- The cutoff **must not** skip the filters; otherwise, we would leak documents to people who don’t have the right to see them


Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-03-20 13:06:53 +00:00
6543
4628b7b7bd bump charabia to 0.8.8
and update lock file
2024-03-20 13:39:00 +01:00
Tamo
c5322df519
Revert "Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1"" 2024-03-20 10:08:28 +01:00
Tamo
6079141ea6 snapshot the scores side by side with the score details 2024-03-19 18:30:14 +01:00
Tamo
2c3af8e513 query the detailed score detail in the test 2024-03-19 18:09:02 +01:00
Louis Dureuil
098ab594eb
A score of 0.0 is now lesser than a sort result
handles the niche case 🐩 in the hybrid search where:
1. a sort ranking rule is the first rule.
2. the keyword search is skipped at the first rule.
3. the semantic search is not skipped at the first rule.

Previously, we would have the skipped search winning, whereas we want the non skipped one winning.
2024-03-19 17:32:32 +01:00
Tamo
567194b925 Revert "Merge remote-tracking branch 'origin/main' into release-v1.7.1"
This reverts commit bd74cce86a, reversing
changes made to d2f77e88bd.
2024-03-19 16:56:21 +01:00
Tamo
d8fe4fe49d return the order in the score details 2024-03-19 15:45:04 +01:00
Tamo
7b9e0d2944 forward the degraded parameter to the hybrid search 2024-03-19 15:11:21 +01:00