2506 Commits

Author SHA1 Message Date
F. Levi
40336ce87d Fix and refactor crop_bounds 2024-10-03 10:40:14 +03:00
F. Levi
37a9d64c44 Fix failing test, refactor 2024-10-01 22:52:01 +03:00
F. Levi
d9e4db9983 Refactor 2024-10-01 17:50:59 +03:00
F. Levi
6d16230f17 Refactor 2024-10-01 17:19:15 +03:00
F. Levi
eabc14c268 Refactor, handle more cases for phrases 2024-09-30 21:24:41 +03:00
F. Levi
00ccf53ffa Merge branch 'main' into change-matches-position-phrase-search 2024-09-27 15:52:05 +03:00
F. Levi
d20a39b959 Refactor find_best_match_interval 2024-09-27 15:44:30 +03:00
meili-bors[bot]
462a2329f1
Merge #4941
4941: Implement the binary quantization in meilisearch r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4873

## What does this PR do?
- Add a settings for the binary quantization
- Once enabled, the bq cannot be disabled

TODO:
- [ ] Missing a bunch of tests

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-09-19 15:50:24 +00:00
Tamo
f6483cf15d apply review comment 2024-09-19 16:47:06 +02:00
meili-bors[bot]
bd34ed01d9
Merge #4945
4945: Add swedish in default pipelines r=dureuill a=ManyTheFish

# Summary
## Fix Swedish support

In Swedish the characters `å`/`ä`/`ö` are completely different than `a` or `o`  and should not be normalized as the same character.
because the Swedish specialized pipeline was not activated by default, these characters were normalized even with the settings:
```json
{
  "localizedAttributes": [ { "locales": ["swe"], "attributePatterns": ["*"] } ]
}
```

## Update Charabia adding German support

German segmentation will now be activated using the setting:
```json
{
  "localizedAttributes": [ { "locales": ["deu"], "attributePatterns": ["*"] } ]
}
```

# TODO

- [x] Activate Swedish Pipeline
- [x] Add a test to avoid future regressions
- [x] Update Charabia


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-19 14:42:03 +00:00
Tamo
74199f328d Make clippy happy 2024-09-19 16:27:34 +02:00
Tamo
1113c42de0 fix broken comments 2024-09-19 16:18:36 +02:00
ManyTheFish
7d6768e4c4 Add german tokenization pipeline 2024-09-19 16:09:01 +02:00
ManyTheFish
f77661ec44 Update Charabia v0.9.1 2024-09-19 16:08:59 +02:00
Tamo
b8fd85a46d Get rids of useless collect before an iteration on the readers 2024-09-19 15:57:38 +02:00
Tamo
fd43c6c404 Improve the error message explaining you can't un-bq an embedder 2024-09-19 15:51:29 +02:00
Tamo
2564ec1496
Update milli/src/index.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:44 +02:00
Tamo
b6b73fe41c
Update milli/src/update/settings.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:14 +02:00
Tamo
6dde41cc46 stop using a local version of arroy and instead point to the git repo with the rev 2024-09-19 15:25:38 +02:00
Tamo
163f8023a1 remove debug println 2024-09-19 12:13:25 +02:00
Tamo
633537ccd7 fix updating documents without updating the settings 2024-09-19 12:00:58 +02:00
Tamo
3f6301dbc9 fix the missing embedder name in the error message when trying to disable the binary quantization 2024-09-19 12:00:58 +02:00
Tamo
2b6952eda1 rename the ArroyReader to an ArroyWrapper since it can read and write 2024-09-19 12:00:58 +02:00
Tamo
79f29eed3c fix the tests and the arroy_readers method 2024-09-19 12:00:58 +02:00
Tamo
cc45e264ca implement the binary quantization in meilisearch 2024-09-19 12:00:56 +02:00
meili-bors[bot]
5f474a640d
Merge #4938
4938: Remove default embedder r=ManyTheFish a=dureuill

# Pull Request

## Related issue
Fixes #4738 

## What does this PR do?

[See public usage](https://meilisearch.notion.site/v1-11-AI-search-changes-0e37727193884a70999f254fa953ce6e#1044b06b651f80edb9d4ef6dc367bad0)

- Remove `hybrid.embedder` boolean from analytics because embedder is now mandatory and so the boolean would always be `true`
- Rework search kind so that a search without query but with vector is a vector search regardless of (non-zero) semantic ratio


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 09:17:14 +00:00
ManyTheFish
bbaee3dbc6 Add Swedish pipeline in all-tokenization feature 2024-09-19 08:34:51 +02:00
F. Levi
0ffeea5a52 Remove wrong comments 2024-09-19 09:06:40 +03:00
meili-bors[bot]
ff523a2357
Merge #4939
4939: Introduce the `STARTS WITH` filter operator r=irevoire a=Kerollmops

This PR fixes #4872 by introducing the `STARTS WITH` filter operator and gating it under the _contains filter_ experimental feature along with the `CONTAINS` one. I also updated [the experimental feature discussion page](https://github.com/orgs/meilisearch/discussions/763).

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-09-18 10:19:48 +00:00
F. Levi
83113998f9 Add more test assertions 2024-09-18 10:35:23 +03:00
Clément Renault
9f1fb4b425
Introduce the STARTS WITH filter operator gated under an experimental feature 2024-09-17 16:44:11 +02:00
F. Levi
f7337affd6 Adjust tests to changes 2024-09-17 17:31:09 +03:00
Louis Dureuil
3c5e363554
Remove default embedders 2024-09-17 16:30:43 +02:00
F. Levi
993408d3ba Change closure to fn 2024-09-15 16:15:09 +03:00
F. Levi
51085206cc Misc adjustments 2024-09-14 10:14:07 +03:00
F. Levi
a2a16bf846 Move MatchPosition impl to Match, adjust counting score for phrases 2024-09-13 21:20:06 +03:00
F. Levi
cab63abc84 Improve MatchesPosition enum with an impl 2024-09-13 14:35:28 +03:00
F. Levi
65e3d61a95 Make use of helper function in one more place 2024-09-13 13:35:58 +03:00
F. Levi
cc6a2aec06 Improve changes to Matcher 2024-09-13 13:31:07 +03:00
Louis Dureuil
23e14138bb
facet distribution: implement Display for OrderBy 2024-09-12 17:43:50 +02:00
Louis Dureuil
e44325683a
Facet distribution: fix issue where truncated facet distribution would have a wrong order 2024-09-12 17:43:49 +02:00
F. Levi
e7af499314 Improve changes to Matcher 2024-09-12 16:58:13 +03:00
F. Levi
edcb4c60ba Change Matcher so that phrases are counted as one instead of word by word 2024-09-12 09:46:08 +03:00
Louis Dureuil
f18e9cb7b3
Change openai default model 2024-09-09 13:09:35 +02:00
Louis Dureuil
ed19b7c3c3
Only reindex if the size increased 2024-09-03 12:07:59 +02:00
Louis Dureuil
1ac008926b
Add maxBytes parameter 2024-09-03 12:07:15 +02:00
Louis Dureuil
c49d892c82
Changes to prompt 2024-09-03 12:07:10 +02:00
Louis Dureuil
de962a26f3
New error type when maxBytes is null 2024-09-03 12:01:04 +02:00
Louis Dureuil
21296190a3
Reindex embedders 2024-09-02 13:00:53 +02:00
Louis Dureuil
4464d319af
Change default template to use the new facility 2024-09-02 11:30:59 +02:00