11162 Commits

Author SHA1 Message Date
ManyTheFish
967033579d Refactor search and facet-search
**Changes:**
The search filters are now using the FilterableAttributesFeatures from the FilterableAttributesRules to know if a field is filterable.
Moreover, the FilterableAttributesFeatures is more precise and an error will be returned if an operator is used on a field that doesn't have the related feature.
The facet-search is now checking if the feature is allowed in the FilterableAttributesFeatures and an error will be returned if the field doesn't have the related feature.

**Impact:**
- facet-search is now relying on AttributePatterns to match the locales
- search using filters is now relying on FilterableAttributesFeatures
- distinct attribute is now relying on FilterableAttributesRules
2025-03-03 10:25:32 +01:00
ManyTheFish
0200c65ebf Change the filterableAttributes setting API
**Changes:**
The filterableAttributes type has been changed from a `BTreeSet<String>` to a `Vec<FilterableAttributesRule>`,
Which is a list of rules defining patterns to match the documents' fields and a set of feature to apply on the matching fields.
The rule order given by the user is now an important information, the features applied on a filterable field will be chosen based on the rule order as we do for the LocalizedAttributesRules.
This means that the list will not be reordered anymore and will keep the user defined order,
moreover, if there are any duplicates, they will not be de-duplicated anymore.

**Impact:**
- Settings API
- the database format of the filterable attributes changed
- may impact the LocalizedAttributesRules due to the AttributePatterns factorization
- OpenAPI generator
2025-03-03 10:22:02 +01:00
meili-bors[bot]
c63c25a9a2
Merge #5355
5355: Support fetching the pooling method from the model configuration r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #5354 

## What does this PR do?
- Fetches the pooling configuration from the model repository
- Use a pooling method that depends on the pooling configuration of that model.
- Allow overriding the pooling method with a new huggingFace embedder parameter `pooling`
  - for backward-compatibility with Meilisearch v1.13
  - for compatibility with embedders that exhibit the same behavior as Meilisearch v1.13
- Handle the default value of that new parameter
   - for compatibility, when importing a db/a dump, it should be set to `forceMean`
   - when (re)set from the settings for an embedder, it should be set to `useModel`


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-02-27 14:55:13 +00:00
Louis Dureuil
046bbea864
Keep old stat format to make sure the number of documents is available during dumpless upgrade 2025-02-27 15:17:23 +01:00
Louis Dureuil
c5cb7d2f2c
Forbid opening a db of v1.13.x from v1.13.y 2025-02-27 15:17:23 +01:00
Louis Dureuil
5e7f226ac9
Support dumpless upgrade for all v1.13 patches 2025-02-27 15:17:23 +01:00
Louis Dureuil
754f254a00
Update snapshots following version bump 2025-02-27 15:17:23 +01:00
Kerollmops
39b5ad3c86
Update version for the next release (v1.13.2) in Cargo.toml 2025-02-27 15:17:22 +01:00
meili-bors[bot]
80adbb1bdc
Merge #5338
5338: Bump Ubuntu in the CI from 20.04 to 22.04 r=dureuill a=Kerollmops

This PR bumps the Ubuntu version we use in the CI from version 20.04 to version 22.04. This also means we are [using GLIBC version 2.35 and not version 2.28](https://gist.github.com/zchrissirhcz/ee13f604996bbbe312ba1d105954d2ed).

Note, the indentation fix is done by my IDE (Zed), sorry about that 🤦 

Fixes https://github.com/meilisearch/meilisearch/issues/5374

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-27 08:14:12 +00:00
meili-bors[bot]
4b6fa1cf41
Merge #5372
5372: Bring back changes from v1.13.1 to main r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <Kerollmops@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Strift <lau.cazanove@gmail.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2025-02-26 17:24:51 +00:00
Kerollmops
dc78d8e9c4
Fix the dumpless upgrade log 2025-02-26 17:02:46 +01:00
ManyTheFish
d4063c9dcd
Fix fmt 2025-02-26 17:02:45 +01:00
Many the fish
abebc574f6
Update crates/milli/src/index.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-26 17:02:45 +01:00
Many the fish
f32ab67819
Update crates/milli/src/index.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-26 17:02:44 +01:00
ManyTheFish
d25953f322
fix clippy 2025-02-26 17:02:43 +01:00
ManyTheFish
405bbd04c1
Dumpless upgrade 2025-02-26 17:01:38 +01:00
ManyTheFish
5d421abdc4
Update Snapshots 2025-02-26 17:01:37 +01:00
ManyTheFish
9f3663e768
Implement Incremental document database stats computing 2025-02-26 17:01:35 +01:00
ManyTheFish
d9642ec916
Use checked_div in average computation 2025-02-26 17:01:34 +01:00
ManyTheFish
818e8b0237
Fix zero division 2025-02-26 17:01:31 +01:00
ManyTheFish
4f77a7fba5
fix clippy 2025-02-26 17:01:29 +01:00
ManyTheFish
058f08dff5
fix snapshots 2025-02-26 17:01:26 +01:00
ManyTheFish
9a6c1730aa
Add document database stats 2025-02-26 17:01:25 +01:00
Strift
91a8a97045
Bump 2025-02-26 17:01:24 +01:00
ManyTheFish
15788773af
Check the exact_word database when computing zero typo query 2025-02-26 17:01:22 +01:00
Kerollmops
025b9b79bb
Update the snapshots 2025-02-26 17:01:21 +01:00
Kerollmops
1c60b17a37
Update version for the next release (v1.13.1) in Cargo.toml 2025-02-26 17:01:19 +01:00
Louis Dureuil
3b2cd54b9d
tests: add a check to know if a Value has an uid 2025-02-25 17:24:45 +01:00
Tamo
0833cb7d34 Mention openAPI in CONTRIBUTING.md 2025-02-25 12:01:26 +01:00
meili-bors[bot]
b0d4f9590f
Merge #5364
5364: Rename `callTrace` into `progressTrace` r=Kerollmops a=Kerollmops

Rename the `callTrace` field into a `progressTrace`.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-25 09:34:13 +00:00
Kerollmops
dfce20be21
Rename callTrace into progressTrace 2025-02-25 10:09:03 +01:00
Louis Dureuil
24fe6cd205
Fix multiple embeddings in hf 2025-02-24 16:24:04 +01:00
Louis Dureuil
e374b095a2
Fix tests 2025-02-24 14:11:26 +01:00
Louis Dureuil
9f3e4801b1
Refactor settings validation and introduce SubEmbedderSettings 2025-02-24 13:58:26 +01:00
Louis Dureuil
b85180fedb
Error types 2025-02-24 13:58:26 +01:00
Louis Dureuil
3cdcc54a9e
analytics 2025-02-24 13:58:26 +01:00
Louis Dureuil
294cf39cad
Integrate composite embedder 2025-02-24 13:58:26 +01:00
Louis Dureuil
4a2643daa2
Rename embed_one to embed_search and embed_chunks* to embed_index* 2025-02-24 13:58:26 +01:00
Louis Dureuil
8d2d9066ba
Add composite embedder 2025-02-24 13:58:26 +01:00
Louis Dureuil
526476e168
Move settings test to its own file 2025-02-24 13:58:26 +01:00
meili-bors[bot]
ea7bae9a71
Merge #5356
5356: Display the internal indexing steps with timings on the `/batches` route r=irevoire a=Kerollmops

This PR computes the durations of each step, stores them in a map, and prints them (for now).

```
"callTrace": {
    "processing tasks > retrieving config": "185.38µs",
    "processing tasks > computing document changes > preparing update file > payload": "23.11ms",
    "processing tasks > computing document changes > preparing update file": "23.26ms",
    "processing tasks > computing document changes": "24.06ms",
    "processing tasks > indexing > extracting documents > document": "15.13ms",
    "processing tasks > indexing > extracting documents": "15.13ms",
    "processing tasks > indexing > extracting facets > document": "5.70ms",
    "processing tasks > indexing > extracting facets": "5.72ms",
    "processing tasks > indexing > extracting words > document": "597.24ms",
    "processing tasks > indexing > extracting words": "597.25ms",
    "processing tasks > indexing > extracting word proximity > document": "1.14s",
    "processing tasks > indexing > extracting word proximity": "1.15s",
    "processing tasks > indexing > tail writing to database": "430.91ms",
    "processing tasks > indexing > waiting for extractors": "52.54µs",
    "processing tasks > indexing > writing embeddings to database": "47.79µs",
    "processing tasks > indexing > post-processing facets": "476.04µs",
    "processing tasks > indexing > post-processing words": "97.82ms",
    "processing tasks > indexing > finalizing": "67.41ms",
    "processing tasks > indexing": "2.40s",
    "processing tasks": "2.43s",
    "writing tasks to disk > task": "37.71µs",
    "writing tasks to disk": "67.13µs"
},
"writeChannelCongestion": {
    "attempts": 2608482,
    "blocking_attempts": 0,
    "blocking_ratio": 0.0
}
```

## To Do
- [x] Update the batches PRD + delivery + tracking issue.
- [x] Store that in the batches to be visible from the `/batches` route.
- [x] Display the writer's congestion.
- [x] Display the info back in the logs too.
- [ ] (optional) Compute the size of each database by [using LMDB](https://docs.rs/heed/latest/heed/struct.DatabaseStat.html).
- [x] Push them in reverse order so that "processing task" is after the other sub-steps.


Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-20 17:38:50 +00:00
Kerollmops
76fd5d92d7
Clarify the tail writing to database 2025-02-20 17:35:23 +01:00
Kerollmops
245a55722a
Remove commented code 2025-02-20 16:48:18 +01:00
Kerollmops
434fad5327
Fix insta tests again 2025-02-20 16:41:48 +01:00
Kerollmops
243a5fa6a8
Log the call trace and congestion 2025-02-20 14:17:34 +01:00
Kerollmops
9d314ace09
Fix the insta tests 2025-02-20 11:51:58 +01:00
Kerollmops
1b1172ad16
Fix dump tests 2025-02-20 10:44:53 +01:00
Kerollmops
1d99c8465c
Hide the batch stats to make insta pass 2025-02-20 10:16:54 +01:00
Kerollmops
05cc8c650c
Expose the write channel congestion in the batches 2025-02-19 15:47:54 +01:00
Louis Dureuil
14e1459bf5
Document settings 2025-02-19 15:06:22 +01:00