11086 Commits

Author SHA1 Message Date
meili-bors[bot]
80adbb1bdc
Merge #5338
5338: Bump Ubuntu in the CI from 20.04 to 22.04 r=dureuill a=Kerollmops

This PR bumps the Ubuntu version we use in the CI from version 20.04 to version 22.04. This also means we are [using GLIBC version 2.35 and not version 2.28](https://gist.github.com/zchrissirhcz/ee13f604996bbbe312ba1d105954d2ed).

Note, the indentation fix is done by my IDE (Zed), sorry about that 🤦 

Fixes https://github.com/meilisearch/meilisearch/issues/5374

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-27 08:14:12 +00:00
meili-bors[bot]
4b6fa1cf41
Merge #5372
5372: Bring back changes from v1.13.1 to main r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <Kerollmops@users.noreply.github.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Strift <lau.cazanove@gmail.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2025-02-26 17:24:51 +00:00
Kerollmops
dc78d8e9c4
Fix the dumpless upgrade log 2025-02-26 17:02:46 +01:00
ManyTheFish
d4063c9dcd
Fix fmt 2025-02-26 17:02:45 +01:00
Many the fish
abebc574f6
Update crates/milli/src/index.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-26 17:02:45 +01:00
Many the fish
f32ab67819
Update crates/milli/src/index.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-26 17:02:44 +01:00
ManyTheFish
d25953f322
fix clippy 2025-02-26 17:02:43 +01:00
ManyTheFish
405bbd04c1
Dumpless upgrade 2025-02-26 17:01:38 +01:00
ManyTheFish
5d421abdc4
Update Snapshots 2025-02-26 17:01:37 +01:00
ManyTheFish
9f3663e768
Implement Incremental document database stats computing 2025-02-26 17:01:35 +01:00
ManyTheFish
d9642ec916
Use checked_div in average computation 2025-02-26 17:01:34 +01:00
ManyTheFish
818e8b0237
Fix zero division 2025-02-26 17:01:31 +01:00
ManyTheFish
4f77a7fba5
fix clippy 2025-02-26 17:01:29 +01:00
ManyTheFish
058f08dff5
fix snapshots 2025-02-26 17:01:26 +01:00
ManyTheFish
9a6c1730aa
Add document database stats 2025-02-26 17:01:25 +01:00
Strift
91a8a97045
Bump 2025-02-26 17:01:24 +01:00
ManyTheFish
15788773af
Check the exact_word database when computing zero typo query 2025-02-26 17:01:22 +01:00
Kerollmops
025b9b79bb
Update the snapshots 2025-02-26 17:01:21 +01:00
Kerollmops
1c60b17a37
Update version for the next release (v1.13.1) in Cargo.toml 2025-02-26 17:01:19 +01:00
meili-bors[bot]
b0d4f9590f
Merge #5364
5364: Rename `callTrace` into `progressTrace` r=Kerollmops a=Kerollmops

Rename the `callTrace` field into a `progressTrace`.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-25 09:34:13 +00:00
Kerollmops
dfce20be21
Rename callTrace into progressTrace 2025-02-25 10:09:03 +01:00
meili-bors[bot]
ea7bae9a71
Merge #5356
5356: Display the internal indexing steps with timings on the `/batches` route r=irevoire a=Kerollmops

This PR computes the durations of each step, stores them in a map, and prints them (for now).

```
"callTrace": {
    "processing tasks > retrieving config": "185.38µs",
    "processing tasks > computing document changes > preparing update file > payload": "23.11ms",
    "processing tasks > computing document changes > preparing update file": "23.26ms",
    "processing tasks > computing document changes": "24.06ms",
    "processing tasks > indexing > extracting documents > document": "15.13ms",
    "processing tasks > indexing > extracting documents": "15.13ms",
    "processing tasks > indexing > extracting facets > document": "5.70ms",
    "processing tasks > indexing > extracting facets": "5.72ms",
    "processing tasks > indexing > extracting words > document": "597.24ms",
    "processing tasks > indexing > extracting words": "597.25ms",
    "processing tasks > indexing > extracting word proximity > document": "1.14s",
    "processing tasks > indexing > extracting word proximity": "1.15s",
    "processing tasks > indexing > tail writing to database": "430.91ms",
    "processing tasks > indexing > waiting for extractors": "52.54µs",
    "processing tasks > indexing > writing embeddings to database": "47.79µs",
    "processing tasks > indexing > post-processing facets": "476.04µs",
    "processing tasks > indexing > post-processing words": "97.82ms",
    "processing tasks > indexing > finalizing": "67.41ms",
    "processing tasks > indexing": "2.40s",
    "processing tasks": "2.43s",
    "writing tasks to disk > task": "37.71µs",
    "writing tasks to disk": "67.13µs"
},
"writeChannelCongestion": {
    "attempts": 2608482,
    "blocking_attempts": 0,
    "blocking_ratio": 0.0
}
```

## To Do
- [x] Update the batches PRD + delivery + tracking issue.
- [x] Store that in the batches to be visible from the `/batches` route.
- [x] Display the writer's congestion.
- [x] Display the info back in the logs too.
- [ ] (optional) Compute the size of each database by [using LMDB](https://docs.rs/heed/latest/heed/struct.DatabaseStat.html).
- [x] Push them in reverse order so that "processing task" is after the other sub-steps.


Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-20 17:38:50 +00:00
Kerollmops
76fd5d92d7
Clarify the tail writing to database 2025-02-20 17:35:23 +01:00
Kerollmops
245a55722a
Remove commented code 2025-02-20 16:48:18 +01:00
Kerollmops
434fad5327
Fix insta tests again 2025-02-20 16:41:48 +01:00
Kerollmops
243a5fa6a8
Log the call trace and congestion 2025-02-20 14:17:34 +01:00
Kerollmops
9d314ace09
Fix the insta tests 2025-02-20 11:51:58 +01:00
Kerollmops
1b1172ad16
Fix dump tests 2025-02-20 10:44:53 +01:00
Kerollmops
1d99c8465c
Hide the batch stats to make insta pass 2025-02-20 10:16:54 +01:00
Kerollmops
05cc8c650c
Expose the write channel congestion in the batches 2025-02-19 15:47:54 +01:00
Kerollmops
3ff1de0a21
Expose the call trace in the batch stats 2025-02-19 11:24:11 +01:00
Kerollmops
e9add14189
Reorder steps 2025-02-18 19:26:41 +01:00
Kerollmops
4a058a080e
Simplify the name generation 2025-02-18 18:48:44 +01:00
Kerollmops
11a11fc870
Accumulate step durations from the progress system 2025-02-18 18:33:19 +01:00
meili-bors[bot]
0f1aeb8eaa
Merge #5351
5351: Bring back v1.13.0 changes into main r=irevoire a=Kerollmops

This PR brings back the changes made in v1.13 into the main branch.

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clémentine <clementine@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2025-02-18 08:05:02 +00:00
meili-bors[bot]
5e7803632d
Merge #5342
5342: Fix workload sha r=dureuill a=ManyTheFish

The dataset shasum was wrong for some workloads making the `/bench workloads/*.json` crash

Co-authored-by: ManyTheFish <many@meilisearch.com>
v1.13.0-rc.3 v1.13.0
2025-02-12 16:27:09 +00:00
meili-bors[bot]
885710a07b
Merge #5341
5341: Embeddings stats r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #5321

## What does this PR do?
- Add embedding stats
- force dumpless upgrade to recompute stats
- add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
2025-02-12 15:46:37 +00:00
ManyTheFish
c55fdad2c3 Fix dumpless upgrade target version 2025-02-12 16:35:05 +01:00
ManyTheFish
1caad4c4b0 Add multiple embeddings for the same embedder in tests 2025-02-12 16:13:34 +01:00
ManyTheFish
8419ed52a1 fix clippy 2025-02-12 14:38:51 +01:00
ManyTheFish
a65c52cc97 Convert dump test into snapshots 2025-02-12 14:14:10 +01:00
ManyTheFish
49e9655c24 Update snapshots 2025-02-12 14:05:32 +01:00
meili-bors[bot]
fa763ca5dc
Merge #5339
5339: Add back timeout from v1.11.3 r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #5337

## What does this PR do?
- Fix regression compared with v1.11 by reintroducing the 30s timeout on all REST API calls.

Thanks to `@migueltarga` for reporting the issue


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
v1.13.0-rc.2
2025-02-12 12:50:27 +00:00
ManyTheFish
c7aeb554b2 Add tests 2025-02-12 13:37:41 +01:00
ManyTheFish
88d9d47928 Fix benchmark sha 2025-02-12 13:27:15 +01:00
Louis Dureuil
8e0d8d31f9
Add back timeout from v1.11.3 2025-02-12 11:53:00 +01:00
meili-bors[bot]
81a38099ec
Merge #5336
5336: Meilitool Hair Dryer r=dureuill a=Kerollmops

This pull request introduces a new subcommand to hair dry a specific part of specific indexes. It is useful when [the memory-mapped pages are not hot in the cache](https://arc.net/l/quote/ixhcdwcq) and must be. Hair drying those interesting pages makes the search requests using the vector store much faster.

The previous technique used the "cat method," which consists of reading the whole LMDB data file and pipping it into the null file descriptor. By doing that, the whole LMDB data file becomes hot in the cache. However, when the database is large, at least 30% of it is free, and unused pages and many other pages don't need to be hot, e.g., raw JSON documents or uninteresting parts of the inverted index.

This new subcommand reads all the Arroy pages of a given index to make them hot, and only those. More coming...

The current algorithm is single-threaded and takes a lot of time. I am in the process of multithreading it. This is the time it takes to hair dry a 305GiB database with a single thread.

```
real    21m51.054s
user    0m3.155s
sys     0m19.393s
```

## To Do
- [ ] (optional) Do the reads in parallel.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-12 10:45:16 +00:00
ManyTheFish
bd27fe7d02 force dumpless upgrade to recompute stats 2025-02-12 11:45:02 +01:00
ManyTheFish
41203f0931 Add embedders stats 2025-02-12 11:37:47 +01:00
Kerollmops
803a699b15
Remove unsafes 2025-02-12 10:46:45 +01:00