Commit Graph

9067 Commits

Author SHA1 Message Date
Louis Dureuil
452a343a2b
Fix imports 2024-02-28 18:09:40 +01:00
meili-bors[bot]
b87485e80d
Merge #4433
4433: Enhance facet incremental r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #4367
Fixes #4409

## What does this PR do?

- Add a test reproducing #4409
- Fix #4409 by removing a document from a level only if it is no more present in all the linked sub-level nodes
- Optimize facet Incremental indexing by creating or deleting a complete level once per field id instead of for each facet value
- Optimize facet Incremental indexing by doing the additions and the deletions in the same process instead of doing them separately


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-02-28 15:28:46 +00:00
meili-bors[bot]
147a67dc82
Merge #4446
4446: Do not omit vectors when importing a dump r=irevoire a=dureuill

# Pull Request

## Related issue
Fixes #4447 

## What does this PR do?
- Correctly populate the maps of embedders before starting the indexing operations, while importing a dump


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-02-27 09:11:00 +00:00
Louis Dureuil
716ffc07ee
Build the embedders when importing a dump 2024-02-26 22:15:57 +01:00
meili-bors[bot]
b005eb3289
Merge #4435
4435: Make update file deletion atomic r=Kerollmops a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4432
Fixes https://github.com/meilisearch/meilisearch/issues/4438 by adding the logs the user asked

## What does this PR do?
- Adds a bunch of logs to help debug this kind of issue in the future
- Delete the update files AFTER committing the update in the `index-scheduler` (thus, if a restart happens, we are able to re-process the batch successfully)
- Multi-thread the deletion of all update files.


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-02-26 17:54:40 +00:00
meili-bors[bot]
9e664d87eb
Merge #4443
4443: Add GPU analytics r=dureuill a=dureuill

# Pull Request

## Related issue

Adds analytics indicating whether Meilisearch  was compiled with the `milli/cuda` feature.

Cc `@macraig` 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-02-26 17:13:45 +00:00
meili-bors[bot]
6dcb5219a0
Merge #4442
4442: Send custom task r=ManyTheFish a=irevoire

This PR has already been merged on main but was supposed to be merged on `release-v1.7.0` thus we need to merge it a second time; sorry 😓 

### This PR implements the necessary parameters for the High Availability

Introduce a new CLI flag called `--experimental-replication-parameters` that changes a few behaviors in the engine:
- [The auto-deletion of tasks is disabled](https://specs.meilisearch.com/specifications/text/0060-tasks-api.html#_2-technical-details)
- Upon registering a task, you can choose its task ID by sending a new header: `TaskId: 456645`. It must be a valid number, which must be superior to the last task id ever seen.
- Add the ability to « dry-register » a task. That means meilisearch will answer to you with a valid task ID like everything went well, but won’t actually write anything in the database. To do that, you need to use the `DryRun: true` header.
- Specification’s here: https://github.com/meilisearch/specifications/pull/266

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-02-26 15:20:16 +00:00
ManyTheFish
5e83bac448 Fix PR comments 2024-02-26 15:40:15 +01:00
Tamo
0562818c2a fix and remove the file-store hack of /dev/null 2024-02-26 13:59:41 +01:00
Tamo
a478392b7a create a test with the dry-run parameter enabled 2024-02-26 13:59:41 +01:00
Tamo
bbf3fb88ca rename the cli parameter 2024-02-26 13:59:40 +01:00
Tamo
60510e037b update the discussion link 2024-02-26 13:58:04 +01:00
Tamo
36c27a18a1 implement the dry run ha parameter 2024-02-26 13:58:04 +01:00
Tamo
1eb1c043b5 disable the auto deletion of tasks when the ha mode is enabled 2024-02-26 13:58:04 +01:00
Tamo
507739bd98 add an experimental cli parameter to allow specifying your task id 2024-02-26 13:58:03 +01:00
Tamo
eb25b07390 let you specify your task id 2024-02-26 13:56:31 +01:00
Tamo
066a7a3cde takes only one read transaction per thread 2024-02-26 10:43:04 +01:00
Louis Dureuil
55796406c5
Add GPU analytics 2024-02-26 10:41:47 +01:00
Tamo
91cdd502f8 When processing tasks, make the update file deletion atomic 2024-02-22 14:56:22 +01:00
ManyTheFish
a493a50825 Fix clippy 2024-02-22 14:53:33 +01:00
ManyTheFish
9d1f489a37 Fix facet incremental indexing 2024-02-21 18:42:16 +01:00
ManyTheFish
865b415b3f Add test rerpoducing bug 2024-02-15 16:00:48 +01:00
meili-bors[bot]
5ee6aaddc4
Merge #4418
4418: Output logs to stderr r=dureuill a=irevoire

Output the logs to `stderr` instead of `stdout`. This was introduced in the `v1.7.0-rc.0` and is a bug; logs should always be outputted to stderr.

Fix #4419

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-02-15 14:31:37 +00:00
Tamo
4148d391b8 move logs to stderr 2024-02-15 15:24:16 +01:00
meili-bors[bot]
88c6165e20
Merge #4410
4410: Implement the experimental log mode cli flag and log level updates at runtime r=dureuill a=irevoire

# Pull Request
This PR fixes two issues at once because they’re highly correlated in the codebase.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4415
Fixes https://github.com/meilisearch/meilisearch/issues/4413

## What does this PR do?
- It makes the fmt logger configurable to output json or human-readable logs (like we already do today)
- It moves the fmt logger under a `reload` layer so we can update its targets at runtime
- Add the possibility to stream logs in the json mode
- Adds an analytics for the new CLI flag

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-02-15 10:01:06 +00:00
Tamo
d097431113
Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-02-15 10:58:43 +01:00
Tamo
1f8af81ba9 update the log mode discussion link 2024-02-15 10:32:48 +01:00
Tamo
5d3bad4120
Update meilisearch/src/option.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-02-15 10:31:23 +01:00
meili-bors[bot]
d34692e30b
Merge #4365
4365: Update charabia r=dureuill a=ManyTheFish

Update Charabia v0.8.7,

- Add Vietnamese Normalization (Ð and Đ into d)

Fixes #4357

Charabia versions:
- https://github.com/meilisearch/charabia/releases/tag/v0.8.6
- https://github.com/meilisearch/charabia/releases/tag/v0.8.7

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-02-14 16:57:25 +00:00
Tamo
a081da0d90 add support for the json format in the stream route 2024-02-14 15:34:39 +01:00
ManyTheFish
78e04520fc Update charabia version 2024-02-14 15:16:16 +01:00
meili-bors[bot]
72c1674a31
Merge #4350
4350: Make several indexing optimizations r=Kerollmops a=ManyTheFish

# Summary

Implement several enhancements to reduce the indexing time.

# Steps

- Compute the indexing chunk size dynamically based on the available threads and the data size
- Remove the merging step before the writing step and merge at the writing time
- Remove append function
- Make Facet search indexing incremental

# Running Indexing process

## `main`
Each type of data is written after a merging phase:
![Capture d’écran 2024-01-23 à 10 18 08](https://github.com/meilisearch/meilisearch/assets/6482087/6203c3ce-407c-46b4-8b83-04282da1bb16)

> Highlighted parts are the writings

## `remove-merging-phase-from-indexing`
When the extraction of a chunk is finished, the data is written:
![Capture d’écran 2024-01-23 à 10 18 18](https://github.com/meilisearch/meilisearch/assets/6482087/ab1307b4-d0a9-42ac-abbb-fdeb27ddf0d4)

> Highlighted parts are the writings

## Related

This PR removes the appending writes on several indexing parts, which may fix https://github.com/meilisearch/meilisearch/issues/4300. However, all of the appending writes are not removed. There are 2 remaining calls that could trigger this bug:
- When [putting embedders in the settings](b6fc181993/milli/src/update/settings.rs (L996))
- when [bulk indexing the facets](b6fc181993/milli/src/update/facet/bulk.rs (L150))


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-02-14 14:12:48 +00:00
ManyTheFish
03bb6372af Change is_batchable_with by mergeable_with 2024-02-14 11:50:22 +01:00
ManyTheFish
3beda8833d Fix and add logs 2024-02-14 11:46:30 +01:00
Tamo
3b6544db6d Implement the experimental log mode cli flag 2024-02-13 18:09:15 +01:00
ManyTheFish
55e942cd45 buggy 2024-02-13 15:26:30 +01:00
ManyTheFish
48026aa75c fix PR comments 2024-02-13 15:19:01 +01:00
Many the fish
e5e811e2c9
Update milli/src/update/index_documents/extract/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-02-13 14:22:21 +01:00
Many the fish
55de96f74e
Update milli/src/update/facet/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-02-13 14:22:10 +01:00
meili-bors[bot]
15dafde21d
Merge #4401
4401: Update version for the next release (v1.7.0) in Cargo.toml r=irevoire a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: irevoire <irevoire@users.noreply.github.com>
2024-02-12 10:17:10 +00:00
irevoire
290f6d15e7 Update version for the next release (v1.7.0) in Cargo.toml 2024-02-12 10:15:00 +00:00
ManyTheFish
39c83cb3d9 fix clippy 2024-02-12 09:12:54 +01:00
Louis Dureuil
7efb1cae11 yield in loop when the channel is not disconnected 2024-02-12 09:12:54 +01:00
Louis Dureuil
7877788510 fix logs 2024-02-12 09:12:54 +01:00
ManyTheFish
be1b054b05
Compute chunk size based on the input data size ant the number of indexing threads 2024-02-08 17:28:37 +01:00
meili-bors[bot]
023c2d755f
Merge #4391
4391: Tracing r=dureuill a=irevoire

# Pull Request

- [ ] Hide the parameters of the process batch
- [x] Make actix-web trace every call on every route
- [x] Remove all `env_logger`/`logs` dependencies
- [x] Be able to enable or disable the memory measurement using the `/logs` route parameters

See the following product discussion: https://github.com/orgs/meilisearch/discussions/721

Supersedes https://github.com/meilisearch/meilisearch/pull/4338

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4317

## What does this PR do?

Update the format of the logs from:
```
[2024-02-06T14:54:11Z INFO  actix_server::builder] starting 10 workers
```

to

```
2024-02-06T13:58:14.710803Z  INFO actix_server::builder: 200: starting 10 workers
```

First, run meilisearch with the route enabled via the feature flag:
- `cargo run --experimental-enable-logs-route`
- Or at runtime by sending the following payload:
```
curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json'  \
--data-binary '{
    "logsRoute": true
  }'
```

Then gather data from meilisearch by calling for example:
```
curl \
	-X POST http://localhost:7700/logs \
	-H 'Content-Type: application/json' \
	--data-binary '{
	    "mode": "fmt",
            "target": "milli=trace"
    }'
```

Once your operation is over, tell meilisearch to stop the route:
```
curl \
	-X DELETE http://localhost:7700/logs
```

----

In the case you’re profiling code, you will be interested by the next command that converts the output of the route to a format that the firefox profiler can understand.

```bash
cargo run --release --bin trace-to-firefox -- 2024-01-17_17:07:55-indexing-trace.json
```

Then go to https://profiler.firefox.com and load it.
Note that we can also share the profiles using the https://share.firefox.dev website.


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-02-08 14:16:56 +00:00
Louis Dureuil
407ad753ed
rust fmt 2024-02-08 15:11:42 +01:00
Tamo
285aa15d2f
make the mode camelCase instead of lowercase 2024-02-08 15:04:06 +01:00
Tamo
bf43a3f60a
fix typo 2024-02-08 15:04:06 +01:00
Tamo
2c88131bb1
rename the fmt mode to human 2024-02-08 15:04:06 +01:00