Commit Graph

9976 Commits

Author SHA1 Message Date
ManyTheFish
0bc1a18f52 Use Languages list detected during indexing at search time 2023-02-01 18:57:43 +01:00
ManyTheFish
643d99e0f9 Add expectancy test 2023-02-01 18:39:54 +01:00
Kerollmops
a36b1dbd70
Fix the tasks with the new patterns 2023-02-01 18:21:45 +01:00
dependabot[bot]
5672165e44
Bump docker/build-push-action from 3 to 4
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 3 to 4.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v3...v4)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-01 17:02:17 +00:00
Kerollmops
d563ed8a39
Making it work with index uid patterns 2023-02-01 17:51:30 +01:00
bors[bot]
36cae3b480
Merge #3399
3399: Rework technical information in the README r=Kerollmops a=curquiza

Following this https://github.com/meilisearch/meilisearch/pull/3346#discussion_r1073289399

Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2023-02-01 14:34:55 +00:00
ManyTheFish
064158e4e2 Update test 2023-02-01 15:34:01 +01:00
ManyTheFish
77d32d0ee8 Fix codec deserialization 2023-02-01 15:26:26 +01:00
ManyTheFish
f4569b04ad Update Charabia version 2023-02-01 15:26:26 +01:00
bors[bot]
5e12af88e2
Merge #3445
3445: Bump milli to v0.41.1 r=curquiza a=dureuill

# Pull Request

## Related issue

Fixes #3438.

## What does this PR do?
- Bump milli to [v0.41.1](https://github.com/meilisearch/milli/releases/tag/v0.41.1) that includes a bugfix for #3438 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-01 11:07:46 +00:00
Louis Dureuil
231067a1c4
Bump milli to v0.41.1 2023-02-01 11:53:39 +01:00
Vibhav Bobade
2a1a7ef00a Integrate Uffizzi 2023-02-01 13:06:27 +05:30
bors[bot]
758b4acea7
Merge #776
776: Reduce incremental indexing time of `words_prefix_position_docids` DB r=curquiza a=loiclec

Fixes partially https://github.com/meilisearch/milli/issues/605

The `words_prefix_position_docids` can easily contain millions of entries. Thus, iterating
over it can be very expensive. But we do so needlessly for every document addition tasks.

It can sometimes cause indexing performance issues when :
- a user sends many `documentAdditionOrUpdate` tasks that cannot be all batched together (for example if they are interspersed with `documentDeletion` tasks)
- the documents contain long, diverse text fields, thus increasing the number of entries in `words_prefix_position_docids`
- the index has accumulated many soft-deleted documents, further increasing the size of `words_prefix_position_docids`
- the machine running Meilisearch does not have great IO performance (e.g. slow SSD, or quota-limited by the cloud provider)

Note, before approving  the PR: the only changed file should be `milli/src/update/words_prefix_position_docids.rs`.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-31 15:52:28 +00:00
bors[bot]
20f8184c06
Merge #3441
3441: Fix import of dump v2 r=dureuill a=irevoire

# Pull Request
This bug was introduced because of a mistake we did earlier: We said the last version to export dump v2 was the v0.21.0 while it was the v0.22.0.
To fix the bug I updated our whole v2 reader to use the code from meilisearch v0.22.0.
Also:
- Import the bugged dump in the tests
- Test the import of this dump in the v2 reader and current reader

## Related issue
Fixes #3435


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-01-31 13:23:57 +00:00
bors[bot]
2f8ebd0501
Merge #3439
3439: Add git config about ownership in Docker CI r=curquiza a=curquiza

The docker CI si failing because of git usage: https://github.com/meilisearch/meilisearch/actions/runs/4053334082/jobs/6973827940

<img width="960" alt="Capture d’écran 2023-01-31 à 12 12 44" src="https://user-images.githubusercontent.com/20380692/215745119-b866bcf2-7077-48e4-b018-7a2085b23680.png">


> fatal: detected dubious ownership in repository at '/home/meili/actions-runner/_work/meilisearch/meilisearch'

I made some research and I found out this https://github.com/actions/runner-images/issues/6775

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-01-31 12:58:59 +00:00
Tamo
6be9a828fa makes clippy happy 2023-01-31 13:03:28 +01:00
Tamo
4b7b2d6a90 fix the import of dump v2 generated by meilisearch v0.22.0 2023-01-31 13:03:28 +01:00
bors[bot]
a4e8158239
Merge #774
774: Update version for the next release (v0.41.1) in Cargo.toml files r=curquiza a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one before merging.

Co-authored-by: curquiza <curquiza@users.noreply.github.com>
2023-01-31 11:51:42 +00:00
bors[bot]
151e52c481
Merge #3433
3433: Add prototype guide to CONTRIBUTING.md r=curquiza a=curquiza



Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Clémentine Urquizar - curqui <clementine@meilisearch.com>
2023-01-31 11:25:46 +00:00
curquiza
e269027cdd Add git config about ownershio in Docker CI 2023-01-31 12:04:41 +01:00
Loïc Lecrenier
a2690ea8d4 Reduce incremental indexing time of words_prefix_position_docids DB
This database can easily contain millions of entries. Thus, iterating
over it can be very expensive.

For regular `documentAdditionOrUpdate` tasks, `del_prefix_fst_words`
will always be empty. Thus, we can save a significant amount of time
by adding this `if !del_prefix_fst_words.is_empty()` condition.

The code's behaviour remains completely unchanged.
2023-01-31 11:42:24 +01:00
bors[bot]
33f61d2cd4
Merge #775
775: Fix clippy for Rust 1.67, allow `uninlined_format_args` r=dureuill a=dureuill

# Pull Request

milli part of https://github.com/meilisearch/meilisearch/pull/3437

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-31 10:29:24 +00:00
bors[bot]
544b581b15
Merge #3437
3437: Make clippy happy for Rust 1.67, allow uninlined_format_args r=Kerollmops a=dureuill

# Pull Request

This PR is the equivalent of #3434 for the `release-v1.0.0` branch.

See #3434 for more information.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-31 10:29:12 +00:00
f3r10
2922c5c899 Fix code format 2023-01-31 11:28:05 +01:00
f3r10
7681be5367 Format code 2023-01-31 11:28:05 +01:00
f3r10
50bc156257 Fix tests 2023-01-31 11:28:05 +01:00
f3r10
d8207356f4 Skip script,language insertion if language is undetected 2023-01-31 11:28:05 +01:00
f3r10
2d58b28f43 Improve script language codec 2023-01-31 11:28:05 +01:00
f3r10
fd60a39f1c Format code 2023-01-31 11:28:05 +01:00
f3r10
369c05732e Add test checking if from script_language_docids database were removed
deleted docids
2023-01-31 11:28:05 +01:00
f3r10
34d04f3d3f Filter from script_language_docids database soft deleted documents 2023-01-31 11:28:05 +01:00
f3r10
a27f329e3a Add tests for checking that detected script and language associated with document(s) were stored during indexing 2023-01-31 11:28:05 +01:00
f3r10
b216ddba63 Delete and clear data from the new database 2023-01-31 11:28:05 +01:00
f3r10
d97fb6117e Extract and index data 2023-01-31 11:28:05 +01:00
f3r10
c45d1e3610 Create a new database on index and add a specialized codec for it 2023-01-31 11:28:05 +01:00
Louis Dureuil
5c0668afcf
clippy: allow uninlined_format_args 2023-01-31 11:13:47 +01:00
Louis Dureuil
20f05efb3c
clippy: needless_lifetimes 2023-01-31 11:12:59 +01:00
Louis Dureuil
cbf029f64c
clippy: --fix 2023-01-31 11:12:59 +01:00
curquiza
bffabf9cc6 Update version for the next release (v0.41.1) in Cargo.toml files 2023-01-31 09:56:22 +00:00
bors[bot]
f647b20818
Merge #3434
3434: Make clippy happy for Rust 1.67, allow `uninlined_format_args` r=Kerollmops a=dureuill

# Pull Request

This PR allows `uninlined_format_args` in CI for clippy.

This is due to https://github.com/rust-lang/rust-clippy/issues/10087, which in particular has correctness issues wrt edition 2018 crates, and is a big change altogether. https://github.com/rust-lang/rust-clippy/pull/10265 is already open in order to change the category of this lint to "pedantic", meaning that if this latter PR merges, a future Rust release will accept our code unmodified wrt uninlined format arguments.

As a result, this PR introduces the following changes:

1. Allow `uninlined_format_args` in the clippy command in CI.
2. Use rewind rather than seek(0)
3. Remove lifetimes that clippy deems needless.

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-31 09:45:12 +00:00
Louis Dureuil
924d5d4c11
clippy: remove needless lifetimes 2023-01-31 10:40:48 +01:00
Louis Dureuil
771a367b97
clippy: use rewind instead of seek 0 2023-01-31 10:40:48 +01:00
Louis Dureuil
07603373f3
clippy: allow uninlined_format_args 2023-01-31 10:15:07 +01:00
Louis Dureuil
d91f8fc493
clippy: Allow uninlined_format_args in CI 2023-01-31 09:56:26 +01:00
Louis Dureuil
3296cf7ae6
clippy: remove needless lifetimes 2023-01-31 09:32:40 +01:00
Louis Dureuil
89675e5f15
clippy: Replace seek 0 by rewind 2023-01-31 09:32:40 +01:00
Louis Dureuil
47b7d515ed
Add more detailed contribution instructions for tests 2023-01-30 17:39:05 +01:00
Clémentine Urquizar - curqui
2ba4629938
Update CONTRIBUTING.md
Co-authored-by: Many the fish <many@meilisearch.com>
2023-01-30 15:56:30 +01:00
curquiza
982dd76042 Improve readability 2023-01-30 14:36:22 +01:00
curquiza
3505ee47f8 Add volume to docker command 2023-01-30 14:33:50 +01:00