Commit Graph

606 Commits

Author SHA1 Message Date
Kerollmops
65528a3e06 Update version for the next release (v1.4.0) in Cargo.toml 2023-08-28 11:52:28 +00:00
dependabot[bot]
e59d7f238c
Bump rustls-webpki from 0.100.1 to 0.100.2
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.100.1 to 0.100.2.
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-22 18:10:53 +00:00
Kerollmops
717b069907
Bump charabia to 0.8.3 2023-08-22 16:25:00 +02:00
irevoire
b947f3bb9d Update version for the next release (v1.3.2) in Cargo.toml 2023-08-16 08:20:36 +00:00
ManyTheFish
cab27c2ab4 upgrade indexmap = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
624fa9052f upgrade deserr = "0.6.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
359ede4862 upgrade fastrand = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
60c11dbdbd upgrade rstar - "0.11.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
dacee40ebc upgrade memmap2 = "0.7.1" 2023-08-10 18:09:02 +02:00
ManyTheFish
6089083a8e upgrade sysinfo = "0.29.7" 2023-08-10 18:09:02 +02:00
ManyTheFish
cc2c19d4c3 upgrade itertools = "0.10.5" 2023-08-10 18:09:02 +02:00
ManyTheFish
a5c56fac8a Update dependencies 2023-08-10 18:09:02 +02:00
meili-bors[bot]
e4e49e63d0
Merge #3993
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza



Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
irevoire
75c87d5391 Update version for the next release (v1.3.1) in Cargo.toml 2023-08-08 10:30:06 +00:00
ManyTheFish
b45c36cd71 Merge branch 'main' into tmp-release-v1.3.0 2023-08-01 15:05:17 +02:00
Clément Renault
d8b47b689e
Use the new read-txn-no-tls heed feature 2023-07-26 15:45:15 +02:00
Kerollmops
29ab54b259
Replace the hnsw crate by the instant-distance one 2023-07-25 12:37:35 +02:00
ManyTheFish
0497f93494 Update Charabia to the last version 2023-07-19 15:19:32 +02:00
meili-bors[bot]
2dfbb6813a
Merge #3913
3913: Expose a Puffin server to profile the indexing process r=Kerollmops a=Kerollmops

This PR exposes a puffin HTTP server to expose the internal timing it takes to index documents, delete documents, or update the settings of an index.

<img width="1752" alt="Capture d’écran 2023-07-10 à 18 44 58" src="https://github.com/meilisearch/meilisearch/assets/3610253/a3c7a6bf-db5b-42f4-8be1-c4e31c869843">

## To be done

 - [x] Move the puffin HTTP server under a feature flag.
 - [x] Use [the `puffin::set_scopes_on` function](https://docs.rs/puffin/latest/puffin/fn.set_scopes_on.html) to toggle it (by using the feature directly).
     When this function is called with `false`, [a call to `profile_scope!` talked 1-2ns](https://docs.rs/puffin/latest/puffin/fn.set_scopes_on.html).
 - [x] Create a _PROFILING.md_ file explaining how to use it.
   - [x] Explain that merging scopes on the interface is not always useful.
 - [x] Add more info on the number of batched tasks (using the `puffin::profile_scope!` macro data).
   - I added more info, but that's more continuous work when we consider we need more info here and there.
 - [x] Clean up some scopes, and don't touch too much code to inject puffin.
   - I am not sure that the _index_documents/mod.rs_ function is that complex with the addition of the scope.
 - [x] Think about what we consider frames. One indexation operation or the wall program. When must we stop the frame, then?
   - What we consider a frame is one single `IndexScheduler::tick` execution.
   - We can change that later.

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-07-19 09:44:01 +00:00
Kerollmops
eef95de30e
First iteration on exposing puffin profiling 2023-07-18 17:38:13 +02:00
ManyTheFish
c106906f8f deactivate camelCase segmentation 2023-07-13 12:06:27 +02:00
Kerollmops
8ba1c8f88f
Update proc-macro2 to compile with the latest nightly 2023-07-12 11:47:27 +02:00
Kerollmops
e7f8daaf86
Update criterion to 0.5.1 to remove the atty dependency 2023-07-03 18:51:42 +02:00
Kerollmops
d1ff631df8
Replace the atty dependency with the is-terminal one 2023-07-03 18:51:42 +02:00
gillian-meilisearch
1d40452057 Update version for the next release (v1.3.0) in Cargo.toml 2023-07-03 08:32:21 +00:00
meili-bors[bot]
661d1f90dc
Merge #3866
3866: Update charabia v0.8.0 r=dureuill a=ManyTheFish

# Pull Request

Update Charabia:
- enhance Japanese segmentation
- enhance Latin Tokenization
  - words containing `_` are now properly segmented into several words
  - brackets `{([])}` are no more considered as context separators so word separated by brackets are now considered near together for the proximity ranking rule
- fixes #3815
- fixes #3778
- fixes [product#151](https://github.com/meilisearch/product/discussions/151)

> Important note: now the float numbers are segmented around the `.` so `3.22` is segmented as [`3`, `.`, `22`] but the middle dot isn't considered as a hard separator, which means that if we search `3.22` we find documents containing `3.22`

Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-06-29 15:24:36 +00:00
ManyTheFish
e8dee3ca65 Update lock file 2023-06-29 17:02:24 +02:00
ManyTheFish
84845de9ef Update Charabia 2023-06-29 15:56:32 +02:00
Kerollmops
a385642ec3
Replace the BTreeMap by an IndexMap to return values in order 2023-06-29 14:33:31 +02:00
Kerollmops
737aec1705
Expose an _semanticSimilarity as a dot product in the documents 2023-06-27 12:32:41 +02:00
Kerollmops
c79e82c62a
Move back to the hnsw crate
This reverts commit 7a4b6c065482f988b01298642f4c18775503f92f.
2023-06-27 12:32:39 +02:00
Kerollmops
268a9ef416
Move to the hgg crate 2023-06-27 12:32:38 +02:00
Clément Renault
4571e512d2
Store the vectors in an HNSW in LMDB 2023-06-27 12:32:38 +02:00
Clément Renault
34349faeae
Create a new _vector extractor 2023-06-27 12:32:37 +02:00
meili-bors[bot]
45636d315c
Merge #3670
3670: Fix addition deletion bug r=irevoire a=irevoire

The first commit of this PR is a revert of https://github.com/meilisearch/meilisearch/pull/3667. It re-enable the auto-batching of addition and deletion of tasks. No new changes have been introduced outside of `milli`. So all the changes you see on the autobatcher have actually already been reviewed.

It fixes https://github.com/meilisearch/meilisearch/issues/3440.

### What was happening?

The issue was that the `external_documents_ids` generated in the `transform` were used in a very strange way that wasn’t compatible with the deletion of documents.
Instead of doing a clear merge between the external document IDs of the DB and the one returned by the transform + writing it on disk, we were doing some weird tricks with the soft-deleted to avoid writing the fst on disk as much as possible.
The new algorithm may be a bit slower but is way more straightforward and doesn’t change depending on if the soft deletion was used or not. Here is a list of the changes introduced:
1. We now do a clear distinction between the `new_external_documents_ids` coming from the transform and only held on RAM and the `external_documents_ids` coming from the DB.
2. The `new_external_documents_ids` (coming out of the transform) are now represented as an `fst`. We don't need to struggle with the hard, soft distinction + the soft_deleted => That's easier to understand
3. When indexing documents, we merge the `external_documents_ids` coming from the DB and the `new_external_documents_ids` coming from the transform.

### Other things introduced in this  PR

Since we constantly have to write small, very specialized fuzzers for this kind of bug, we decided to push the one used to reproduce this bug.
It's not perfect, but it's easy to improve in the future.
It'll also run for as long as possible on every merge on the main branch.

Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Loïc Lecrenier <loic.lecrenier@icloud.com>
2023-06-19 09:09:30 +00:00
Tamo
6c6387d05e
move the fuzzer to its own crate 2023-05-29 12:27:39 +02:00
Tamo
4391cba6ca
fix the addition + deletion bug 2023-05-17 18:28:57 +02:00
Kerollmops
1a79fd0c3c
Use the new heed v0.12.6 2023-05-15 11:42:30 +02:00
Kerollmops
c4a40e7110
Use the writemap flag to reduce the memory usage 2023-05-15 10:15:33 +02:00
curquiza
3533d4f2bb Update version for the next release (v1.2.0) in Cargo.toml 2023-05-08 17:52:33 +00:00
Louis Dureuil
90bc230820
Merge remote-tracking branch 'origin/main' into search-refactor
Conflicts | resolution
----------|-----------
Cargo.lock | added mimalloc
Cargo.toml |  took origin/main version
milli/src/search/criteria/exactness.rs | deleted after checking it was only clippy changes
milli/src/search/query_tree.rs | deleted after checking it was only clippy changes
2023-05-03 12:19:06 +02:00
ManyTheFish
1bf2694604 Update cargo lock 2023-04-26 17:41:29 +02:00
Kerollmops
a109802d45
Upgrade the incompatible versions of the dependencies 2023-04-24 17:50:57 +02:00
Kerollmops
47b66e49b8
Upgrade the compatible versions of the dependencies 2023-04-24 17:50:52 +02:00
bors[bot]
654a3a9e19
Merge #3688
3688: Following release v1.1.1: bring back changes into `main` r=curquiza a=curquiza

`@meilisearch/engine-team` ensure the changes we bring to `main` are the ones you want

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: dureuill <dureuill@users.noreply.github.com>
2023-04-24 11:38:23 +00:00
dependabot[bot]
f0b4046c43
Bump h2 from 0.3.15 to 0.3.17
Bumps [h2](https://github.com/hyperium/h2) from 0.3.15 to 0.3.17.
- [Release notes](https://github.com/hyperium/h2/releases)
- [Changelog](https://github.com/hyperium/h2/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/h2/compare/v0.3.15...v0.3.17)

---
updated-dependencies:
- dependency-name: h2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-04-13 17:03:48 +00:00
dureuill
cd45d21d6e Update version for the next release (v1.1.1) in Cargo.toml 2023-04-13 13:25:10 +00:00
Loïc Lecrenier
9259cdb12e Update Cargo.lock (was mistakenly changed during rebase) 2023-03-20 09:41:56 +01:00
Loïc Lecrenier
6c659dc12f Use MiMalloc in milli tests 2023-03-20 09:41:37 +01:00
curquiza
577e7126f9 Update version for the next release (v1.1.0) in Cargo.toml 2023-03-06 13:52:54 +00:00
Louis Dureuil
42577403d8
Authentication: Directly pass the authfilter to the index scheduler 2023-02-22 16:35:52 +01:00
bors[bot]
39407885c2
Merge #3347
3347: Enhance language detection r=irevoire a=ManyTheFish

## Summary

Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.

This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.

## Detail

- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese

## Related issues
Fixes #2403
Fixes #3513

Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-02-21 10:52:13 +00:00
ManyTheFish
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
bors[bot]
1e9ac00800
Merge #3505
3505: Csv delimiter r=irevoire a=irevoire

Fixes https://github.com/meilisearch/meilisearch/issues/3442
Closes https://github.com/meilisearch/meilisearch/pull/2803
Specified in https://github.com/meilisearch/specifications/pull/221

This PR is a reimplementation of https://github.com/meilisearch/meilisearch/pull/2803, on the new engine. Thanks for your idea and initial PR `@MixusMinimax;` sorry I couldn’t update/merge your PR. Way too many changes happened on the engine in the meantime.

**Attention to reviewer**; I had to update deserr to implement the support of deserializing `char`s

-------

It introduces four new error messages;
- Invalid value in parameter csvDelimiter: expected a string of one character, but found an empty string
- Invalid value in parameter csvDelimiter: expected a string of one character, but found the following string of 5 characters: doggo
- csv delimiter must be an ascii character. Found: 🍰 
- The Content-Type application/json does not support the use of a csv delimiter. The csv delimiter can only be used with the Content-Type text/csv.

And one error code;
- `invalid_index_csv_delimiter`

The `invalid_content_type` error code is now also used when we encounter the `csvDelimiter` query parameter with a non-csv content type.

Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 17:01:36 +00:00
ManyTheFish
cb8d5f2d4b Update Charabia to 0.7.1 2023-02-20 14:00:31 +01:00
Tamo
8c074f5028 implements the csv delimiter without tests
Co-authored-by: Maxi Barmetler <maxi.barmetler@gmail.com>
2023-02-16 17:35:36 +01:00
Tamo
74d1a67a99 Use the workspace inheritance feature of rust 1.64 2023-02-15 13:51:07 +01:00
Tamo
a43765d454
use the pre-defined deserr extractors 2023-02-14 20:05:30 +01:00
Tamo
8fb7b1d10f
bump deserr 2023-02-14 20:04:30 +01:00
Clément Renault
4570d5bf3a
Merge remote-tracking branch 'origin/main' into temp-wildcard 2023-02-09 13:14:05 +01:00
dependabot[bot]
5f56e6dd58
Bump tokio from 1.24.1 to 1.24.2
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.24.1 to 1.24.2.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-02-07 12:14:05 +00:00
bors[bot]
c88c3637b4
Merge #3461
3461: Bring v1 changes into main r=curquiza a=Kerollmops

Also bring back changes in milli (the remote repository) into main done during the pre-release

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Philipp Ahlner <philipp@ahlner.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-02-07 11:27:27 +00:00
bors[bot]
821d92b5d0
Merge #3407
3407: Add Cargo feature for LMDB's POSIX semaphores r=dureuill a=GregoryConrad

See https://github.com/meilisearch/milli/pull/757

Co-authored-by: Gregory Conrad <gregorysconrad@gmail.com>
2023-02-07 08:25:20 +00:00
Kerollmops
fbec48f56e
Merge remote-tracking branch 'milli/main' into bring-v1-changes 2023-02-06 16:48:10 +01:00
Kerollmops
a377a49218
Make meiliserach depend on the local milli 2023-02-06 16:44:43 +01:00
Kerollmops
a015e232ab
Merge remote-tracking branch 'origin/release-v1.0.0' into bring-v1-changes 2023-02-06 16:41:10 +01:00
Kerollmops
d563ed8a39
Making it work with index uid patterns 2023-02-01 17:51:30 +01:00
ManyTheFish
f4569b04ad Update Charabia version 2023-02-01 15:26:26 +01:00
Louis Dureuil
231067a1c4
Bump milli to v0.41.1 2023-02-01 11:53:39 +01:00
Tamo
8356f109c1 bump milli to fix the last test 2023-01-25 16:45:11 +01:00
bors[bot]
217504fff3
Merge #3406
3406: Master Key: Implements errors and warnings from the specification r=irevoire a=dureuill

<sub>Now in technicolor</sub>

# Pull Request

## What does this PR do?
- Uses `atty` and `termcolor` as dependency
- Use these dependencies to print colored background for warning messages
- Update messages to match https://github.com/meilisearch/specifications/pull/209

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-01-23 16:39:18 +00:00
Tamo
c79b6a1ee4
bump milli 2023-01-23 14:13:19 +01:00
dependabot[bot]
5f4497935f
Bump libgit2-sys from 0.14.1+1.5.0 to 0.14.2+1.5.1
Bumps [libgit2-sys](https://github.com/rust-lang/git2-rs) from 0.14.1+1.5.0 to 0.14.2+1.5.1.
- [Release notes](https://github.com/rust-lang/git2-rs/releases)
- [Commits](https://github.com/rust-lang/git2-rs/compare/0.14.1...libgit2-sys-0.14.2)

---
updated-dependencies:
- dependency-name: libgit2-sys
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-20 23:38:40 +00:00
Gregory Conrad
3f69dd6450 feat: add Cargo feature for LMDB's POSIX semaphores 2023-01-19 12:08:38 -05:00
Louis Dureuil
0de9a3ffe7
Implements errors and warnings from the specification
Now in technicolor
2023-01-19 18:04:45 +01:00
Loïc Lecrenier
c71a8ea183 Update to latest milli and deserr 2023-01-17 13:10:38 +01:00
Loïc Lecrenier
07b90dec08 Remove unused proptest dependency 2023-01-17 11:07:07 +01:00
Loïc Lecrenier
9194508a0f Refactor query parameter deserialisation logic 2023-01-17 11:07:07 +01:00
Loïc Lecrenier
766dd830ae Update deserr to latest version + add new error codes for missing fields
- missing_api_key_indexes
- missing_api_key_actions
- missing_api_key_expires_at

- missing_swap_indexes_indexes
2023-01-17 09:43:07 +01:00
Loïc Lecrenier
436ae4e466 Improve error messages generated by deserr
Split Json and Query Parameter error types
2023-01-17 09:43:07 +01:00
Kerollmops
507a7bad96
Use the local milli subcrate 2023-01-16 17:35:54 +01:00
Kerollmops
cde62fcb5b
Merge remote-tracking branch 'origin/release-v1.0.0' into import-milli 2023-01-16 17:35:18 +01:00
Kerollmops
03a82136dc
Remove the useless cli subcrate 2023-01-16 17:08:43 +01:00
Kerollmops
97005dd505
Bump the milli-imported crates to v1.0.0 2023-01-16 16:29:12 +01:00
Kerollmops
0cec352d2b
Merge remote-tracking branch 'milli/main' into import-milli 2023-01-16 16:20:22 +01:00
Tamo
bf573885ea
integrate the latest version of milli 2023-01-11 19:08:39 +01:00
Loïc Lecrenier
b0b7ad7caf
Apply review suggestions 2023-01-11 19:08:39 +01:00
Loïc Lecrenier
c91ffec72e
Update Cargo.toml 2023-01-11 19:08:39 +01:00
Loïc Lecrenier
1fc11264e8
Refactor deserr integration 2023-01-11 19:08:39 +01:00
Tamo
d17efb9ed6
use the published version of deserr 2023-01-09 12:51:10 +01:00
Tamo
50ce0409bc
Integrate deserr on the most important routes 2023-01-05 20:48:29 +01:00
curquiza
28408816ef Update version for the next release (v1.0.0) in Cargo.toml files 2023-01-05 11:45:15 +00:00
Loïc Lecrenier
2d74678b51 Replace underscores with hyphens in doc link to error code 2023-01-05 10:09:02 +01:00
Louis Dureuil
fcbd47281b
Fix tests 2023-01-04 14:24:20 +01:00
Louis Dureuil
0e98a71a24
Update milli to v0.38 2023-01-04 14:24:20 +01:00
Louis Dureuil
66e18eae79
auth: add generate_master_key function 2022-12-22 11:55:27 +01:00
Tamo
d8fb506c92
handle most io error instead of tagging everything as an internal 2022-12-19 20:50:40 +01:00
Kerollmops
60c3bac108 Bump milli to v0.37.3 2022-12-14 17:25:40 +01:00
jiangbo212
7c24fea9f2 Merge branch 'main' into fix-3037 2022-12-13 05:16:03 +08:00
curquiza
4631f4d97f Bump milli to v0.37.2 2022-12-08 18:16:48 +01:00