9033 Commits

Author SHA1 Message Date
Clément Renault
af0f6f0bf0
Merge branch 'main' into update-version-v1.4.0 2023-08-28 15:08:59 +02:00
meili-bors[bot]
ccf3ba3f32
Merge #4019
4019: Bringing back changes from `v1.3.2` onto `main` r=irevoire a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2023-08-28 12:14:11 +00:00
Kerollmops
65528a3e06 Update version for the next release (v1.4.0) in Cargo.toml 2023-08-28 11:52:28 +00:00
Clément Renault
6db80b0836
Define the full Homebrew formula path 2023-08-24 11:24:47 +02:00
meili-bors[bot]
cdb4b3e024
Merge #4013
4013: Fix the ranking rule by temporarily disabling an assert in the bucket sort algorithm r=Kerollmops a=Kerollmops

This PR temporarily disables an assertion, making the search crash. [I created a tracking issue](https://github.com/meilisearch/meilisearch/issues/4012) to find a better way to fix this.

It no longer reverts a20e4d447ce3022443d25223d34d1a7a8e98e31f, which seemed to generate unreachable graphs and make the bucket sort ranking algorithm panic because of entering an unreachable state. We discussed that below in the comments.

Temporary fixes #4002, fixes #4006, and fixes #3995.

---

It took me approximately 2 days to find the first bad commit just because I'm bad in `git bisect` x `bash`, i.e. [I misused `%1` with `$!` to kill the most recently backgrounded job](https://unix.stackexchange.com/a/340084/212574)...

<details>
  <summary>Here is the script I used to find the invalid commit</summary>

```bash
#!/usr/bin/env bash

set -x

# remove the data
rm -rf data.ms

# build meilisearch
cargo build --release
# ignore this commit if it doesn't compile
if [[ $? != 0 ]]; then
    exit 125
fi

# index the dump and start from it
./target/release/meilisearch \
--http-addr 'localhost:7705' \
--import-dump $HOME/Downloads/modified-20230822-083016113.dump &

# wait 10 sec while it indexes the docs
sleep 5

# check if the server crashes on requests
echo '{
    "q": "rtx 305",
    "attributesToHighlight": [
        "*"
    ],
    "highlightPreTag": "<ais-highlight-0000000000>",
    "highlightPostTag": "</ais-highlight-0000000000>",
    "limit": 21,
    "offset": 0
}' | xh 'localhost:7705/indexes/arvutitark_local_orderables/search'

last_exit_code=$?

# Now kill Meilisearch
kill $!

# Clean the potential Cargo.lock
git checkout .

exit $last_exit_code
```
</details>

Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
v1.3.2
2023-08-23 15:30:56 +00:00
Clément Renault
8c0ebd1331
Update milli/src/search/new/bucket_sort.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-08-23 16:40:39 +02:00
Kerollmops
5130e06b41
Temporarily disable an assert in the ranking rules 2023-08-23 16:11:54 +02:00
Clément Renault
08e27ef73f
Merge pull request #4008 from meilisearch/fix-highlighting-panic
Bump charabia to 0.8.3
2023-08-23 11:56:45 +02:00
meili-bors[bot]
914b125c5f
Merge #3945
3945: Do not leak field information on error r=Kerollmops a=vivek-26

# Pull Request

## Related issue
Fixes #3865

## What does this PR do?
This PR ensures that `InvalidSortableAttribute`and `InvalidFacetSearchFacetName` errors do not leak field information i.e. fields which are not part of `displayedAttributes` in the settings are hidden from the error message.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Vivek Kumar <vivek.26@outlook.com>
2023-08-22 18:55:27 +00:00
dependabot[bot]
e59d7f238c
Bump rustls-webpki from 0.100.1 to 0.100.2
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.100.1 to 0.100.2.
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.100.1...v/0.100.2)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-08-22 18:10:53 +00:00
Kerollmops
717b069907
Bump charabia to 0.8.3 2023-08-22 16:25:00 +02:00
meili-bors[bot]
7ea154673a
Merge #4000
4000: Update version for the next release (v1.3.2) in Cargo.toml r=irevoire a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: irevoire <irevoire@users.noreply.github.com>
2023-08-16 10:41:33 +00:00
irevoire
b947f3bb9d Update version for the next release (v1.3.2) in Cargo.toml 2023-08-16 08:20:36 +00:00
meili-bors[bot]
4c35817c5f
Merge #3998
3998: Accept the `null` JSON value as a value of the `_vectors` field r=irevoire a=Kerollmops

This PR fixes #3979 by accepting `null` JSON values in the `_vectors` fields provided by the user.

Can the reviewer please verify that I am merging in the right branch?
I think we must create a new _release-v1.3.2_.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-08-16 08:12:24 +00:00
Kerollmops
c53841e166
Accept the null JSON value as the value of _vectors 2023-08-14 16:03:55 +02:00
meili-bors[bot]
fd81945597
Merge #3987
3987: Update dependencies for v1.4 r=curquiza a=ManyTheFish

# Pull Request

## Related issue
Fixes #3870 

## What does this PR do?
- [Update dependencies](d7ff5368b4)
- [upgrade itertools = "0.10.5"](d0582d01f4)
- [upgrade sysinfo = "0.29.7"](507c661352)
- [upgrade memmap2 = "0.7.1"](489e0d5cd0)
- [upgrade rstar = "0.11.0"](3d9d08e3b2)
- [upgrade fastrand = "2.0.0"](1af7083c48)
- [upgrade deserr = "0.6.0"](7fe77045af)
- [upgrade indexmap = "2.0.0"](95e4960b0c)
- [update rust toolchain = "1.71.1"](937b7b5da5)

## Remaining un-upgraded dependencies
- vergen 7.5.1 --> 8.2.4: I wasn't able to quickly understand the changes in the lib API to upgrade the dependency
- rustls 0.20.8 --> 0.21.6: Meilisearch doesn't have any direct dependency on it


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 16:46:17 +00:00
ManyTheFish
794e491152 update rust toolchain 2023-08-10 18:09:02 +02:00
ManyTheFish
cab27c2ab4 upgrade indexmap = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
624fa9052f upgrade deserr = "0.6.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
359ede4862 upgrade fastrand = "2.0.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
60c11dbdbd upgrade rstar - "0.11.0" 2023-08-10 18:09:02 +02:00
ManyTheFish
dacee40ebc upgrade memmap2 = "0.7.1" 2023-08-10 18:09:02 +02:00
ManyTheFish
6089083a8e upgrade sysinfo = "0.29.7" 2023-08-10 18:09:02 +02:00
ManyTheFish
cc2c19d4c3 upgrade itertools = "0.10.5" 2023-08-10 18:09:02 +02:00
ManyTheFish
a5c56fac8a Update dependencies 2023-08-10 18:09:02 +02:00
meili-bors[bot]
e4e49e63d0
Merge #3993
3993: Bringing back changes from v1.3.1 to `main` r=irevoire a=curquiza



Co-authored-by: irevoire <irevoire@users.noreply.github.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-10 14:30:02 +00:00
meili-bors[bot]
00bd7bd19a
Merge #3990
3990: Removed unnecessary borrow call that failed nightly tests r=irevoire a=JannisK89

# Pull Request

## Related issue
Fixes #3988

## What does this PR do?
- Removes unnecessary borrow call that was causing warnings when running tests on nightly.

## PR checklist
Please check if your PR fulfills the following requirements:
- [ x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ x] Have you read the contributing guidelines?
- [ x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Please let me know if there is anything else I can do to improve this PR.
Thank you.

Co-authored-by: JannisK89 <jannis.karanikis@gmail.com>
2023-08-10 11:42:19 +00:00
meili-bors[bot]
ef3d098b4d
Merge #3976
3976: Fix the get stats method r=ManyTheFish a=irevoire

# Pull Request

- The get stats method of the index-scheduler was not using at all the processing tasks. That was returning a wrong number of enqueued tasks and 0 processing tasks.
- Added a test
- Currently this method was **ONLY** used to compute the `meilisearch_nb_tasks` field of the **experimental feature** metrics.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3972


Co-authored-by: Tamo <tamo@meilisearch.com>
v1.3.1
2023-08-10 10:55:50 +00:00
meili-bors[bot]
8084cf29f3
Merge #3946
3946: Settings customizing tokenization r=irevoire a=ManyTheFish

# Pull Request
This pull Request allows the User to customize Meilisearch Tokenization by providing specialized settings.

## Small documentation
All the new settings can be set and reset like the other index settings by calling the route `/indexes/:name/settings`

### `nonSeparatorTokens`
The Meilisearch word segmentation uses a default list of separators to segment words, however, for specific use cases some of the default separators shouldn't be considered separators, the `nonSeparatorTokens` setting allows to remove of some tokens from the default list of separators.

***Request payload `PUT`- `/indexes/articles/settings/non-separator-tokens`***
```json
["`@",` "#", "&"]
```

### `separatorTokens`
Some use cases need to define additional separators, some are related to a specific way of parsing technical documents some others are related to encodings in documents,  the `separatorTokens` setting allows adding some tokens to the list of separators.

***Request payload `PUT`- `/indexes/articles/settings/separator-tokens`***
```json
["&sect;", "&sep"]
```

### `dictionary`
The Meilisearch word segmentation relies on separators and language-based word-dictionaries to segment words, however, this segmentation is inaccurate on technical or use-case specific vocabulary (like `G/Box` to say `Gear Box`), or on proper nouns (like `J. R. R.` when parsing `J. R. R. Tolkien`), the `dictionary` setting allows defining a list of words that would be segmented as described in the list.

***Request payload `PUT`- `/indexes/articles/settings/dictionary`***
```json
["J. R. R.", "J.R.R."]
```

these last feature synergies well with the `stopWords` setting or the `synonyms` setting allowing to segment words and correctly retrieve the synonyms:
***Request payload `PATCH`- `/indexes/articles/settings`***
```json
{
    "dictionary": ["J. R. R.", "J.R.R."],
    "synonyms": {
            "J.R.R.": ["jrr", "J. R. R."],
            "J. R. R.": ["jrr", "J.R.R."],
            "jrr": ["J.R.R.", "J. R. R."],
    }
}
```

### Related specifications:
- https://github.com/meilisearch/specifications/pull/255
- https://github.com/meilisearch/specifications/pull/254

### Try it with Docker

```bash
$ docker pull getmeili/meilisearch:prototype-tokenizer-customization-3
```

## Related issue
Fixes #3610
Fixes #3917
Fixes https://github.com/meilisearch/product/discussions/468
Fixes https://github.com/meilisearch/product/discussions/160
Fixes https://github.com/meilisearch/product/discussions/260
Fixes https://github.com/meilisearch/product/discussions/381
Fixes https://github.com/meilisearch/product/discussions/131
Related to https://github.com/meilisearch/meilisearch/issues/2879

Fixes #2760

## What does this PR do?
- Add a setting `nonSeparatorTokens` allowing to remove a token from the default separator tokens
- Add a setting `separatorTokens` allowing to add a token in the separator tokens
- Add a setting `dictionary` allowing to override the segmentation on specific words
- add new error code `invalid_settings_non_separator_tokens` (invalid_request)
- add new error code `invalid_settings_separator_tokens` (invalid_request)
- add new error code `invalid_settings_dictionary` (invalid_request)

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-08-10 10:01:18 +00:00
ManyTheFish
5a7c1bde84 Fix clippy 2023-08-10 11:27:56 +02:00
ManyTheFish
6b2d671be7 Fix PR comments 2023-08-10 10:44:07 +02:00
Many the fish
43c13faeda
Update milli/src/update/index_documents/extract/extract_docid_word_positions.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-10 10:05:03 +02:00
meili-bors[bot]
29adfc2f68
Merge #3989
3989: Improve test suite CI for manual trigger events r=irevoire a=curquiza

# Why?

To be able to test https://github.com/meilisearch/meilisearch/issues/3988 before merging the PR solving it

# How do we ensure this PR works?

I triggered `workflow_dispatch` (i.e. manual trigger) on this branch, and we can see all the jobs have been triggered (even if some of them are failing -> it's another issue)
https://github.com/meilisearch/meilisearch/actions/runs/5810609073

We can see the tests triggered by the PR are restricted as expected: https://github.com/meilisearch/meilisearch/actions/runs/5810605977

Co-authored-by: curquiza <clementine@meilisearch.com>
2023-08-10 07:55:48 +00:00
JannisK89
064ee95b1c removed unnecessary borrow call 2023-08-10 08:41:25 +02:00
curquiza
604d533b31 Improve test suite CI for workflow_dispatch event 2023-08-09 16:47:28 +02:00
meili-bors[bot]
44c1900f36
Merge #3986
3986: Fix geo bounding box with strings r=ManyTheFish a=irevoire

# Pull Request

When sending a document with one geofield of type string (i.e.: `{ "_geo": { "lat": 12, "lng": "13" }}`), the geobounding box would exclude this document.

This PR fixes this issue by automatically parsing the string value in case we're working on a geofield.

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/3973

## What does this PR do?
- Automatically parse the facet value iif we're working on a geofield.
- Make insta works with snapshots in loops or closure executed multiple times. (you may need to update your cli if it panics after this PR: `cargo install cargo-insta`).
- Add one integration test in milli and in meilisearch to ensure it works forever.
- Add three snapshots for the dump that mysteriously disappeared I don't know how


Co-authored-by: Tamo <tamo@meilisearch.com>
2023-08-09 07:58:15 +00:00
meili-bors[bot]
04671d0751
Merge #3981
3981: Truncate the normalized long facets used in the search for facet value r=irevoire a=ManyTheFish

# Pull Request
 Truncate the normalized long facets used in the search for facet value

## targeted release

v1.3.1

## Related issue
Fixes #3978


Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-08-08 15:07:07 +00:00
Tamo
4f4c669d50 add back some dump snapshots that disappeared. it's completely unrelated to this PR 2023-08-08 16:58:14 +02:00
ManyTheFish
8dc5acf998 Try fix prototype-fix-synonyms-with-separators-0 2023-08-08 16:52:36 +02:00
ManyTheFish
fc2590fc9d Add a test 2023-08-08 16:43:08 +02:00
ManyTheFish
35758db9ec Truncate the the normalized long facets used in search for facet value 2023-08-08 16:38:30 +02:00
Tamo
4988199bb9 ensure the geoboundingbox works with strings and int geofields in milli and meilisearch 2023-08-08 16:29:25 +02:00
Tamo
83991ee770 enable the multi-snapshot attribute in insta. This will let us use insta in loops 2023-08-08 16:28:38 +02:00
Tamo
9d061cec26 automatically parse the filterable attribute to float if it's a geo field 2023-08-08 16:28:07 +02:00
ManyTheFish
4a21fecf67 Merge branch 'main' into settings-customizing-tokenization prototype-tokenizer-customization-3 2023-08-08 16:08:16 +02:00
ManyTheFish
ae8e69c030 Add API route for the new settings 2023-08-08 16:03:16 +02:00
Tamo
fe819a9d80 fix the get stats method
It was not taking into account the processing tasks at all
2023-08-08 13:21:15 +02:00
meili-bors[bot]
e338ceb97f
Merge #3982
3982: Update version for the next release (v1.3.1) in Cargo.toml r=irevoire a=meili-bot

⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.

Co-authored-by: irevoire <irevoire@users.noreply.github.com>
2023-08-08 10:30:56 +00:00
irevoire
75c87d5391 Update version for the next release (v1.3.1) in Cargo.toml 2023-08-08 10:30:06 +00:00
Vivek Kumar
dd57873f8e
hide fields not in the displayedAttributes list from errors 2023-08-05 16:03:10 +05:30