662: Enhance word splitting strategy r=ManyTheFish a=akki1306
# Pull Request
## Related issue
Fixes#648
## What does this PR do?
- [split_best_frequency](55d889522b/milli/src/search/query_tree.rs (L282-L301)) to use frequency of word pairs near together with proximity value of 1 instead of considering the frequency of individual words. Word pairs having max frequency are considered.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Akshay Kulkarni <akshayk.gj@gmail.com>
604: Speed up debug builds r=Kerollmops a=loiclec
Note: this draft PR is based on https://github.com/meilisearch/milli/pull/601 , for no particular reason.
## What does this PR do?
Make a series of changes with the goal of speeding up debug builds:
1. Add an `all_languages` feature which compiles charabia with its `default` features activated.
The `all_languages` feature is activated by default. But running:
```
cargo build --no-default-features
```
on `milli` is now much faster.
2. Reduce the debug optimisation level from 3 to 0, except for a few critical dependencies.
3. Compile the build dependencies quicker as well. Previously, all build dependencies were compiled with `opt-level = 3`. Now, only the critical build dependencies are compiled with optimisations.
4. Reduce the amount of code generated by the `documents!` macro
5. Make the "progress update" closure provided to indexing functions a trait object instead of a generic parameter. This avoids monomorphising the indexing code multiple times needlessly.
## Results
Initial build times on my computer before and after these changes:
| | cargo check | cargo check --no-default-features | cargo test | cargo test --lib | cargo test --no-default-features | cargo test --lib --no-default-features |
|--------|-------------|-----------------------------------|------------|------------------|----------------------------------|----------------------------------------|
| before | 1m05s | 1m05s | 2m06s | 1m47s | 2m06 | 1m47s |
| after | 28.9s | 13.1s | 40s | 38s | 23s | 21s |
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
658: Add proximity calculation for the same word r=ManyTheFish a=msvaljek
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/milli/issues/647
## What does this PR do?
- During [the increase of the current word position](d94339a858/milli/src/update/index_documents/extract/extract_word_pair_proximity_docids.rs (L129-L135)) we extract the proximity between the current position and the next one.
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: msvaljek <marko.svaljek@commercetools.com>
654: Re-upload milli's logo r=curquiza a=jeertmans
# Pull Request
## Related issue
None
## What does this PR do?
Apparently, some [commit](add96f921b) deleted the logo file, and updated the `src` path. It seems to me that this was an error, and that the logo file should have been moved, not deleted.
This fixes the problem of seeing this (see image) instead of the actual logo.
![image](https://user-images.githubusercontent.com/27275099/193786803-e0d11a59-48fa-4331-bd92-48457969d766.png)
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Jérome Eertmans <jeertmans@icloud.com>
650: Add missing logging timer to extractors r=Kerollmops a=vishalsodani
# Pull Request
## What does this PR do?
#645
<!-- Please link the issue you're trying to fix with this PR, if none then please create an issue first. -->
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: vishalsodani <vishalsodani@rediffmail.com>
653: Fix#652 - Change Spelling of `author` in `README.md` r=curquiza a=anirudhRowjee
# Pull Request
## What does this PR do?
Fixes#652
- Changes spellings of `au{hor` to `author`
- Minor formatting changes in Markdown
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Anirudh Rowjee <ani.rowjee@gmail.com>
649: Update Hacktoberfest section in CONTRIBUTING.md r=curquiza a=meili-bot
_This PR is auto-generated._
Following: af850854e4
Update Hacktoberfest section in CONTRIBUTING.md with the global guideline information.
Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
642: Remove LTO in release profile r=Kerollmops a=loiclec
Since we can't enable it in Meilisearch (see https://github.com/meilisearch/meilisearch/pull/2717 ), we should not enable it in milli either. The goal is for milli's benchmarks to accurately represent its performance within meilisearch.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
641: Remove `helpers` crate r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Remove the `helpers` crates, because (I think) we don't use it. This should have been part of https://github.com/meilisearch/milli/pull/636 , but I forgot about it then :)
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
636: Remove unused `infos`, `http-ui`, and `milli/fuzz`, crates r=ManyTheFish a=loiclec
We haven't used the `infos/`, `http-ui/` and `milli/fuzz/` crates in a long time. They are not properly maintained and probably do not work correctly anymore.
This PR removes these crates entirely from the workspace to reduce the amount of code we need to maintain.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec
# Pull Request
## What does this PR do?
Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.
In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
638: Update version for the next release (v0.33.4) in Cargo.toml files r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
637: We avoid skipping errors in the indexing pipeline r=ManyTheFish a=Kerollmops
This PR is related to https://github.com/meilisearch/meilisearch/issues/2764 and should fix it when merged into Meilisearch.
Co-authored-by: Kerollmops <clement@meilisearch.com>
632: Make charabia default feature optional r=ManyTheFish a=vincent-herlemont
# Pull Request
## What does this PR do?
Fixes [#627](https://github.com/meilisearch/milli/issues/627#issuecomment-1239769122)
Thank you so much for contributing to Meilisearch!
Co-authored-by: Vincent Herlemont <vincent@herlemont.fr>
633: Upgrade ubuntu-18.04 to 20.04 r=Kerollmops a=curquiza
Ubuntu-18.04 is going to be deprecated by GitHub
https://github.com/actions/runner-images/issues/6002
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>