5135: Check all search filter attributes are filterable upfront r=curquiza a=jameshiew
# Pull Request
## Related issue
Fixes#5069
## What does this PR do?
- checks all `fid`s in the `Filter` tree are filterable before evaluating search query
- returns AttributeNotFilterable error if any are not
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: James Hiew <james@hiew.net>
5187: Bring back v1.12.0 of pre-release changes into `main` r=irevoire a=curquiza
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Many the fish <many@meilisearch.com>
5174: Split tests for option crate meilisearch in a separate test file r=irevoire a=K-Kumar-01
# Pull Request
Splits the tests for meilisearch option crate in a separate testfile.
## Related issue
Partially solves #5116
## What does this PR do?
- Splits the test for `/src/option.rs` into a separate file `/src/option_test.rs` in meilisearch crate
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Co-authored-by: Kushal Kumar <kushalkumargupta4@gmail.com>
5175: fix the flaky batches test r=dureuill a=irevoire
## What does this PR do?
I finally reproduced the flaky test in the CI here: https://github.com/meilisearch/meilisearch/actions/runs/12390709982/job/34586313125
I cannot reproduce it locally even with `cargo flaky --iter 2000` so I'm not 100% my fix will work.
But what I did was definitely part of the flakyness of the tests, we were querying a batch that could in some cases not be started.
That worked well for the tasks since an enqueued task is already written on disk, but since the batch do not exist if they're not processing they were just missing.
---
I also changed what we were doing because there is no point in doing an indexing process for this test
Co-authored-by: Tamo <tamo@meilisearch.com>
5159: Fix the New Indexer Spilling r=irevoire a=Kerollmops
Fix two bugs in the merging of the spilled caches. Thanks to `@ManyTheFish` and `@irevoire` 👏
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
5158: Indexer edition 2024 fix facet fst r=Kerollmops a=ManyTheFish
# Pull Request
Fix a regression in the new indexer; when several filterable attributes containing strings were set, all the field IDs were shifted, and the last one was overwriting the previous FST.
## What does this PR do?
- Add a test reproducing the bug
- fix the bug
Co-authored-by: ManyTheFish <many@meilisearch.com>
5152: Make xtasks be able to use the specified binary r=dureuill a=Kerollmops
Makes it possible to specify the binary to run. It is useful to run PGO optimized binaries.
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
5147: Batch progress r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5068
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Tamo <tamo@meilisearch.com>
5144: Exactly 512 bytes docid fails r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#5050
## What does this PR do?
- Return a user error rather than an internal one for docids of exactly 512 bytes
- Fix up error message to indicate that exactly 512 bytes long docids are not supported.
- Fix up error message to reflect that index uids are actually limited to 400 bytes in length
## Impact
- Impacts docs:
- update [this paragraph](https://www.meilisearch.com/docs/learn/resources/known_limitations#length-of-primary-key-values) to say 511 bytes instead of 512
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
5153: Return docid in case of errors while rendering the document template r=Kerollmops a=dureuill
Improves error message:
Before:
```
ERROR index_scheduler: Batch failed Index `mieli`: user error: missing field in document: liquid: Unknown index
with:
variable=doc
requested index=title
available indexes=by, id, kids, parent, text, time, type
```
After:
```
ERROR index_scheduler: Batch failed Index `mieli`: user error: missing field in document `11345147`: liquid: Unknown index
with:
variable=doc
requested index=title
available indexes=by, id, kids, parent, text, time, type
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
5146: Offline upgrade v1.12 r=irevoire a=ManyTheFish
# Pull Request
## Related issue
Fixes#4978
## What does this PR do?
- add v1_11_to_v1_12 function to upgrade Meilisearch from v1.11 to v1.12
- Convert the update files from OBKV to ndjson format
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
5148: Do not duplicate NDJson data when unecessary r=dureuill a=Kerollmops
This PR improves the NDJSON support. Usually, we save all of the user's document content into a temporary file, validate its content, and then convert everything into NDJSON in the file store (update files in the tasks).
It is a waste of time when users are already sending NDJSON. So, this PR removes the last copy and directly stores the user content in the file store, validating it from the file store. If an issue arises, the file will not persist and will be dropped/deleted instead.
Related to #5078.
Co-authored-by: Kerollmops <clement@meilisearch.com>
5141: Use the right amount of max memory and not impact the settings r=curquiza a=Kerollmops
Fixes#5132. Related to #5125.
Co-authored-by: Kerollmops <clement@meilisearch.com>
5056: Attach index name in error message r=irevoire a=airycanon
# Pull Request
## Related issue
Fixes#4392
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: airycanon <airycanon@airycanon.me>
5123: Fix batch details r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5079
Fixes https://github.com/meilisearch/meilisearch/issues/5112
## What does this PR do?
- Make the processing tasks actually processing in the stats of the batch instead of enqueued
- Stop counting one extra task for all non-prioritized batches in the stats
- Add a test
Co-authored-by: Tamo <tamo@meilisearch.com>
5125: Change the default max memory usage to 5% of the total memory r=ManyTheFish a=Kerollmops
After thorough testing, we found that giving 5% of the total available memory to allocate resident memory (caches and channels) is the best approach.
The main reason is that the new indexer is highly memory-map oriented, with LMDB, and reads the database while performing the indexation. So, by allowing the maximum amount of memory available to LMDB and the OS, it will perform the key-value store reads and all other indexation operations faster by keeping more pages hot in the cache. In #5124, we also sorted the entries to merge to improve the read speed of LMDB.
This is common in database management systems: Reading stuff on the disk is much faster when done in lexicographic order (the default sorted order of key values). The entries have a great chance of already being in the OS memory cache, as they were loaded in a previous read, and reading stuff on the disk is very slow compared to reading memory.
Co-authored-by: Kerollmops <clement@meilisearch.com>
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops
In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:
- Optimize the prefix generation for word position docids (`@manythefish)`
- Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
Before on the tag meilisearch-v1.12.0-rc.3.
```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```
After sorting the whole `HashMap`s in a `Vec` on this branch.
```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>