5144: Exactly 512 bytes docid fails r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#5050
## What does this PR do?
- Return a user error rather than an internal one for docids of exactly 512 bytes
- Fix up error message to indicate that exactly 512 bytes long docids are not supported.
- Fix up error message to reflect that index uids are actually limited to 400 bytes in length
## Impact
- Impacts docs:
- update [this paragraph](https://www.meilisearch.com/docs/learn/resources/known_limitations#length-of-primary-key-values) to say 511 bytes instead of 512
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
5153: Return docid in case of errors while rendering the document template r=Kerollmops a=dureuill
Improves error message:
Before:
```
ERROR index_scheduler: Batch failed Index `mieli`: user error: missing field in document: liquid: Unknown index
with:
variable=doc
requested index=title
available indexes=by, id, kids, parent, text, time, type
```
After:
```
ERROR index_scheduler: Batch failed Index `mieli`: user error: missing field in document `11345147`: liquid: Unknown index
with:
variable=doc
requested index=title
available indexes=by, id, kids, parent, text, time, type
```
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
5146: Offline upgrade v1.12 r=irevoire a=ManyTheFish
# Pull Request
## Related issue
Fixes#4978
## What does this PR do?
- add v1_11_to_v1_12 function to upgrade Meilisearch from v1.11 to v1.12
- Convert the update files from OBKV to ndjson format
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
5148: Do not duplicate NDJson data when unecessary r=dureuill a=Kerollmops
This PR improves the NDJSON support. Usually, we save all of the user's document content into a temporary file, validate its content, and then convert everything into NDJSON in the file store (update files in the tasks).
It is a waste of time when users are already sending NDJSON. So, this PR removes the last copy and directly stores the user content in the file store, validating it from the file store. If an issue arises, the file will not persist and will be dropped/deleted instead.
Related to #5078.
Co-authored-by: Kerollmops <clement@meilisearch.com>
5145: Use bumparaw-collections in Meilisearch/milli r=dureuill a=Kerollmops
This PR is related to #5078. It uses the now published bumparaw-collections and (soon) makes the `RawMap` hasher nonrandom.
Co-authored-by: Kerollmops <clement@meilisearch.com>
5141: Use the right amount of max memory and not impact the settings r=curquiza a=Kerollmops
Fixes#5132. Related to #5125.
Co-authored-by: Kerollmops <clement@meilisearch.com>
5056: Attach index name in error message r=irevoire a=airycanon
# Pull Request
## Related issue
Fixes#4392
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: airycanon <airycanon@airycanon.me>
5123: Fix batch details r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5079
Fixes https://github.com/meilisearch/meilisearch/issues/5112
## What does this PR do?
- Make the processing tasks actually processing in the stats of the batch instead of enqueued
- Stop counting one extra task for all non-prioritized batches in the stats
- Add a test
Co-authored-by: Tamo <tamo@meilisearch.com>
5125: Change the default max memory usage to 5% of the total memory r=ManyTheFish a=Kerollmops
After thorough testing, we found that giving 5% of the total available memory to allocate resident memory (caches and channels) is the best approach.
The main reason is that the new indexer is highly memory-map oriented, with LMDB, and reads the database while performing the indexation. So, by allowing the maximum amount of memory available to LMDB and the OS, it will perform the key-value store reads and all other indexation operations faster by keeping more pages hot in the cache. In #5124, we also sorted the entries to merge to improve the read speed of LMDB.
This is common in database management systems: Reading stuff on the disk is much faster when done in lexicographic order (the default sorted order of key values). The entries have a great chance of already being in the OS memory cache, as they were loaded in a previous read, and reading stuff on the disk is very slow compared to reading memory.
Co-authored-by: Kerollmops <clement@meilisearch.com>
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops
In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:
- Optimize the prefix generation for word position docids (`@manythefish)`
- Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
Before on the tag meilisearch-v1.12.0-rc.3.
```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```
After sorting the whole `HashMap`s in a `Vec` on this branch.
```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>