4664: Update README.md r=curquiza a=tpayet
Add hybrid & semantic as a feature
# Pull Request
## Related issue
Fixes #<issue_number>
## What does this PR do?
- ...
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Thomas Payet <thomas@meilisearch.com>
4663: Bring back release v1.8.1 into main r=ManyTheFish a=ManyTheFish
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: ManyTheFish <ManyTheFish@users.noreply.github.com>
Co-authored-by: Many the fish <many@meilisearch.com>
4657: Update version for the next release (v1.9.0) in Cargo.toml r=curquiza a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
4655: Remove `exportPuffinReport` experimental feature r=Kerollmops a=Kerollmops
This PR fixes#4605 by removing every trace of Puffin. Puffin is a great tool, but we use a better approach to measuring performance.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4646: Reduce `Transform`'s disk usage r=Kerollmops a=Kerollmops
This PR implements what is described in #4485. It reduces the number of disk writes and disk usage.
Co-authored-by: Clément Renault <clement@meilisearch.com>
4633: Allow to mark vectors as "userProvided" r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes#4606
## What does this PR do?
[See usage in PRD](https://meilisearch.notion.site/v1-9-AI-search-changes-e90d6803eca8417aa70a1ac5d0225697#deb96fb0595947bda7d4a371100326eb)
- Extends the shape of the special `_vectors` field in documents.
- previously, the `_vectors` field had to be an object, with each field the name of a configured embedder, and each value either `null`, an embedding (array of numbers), or an array of embeddings.
- In this PR, the value of an embedder in the `_vectors` field can additionally be an object. The object has two fields:
1. `embeddings`: `null`, an embedding (array of numbers), or an array of embeddings.
2. `userProvided`: a boolean indicating if the vector was provided by the user.
- The previous form `embedder_or_array_of_embedders` is semantically equivalent to:
```json
{
"embeddings": embedder_or_array_of_embedders,
"userProvided": true
}
```
- During the indexing step, the subfields and values of the `_vectors` field that have `userProvided` set to **false** are added in the vector DB, but not in the documents DB: that means that future modifications of the documents will trigger a regeneration of that particular vector using the document template.
- This allows **importing** embeddings as a one-shot process, while still retaining the ability to regenerate embeddings on document change.
- The dump process now uses this ability: it enriches the `_vectors` fields of documents with the embeddings that were autogenerated, marking them as not `userProvided`. This allows importing the vectors from a dump without regenerating them.
### Tests
This PR adds the following tests
- Long-needed hybrid search tests of a simple hf embedder
- Dump test that imports vectors. Due to the difficulty of actually importing a dump in tests, we just read the dump and check it contains the expected content.
- Tests in the index-scheduler: this tests that documents containing the same kind of instructions as in the dump indexes as expected
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
4648: Update version for the next release (v1.8.1) in Cargo.toml r=ManyTheFish a=meili-bot
⚠️ This PR is automatically generated. Check the new version is the expected one and Cargo.lock has been updated before merging.
Co-authored-by: ManyTheFish <ManyTheFish@users.noreply.github.com>
4642: Index the _geo fields when changing the setting while there is already documents in the DB r=ManyTheFish a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4640
Fixes https://github.com/meilisearch/meilisearch/issues/4628
## What does this PR do?
- Add an integration test that first indexes the document and then changes the settings
- Fix `extract_geo_point` by detecting if the `_geo` field has been faceted in this setting change and index all documents
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
4644: Revert "Stream documents" and keep heed+arroy to the latest verion r=Kerollmops a=irevoire
Reverts meilisearch/meilisearch#4544
Fixes https://github.com/meilisearch/meilisearch/issues/4641
I didn’t realize that some http clients were not handling chunked http requests like you would expect (if you ask the body, it gives you the body), which made the previous PR breaking.
There is no way to provide a good fix to the issue we initially wanted to fix without breaking meilisearch and that’s not planned for now.
Co-authored-by: Tamo <irevoire@protonmail.ch>
Co-authored-by: Tamo <tamo@meilisearch.com>