3825: Accept semantic vectors and allow users to query nearest neighbors r=Kerollmops a=Kerollmops
This Pull Request brings a new feature to the current API. The engine accepts a new `_vectors` field akin to the `_geo` one. This vector is stored in Meilisearch and can be retrieved via search. This work is the first step toward hybrid search, bringing the best of both worlds: keyword and semantic search ❤️🔥
## ToDo
- [x] Make it possible to get the `limit` nearest neighbors from a user-generated vector by using the `vector` field of search route.
- [x] Delete the documents and vectors from the HNSW-related data structures.
- [x] Do it the slow and ugly way (we need to be able to iterate over all the values).
- [ ] Do it the efficient way (Wait for a new method or implement it myself).
- [ ] ~~Move from the `hnsw` crate to the hgg one~~ The hgg crate is too slow.
Meilisearch takes approximately 88s to answer a query. It is related to the time it takes to deserialize the `Hgg` data structure or search in it. I didn't take the time to measure precisely. We moved back to the hnsw crate which takes approximately 40ms to answer.
- [ ] ~~Wait for a fix for https://github.com/rust-cv/hgg/issues/4.~~
- [x] Fix the current dot product function.
- [x] Fill in the other `SearchResult` fields.
- [x] Remove the `hnsw` dependency of the meilisearch crate.
- [x] Fix the pages by taking the offset into account.
- [x] Release a first prototype https://github.com/meilisearch/product/discussions/621#discussioncomment-6183647
- [x] Make the pagination and filtering faster and more correct.
- [x] Return the original vector in the output search results (like `query`).
- [x] Return an `_semanticSimilarity` field in the documents (it's a dot product)
- [x] Return this score even if the `_vectors` field is not displayed
- [x] Rename the field `_semanticScore`.
- [ ] Return the `_geoDistance` value even if the `_geo` field is not displayed
- [x] Store the HNSW on possibly multiple LMDB values.
- [ ] Measure it and make it faster if needed
- [ ] Export the `ReadableSlices` type into a small external crate
- [x] Accept an `_vectors` field instead of the `_vector` one.
- [x] Normalize all vectors.
- [ ] Remove the `_vectors` field from the default searchable attributes (as we do with `_geo`?).
- [ ] Correctly compute the candidates by remembering the documents having a valid `_vectors` field.
- [ ] Return the right errors:
- [ ] Return an error when the query vector is not the same length as the vectors in the HNSW.
- [ ] We must return the user document id that triggered the vector dimension issue.
- [x] If an indexation error occurs.
- [ ] Fix the error codes when using the search route.
- [ ] ~~Introduce some settings:~~
We currently ensure that the vector length is consistent over the whole set of documents and return an error for when a vector dimension doesn't follow the current number of dimensions.
- [ ] The length of the vector the user will provide.
- [ ] The distance function (we only support dot as of now).
- [ ] Introduce other distance functions
- [ ] Euclidean
- [ ] Dot Product
- [ ] Cosine
- [ ] Make them SIMD optimized
- [ ] Give credit to qdrant
- [ ] Add tests.
- [ ] Write a mini spec.
- [ ] Release it in v1.3 as an experimental feature.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
3853: docs: fixed some broken links r=gillian-meilisearch a=0xflotus
Some of the links in the README file were broken.
Co-authored-by: 0xflotus <0xflotus@gmail.com>
3850: Experimental features r=Kerollmops a=dureuill
# Pull Request
## Related issue
- Fixes https://github.com/meilisearch/meilisearch/issues/3857
- Related to https://github.com/meilisearch/meilisearch/issues/3771
## What does this PR do?
### Example
<details>
<summary>Using the feature to enable `scoreDetails`</summary>
```json
❯ curl \
-X POST 'http://localhost:7700/indexes/index-word-count-10-count/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "Batman", "limit": 1, "showRankingScoreDetails": true, "attributesToRetrieve": ["title"]}' | jsonxf
{
"message": "Computing score details requires enabling the `score details` experimental feature. See https://github.com/meilisearch/product/discussions/674",
"code": "feature_not_enabled",
"type": "invalid_request",
"link": "https://docs.meilisearch.com/errors#feature_not_enabled"
}
```
```json
❯ curl \
-X PATCH 'http://localhost:7700/experimental-features/' \
-H 'Content-Type: application/json' \
--data-binary '{
"scoreDetails": true
}'
{"scoreDetails":true,"vectorSearch":false}
```
```json
❯ curl \
-X POST 'http://localhost:7700/indexes/index-word-count-10-count/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "Batman", "limit": 1, "showRankingScoreDetails": true, "attributesToRetrieve": ["title"]}' | jsonxf
{
"hits": [
{
"title": "Batman",
"_rankingScoreDetails": {
"words": {
"order": 0,
"matchingWords": 1,
"maxMatchingWords": 1,
"score": 1.0
},
"typo": {
"order": 1,
"typoCount": 0,
"maxTypoCount": 1,
"score": 1.0
},
"proximity": {
"order": 2,
"score": 1.0
},
"attribute": {
"order": 3,
"attribute_ranking_order_score": 1.0,
"query_word_distance_score": 1.0,
"score": 1.0
},
"exactness": {
"order": 4,
"matchType": "exactMatch",
"score": 1.0
}
}
}
],
"query": "Batman",
"processingTimeMs": 3,
"limit": 1,
"offset": 0,
"estimatedTotalHits": 46
}
```
</details>
### User standpoint
- Add new route GET/POST/PATCH/DELETE `/experimental-features` to switch on or off some of the experimental features in a manner persistent between instance restarts
- Use these new routes to allow setting on/off the following experimental features:
- vector store **TODO:** fill in issue
- score details (related to https://github.com/meilisearch/meilisearch/issues/3771)
- Make the way of checking feature availability and error message uniform for the Prometheus metrics experimental feature
- Save the enabled features in dump, restore from dumps
- **TODO:** tests:
- Test new security permissions (do they allow access with ALL, do they prevent access when missing)
- Test dump behavior, in particular ability to import existing v6 dumps
- Test basic behavior when calling the rule
### Implementation standpoint
- New DB "experimental-features"
- dumps are modified to save the state of that new DB as a `experimental-features.json` file, that is then loaded back when importing the dump. This doesn't change the dump version, as the file is optional and it missing will not cause the dump to fail
Co-authored-by: Louis Dureuil <louis@meilisearch.com>