Jakob Klemm
d3004d8040
Implemented Ollama as an embeddings provider
...
Initial prototype of Ollama embeddings actually working, error handlign / retries still missing.
Allow model to be any String and require dimensions parameter
Fixed rustfmt formatting issues
There were some formatting issues in the initial PR and this should not make the changes comply with the Rust style guidelines
Because I accidentally didn't follow the style guide for commits in my commit messages I squashed them into one to comply
2024-03-04 15:09:43 +01:00
Louis Dureuil
452a343a2b
Fix imports
2024-02-28 18:09:40 +01:00
meili-bors[bot]
b87485e80d
Merge #4433
...
4433: Enhance facet incremental r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes #4367
Fixes #4409
## What does this PR do?
- Add a test reproducing #4409
- Fix #4409 by removing a document from a level only if it is no more present in all the linked sub-level nodes
- Optimize facet Incremental indexing by creating or deleting a complete level once per field id instead of for each facet value
- Optimize facet Incremental indexing by doing the additions and the deletions in the same process instead of doing them separately
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-02-28 15:28:46 +00:00
ManyTheFish
5e83bac448
Fix PR comments
2024-02-26 15:40:15 +01:00
Louis Dureuil
55796406c5
Add GPU analytics
2024-02-26 10:41:47 +01:00
ManyTheFish
a493a50825
Fix clippy
2024-02-22 14:53:33 +01:00
ManyTheFish
9d1f489a37
Fix facet incremental indexing
2024-02-21 18:42:16 +01:00
meili-bors[bot]
d34692e30b
Merge #4365
...
4365: Update charabia r=dureuill a=ManyTheFish
Update Charabia v0.8.7,
- Add Vietnamese Normalization (Ð and Đ into d)
Fixes #4357
Charabia versions:
- https://github.com/meilisearch/charabia/releases/tag/v0.8.6
- https://github.com/meilisearch/charabia/releases/tag/v0.8.7
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-02-14 16:57:25 +00:00
ManyTheFish
78e04520fc
Update charabia version
2024-02-14 15:16:16 +01:00
ManyTheFish
03bb6372af
Change is_batchable_with by mergeable_with
2024-02-14 11:50:22 +01:00
ManyTheFish
3beda8833d
Fix and add logs
2024-02-14 11:46:30 +01:00
ManyTheFish
55e942cd45
buggy
2024-02-13 15:26:30 +01:00
ManyTheFish
48026aa75c
fix PR comments
2024-02-13 15:19:01 +01:00
Many the fish
e5e811e2c9
Update milli/src/update/index_documents/extract/mod.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-02-13 14:22:21 +01:00
Many the fish
55de96f74e
Update milli/src/update/facet/mod.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-02-13 14:22:10 +01:00
ManyTheFish
39c83cb3d9
fix clippy
2024-02-12 09:12:54 +01:00
Louis Dureuil
7efb1cae11
yield in loop when the channel is not disconnected
2024-02-12 09:12:54 +01:00
Louis Dureuil
7877788510
fix logs
2024-02-12 09:12:54 +01:00
ManyTheFish
be1b054b05
Compute chunk size based on the input data size ant the number of indexing threads
2024-02-08 17:28:37 +01:00
meili-bors[bot]
023c2d755f
Merge #4391
...
4391: Tracing r=dureuill a=irevoire
# Pull Request
- [ ] Hide the parameters of the process batch
- [x] Make actix-web trace every call on every route
- [x] Remove all `env_logger`/`logs` dependencies
- [x] Be able to enable or disable the memory measurement using the `/logs` route parameters
See the following product discussion: https://github.com/orgs/meilisearch/discussions/721
Supersedes https://github.com/meilisearch/meilisearch/pull/4338
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4317
## What does this PR do?
Update the format of the logs from:
```
[2024-02-06T14:54:11Z INFO actix_server::builder] starting 10 workers
```
to
```
2024-02-06T13:58:14.710803Z INFO actix_server::builder: 200: starting 10 workers
```
First, run meilisearch with the route enabled via the feature flag:
- `cargo run --experimental-enable-logs-route`
- Or at runtime by sending the following payload:
```
curl \
-X PATCH 'http://localhost:7700/experimental-features/ ' \
-H 'Content-Type: application/json' \
--data-binary '{
"logsRoute": true
}'
```
Then gather data from meilisearch by calling for example:
```
curl \
-X POST http://localhost:7700/logs \
-H 'Content-Type: application/json' \
--data-binary '{
"mode": "fmt",
"target": "milli=trace"
}'
```
Once your operation is over, tell meilisearch to stop the route:
```
curl \
-X DELETE http://localhost:7700/logs
```
----
In the case you’re profiling code, you will be interested by the next command that converts the output of the route to a format that the firefox profiler can understand.
```bash
cargo run --release --bin trace-to-firefox -- 2024-01-17_17:07:55-indexing-trace.json
```
Then go to https://profiler.firefox.com and load it.
Note that we can also share the profiles using the https://share.firefox.dev website.
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-02-08 14:16:56 +00:00
Louis Dureuil
407ad753ed
rust fmt
2024-02-08 15:11:42 +01:00
Tamo
bf43a3f60a
fix typo
2024-02-08 15:04:06 +01:00
Tamo
1502382316
use debug instead of debug_span
2024-02-08 15:04:06 +01:00
Tamo
08af0e690c
Structures a bunch of logs
2024-02-08 15:04:06 +01:00
Louis Dureuil
db722d201a
Write entries into database downgraded to trace level
2024-02-08 15:04:05 +01:00
Tamo
e773dfa9ba
get rids of log in milli and add logs for the bucket sort
2024-02-08 15:04:05 +01:00
Louis Dureuil
5d7061682e
Add tracing to milli
2024-02-08 15:03:31 +01:00
meili-bors[bot]
72ebac1fbb
Merge #4388
...
4388: Cap the maximum memory of the grenad sorters r=curquiza a=Kerollmops
This PR clamps the memory usage of the grenad sorters to a reasonable maximum. Grenad sorters are opened on multiple threads at a time. This can result in higher memory usage than expected, even though it shouldn't consume more than the memory available.
Fixes #4152 .
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-02-08 13:19:28 +00:00
Louis Dureuil
a1caac9bfb
Correct distribution shifts for new models
2024-02-07 15:09:16 +01:00
Louis Dureuil
88d03c56ab
Don't accept dimensions of 0 (ever) or dimensions greater than the default dimensions of the model
2024-02-07 11:52:09 +01:00
Louis Dureuil
32ee05ccef
Fix default dimensions for models
2024-02-07 11:52:09 +01:00
Louis Dureuil
74c180267e
pass dimensions only when defined
2024-02-07 11:52:08 +01:00
Louis Dureuil
517f5332d6
Allow actually passing dimensions
for OpenAI source
...
-> make sure the settings change is rejected or the settings task fails when the specified model doesn't support
overriding `dimensions` and the passed `dimensions` differs from the model's default dimensions.
2024-02-07 11:51:44 +01:00
Louis Dureuil
9ac5750096
Retrieve the overriden dimensions from the configuration when fetching settings
2024-02-07 11:51:44 +01:00
Louis Dureuil
7ae4013478
Make sure the overriden dimensions are always used when embedding
2024-02-07 11:51:44 +01:00
Gosti
fb705116a6
feat: add new models and ability to override dimensions
2024-02-07 11:51:42 +01:00
Clément Renault
053306c0e7
Try with 500MiB
2024-02-07 11:24:43 +01:00
Clément Renault
9eeb75d501
Clamp the max memory of the grenad sorters to a reasonable maximum
2024-02-06 10:47:04 +01:00
Louis Dureuil
fbf5f2a392
Don't use a runtime in extract_embedder, use it only for OpenAI
2024-02-01 10:33:27 +01:00
Louis Dureuil
1555870088
Truncate HuggingFace vectors that are too long
2024-02-01 10:33:27 +01:00
Tamo
9f8f3105d5
make clippy happy
2024-02-01 10:33:27 +01:00
Tamo
318843aacd
add a bunch of tests and fix the error message when adding the geosearch as filterable/sortable while there is malformed documents in the DB
2024-02-01 10:33:27 +01:00
Louis Dureuil
dff2707471
Use MatchingWords from keyword search instead of the one from vector search
2024-02-01 10:33:27 +01:00
Tamo
c1bf33a112
Revert "Remove panic on the geosearch"
2024-01-25 18:51:19 +01:00
Louis Dureuil
f692021bfc
Implement PR comments
2024-01-22 10:25:56 +01:00
Louis Dureuil
84f49d76cd
Add cuda feature
2024-01-22 10:25:16 +01:00
Tamo
0887186ecf
make clippy happy
2024-01-17 16:07:10 +01:00
Tamo
7d190d8078
add a bunch of tests and fix the error message when adding the geosearch as filterable/sortable while there is malformed documents in the DB
2024-01-17 15:51:52 +01:00
Clément Renault
01e2c3d6bb
Bump arroy to v0.2.0
2024-01-16 16:45:55 +01:00
Clément Renault
9f9ad4cc05
Fix Clippy warnings
2024-01-16 15:27:24 +01:00
Clément Renault
3ee7682fa7
Fix some integer comparisons
2024-01-16 15:22:23 +01:00
Clément Renault
7f125bfb12
Update incompatible dependencies
2024-01-16 15:15:54 +01:00
Clément Renault
5869ca7716
Upgrade all compatible dependencies
2024-01-16 15:05:03 +01:00
meili-bors[bot]
e93d36d5b9
Merge #4313
...
4313: Fix document formatting performances r=Kerollmops a=ManyTheFish
reduce the formatted option list to the attributes that should be formatted,
instead of all the attributes to display.
The time to compute the `format` list scales with the number of fields to format;
cumulated with `map_leaf_values` that iterates over all the nested fields, it gives a quadratic complexity:
`d*f` where `d` is the total number of fields to display and `f` is the total number of fields to format.
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-01-11 14:19:44 +00:00
ManyTheFish
5f5a486895
Reduce formatting time
2024-01-11 11:36:41 +01:00
ManyTheFish
5f4fc6c955
Add timer logs
2024-01-11 09:44:16 +01:00
Clément Renault
3f3462ab62
Limit the number of values returned by the facet search
2024-01-10 16:54:08 +01:00
Tamo
54ae6951eb
fix warning
2024-01-02 15:19:30 +01:00
Louis Dureuil
0bf879fb88
Fix warning on rust stable
2023-12-20 17:48:09 +01:00
Louis Dureuil
6ff81de401
Fix tests
2023-12-20 17:16:46 +01:00
Louis Dureuil
9123370e90
Validate fused settings in settings task after fusing with existing setting
2023-12-20 17:16:46 +01:00
Louis Dureuil
14b396d302
Add new errors
2023-12-20 17:16:45 +01:00
Louis Dureuil
393216bf30
Flatten embedders settings
2023-12-20 17:16:43 +01:00
Louis Dureuil
e249e4db7b
Change Setting::apply function signature
2023-12-20 17:15:24 +01:00
Louis Dureuil
333ce12eb2
Fixed issue where the default revision is always the one we picked for the default model
2023-12-20 10:17:49 +01:00
Louis Dureuil
942d49314c
Remove dependency that requires libstdc++
2023-12-18 22:17:18 +01:00
Many the fish
9e1b458010
Merge branch 'main' into change-proximity-precision-settings
2023-12-18 09:08:47 +01:00
ManyTheFish
6425996e36
Change the naming of attributeScale and wordScale into byAttribute and byWord
2023-12-14 16:31:00 +01:00
Louis Dureuil
eb5cb91da2
Switch default from hf to openai
2023-12-14 16:19:46 +01:00
Louis Dureuil
87bba98bd8
Various changes
...
- fixed seed for arroy
- check vector dimensions as soon as it is provided to search
- don't embed whitespace
2023-12-14 16:08:42 +01:00
Louis Dureuil
217105b7da
hybrid search uses semantic ratio, error handling
2023-12-14 16:08:42 +01:00
ManyTheFish
9991152bbe
Add TODOs
2023-12-14 16:08:42 +01:00
Louis Dureuil
a4536b1381
Small adjustments to respect the spec
2023-12-14 16:08:42 +01:00
Louis Dureuil
5b51cb04af
Remove some settings
2023-12-14 16:08:42 +01:00
Louis Dureuil
b8e4709dfa
Remove prompt strategy and fallback
2023-12-14 16:08:41 +01:00
Louis Dureuil
806e5b6899
Tests pass
2023-12-14 16:08:41 +01:00
Louis Dureuil
e0cc775dc4
Various changes
...
- DistributionShift in Search object (to be set from model in embed?)
- Fix issue where embedder index wasn't computed at search time
- Accept as default embedder either the "default" one, or the only embedder when there is only one
2023-12-14 16:08:41 +01:00
Louis Dureuil
12940d79a9
WIP
...
- manual embedder
- multi embedders OK
- clippy + tests OK
2023-12-14 16:08:41 +01:00
Louis Dureuil
922a640188
WIP multi embedders
...
fixed template bugs
2023-12-14 16:08:41 +01:00
Louis Dureuil
d4715e0c4d
Fix same vector sort bug
2023-12-14 16:08:41 +01:00
Louis Dureuil
11e2a2c1aa
Fix geosort bug
2023-12-14 16:08:41 +01:00
Louis Dureuil
65e49b7092
Remove stuff, add distribution shift (WIP)
2023-12-14 16:08:38 +01:00
Louis Dureuil
e56f160032
Actually pass embedders on reindex
2023-12-14 16:07:49 +01:00
Louis Dureuil
687d92f217
prompt bifluor+
2023-12-14 16:07:49 +01:00
Louis Dureuil
fb539f61fe
WIP
2023-12-14 16:07:49 +01:00
Louis Dureuil
cb4ebe163e
WIP
2023-12-14 16:07:49 +01:00
Louis Dureuil
dde3a04679
WIP arroy integration
2023-12-14 16:07:49 +01:00
Louis Dureuil
13c2c6c16b
Small commit to add hybrid search and autoembedding
2023-12-14 16:07:48 +01:00
Louis Dureuil
21bcf32109
Add candle and hg_hub, updating a lot of deps in the process
2023-12-14 16:07:48 +01:00
Clément Renault
56571f762a
Merge remote-tracking branch 'origin/main' into tmp-release-v1.5.1
2023-12-13 11:57:01 +01:00
ManyTheFish
467b49153d
Implement proximityPrecision setting on milli side
2023-12-06 15:49:02 +01:00
ManyTheFish
bddc168d83
List TODOs
2023-12-06 14:59:23 +01:00
ManyTheFish
3b3fa38f27
Put the restrict list in a sub-struct
2023-11-28 18:37:57 +01:00
Clément Renault
170e063b80
Remove the actix-web dependency from milli
2023-11-28 17:19:57 +01:00
ManyTheFish
d6c2ee15a9
Filter on attributes before computing the docids when attribute restriction is on
2023-11-28 14:55:29 +01:00
Clément Renault
ec9b52d608
Rename copy_to_path to copy_to_file
2023-11-28 14:32:30 +01:00
Clément Renault
34c67ac389
Remove the possibility to fail fetching the env info
2023-11-28 14:31:23 +01:00
Clément Renault
d050c9b4ae
Only remap the main database once
2023-11-28 14:27:30 +01:00
Clément Renault
7dd1226faf
Clarify an unreachable unwrap
2023-11-28 14:26:31 +01:00
Clément Renault
548c8247c2
Create and use real error types in the codecs
2023-11-28 10:11:17 +01:00
Clément Renault
d32eb11329
Move to the v0.20.0-alpha.9 of heed
2023-11-27 11:52:22 +01:00
Clément Renault
58dac8af42
Remove the panics and unwraps
2023-11-23 15:00:48 +01:00
Clément Renault
0dbf1a16ff
Make clippy happy
2023-11-23 14:11:38 +01:00
Clément Renault
462b4c0080
Fix the tests
2023-11-23 12:07:35 +01:00
Clément Renault
0d4482625a
Make the changes to use heed v0.20-alpha.6
2023-11-23 11:43:58 +01:00
Clément Renault
56a0d91ecd
Update the heed dependency and lock file
2023-11-22 15:11:09 +01:00
Clément Renault
7cb7e37ba8
Merge branch 'main' into tmp-release-v1.5.0
2023-11-21 16:30:46 +01:00
ManyTheFish
d3575fb028
Make into_del_add_obkv parameters more human readable
2023-11-20 16:10:39 +01:00
ManyTheFish
39cbb499c2
Small fixes
2023-11-20 10:20:39 +01:00
ManyTheFish
ebef6bc24d
Simplify documents database writing
2023-11-20 10:14:57 +01:00
ManyTheFish
d59b7db8d0
remove unused code
2023-11-20 10:10:45 +01:00
ManyTheFish
263e825619
Fix typos in comments
2023-11-20 10:06:29 +01:00
Many the fish
b0adc73ce6
Merge pull request #4207 from meilisearch/diff-indexing-prefix-databases
...
Diff indexing prefix databases
2023-11-14 16:04:05 +01:00
Louis Dureuil
772964125d
Factor removal of document from DB
2023-11-13 13:51:22 +01:00
Louis Dureuil
378deb0bef
Rename trait
2023-11-13 13:38:36 +01:00
ManyTheFish
1f36410541
Update tests
2023-11-13 13:36:39 +01:00
Louis Dureuil
8c649d8061
Throw error when the vector search is sent with the wrong size
2023-11-13 09:57:42 +01:00
Louis Dureuil
264b10ec20
Fixup documentation
2023-11-09 16:23:20 +01:00
Louis Dureuil
3053e01c05
Batch::remove_documents_from_db_no_batch
2023-11-09 14:23:02 +01:00
Louis Dureuil
b11c2afac0
Index::external_id_of
2023-11-09 14:22:43 +01:00
Louis Dureuil
9cef800b2a
Enrich uses the new type
2023-11-09 14:22:05 +01:00
Louis Dureuil
db2fb86b8b
Extract PrimaryKey logic to a type
2023-11-09 14:19:16 +01:00
ManyTheFish
882ab9cc85
remove warnings
2023-11-09 11:35:33 +01:00
ManyTheFish
5a9c96e1db
Compute word integer prefix cache
2023-11-09 11:34:26 +01:00
ManyTheFish
70ce40828c
Compute word docids prefix cache
2023-11-08 17:01:00 +01:00
ManyTheFish
688266c83e
Remove word pair proximity prefix cache and compute it at search time
2023-11-08 14:16:01 +01:00
ManyTheFish
6dab826908
Reactivate prefix databases
2023-11-08 13:58:01 +01:00
ManyTheFish
1e2fbc6a42
revert "REVERT ME: ignore prefix pair databases tests"
...
This reverts commit 1b2ea6cf19309782a2e3b2ff2fe6d7708dd5de4f.
2023-11-08 11:50:52 +01:00
Louis Dureuil
cbaa54cafd
Fix clippy issues
2023-11-06 11:19:31 +01:00
Louis Dureuil
1bccf2079e
Correctly mark non-tests as non-tests
2023-11-06 11:03:56 +01:00
ManyTheFish
1b2ea6cf19
REVERT ME: ignore prefix pair databases tests
2023-11-06 10:46:22 +01:00
Louis Dureuil
1ad1fcc8c8
Remove all warnings
2023-11-06 10:31:14 +01:00
ManyTheFish
87610a5f98
Don't try to delete a document that is not in the database
2023-11-02 16:49:03 +01:00
Clément Renault
ff522c919d
Fix the vector extractions for the diff indexing
2023-11-02 15:58:08 +01:00
ManyTheFish
bf0651f23c
Implement iter method on ExternalDocumentsIds
2023-11-02 15:38:00 +01:00
ManyTheFish
5b20e625f3
fix merge
2023-11-02 15:31:37 +01:00
ManyTheFish
bc51d6157a
Fix transform reindexing path
2023-11-02 15:26:20 +01:00
ManyTheFish
1b4ff991c0
update typed chunks
2023-11-02 15:26:20 +01:00
ManyTheFish
4b64c33aa2
update vector extractor
2023-11-02 15:26:20 +01:00
ManyTheFish
12323d610e
Change the original document sorter key from the internal docid to a concatenation of the internal and the external docid
2023-11-02 15:26:20 +01:00
Clément Renault
4d864f0702
Always sort internal Sorter entries in parallel
2023-11-02 14:47:43 +01:00
Clément Renault
b10c060bf7
Cleanup TOML
2023-11-01 14:03:04 +01:00
Clément Renault
c71b1d33ae
Sort entries using rayon in the transform sorters
2023-11-01 11:07:16 +01:00
Clément Renault
0fc446c62f
Add more timing logs to the Transform
2023-11-01 11:07:16 +01:00
Louis Dureuil
0fb6acefc3
Add snapshots for facets
2023-10-31 17:11:08 +01:00
Louis Dureuil
b1d1355b69
remove tests on soft-deleted
2023-10-31 16:36:27 +01:00
Louis Dureuil
f19332466e
Extract field value as values instead of Option<Value>
2023-10-31 16:36:27 +01:00
Louis Dureuil
03ddb4f310
use deladd in facet update tests
2023-10-31 16:36:27 +01:00
Louis Dureuil
c855cc2721
Remove unused test
2023-10-31 16:36:27 +01:00
Louis Dureuil
da0503ef80
Fix document count
2023-10-31 16:36:27 +01:00