Commit Graph

261 Commits

Author SHA1 Message Date
Clément Renault
2094ce8a9a
Move the arroy building after the writing loop 2024-11-27 14:30:33 +01:00
Clément Renault
8442db8101
Implement mostly all senders 2024-11-27 14:16:35 +01:00
Clément Renault
79671c9faa
Implement a first version of the bbqueue channels 2024-11-27 14:15:00 +01:00
meili-bors[bot]
a2f64f6552
Merge #5095
5095: Span to measure the part of db writes that is after the merge/extraction r=curquiza a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 11:10:00 +00:00
ManyTheFish
18a9af353c Update Charabia version to v0.9.2 2024-11-27 11:12:08 +01:00
meili-bors[bot]
aae0dc715d
Merge #5063
5063: Fix pagination when embedding fails r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5045

## What does this PR do?
- Use `return_keyword_results` function when embedding fails


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 09:13:28 +00:00
meili-bors[bot]
d0b2c0a523
Merge #5091
5091: Settings opt out r=Kerollmops a=ManyTheFish

# Pull Request

Related PRD: https://www.notion.so/meilisearch/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4

## Related issue
Fixes #4979 

- [x] Add setting opt-out
- [x] Add analytics
- [x] Add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-11-26 15:50:28 +00:00
ManyTheFish
2e896f30a5 Fix PR comments 2024-11-26 16:06:33 +01:00
Louis Dureuil
8f57b4fdf4
Span to measure the part of db writes that is after the merge/extraction 2024-11-26 14:46:36 +01:00
Many the fish
f014e78684
Update crates/milli/src/index.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:46:01 +01:00
Many the fish
9008ecda3d
Update crates/meilisearch-types/src/settings.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:44:24 +01:00
ManyTheFish
d7bcfb2d19 fix clippy 2024-11-26 14:04:16 +01:00
meili-bors[bot]
fb66fec398
Merge #5092
5092: Precise spans for new indexer r=dureuill a=dureuill

- Separate extract and merge spans
- Add span around commit

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-26 10:59:40 +00:00
Louis Dureuil
fa15be5bc4
Add span around commit 2024-11-26 09:45:48 +01:00
Louis Dureuil
aa460819a7
Add more precise spans 2024-11-26 09:45:36 +01:00
meili-bors[bot]
e241f91285
Merge #5062
5062: Fix bugs for v1.12 r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #4984
Fixes https://github.com/meilisearch/meilisearch/issues/4974
Fixes [SDK test](https://github.com/meilisearch/meilisearch/actions/runs/11886701996/job/33118278794)
## What does this PR do?
- add 3 tests
- fix bugs

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-11-26 08:10:50 +00:00
ManyTheFish
d66dc363ed Test and implement settings opt-out 2024-11-25 18:23:22 +01:00
meili-bors[bot]
5560452ef9
Merge #5089
5089: Improve error handling when writing into LMDB r=dureuill a=Kerollmops

This PR exposes two new internal error variants: `StoreDelete` and `StorePut`. So that the error messages are better when we fail at writing into LMDB.

Related to #5078

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-25 16:19:41 +00:00
Clément Renault
b4fb2dabd4
Use the grenad rayon feature 2024-11-25 16:31:21 +01:00
Clément Renault
5606679c53
Use the obkv and grenad crates.io versions 2024-11-25 16:24:59 +01:00
Clément Renault
a3103f347e
Fix the facet f64 database name 2024-11-25 16:05:31 +01:00
Clément Renault
25aac45fc7
Expose better error messages 2024-11-25 15:54:43 +01:00
meili-bors[bot]
98a785b0d7
Merge #5080
5080: Fix getting a single batch through the GET route r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes a bug where getting a single batch does not work

Related to #5070 


fix by `@Kerollmops` 

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 17:08:46 +00:00
Louis Dureuil
ba7500998e
Fix getting a single batch through the GET route 2024-11-21 17:59:31 +01:00
meili-bors[bot]
19e6f675b3
Merge #4900
4900: Indexer edition 2024 r=Kerollmops a=dureuill

This PR is implementing the indexer edition 2024, largely inspired by [the ideas from this blog post](https://blog.kerollmops.com/meilisearch-is-too-slow).

Fixes https://github.com/meilisearch/meilisearch/issues/4985

## Features
- Stream-first approach to reading documents.
- Minimum disk write operations.
- RAM usage-first approach to avoid modifying common bitmaps on disk but in memory.
- Reduced LMDB fragmentation by writing entries only once...
- ...computing the final version of the entries in parallel...
- ...and storing them in write-optimized data structures before sending them to the BTree (LMDB).
- Indexing in multiple transactions to improve large dataset support (dumps).


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 16:19:10 +00:00
Louis Dureuil
323ecbb885
Add span on document operation 2024-11-21 17:01:10 +01:00
Louis Dureuil
ffb60cb885
Add comment explaining why we fixed the version of insta 2024-11-21 16:56:56 +01:00
Louis Dureuil
dcc3caef0d
Remove TopLevelMap 2024-11-21 16:56:46 +01:00
Louis Dureuil
221e547e86
Slight changes 2024-11-21 16:47:44 +01:00
Clément Renault
61d0615253
Document the geo point extractor 2024-11-21 16:47:08 +01:00
Clément Renault
5727e00374
Remove useless geo skipped 2024-11-21 16:47:08 +01:00
Clément Renault
9b60843831
Remove commented lines 2024-11-21 16:47:07 +01:00
ManyTheFish
36962b943b First batch of PR comment 2024-11-21 16:38:11 +01:00
Louis Dureuil
32bcacefd5
Changes Document::len to Document::top_level_fields_count 2024-11-21 15:01:07 +01:00
Louis Dureuil
4ed195426c
remove unused stuff in global.rs 2024-11-21 15:01:07 +01:00
Many the fish
ff38f29981
Update crates/index-scheduler/src/batch.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 14:18:39 +01:00
ManyTheFish
94b260fd25 Remove orphan span 2024-11-21 12:12:07 +01:00
Clément Renault
ab2c83f868
Use the disk less when computing prefixes 2024-11-21 10:45:37 +01:00
Louis Dureuil
1f9692cd04
Increase map size for tests 2024-11-20 17:52:21 +01:00
Tamo
1e694ae432
improve the count of the number of tasks in a batch 2024-11-20 17:48:26 +01:00
Tamo
71807cac6d
makes clippy happy 2024-11-20 17:40:58 +01:00
Tamo
21a2264782
improve the details and stats of the current batch processing 2024-11-20 17:25:55 +01:00
Louis Dureuil
bda2b41d11
update snaps after merge 2024-11-20 17:08:30 +01:00
Louis Dureuil
6e6acfcf1b
Merge branch 'main' into indexer-edition-2024 2024-11-20 16:59:58 +01:00
Louis Dureuil
e0864f1b21
Separate side effect and debug asserts 2024-11-20 16:25:17 +01:00
Clément Renault
a38344acb3
Replace eprintlns by tracing 2024-11-20 15:29:51 +01:00
ManyTheFish
4d616f8794 Parse every attributes and filter before tokenization 2024-11-20 15:15:25 +01:00
Louis Dureuil
ff9c92c409
rename documents -> substep 2024-11-20 15:12:02 +01:00
Clément Renault
8380ddbdcd
Fix progress of into_changes 2024-11-20 15:10:09 +01:00
Louis Dureuil
867138f166
Add SP to into_changes 2024-11-20 15:07:05 +01:00
Clément Renault
567bd4538b
Fxi the into_changes stop processing 2024-11-20 14:58:25 +01:00
Louis Dureuil
84600a10d1
Add MSP to document_update.into_changes() 2024-11-20 14:53:37 +01:00
ManyTheFish
35bbe1c2a2 Add failing test on settings changes 2024-11-20 14:48:12 +01:00
Louis Dureuil
7d64e8dbd3
Fix Windows compilation 2024-11-20 14:40:38 +01:00
Tamo
ec06879d28
apply review changes 2024-11-20 14:40:36 +01:00
Tamo
83d1f858c1
Update crates/index-scheduler/src/lib.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-20 14:36:05 +01:00
Louis Dureuil
cae8c89467
"fix" last warnings 2024-11-20 14:03:52 +01:00
Tamo
a7ac590e9e
implements the reverse query parameter for the batches 2024-11-20 13:29:52 +01:00
Clément Renault
7cb8732b45
Introduce a new bincode internal error 2024-11-20 13:23:11 +01:00
Tamo
8ad68dd708
stop leaking the update files of the canceled tasks 2024-11-20 13:17:54 +01:00
ManyTheFish
fe5d50969a
Fix filed selector in extrators 2024-11-20 13:16:44 +01:00
Clément Renault
56c7c5d5f0
Fix comments 2024-11-20 13:16:44 +01:00
Louis Dureuil
4cdfdddd6d
Fix one more 2024-11-20 13:16:43 +01:00
Louis Dureuil
2afa33011a
Fix tokenize_document 2024-11-20 13:16:43 +01:00
Louis Dureuil
61feca1f41
More tests pass 2024-11-20 13:16:43 +01:00
Louis Dureuil
f893b5153e
Don't mark [""] as empty facet 2024-11-20 13:16:42 +01:00
Louis Dureuil
ca779c21f9
facets: Handle boolean and skip empty strings 2024-11-20 13:16:42 +01:00
Louis Dureuil
477077bdc2
Remove _vectors from fid map when there are no vectors in sight 2024-11-20 13:16:42 +01:00
ManyTheFish
b1f8aec348
Fix index_documents_check_exists_database 2024-11-20 13:16:41 +01:00
ManyTheFish
ba7f091db3
Use tokenizer on numbers and booleans 2024-11-20 13:16:41 +01:00
Louis Dureuil
8049df125b
Add depth to facet extraction so that null inside an array doesn't mark the entire field as null 2024-11-20 13:16:40 +01:00
Clément Renault
50d1bd01df
We no longer index geo lat and lng 2024-11-20 13:16:40 +01:00
Louis Dureuil
a28d4f5d0c
Fix setup_search_index_with_criteria 2024-11-20 13:16:40 +01:00
Louis Dureuil
fc14f4bc66
Attempt to fix setup_search_index_with_criteria 2024-11-20 13:16:39 +01:00
Clément Renault
5f8a82d6f5
Improve test 2024-11-20 13:16:39 +01:00
Clément Renault
fe04e51a49
One more 2024-11-20 13:16:38 +01:00
Clément Renault
01b27e40ad
Fix a bit of the placeholder search tests 2024-11-20 13:16:38 +01:00
Louis Dureuil
8076d98544
Fix stats_should_not_return_deleted_documents 2024-11-20 13:16:37 +01:00
Louis Dureuil
9e951baad5
One more test passing 2024-11-20 13:16:37 +01:00
Louis Dureuil
52f2fc4c46
Fail in case of user error in tests 2024-11-20 13:16:37 +01:00
Clément Renault
3957917e0b
Correctly count indexed documents 2024-11-20 13:16:36 +01:00
Louis Dureuil
651c30899e
Allow fetching embedders from inside tests 2024-11-20 13:16:36 +01:00
Clément Renault
2c7a7fe4e8
Count the number of documents correctly 2024-11-20 13:16:35 +01:00
Clément Renault
23f0c2c29b
Generate internal ids only when needed 2024-11-20 13:16:35 +01:00
Louis Dureuil
6641c3f59b
Remove all autogenerated tests 2024-11-20 13:16:34 +01:00
Louis Dureuil
07a72824b7
Subfields of _vectors are no longer part of the fid map 2024-11-20 13:16:34 +01:00
Louis Dureuil
000eb55c4e
fix one 2024-11-20 13:16:34 +01:00
Clément Renault
b4bf7ce9b0
Increase the number of readers as the indexer uses readers too 2024-11-20 13:16:33 +01:00
Louis Dureuil
1aef0e4037
documents! macro accepts a single object again 2024-11-20 13:16:33 +01:00
Clément Renault
32d0e50a75
Fix all the benchmark compilation errors 2024-11-20 13:16:32 +01:00
Louis Dureuil
df5884b0c1
Fix settings test 2024-11-20 13:16:32 +01:00
Louis Dureuil
9e0eb5ebb0
Removed some warnings 2024-11-20 13:16:32 +01:00
Clément Renault
3cf1352ae1
Fix the benchmark tests 2024-11-20 13:16:31 +01:00
Clément Renault
aba8a0e9e0
Fix some tests but not all of them 2024-11-20 13:16:31 +01:00
Clément Renault
670aff5553
Remove useless Transform methods 2024-11-20 13:16:08 +01:00
Tamo
7e379b3d14
remove useless prints 2024-11-20 12:27:12 +01:00
Tamo
56eacd221f
update the tests after the rebase 2024-11-20 10:54:38 +01:00
Tamo
bdb51a85fe
now that the task cancelation shares their started at with all the tasks of their batch we don't need the trick of retrieving the previous batch anymore 2024-11-20 10:51:07 +01:00
Tamo
b24a34830d
fix the dump test -> the only change is that we now have a null batch_uid in all the tasks 2024-11-20 10:51:06 +01:00
Tamo
e145d71a62
implements the two last TODOs 2024-11-20 10:51:06 +01:00