Louis Dureuil
5f896b1050
Fix geo when spilling
2024-12-04 17:51:12 +01:00
Kerollmops
d0c4e6da6b
Make clippy happy
2024-12-04 17:39:10 +01:00
Kerollmops
2da5584bb5
Make the tasks pulling timeout configurable
2024-12-04 17:39:07 +01:00
Kerollmops
2e32d0474c
Lexicographically sort all the map to merge
2024-12-04 17:05:11 +01:00
Kerollmops
cb99ac6f7e
Consume vec instead of draining
2024-12-04 17:00:22 +01:00
Kerollmops
be411435f5
Use the merge_caches_alt function in the docids merging
2024-12-04 16:37:29 +01:00
Kerollmops
29ef164530
Introduce a new semi ordered merge function
2024-12-04 16:33:35 +01:00
ManyTheFish
739c52a3cd
Replace HashSets by BTreeSets for the prefixes
2024-12-04 16:16:48 +01:00
Tamo
7a2af06b1e
update the impacted snapshots
2024-12-04 15:52:24 +01:00
Tamo
cb0c3a5aad
stop adding one enqueued tasks to all unprioritized batches
2024-12-04 15:48:28 +01:00
Tamo
cbcf6c9ba3
make the processing tasks as processing in a batch
2024-12-04 14:48:48 +01:00
Tamo
bf742d81cf
add a test
2024-12-04 14:47:02 +01:00
ManyTheFish
fc1df5793c
fix tests
2024-12-04 14:35:20 +01:00
meili-bors[bot]
3ded069042
Merge #5122
...
5122: Yield the BBQueue writing loop r=ManyTheFish a=Kerollmops
We prefer yielding to let the writing thread do its job instead of spin looping.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 13:33:51 +00:00
Kerollmops
261d2ceb06
Yield the BBQueue writer instead of spin looping
2024-12-04 14:16:40 +01:00
meili-bors[bot]
5b8cd68abe
Merge #5110
...
5110: Increase margin on deletion of task r=dureuill a=irevoire
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5077
## What does this PR do?
- Increase the margin we keep to enqueue task deletion
The issue was that we had not enough space on the reserved memory to write both the batch and the deletion task we just enqueued.
We could fix it only for this test as it’s not an issue in production where we have 10GiB of margin, but I thought it wasn’t a bad idea either to increase our margin a bit since we’re effectively writing more to lmdb.
Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-04 12:54:48 +00:00
ManyTheFish
953a82ca04
Add new error message
2024-12-04 11:15:29 +01:00
Kerollmops
96831ed9bb
Send the WakeUp message if necessary in the reserve function
2024-12-04 11:03:01 +01:00
Kerollmops
0459b1a242
Change the reserve and grant function to accept a closure
2024-12-04 10:32:25 +01:00
Kerollmops
8ecb726683
Fix the minimun BBQueue channel threshold
2024-12-03 15:49:11 +01:00
Clément Renault
0ad2f57a92
Update bbqueue repo to point to the meilisearch org
2024-12-03 12:00:04 +01:00
Tamo
71d53f413f
increase the margin allowed to delete task
2024-12-03 11:07:03 +01:00
meili-bors[bot]
054622bd16
Merge #5094
...
5094: Implement a bbqueue channel between the extractors and the writer r=dureuill a=Kerollmops
This PR switches from a bounded crossbeam channel only with allocated entries for the communication between the extractors and the writer to a [BBQueue](https://github.com/jamesmunns/bbqueue )-based system with a Single Producer Single Consumer kind of Circular/Ring Buffers channel.
- [x] Implement the BBQueue channel system...
- [x] with a crossbeam channel to wake up the receiver.
- [x] Manage the BBQueue allocated memory dynamically.
- [x] Support content that doesn't fit in the bbqueues.
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-12-03 08:00:55 +00:00
Louis Dureuil
e905a72d73
remove mimalloc on Windows
2024-12-02 18:13:56 +01:00
meili-bors[bot]
2e879c1df8
Merge #5109
...
5109: Fix autobatch r=dureuill a=dureuill
Fixes most SDK tests and flaky failures
Changes:
- Make sure that the settings are not autobatched with document operations, as the new indexer no longer supports this operating mode
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-02 16:30:51 +00:00
Louis Dureuil
d040aff101
Stop allocating 1GiB for documents
2024-12-02 16:30:14 +01:00
Tamo
beeb31ce41
Update crates/index-scheduler/src/lib.rs
2024-12-02 15:32:16 +01:00
Louis Dureuil
057143214d
Fix warnings
2024-12-02 14:42:31 +01:00
Louis Dureuil
6a1d26a60c
Update autobatching tests
2024-12-02 14:15:15 +01:00
Louis Dureuil
d78f4666a0
Fix autobatching of documents and settings
2024-12-02 12:25:01 +01:00
Tamo
a439fa3e1a
While spamming the batches route we could see a processing batch becoming missing and then finished, this commit ensures the batches goes from processing to finished directly
2024-12-02 12:02:16 +01:00
Clément Renault
767259be7e
Prefer returning a abort indexation rather than throwing a panic
2024-12-02 11:53:42 +01:00
Clément Renault
e9f34fb4b1
Make the frame consumer pulling fair
2024-12-02 11:49:01 +01:00
Clément Renault
d5c07ef7b3
Manage key length conversion error correctly
2024-12-02 11:03:00 +01:00
Clément Renault
5e218f3f4d
Remove a sync_all (mark my words)
2024-12-02 11:03:00 +01:00
Clément Renault
bcab61ab1d
Do spurious wake ups on the receiver side
2024-12-02 11:03:00 +01:00
Clément Renault
263c5a348e
Move the spin looping for BBQueue frames into a dedicated function
2024-12-02 10:33:49 +01:00
Clément Renault
be7d2fbe63
Move the EntryHeader up in the file and document the safety related to the size
2024-12-02 10:19:11 +01:00
Clément Renault
f7f9a131e4
Improve copying bytes into aligned memory area
2024-12-02 10:15:58 +01:00
Clément Renault
5df5eb2db2
Clarify a method name
2024-12-02 10:10:48 +01:00
Clément Renault
30eb0e5b5b
Rename recv and read methods to recv_action and recv_frame
2024-12-02 10:08:01 +01:00
Clément Renault
5b860cb989
Fix english in the doc
2024-12-02 10:06:35 +01:00
Clément Renault
76d0623b11
Reduce the number of unwraps
2024-12-02 10:05:06 +01:00
Clément Renault
db4eaf4d2d
Rename serialize_into into serialize_into_writer
2024-12-02 10:03:27 +01:00
Clément Renault
13f21206a6
Call the serialize_into_writer method from the serialize_into one
2024-12-02 10:03:01 +01:00
Clément Renault
14ee7aa84c
Make sure the BBQueue is at least 50 MiB
2024-11-28 18:02:48 +01:00
Clément Renault
8a35cd1743
Adjust the BBQueue buffers to use 2% instead of 10%
2024-11-28 16:00:15 +01:00
meili-bors[bot]
8d33af1dff
Merge #5102
...
5102: Update mini-dashboard to v0.2.16 version r=curquiza a=curquiza
Fixes https://github.com/meilisearch/meilisearch/issues/5093
Fixes this bug: https://github.com/meilisearch/mini-dashboard/issues/563
Co-authored-by: curquiza <clementine@meilisearch.com>
2024-11-28 14:57:27 +00:00
Clément Renault
3c7ac093d3
Take the BBQueue capacity into account in the max memory
2024-11-28 15:43:14 +01:00
Clément Renault
b57dd5c58e
Remove the Vector variant and use the Vectors
2024-11-28 15:20:43 +01:00
ManyTheFish
90b428a8c3
Apply change requests
2024-11-28 15:16:13 +01:00
Clément Renault
096a28656e
Fix a bug around deleting all the vectors of a doc
2024-11-28 15:15:06 +01:00
curquiza
3dc87f5baa
Update mini-dashboard to v0.2.16 version
2024-11-28 14:33:05 +01:00
Clément Renault
cc4bd54669
Correctly construct the Embeddings struct
2024-11-28 13:53:25 +01:00
ManyTheFish
5383f41bba
Polish test_setting_routes!
2024-11-28 12:04:21 +01:00
Clément Renault
58eab9a018
Send large payload through crossbeam
2024-11-28 12:01:06 +01:00
ManyTheFish
9f36ffcbdb
Polish make_setting_routes!
2024-11-28 11:44:09 +01:00
ManyTheFish
68c4717e21
Change the settings tests and macros to avoid oversights
2024-11-28 11:34:35 +01:00
Clément Renault
5c488e20cc
Send the geo rtree through crossbeam channel
2024-11-27 18:03:45 +01:00
Clément Renault
da650f834e
Plug the NoPanicThreadPool in the tests and benchmarks
2024-11-27 17:04:49 +01:00
Clément Renault
e83534a430
Fix the indexer::index to correctly use the rayon::ThreadPool
2024-11-27 16:27:43 +01:00
Clément Renault
98d4a2909e
Fix the way we spawn the rayon threadpool
2024-11-27 16:05:44 +01:00
Clément Renault
a514ce472a
Make clippy happy
2024-11-27 14:59:04 +01:00
Clément Renault
cc63802115
Modify and return the IndexEmbeddings to write them later
2024-11-27 14:58:03 +01:00
Clément Renault
acec45ad7c
Send a WakeUp when writing data in the BBQueue buffers
2024-11-27 14:33:23 +01:00
Clément Renault
08d6413365
Fix result types
2024-11-27 14:32:42 +01:00
Clément Renault
70802eb7c7
Fix most issues with the lifetimes
2024-11-27 14:32:42 +01:00
Clément Renault
6ac5b3b136
Finish most of the channels types
2024-11-27 14:32:26 +01:00
Clément Renault
e1e76f39d0
Clean up dependencies
2024-11-27 14:30:34 +01:00
Clément Renault
2094ce8a9a
Move the arroy building after the writing loop
2024-11-27 14:30:33 +01:00
Clément Renault
8442db8101
Implement mostly all senders
2024-11-27 14:16:35 +01:00
Clément Renault
79671c9faa
Implement a first version of the bbqueue channels
2024-11-27 14:15:00 +01:00
meili-bors[bot]
a2f64f6552
Merge #5095
...
5095: Span to measure the part of db writes that is after the merge/extraction r=curquiza a=dureuill
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 11:10:00 +00:00
ManyTheFish
18a9af353c
Update Charabia version to v0.9.2
2024-11-27 11:12:08 +01:00
meili-bors[bot]
aae0dc715d
Merge #5063
...
5063: Fix pagination when embedding fails r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5045
## What does this PR do?
- Use `return_keyword_results` function when embedding fails
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 09:13:28 +00:00
meili-bors[bot]
d0b2c0a523
Merge #5091
...
5091: Settings opt out r=Kerollmops a=ManyTheFish
# Pull Request
Related PRD: https://www.notion.so/meilisearch/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4
## Related issue
Fixes #4979
- [x] Add setting opt-out
- [x] Add analytics
- [x] Add tests
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-11-26 15:50:28 +00:00
ManyTheFish
2e896f30a5
Fix PR comments
2024-11-26 16:06:33 +01:00
Louis Dureuil
8f57b4fdf4
Span to measure the part of db writes that is after the merge/extraction
2024-11-26 14:46:36 +01:00
Many the fish
f014e78684
Update crates/milli/src/index.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:46:01 +01:00
Many the fish
9008ecda3d
Update crates/meilisearch-types/src/settings.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:44:24 +01:00
ManyTheFish
d7bcfb2d19
fix clippy
2024-11-26 14:04:16 +01:00
meili-bors[bot]
fb66fec398
Merge #5092
...
5092: Precise spans for new indexer r=dureuill a=dureuill
- Separate extract and merge spans
- Add span around commit
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-26 10:59:40 +00:00
Louis Dureuil
fa15be5bc4
Add span around commit
2024-11-26 09:45:48 +01:00
Louis Dureuil
aa460819a7
Add more precise spans
2024-11-26 09:45:36 +01:00
meili-bors[bot]
e241f91285
Merge #5062
...
5062: Fix bugs for v1.12 r=Kerollmops a=ManyTheFish
# Pull Request
## Related issue
Fixes #4984
Fixes https://github.com/meilisearch/meilisearch/issues/4974
Fixes [SDK test](https://github.com/meilisearch/meilisearch/actions/runs/11886701996/job/33118278794 )
## What does this PR do?
- add 3 tests
- fix bugs
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-11-26 08:10:50 +00:00
ManyTheFish
d66dc363ed
Test and implement settings opt-out
2024-11-25 18:23:22 +01:00
meili-bors[bot]
5560452ef9
Merge #5089
...
5089: Improve error handling when writing into LMDB r=dureuill a=Kerollmops
This PR exposes two new internal error variants: `StoreDelete` and `StorePut`. So that the error messages are better when we fail at writing into LMDB.
Related to #5078
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-25 16:19:41 +00:00
Clément Renault
b4fb2dabd4
Use the grenad rayon feature
2024-11-25 16:31:21 +01:00
Clément Renault
5606679c53
Use the obkv and grenad crates.io versions
2024-11-25 16:24:59 +01:00
Clément Renault
a3103f347e
Fix the facet f64 database name
2024-11-25 16:05:31 +01:00
Clément Renault
25aac45fc7
Expose better error messages
2024-11-25 15:54:43 +01:00
meili-bors[bot]
98a785b0d7
Merge #5080
...
5080: Fix getting a single batch through the GET route r=Kerollmops a=dureuill
# Pull Request
## Related issue
Fixes a bug where getting a single batch does not work
Related to #5070
fix by `@Kerollmops`
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 17:08:46 +00:00
Louis Dureuil
ba7500998e
Fix getting a single batch through the GET route
2024-11-21 17:59:31 +01:00
meili-bors[bot]
19e6f675b3
Merge #4900
...
4900: Indexer edition 2024 r=Kerollmops a=dureuill
This PR is implementing the indexer edition 2024, largely inspired by [the ideas from this blog post](https://blog.kerollmops.com/meilisearch-is-too-slow ).
Fixes https://github.com/meilisearch/meilisearch/issues/4985
## Features
- Stream-first approach to reading documents.
- Minimum disk write operations.
- RAM usage-first approach to avoid modifying common bitmaps on disk but in memory.
- Reduced LMDB fragmentation by writing entries only once...
- ...computing the final version of the entries in parallel...
- ...and storing them in write-optimized data structures before sending them to the BTree (LMDB).
- Indexing in multiple transactions to improve large dataset support (dumps).
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 16:19:10 +00:00
Louis Dureuil
323ecbb885
Add span on document operation
2024-11-21 17:01:10 +01:00
Louis Dureuil
ffb60cb885
Add comment explaining why we fixed the version of insta
2024-11-21 16:56:56 +01:00
Louis Dureuil
dcc3caef0d
Remove TopLevelMap
2024-11-21 16:56:46 +01:00
Louis Dureuil
221e547e86
Slight changes
2024-11-21 16:47:44 +01:00
Clément Renault
61d0615253
Document the geo point extractor
2024-11-21 16:47:08 +01:00
Clément Renault
5727e00374
Remove useless geo skipped
2024-11-21 16:47:08 +01:00
Clément Renault
9b60843831
Remove commented lines
2024-11-21 16:47:07 +01:00
ManyTheFish
36962b943b
First batch of PR comment
2024-11-21 16:38:11 +01:00
Louis Dureuil
32bcacefd5
Changes Document::len to Document::top_level_fields_count
2024-11-21 15:01:07 +01:00
Louis Dureuil
4ed195426c
remove unused stuff in global.rs
2024-11-21 15:01:07 +01:00
Many the fish
ff38f29981
Update crates/index-scheduler/src/batch.rs
...
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-21 14:18:39 +01:00
ManyTheFish
94b260fd25
Remove orphan span
2024-11-21 12:12:07 +01:00
Clément Renault
ab2c83f868
Use the disk less when computing prefixes
2024-11-21 10:45:37 +01:00
Louis Dureuil
1f9692cd04
Increase map size for tests
2024-11-20 17:52:21 +01:00
Tamo
1e694ae432
improve the count of the number of tasks in a batch
2024-11-20 17:48:26 +01:00
Tamo
71807cac6d
makes clippy happy
2024-11-20 17:40:58 +01:00
Tamo
21a2264782
improve the details and stats of the current batch processing
2024-11-20 17:25:55 +01:00
Louis Dureuil
bda2b41d11
update snaps after merge
2024-11-20 17:08:30 +01:00
Louis Dureuil
6e6acfcf1b
Merge branch 'main' into indexer-edition-2024
2024-11-20 16:59:58 +01:00
Louis Dureuil
e0864f1b21
Separate side effect and debug asserts
2024-11-20 16:25:17 +01:00
Clément Renault
a38344acb3
Replace eprintlns by tracing
2024-11-20 15:29:51 +01:00
ManyTheFish
4d616f8794
Parse every attributes and filter before tokenization
2024-11-20 15:15:25 +01:00
Louis Dureuil
ff9c92c409
rename documents -> substep
2024-11-20 15:12:02 +01:00
Clément Renault
8380ddbdcd
Fix progress of into_changes
2024-11-20 15:10:09 +01:00
Louis Dureuil
867138f166
Add SP to into_changes
2024-11-20 15:07:05 +01:00
Clément Renault
567bd4538b
Fxi the into_changes stop processing
2024-11-20 14:58:25 +01:00
Louis Dureuil
84600a10d1
Add MSP to document_update.into_changes()
2024-11-20 14:53:37 +01:00
ManyTheFish
35bbe1c2a2
Add failing test on settings changes
2024-11-20 14:48:12 +01:00
Louis Dureuil
7d64e8dbd3
Fix Windows compilation
2024-11-20 14:40:38 +01:00
Tamo
ec06879d28
apply review changes
2024-11-20 14:40:36 +01:00
Tamo
83d1f858c1
Update crates/index-scheduler/src/lib.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-20 14:36:05 +01:00
Louis Dureuil
cae8c89467
"fix" last warnings
2024-11-20 14:03:52 +01:00
Tamo
a7ac590e9e
implements the reverse query parameter for the batches
2024-11-20 13:29:52 +01:00
Clément Renault
7cb8732b45
Introduce a new bincode internal error
2024-11-20 13:23:11 +01:00
Tamo
8ad68dd708
stop leaking the update files of the canceled tasks
2024-11-20 13:17:54 +01:00
ManyTheFish
fe5d50969a
Fix filed selector in extrators
2024-11-20 13:16:44 +01:00
Clément Renault
56c7c5d5f0
Fix comments
2024-11-20 13:16:44 +01:00
Louis Dureuil
4cdfdddd6d
Fix one more
2024-11-20 13:16:43 +01:00
Louis Dureuil
2afa33011a
Fix tokenize_document
2024-11-20 13:16:43 +01:00
Louis Dureuil
61feca1f41
More tests pass
2024-11-20 13:16:43 +01:00
Louis Dureuil
f893b5153e
Don't mark [""] as empty facet
2024-11-20 13:16:42 +01:00
Louis Dureuil
ca779c21f9
facets: Handle boolean and skip empty strings
2024-11-20 13:16:42 +01:00
Louis Dureuil
477077bdc2
Remove _vectors
from fid map when there are no vectors in sight
2024-11-20 13:16:42 +01:00
ManyTheFish
b1f8aec348
Fix index_documents_check_exists_database
2024-11-20 13:16:41 +01:00
ManyTheFish
ba7f091db3
Use tokenizer on numbers and booleans
2024-11-20 13:16:41 +01:00
Louis Dureuil
8049df125b
Add depth to facet extraction so that null inside an array doesn't mark the entire field as null
2024-11-20 13:16:40 +01:00
Clément Renault
50d1bd01df
We no longer index geo lat and lng
2024-11-20 13:16:40 +01:00
Louis Dureuil
a28d4f5d0c
Fix setup_search_index_with_criteria
2024-11-20 13:16:40 +01:00
Louis Dureuil
fc14f4bc66
Attempt to fix setup_search_index_with_criteria
2024-11-20 13:16:39 +01:00
Clément Renault
5f8a82d6f5
Improve test
2024-11-20 13:16:39 +01:00
Clément Renault
fe04e51a49
One more
2024-11-20 13:16:38 +01:00
Clément Renault
01b27e40ad
Fix a bit of the placeholder search tests
2024-11-20 13:16:38 +01:00
Louis Dureuil
8076d98544
Fix stats_should_not_return_deleted_documents
2024-11-20 13:16:37 +01:00
Louis Dureuil
9e951baad5
One more test passing
2024-11-20 13:16:37 +01:00
Louis Dureuil
52f2fc4c46
Fail in case of user error in tests
2024-11-20 13:16:37 +01:00
Clément Renault
3957917e0b
Correctly count indexed documents
2024-11-20 13:16:36 +01:00