342 Commits

Author SHA1 Message Date
Louis Dureuil
c77073efcc
Update::has_changed_for_fields 2024-12-05 15:50:12 +01:00
meili-bors[bot]
1537323eb9
Merge #5119
5119: Settings opt out error msg r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
PRD: https://meilisearch.notion.site/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4
## What does this PR do?

Add a new error code and message when the user tries a facet search on an index where the facet search is disabled:
```json
{
  "message": "The facet search is disabled for this index",
  "code": "facet_search_disabled",
  "type": "invalid_request",
  "link": "https://docs.meilisearch.com/errors#invalid_facet_search_disabled"
}
 ```


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-12-05 13:51:11 +00:00
ManyTheFish
a0a3b55700 Change error code 2024-12-05 14:48:29 +01:00
Tamo
214b51de87
try to fix the snapshot on demand flaky test 2024-12-05 14:45:54 +01:00
Tamo
95975944d7
fix the dumps missing the empty swap index tasks 2024-12-05 14:23:38 +01:00
meili-bors[bot]
9a9383643f
Merge #5125
5125: Change the default max memory usage to 5% of the total memory r=ManyTheFish a=Kerollmops

After thorough testing, we found that giving 5% of the total available memory to allocate resident memory (caches and channels) is the best approach.

The main reason is that the new indexer is highly memory-map oriented, with LMDB, and reads the database while performing the indexation. So, by allowing the maximum amount of memory available to LMDB and the OS, it will perform the key-value store reads and all other indexation operations faster by keeping more pages hot in the cache. In #5124, we also sorted the entries to merge to improve the read speed of LMDB.

This is common in database management systems: Reading stuff on the disk is much faster when done in lexicographic order (the default sorted order of key values). The entries have a great chance of already being in the OS memory cache, as they were loaded in a previous read, and reading stuff on the disk is very slow compared to reading memory.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 10:11:25 +00:00
meili-bors[bot]
cac355bfa7
Merge #5124
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops

In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:

 - Optimize the prefix generation for word position docids (`@manythefish)`
 - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
 
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
 
Before on the tag meilisearch-v1.12.0-rc.3.

```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```

After sorting the whole `HashMap`s in a `Vec` on this branch.

```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
Kerollmops
9020a50df8
Change the default max memory usage to 5% of the total memory 2024-12-05 10:14:46 +01:00
Kerollmops
52843123d4
Clean up and remove the non-sorted merge_caches function 2024-12-05 10:03:05 +01:00
meili-bors[bot]
6298db5bea
Merge #5113
5113: Fix the Minimum BBQueue channel threshold r=Kerollmops a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-05 09:01:02 +00:00
meili-bors[bot]
a003a0934a
Merge #5121
5121: Make the tasks pulling timeout configurable r=dureuill a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 17:04:14 +00:00
Louis Dureuil
3a11e39c01
Force max_memory to a min of 100MiB 2024-12-04 17:53:30 +01:00
Louis Dureuil
5f896b1050
Fix geo when spilling 2024-12-04 17:51:12 +01:00
Kerollmops
d0c4e6da6b
Make clippy happy 2024-12-04 17:39:10 +01:00
Kerollmops
2da5584bb5
Make the tasks pulling timeout configurable 2024-12-04 17:39:07 +01:00
Kerollmops
2e32d0474c
Lexicographically sort all the map to merge 2024-12-04 17:05:11 +01:00
Kerollmops
cb99ac6f7e
Consume vec instead of draining 2024-12-04 17:00:22 +01:00
Kerollmops
be411435f5
Use the merge_caches_alt function in the docids merging 2024-12-04 16:37:29 +01:00
Kerollmops
29ef164530
Introduce a new semi ordered merge function 2024-12-04 16:33:35 +01:00
ManyTheFish
739c52a3cd Replace HashSets by BTreeSets for the prefixes 2024-12-04 16:16:48 +01:00
Tamo
7a2af06b1e
update the impacted snapshots 2024-12-04 15:52:24 +01:00
Tamo
cb0c3a5aad
stop adding one enqueued tasks to all unprioritized batches 2024-12-04 15:48:28 +01:00
Tamo
cbcf6c9ba3
make the processing tasks as processing in a batch 2024-12-04 14:48:48 +01:00
Tamo
bf742d81cf
add a test 2024-12-04 14:47:02 +01:00
ManyTheFish
fc1df5793c fix tests 2024-12-04 14:35:20 +01:00
meili-bors[bot]
3ded069042
Merge #5122
5122: Yield the BBQueue writing loop r=ManyTheFish a=Kerollmops

We prefer yielding to let the writing thread do its job instead of spin looping.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-04 13:33:51 +00:00
Kerollmops
261d2ceb06
Yield the BBQueue writer instead of spin looping 2024-12-04 14:16:40 +01:00
meili-bors[bot]
5b8cd68abe
Merge #5110
5110: Increase margin on deletion of task r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5077

## What does this PR do?
- Increase the margin we keep to enqueue task deletion

The issue was that we had not enough space on the reserved memory to write both the batch and the deletion task we just enqueued.
We could fix it only for this test as it’s not an issue in production where we have 10GiB of margin, but I thought it wasn’t a bad idea either to increase our margin a bit since we’re effectively writing more to lmdb.


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-04 12:54:48 +00:00
ManyTheFish
953a82ca04 Add new error message 2024-12-04 11:15:29 +01:00
Kerollmops
96831ed9bb
Send the WakeUp message if necessary in the reserve function 2024-12-04 11:03:01 +01:00
Kerollmops
0459b1a242
Change the reserve and grant function to accept a closure 2024-12-04 10:32:25 +01:00
Kerollmops
8ecb726683
Fix the minimun BBQueue channel threshold 2024-12-03 15:49:11 +01:00
Clément Renault
0ad2f57a92
Update bbqueue repo to point to the meilisearch org 2024-12-03 12:00:04 +01:00
Tamo
71d53f413f increase the margin allowed to delete task 2024-12-03 11:07:03 +01:00
meili-bors[bot]
054622bd16
Merge #5094
5094: Implement a bbqueue channel between the extractors and the writer r=dureuill a=Kerollmops

This PR switches from a bounded crossbeam channel only with allocated entries for the communication between the extractors and the writer to a [BBQueue](https://github.com/jamesmunns/bbqueue)-based system with a Single Producer Single Consumer kind of Circular/Ring Buffers channel.

 - [x] Implement the BBQueue channel system...
 - [x] with a crossbeam channel to wake up the receiver.
 - [x] Manage the BBQueue allocated memory dynamically.
 - [x] Support content that doesn't fit in the bbqueues.

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-12-03 08:00:55 +00:00
Louis Dureuil
e905a72d73 remove mimalloc on Windows 2024-12-02 18:13:56 +01:00
meili-bors[bot]
2e879c1df8
Merge #5109
5109: Fix autobatch r=dureuill a=dureuill

Fixes most SDK tests and flaky failures

Changes:

- Make sure that the settings are not autobatched with document operations, as the new indexer no longer supports this operating mode

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-12-02 16:30:51 +00:00
Louis Dureuil
d040aff101 Stop allocating 1GiB for documents 2024-12-02 16:30:14 +01:00
Tamo
beeb31ce41
Update crates/index-scheduler/src/lib.rs 2024-12-02 15:32:16 +01:00
Louis Dureuil
057143214d
Fix warnings 2024-12-02 14:42:31 +01:00
Louis Dureuil
6a1d26a60c
Update autobatching tests 2024-12-02 14:15:15 +01:00
Louis Dureuil
d78f4666a0
Fix autobatching of documents and settings 2024-12-02 12:25:01 +01:00
Tamo
a439fa3e1a
While spamming the batches route we could see a processing batch becoming missing and then finished, this commit ensures the batches goes from processing to finished directly 2024-12-02 12:02:16 +01:00
Clément Renault
767259be7e
Prefer returning a abort indexation rather than throwing a panic 2024-12-02 11:53:42 +01:00
Clément Renault
e9f34fb4b1
Make the frame consumer pulling fair 2024-12-02 11:49:01 +01:00
Clément Renault
d5c07ef7b3
Manage key length conversion error correctly 2024-12-02 11:03:00 +01:00
Clément Renault
5e218f3f4d
Remove a sync_all (mark my words) 2024-12-02 11:03:00 +01:00
Clément Renault
bcab61ab1d
Do spurious wake ups on the receiver side 2024-12-02 11:03:00 +01:00
Clément Renault
263c5a348e
Move the spin looping for BBQueue frames into a dedicated function 2024-12-02 10:33:49 +01:00
Clément Renault
be7d2fbe63
Move the EntryHeader up in the file and document the safety related to the size 2024-12-02 10:19:11 +01:00
Clément Renault
f7f9a131e4
Improve copying bytes into aligned memory area 2024-12-02 10:15:58 +01:00
Clément Renault
5df5eb2db2
Clarify a method name 2024-12-02 10:10:48 +01:00
Clément Renault
30eb0e5b5b
Rename recv and read methods to recv_action and recv_frame 2024-12-02 10:08:01 +01:00
Clément Renault
5b860cb989
Fix english in the doc 2024-12-02 10:06:35 +01:00
Clément Renault
76d0623b11
Reduce the number of unwraps 2024-12-02 10:05:06 +01:00
Clément Renault
db4eaf4d2d
Rename serialize_into into serialize_into_writer 2024-12-02 10:03:27 +01:00
Clément Renault
13f21206a6
Call the serialize_into_writer method from the serialize_into one 2024-12-02 10:03:01 +01:00
Clément Renault
14ee7aa84c
Make sure the BBQueue is at least 50 MiB 2024-11-28 18:02:48 +01:00
Clément Renault
8a35cd1743
Adjust the BBQueue buffers to use 2% instead of 10% 2024-11-28 16:00:15 +01:00
meili-bors[bot]
8d33af1dff
Merge #5102
5102: Update mini-dashboard to v0.2.16 version r=curquiza a=curquiza

Fixes https://github.com/meilisearch/meilisearch/issues/5093

Fixes this bug: https://github.com/meilisearch/mini-dashboard/issues/563

Co-authored-by: curquiza <clementine@meilisearch.com>
2024-11-28 14:57:27 +00:00
Clément Renault
3c7ac093d3
Take the BBQueue capacity into account in the max memory 2024-11-28 15:43:14 +01:00
Clément Renault
b57dd5c58e
Remove the Vector variant and use the Vectors 2024-11-28 15:20:43 +01:00
ManyTheFish
90b428a8c3 Apply change requests 2024-11-28 15:16:13 +01:00
Clément Renault
096a28656e
Fix a bug around deleting all the vectors of a doc 2024-11-28 15:15:06 +01:00
curquiza
3dc87f5baa Update mini-dashboard to v0.2.16 version 2024-11-28 14:33:05 +01:00
Clément Renault
cc4bd54669
Correctly construct the Embeddings struct 2024-11-28 13:53:25 +01:00
ManyTheFish
5383f41bba Polish test_setting_routes! 2024-11-28 12:04:21 +01:00
Clément Renault
58eab9a018
Send large payload through crossbeam 2024-11-28 12:01:06 +01:00
ManyTheFish
9f36ffcbdb Polish make_setting_routes! 2024-11-28 11:44:09 +01:00
ManyTheFish
68c4717e21 Change the settings tests and macros to avoid oversights 2024-11-28 11:34:35 +01:00
Clément Renault
5c488e20cc
Send the geo rtree through crossbeam channel 2024-11-27 18:03:45 +01:00
Clément Renault
da650f834e
Plug the NoPanicThreadPool in the tests and benchmarks 2024-11-27 17:04:49 +01:00
Clément Renault
e83534a430
Fix the indexer::index to correctly use the rayon::ThreadPool 2024-11-27 16:27:43 +01:00
Clément Renault
98d4a2909e
Fix the way we spawn the rayon threadpool 2024-11-27 16:05:44 +01:00
Clément Renault
a514ce472a
Make clippy happy 2024-11-27 14:59:04 +01:00
Clément Renault
cc63802115
Modify and return the IndexEmbeddings to write them later 2024-11-27 14:58:03 +01:00
Clément Renault
acec45ad7c
Send a WakeUp when writing data in the BBQueue buffers 2024-11-27 14:33:23 +01:00
Clément Renault
08d6413365
Fix result types 2024-11-27 14:32:42 +01:00
Clément Renault
70802eb7c7
Fix most issues with the lifetimes 2024-11-27 14:32:42 +01:00
Clément Renault
6ac5b3b136
Finish most of the channels types 2024-11-27 14:32:26 +01:00
Clément Renault
e1e76f39d0
Clean up dependencies 2024-11-27 14:30:34 +01:00
Clément Renault
2094ce8a9a
Move the arroy building after the writing loop 2024-11-27 14:30:33 +01:00
Clément Renault
8442db8101
Implement mostly all senders 2024-11-27 14:16:35 +01:00
Clément Renault
79671c9faa
Implement a first version of the bbqueue channels 2024-11-27 14:15:00 +01:00
meili-bors[bot]
a2f64f6552
Merge #5095
5095: Span to measure the part of db writes that is after the merge/extraction r=curquiza a=dureuill



Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 11:10:00 +00:00
ManyTheFish
18a9af353c Update Charabia version to v0.9.2 2024-11-27 11:12:08 +01:00
meili-bors[bot]
aae0dc715d
Merge #5063
5063: Fix pagination when embedding fails r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5045

## What does this PR do?
- Use `return_keyword_results` function when embedding fails


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-27 09:13:28 +00:00
meili-bors[bot]
d0b2c0a523
Merge #5091
5091: Settings opt out r=Kerollmops a=ManyTheFish

# Pull Request

Related PRD: https://www.notion.so/meilisearch/API-usage-Settings-to-opt-out-indexing-features-fff4b06b651f8108ade3f858aeb16b14?pvs=4

## Related issue
Fixes #4979 

- [x] Add setting opt-out
- [x] Add analytics
- [x] Add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2024-11-26 15:50:28 +00:00
ManyTheFish
2e896f30a5 Fix PR comments 2024-11-26 16:06:33 +01:00
Louis Dureuil
8f57b4fdf4
Span to measure the part of db writes that is after the merge/extraction 2024-11-26 14:46:36 +01:00
Many the fish
f014e78684
Update crates/milli/src/index.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:46:01 +01:00
Many the fish
9008ecda3d
Update crates/meilisearch-types/src/settings.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-26 14:44:24 +01:00
ManyTheFish
d7bcfb2d19 fix clippy 2024-11-26 14:04:16 +01:00
meili-bors[bot]
fb66fec398
Merge #5092
5092: Precise spans for new indexer r=dureuill a=dureuill

- Separate extract and merge spans
- Add span around commit

Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-11-26 10:59:40 +00:00
Louis Dureuil
fa15be5bc4
Add span around commit 2024-11-26 09:45:48 +01:00
Louis Dureuil
aa460819a7
Add more precise spans 2024-11-26 09:45:36 +01:00
meili-bors[bot]
e241f91285
Merge #5062
5062: Fix bugs for v1.12 r=Kerollmops a=ManyTheFish

# Pull Request

## Related issue
Fixes #4984
Fixes https://github.com/meilisearch/meilisearch/issues/4974
Fixes [SDK test](https://github.com/meilisearch/meilisearch/actions/runs/11886701996/job/33118278794)
## What does this PR do?
- add 3 tests
- fix bugs

Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-11-26 08:10:50 +00:00
ManyTheFish
d66dc363ed Test and implement settings opt-out 2024-11-25 18:23:22 +01:00
meili-bors[bot]
5560452ef9
Merge #5089
5089: Improve error handling when writing into LMDB r=dureuill a=Kerollmops

This PR exposes two new internal error variants: `StoreDelete` and `StorePut`. So that the error messages are better when we fail at writing into LMDB.

Related to #5078

Co-authored-by: Clément Renault <clement@meilisearch.com>
2024-11-25 16:19:41 +00:00
Clément Renault
b4fb2dabd4
Use the grenad rayon feature 2024-11-25 16:31:21 +01:00