307 Commits

Author SHA1 Message Date
ManyTheFish
659855c88e Refactor Settings Indexing process
**Changes:**
The transform structure is now relying on FieldIdMapWithMetadata and AttributePatterns to prepare
the obkv documents during a settings reindexing.
The InnerIndexSettingsDiff and InnerIndexSettings structs are now relying on FieldIdMapWithMetadata, FilterableAttributesRule and AttributePatterns to define the field and the databases that should be reindexed.
The faceted_fields_ids, localized_searchable_fields_ids and localized_faceted_fields_ids have been removed in favor of the FieldIdMapWithMetadata.
We are now relying on the FieldIdMapWithMetadata to retain vectors_fids from the facets and the searchables.

The searchable database computing is now relying on the FieldIdMapWithMetadata to know if a field is searchable and retrieve the locales.

The facet database computing is now relying on the FieldIdMapWithMetadata to compute the facet databases, the facet-search and retrieve the locales.

The facet level database computing is now relying on the FieldIdMapWithMetadata and the facet level database are cleared depending on the settings differences (clear_facet_levels_based_on_settings_diff).

The vector point extraction uses the FieldIdMapWithMetadata instead of FieldsIdsMapWithMetadata.

**Impact:**
- Dump import
- Settings update
2025-03-03 10:32:02 +01:00
meili-bors[bot]
c63c25a9a2
Merge #5355
5355: Support fetching the pooling method from the model configuration r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #5354 

## What does this PR do?
- Fetches the pooling configuration from the model repository
- Use a pooling method that depends on the pooling configuration of that model.
- Allow overriding the pooling method with a new huggingFace embedder parameter `pooling`
  - for backward-compatibility with Meilisearch v1.13
  - for compatibility with embedders that exhibit the same behavior as Meilisearch v1.13
- Handle the default value of that new parameter
   - for compatibility, when importing a db/a dump, it should be set to `forceMean`
   - when (re)set from the settings for an embedder, it should be set to `useModel`


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-02-27 14:55:13 +00:00
Louis Dureuil
5e7f226ac9
Support dumpless upgrade for all v1.13 patches 2025-02-27 15:17:23 +01:00
ManyTheFish
d25953f322
fix clippy 2025-02-26 17:02:43 +01:00
ManyTheFish
405bbd04c1
Dumpless upgrade 2025-02-26 17:01:38 +01:00
ManyTheFish
9f3663e768
Implement Incremental document database stats computing 2025-02-26 17:01:35 +01:00
Louis Dureuil
e374b095a2
Fix tests 2025-02-24 14:11:26 +01:00
Louis Dureuil
9f3e4801b1
Refactor settings validation and introduce SubEmbedderSettings 2025-02-24 13:58:26 +01:00
Louis Dureuil
4a2643daa2
Rename embed_one to embed_search and embed_chunks* to embed_index* 2025-02-24 13:58:26 +01:00
Louis Dureuil
526476e168
Move settings test to its own file 2025-02-24 13:58:26 +01:00
Kerollmops
76fd5d92d7
Clarify the tail writing to database 2025-02-20 17:35:23 +01:00
Kerollmops
245a55722a
Remove commented code 2025-02-20 16:48:18 +01:00
Kerollmops
05cc8c650c
Expose the write channel congestion in the batches 2025-02-19 15:47:54 +01:00
Louis Dureuil
7b4ce468a6
Allow overriding pooling method 2025-02-18 17:12:23 +01:00
meili-bors[bot]
0f1aeb8eaa
Merge #5351
5351: Bring back v1.13.0 changes into main r=irevoire a=Kerollmops

This PR brings back the changes made in v1.13 into the main branch.

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clémentine <clementine@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2025-02-18 08:05:02 +00:00
ManyTheFish
c55fdad2c3 Fix dumpless upgrade target version 2025-02-12 16:35:05 +01:00
ManyTheFish
bd27fe7d02 force dumpless upgrade to recompute stats 2025-02-12 11:45:02 +01:00
Louis Dureuil
b83275c9c5
Change the updated* functions to only_new functions, hopefully better communicating what they do 2025-02-11 15:27:10 +01:00
Louis Dureuil
d7f35ee3ba
Use merged document instead of updated 2025-02-11 15:27:10 +01:00
meili-bors[bot]
0c3e7fe963
Merge #5316
5316: Fix the dumpless upgrade corruption r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5280

## What does this PR do?
- Add a test that ensure we write the version in the index-scheduler even if we have a bug while writing the VERSION file
- Do what was described in the issue


Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-10 09:53:57 +00:00
Tamo
45f843ccb9 fmt 2025-02-10 10:46:42 +01:00
Kerollmops
2b0e17ede0
Make sure arroy is using the rayon thread-pool 2025-02-06 15:28:10 +01:00
meili-bors[bot]
796acd1aee
Merge #5288
5288: Improve AI logging r=dureuill a=Kerollmops

This PR fixes #5285 and brings the changes from #5233 to simplify debugging indexation and search performance issues related to AI. The following texts can be found in the logs to debug and understand performance issues:

 - `embed_one: search` represents the time we spent waiting for the embedding generation, i.e., OpenAI, local HuggingFace, Ollama.
 - `filtered_universe: search::universe` the time spent filtering the documents.
 - ~`next_bucket: search::vector_sort` is the time spent finding the nearest neighbors (ANNs) in the vector store (arroy), locally~ was being triggered too many times.
 - `indexing::vectors` is the time arroy spends indexing the new vectors for a batch.
 - `documents::extract vectors` and `documents::merge vectors` to see the time spent generating and writing the embeddings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-04 10:20:45 +00:00
Tamo
d34f0b606c
Update crates/milli/src/update/new/document_change.rs 2025-02-03 12:08:52 +01:00
Kerollmops
acc400face
Support merging update and replacement operations 2025-02-03 11:47:17 +01:00
Kerollmops
aa2327591e
Add more mixing updates and replacements tests 2025-02-03 10:34:07 +01:00
Kerollmops
60470bb647
Fix the tests to use the new replace/update documents 2025-02-03 10:34:07 +01:00
Kerollmops
8e6893ddbe
Make sure we correctly mix different document operations 2025-02-03 10:34:06 +01:00
Kerollmops
424c5bde40
Move the embedding computation and extraction log to debug 2025-01-29 16:40:36 +01:00
Kerollmops
cb1b7513af
Log the memory metrics only once 2025-01-29 15:21:52 +01:00
Clément Renault
a9d0f4a002
Improve english comments
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-01-29 15:16:40 +01:00
Kerollmops
db032079d8
Show indexation allocated memory 2025-01-29 14:21:02 +01:00
Clément Renault
a00796c46a
Improve the naming in the log message 2025-01-29 14:21:02 +01:00
Kerollmops
6112bd8caa
Display the channel congestion 2025-01-29 14:21:02 +01:00
Kerollmops
cec88cfc29
Measure the bbqueue congestion 2025-01-29 14:21:02 +01:00
Kerollmops
4a5923a55e
log the time arroy took to insert embeddings 2025-01-27 14:22:17 +01:00
Clément Renault
9b579069df
Comment the max grant of the bbqueue
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-01-24 12:18:32 +01:00
Louis Dureuil
f5a4a1c8b2
Give more RAM to bbqueue.
- bbqueue buffers used to have (5% * 2%) / num_threads
- they now have 5% / num_threads
2025-01-24 12:18:32 +01:00
Kerollmops
5ab4cdb1f3
Reduce the maximum grant possible we can store in the BBQueue 2025-01-24 12:18:32 +01:00
Tamo
787472453d
write the version of the index while upgrading it 2025-01-23 16:51:24 +01:00
Tamo
c27c923439
introduce a trait to upgrade the indexes 2025-01-23 16:51:23 +01:00
Tamo
41eeffd88d
fmt 2025-01-23 16:51:20 +01:00
Tamo
20ac59c946
fix the field distribution when upgrading from the v1_12 2025-01-23 16:51:19 +01:00
Tamo
cfc1e193b6
update the test with the stats 2025-01-23 16:51:19 +01:00
Tamo
0cc25c7e4c
add a large test importing a data.ms from the v1.12.0 2025-01-23 16:51:18 +01:00
Tamo
3ef7a478cd
move the version check to the task queue 2025-01-23 16:48:32 +01:00
Tamo
d3654906bf
Add the new tasks with most of the job done 2025-01-23 16:48:32 +01:00
Louis Dureuil
d6063079af
Unify facet strings by their normalized value 2025-01-22 15:50:42 +01:00
Louis Dureuil
a6470a0c37
Improve error log 2025-01-22 15:50:41 +01:00
Louis Dureuil
8a54f14b8e
Demote panic to error log 2025-01-22 15:49:24 +01:00