410 Commits

Author SHA1 Message Date
ManyTheFish
659855c88e Refactor Settings Indexing process
**Changes:**
The transform structure is now relying on FieldIdMapWithMetadata and AttributePatterns to prepare
the obkv documents during a settings reindexing.
The InnerIndexSettingsDiff and InnerIndexSettings structs are now relying on FieldIdMapWithMetadata, FilterableAttributesRule and AttributePatterns to define the field and the databases that should be reindexed.
The faceted_fields_ids, localized_searchable_fields_ids and localized_faceted_fields_ids have been removed in favor of the FieldIdMapWithMetadata.
We are now relying on the FieldIdMapWithMetadata to retain vectors_fids from the facets and the searchables.

The searchable database computing is now relying on the FieldIdMapWithMetadata to know if a field is searchable and retrieve the locales.

The facet database computing is now relying on the FieldIdMapWithMetadata to compute the facet databases, the facet-search and retrieve the locales.

The facet level database computing is now relying on the FieldIdMapWithMetadata and the facet level database are cleared depending on the settings differences (clear_facet_levels_based_on_settings_diff).

The vector point extraction uses the FieldIdMapWithMetadata instead of FieldsIdsMapWithMetadata.

**Impact:**
- Dump import
- Settings update
2025-03-03 10:32:02 +01:00
ManyTheFish
286d310287 Fix inconsistency in attribute ranking rule computation
**Changes:**
The building of the Attributes ranking rule graph was comparing fieldids with weights
which doesn't make sense and may be bug prone, we are now comparing fieldids with fieldids.

**Impact:**
- search: Attribute ranking rule
2025-03-03 10:29:34 +01:00
ManyTheFish
4f7ece2411 Refactor the FieldIdMapWithMetadata
**Changes:**
The FieldIdMapWithMetadata structure now stores more information about fields.
The metadata_for_field function computes all the needed information relying on the user provided data instead of the enriched data (searchable/sortable)
which may solve an indexing bug on sortable attributes that was not matching the nested fields.

The FieldIdMapWithMetadata structure was duplicated in the embeddings as FieldsIdsMapWithMetadata,
so the FieldsIdsMapWithMetadata has been removed in favor of FieldIdMapWithMetadata.

The Facet distribution is now relying on the FieldIdMapWithMetadata with metadata to match is a field can be faceted.

**Impact:**
- searchable attributes matching
- searchable attributes weight computation
- sortable attributes matching
- faceted fields matching
- prompt computing
- facet distribution
2025-03-03 10:29:33 +01:00
ManyTheFish
967033579d Refactor search and facet-search
**Changes:**
The search filters are now using the FilterableAttributesFeatures from the FilterableAttributesRules to know if a field is filterable.
Moreover, the FilterableAttributesFeatures is more precise and an error will be returned if an operator is used on a field that doesn't have the related feature.
The facet-search is now checking if the feature is allowed in the FilterableAttributesFeatures and an error will be returned if the field doesn't have the related feature.

**Impact:**
- facet-search is now relying on AttributePatterns to match the locales
- search using filters is now relying on FilterableAttributesFeatures
- distinct attribute is now relying on FilterableAttributesRules
2025-03-03 10:25:32 +01:00
ManyTheFish
0200c65ebf Change the filterableAttributes setting API
**Changes:**
The filterableAttributes type has been changed from a `BTreeSet<String>` to a `Vec<FilterableAttributesRule>`,
Which is a list of rules defining patterns to match the documents' fields and a set of feature to apply on the matching fields.
The rule order given by the user is now an important information, the features applied on a filterable field will be chosen based on the rule order as we do for the LocalizedAttributesRules.
This means that the list will not be reordered anymore and will keep the user defined order,
moreover, if there are any duplicates, they will not be de-duplicated anymore.

**Impact:**
- Settings API
- the database format of the filterable attributes changed
- may impact the LocalizedAttributesRules due to the AttributePatterns factorization
- OpenAPI generator
2025-03-03 10:22:02 +01:00
meili-bors[bot]
c63c25a9a2
Merge #5355
5355: Support fetching the pooling method from the model configuration r=Kerollmops a=dureuill

# Pull Request

## Related issue
Fixes #5354 

## What does this PR do?
- Fetches the pooling configuration from the model repository
- Use a pooling method that depends on the pooling configuration of that model.
- Allow overriding the pooling method with a new huggingFace embedder parameter `pooling`
  - for backward-compatibility with Meilisearch v1.13
  - for compatibility with embedders that exhibit the same behavior as Meilisearch v1.13
- Handle the default value of that new parameter
   - for compatibility, when importing a db/a dump, it should be set to `forceMean`
   - when (re)set from the settings for an embedder, it should be set to `useModel`


Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2025-02-27 14:55:13 +00:00
Louis Dureuil
5e7f226ac9
Support dumpless upgrade for all v1.13 patches 2025-02-27 15:17:23 +01:00
ManyTheFish
d4063c9dcd
Fix fmt 2025-02-26 17:02:45 +01:00
Many the fish
abebc574f6
Update crates/milli/src/index.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-26 17:02:45 +01:00
Many the fish
f32ab67819
Update crates/milli/src/index.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-26 17:02:44 +01:00
ManyTheFish
d25953f322
fix clippy 2025-02-26 17:02:43 +01:00
ManyTheFish
405bbd04c1
Dumpless upgrade 2025-02-26 17:01:38 +01:00
ManyTheFish
9f3663e768
Implement Incremental document database stats computing 2025-02-26 17:01:35 +01:00
ManyTheFish
d9642ec916
Use checked_div in average computation 2025-02-26 17:01:34 +01:00
ManyTheFish
818e8b0237
Fix zero division 2025-02-26 17:01:31 +01:00
ManyTheFish
4f77a7fba5
fix clippy 2025-02-26 17:01:29 +01:00
ManyTheFish
9a6c1730aa
Add document database stats 2025-02-26 17:01:25 +01:00
ManyTheFish
15788773af
Check the exact_word database when computing zero typo query 2025-02-26 17:01:22 +01:00
Louis Dureuil
24fe6cd205
Fix multiple embeddings in hf 2025-02-24 16:24:04 +01:00
Louis Dureuil
e374b095a2
Fix tests 2025-02-24 14:11:26 +01:00
Louis Dureuil
9f3e4801b1
Refactor settings validation and introduce SubEmbedderSettings 2025-02-24 13:58:26 +01:00
Louis Dureuil
b85180fedb
Error types 2025-02-24 13:58:26 +01:00
Louis Dureuil
294cf39cad
Integrate composite embedder 2025-02-24 13:58:26 +01:00
Louis Dureuil
4a2643daa2
Rename embed_one to embed_search and embed_chunks* to embed_index* 2025-02-24 13:58:26 +01:00
Louis Dureuil
8d2d9066ba
Add composite embedder 2025-02-24 13:58:26 +01:00
Louis Dureuil
526476e168
Move settings test to its own file 2025-02-24 13:58:26 +01:00
Kerollmops
76fd5d92d7
Clarify the tail writing to database 2025-02-20 17:35:23 +01:00
Kerollmops
245a55722a
Remove commented code 2025-02-20 16:48:18 +01:00
Kerollmops
05cc8c650c
Expose the write channel congestion in the batches 2025-02-19 15:47:54 +01:00
Louis Dureuil
14e1459bf5
Document settings 2025-02-19 15:06:22 +01:00
Kerollmops
e9add14189
Reorder steps 2025-02-18 19:26:41 +01:00
Kerollmops
4a058a080e
Simplify the name generation 2025-02-18 18:48:44 +01:00
Kerollmops
11a11fc870
Accumulate step durations from the progress system 2025-02-18 18:33:19 +01:00
Louis Dureuil
7b4ce468a6
Allow overriding pooling method 2025-02-18 17:12:23 +01:00
Louis Dureuil
11759c4be4
Support pooling 2025-02-18 16:10:51 +01:00
meili-bors[bot]
0f1aeb8eaa
Merge #5351
5351: Bring back v1.13.0 changes into main r=irevoire a=Kerollmops

This PR brings back the changes made in v1.13 into the main branch.

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: Clémentine <clementine@meilisearch.com>
Co-authored-by: meili-bors[bot] <89034592+meili-bors[bot]@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
2025-02-18 08:05:02 +00:00
meili-bors[bot]
885710a07b
Merge #5341
5341: Embeddings stats r=ManyTheFish a=ManyTheFish

# Pull Request

## Related issue
Fixes #5321

## What does this PR do?
- Add embedding stats
- force dumpless upgrade to recompute stats
- add tests


Co-authored-by: ManyTheFish <many@meilisearch.com>
2025-02-12 15:46:37 +00:00
ManyTheFish
c55fdad2c3 Fix dumpless upgrade target version 2025-02-12 16:35:05 +01:00
ManyTheFish
8419ed52a1 fix clippy 2025-02-12 14:38:51 +01:00
Louis Dureuil
8e0d8d31f9
Add back timeout from v1.11.3 2025-02-12 11:53:00 +01:00
ManyTheFish
bd27fe7d02 force dumpless upgrade to recompute stats 2025-02-12 11:45:02 +01:00
ManyTheFish
41203f0931 Add embedders stats 2025-02-12 11:37:47 +01:00
Louis Dureuil
b83275c9c5
Change the updated* functions to only_new functions, hopefully better communicating what they do 2025-02-11 15:27:10 +01:00
Louis Dureuil
d7f35ee3ba
Use merged document instead of updated 2025-02-11 15:27:10 +01:00
meili-bors[bot]
0c3e7fe963
Merge #5316
5316: Fix the dumpless upgrade corruption r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5280

## What does this PR do?
- Add a test that ensure we write the version in the index-scheduler even if we have a bug while writing the VERSION file
- Do what was described in the issue


Co-authored-by: Tamo <tamo@meilisearch.com>
2025-02-10 09:53:57 +00:00
Tamo
45f843ccb9 fmt 2025-02-10 10:46:42 +01:00
Kerollmops
2b0e17ede0
Make sure arroy is using the rayon thread-pool 2025-02-06 15:28:10 +01:00
Louis Dureuil
04ac0af54b
Add WeightedScoreValues to be able to compare remote scores 2025-02-05 15:03:16 +01:00
Louis Dureuil
9996533364
Make search types serialize and deserialize so that reading from a proxy is possible 2025-02-05 15:03:16 +01:00
meili-bors[bot]
796acd1aee
Merge #5288
5288: Improve AI logging r=dureuill a=Kerollmops

This PR fixes #5285 and brings the changes from #5233 to simplify debugging indexation and search performance issues related to AI. The following texts can be found in the logs to debug and understand performance issues:

 - `embed_one: search` represents the time we spent waiting for the embedding generation, i.e., OpenAI, local HuggingFace, Ollama.
 - `filtered_universe: search::universe` the time spent filtering the documents.
 - ~`next_bucket: search::vector_sort` is the time spent finding the nearest neighbors (ANNs) in the vector store (arroy), locally~ was being triggered too many times.
 - `indexing::vectors` is the time arroy spends indexing the new vectors for a batch.
 - `documents::extract vectors` and `documents::merge vectors` to see the time spent generating and writing the embeddings.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2025-02-04 10:20:45 +00:00