67 Commits

Author SHA1 Message Date
Louis Dureuil
d6063079af
Unify facet strings by their normalized value 2025-01-22 15:50:42 +01:00
Louis Dureuil
a21711f473
Fix test 2025-01-14 10:23:59 +01:00
Clément Renault
00a03742ff
Prefer using extend when merging bitmaps than unions (less allocations) 2025-01-09 10:42:38 +01:00
Louis Dureuil
de7f8c4406
refactor indexer mod 2025-01-07 15:29:02 +01:00
Gnosnay
44eb153619 Replace hardcoded string with constants 2024-12-28 20:35:55 +08:00
ManyTheFish
acdd5aa6ea
Use the thread source id instead of the destination id
when filtering on the cache to merge
2024-12-12 18:12:00 +01:00
Kerollmops
2f3cc8cdd2
Fix the merge_caches_sorted function 2024-12-12 16:15:37 +01:00
meili-bors[bot]
1fc90fbacb
Merge #5147
5147: Batch progress r=dureuill a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/5068

## What does this PR do?
- ...

## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Tamo <tamo@meilisearch.com>
2024-12-12 09:15:54 +00:00
Tamo
df9b68f8ed
inital implementation of the progress 2024-12-11 16:25:01 +01:00
Louis Dureuil
bfca54cc2c
Return docid in case of errors while rendering the document template 2024-12-11 15:26:18 +01:00
Kerollmops
a751972c57
Prefer using a stable than a random hash builder 2024-12-10 14:25:53 +01:00
Kerollmops
89637bcaaf
Use bumparaw-collections in Meilisearch/milli 2024-12-10 11:52:20 +01:00
ManyTheFish
07f42e8057 Do not index a filed count when no word is counted 2024-12-09 15:45:12 +01:00
Louis Dureuil
bd5110a2fe
Fix clippy warnings 2024-12-05 16:13:07 +01:00
Louis Dureuil
fa8b9acdf6
Ignore documents that didn't change in facets 2024-12-05 16:12:52 +01:00
Louis Dureuil
2b74d1824b
Ignore documents that didn't change any field in word pair proximity 2024-12-05 15:56:22 +01:00
Louis Dureuil
c77b00d3ac
Don't extract word docids when no searchable changed 2024-12-05 15:51:58 +01:00
meili-bors[bot]
cac355bfa7
Merge #5124
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops

In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:

 - Optimize the prefix generation for word position docids (`@manythefish)`
 - Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
 
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
 
Before on the tag meilisearch-v1.12.0-rc.3.

```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```

After sorting the whole `HashMap`s in a `Vec` on this branch.

```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
Kerollmops
52843123d4
Clean up and remove the non-sorted merge_caches function 2024-12-05 10:03:05 +01:00
Louis Dureuil
5f896b1050
Fix geo when spilling 2024-12-04 17:51:12 +01:00
Kerollmops
2e32d0474c
Lexicographically sort all the map to merge 2024-12-04 17:05:11 +01:00
Kerollmops
cb99ac6f7e
Consume vec instead of draining 2024-12-04 17:00:22 +01:00
Kerollmops
be411435f5
Use the merge_caches_alt function in the docids merging 2024-12-04 16:37:29 +01:00
Kerollmops
29ef164530
Introduce a new semi ordered merge function 2024-12-04 16:33:35 +01:00
Clément Renault
db4eaf4d2d
Rename serialize_into into serialize_into_writer 2024-12-02 10:03:27 +01:00
Clément Renault
08d6413365
Fix result types 2024-11-27 14:32:42 +01:00
Clément Renault
70802eb7c7
Fix most issues with the lifetimes 2024-11-27 14:32:42 +01:00
Clément Renault
6ac5b3b136
Finish most of the channels types 2024-11-27 14:32:26 +01:00
Clément Renault
8442db8101
Implement mostly all senders 2024-11-27 14:16:35 +01:00
Louis Dureuil
221e547e86
Slight changes 2024-11-21 16:47:44 +01:00
Clément Renault
61d0615253
Document the geo point extractor 2024-11-21 16:47:08 +01:00
Clément Renault
5727e00374
Remove useless geo skipped 2024-11-21 16:47:08 +01:00
ManyTheFish
36962b943b First batch of PR comment 2024-11-21 16:38:11 +01:00
Clément Renault
a38344acb3
Replace eprintlns by tracing 2024-11-20 15:29:51 +01:00
ManyTheFish
4d616f8794 Parse every attributes and filter before tokenization 2024-11-20 15:15:25 +01:00
ManyTheFish
fe5d50969a
Fix filed selector in extrators 2024-11-20 13:16:44 +01:00
Clément Renault
56c7c5d5f0
Fix comments 2024-11-20 13:16:44 +01:00
Louis Dureuil
2afa33011a
Fix tokenize_document 2024-11-20 13:16:43 +01:00
Louis Dureuil
f893b5153e
Don't mark [""] as empty facet 2024-11-20 13:16:42 +01:00
Louis Dureuil
ca779c21f9
facets: Handle boolean and skip empty strings 2024-11-20 13:16:42 +01:00
ManyTheFish
b1f8aec348
Fix index_documents_check_exists_database 2024-11-20 13:16:41 +01:00
ManyTheFish
ba7f091db3
Use tokenizer on numbers and booleans 2024-11-20 13:16:41 +01:00
Louis Dureuil
8049df125b
Add depth to facet extraction so that null inside an array doesn't mark the entire field as null 2024-11-20 13:16:40 +01:00
ManyTheFish
41dbdd2d18 Fix filtered_placeholder_search_should_not_return_deleted_documents and word_scale_set_and_reset 2024-11-19 16:08:25 +01:00
Louis Dureuil
c782c09208
Move step to a dedicated mod and replace it with an enum 2024-11-18 18:22:13 +01:00
Louis Dureuil
04c38220ca
Move MostlySend, ThreadLocal, FullySend to their own commit 2024-11-18 16:43:05 +01:00
Louis Dureuil
5f93651cef
fixes 2024-11-18 16:23:11 +01:00
Louis Dureuil
0a21d9bfb3
Fix double borrow of new fields id map 2024-11-18 15:56:01 +01:00
Clément Renault
5b4c06c24c
Plug the grenad max memory parameter 2024-11-18 11:28:04 +01:00
ManyTheFish
4ff2b3c2ee Fix test on locales 2024-11-14 15:45:04 +01:00