Louis Dureuil
bd5110a2fe
Fix clippy warnings
2024-12-05 16:13:07 +01:00
Louis Dureuil
fa8b9acdf6
Ignore documents that didn't change in facets
2024-12-05 16:12:52 +01:00
Louis Dureuil
2b74d1824b
Ignore documents that didn't change any field in word pair proximity
2024-12-05 15:56:22 +01:00
Louis Dureuil
c77b00d3ac
Don't extract word docids when no searchable changed
2024-12-05 15:51:58 +01:00
meili-bors[bot]
cac355bfa7
Merge #5124
...
5124: Optimize Prefixes and Merges r=ManyTheFish a=Kerollmops
In this PR, we plan to optimize the read of LMDB to use read the entries in lexicographic order and better use the memory-mapping OS cache:
- Optimize the prefix generation for word position docids (`@manythefish)`
- Optimize the parallel merging of the caches to sort entries before merging the caches (`@kerollmops)`
## Benchmarks on 1cpu 2gb gpo3 (5k IOps)
Before on the tag meilisearch-v1.12.0-rc.3.
```
word_position_docids:merge_and_send_docids: 988s
compute_word_fst: 23.3s
word_pair_proximity_docids:merge_and_send_docids: 428s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 76.3s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 429s
```
After sorting the whole `HashMap`s in a `Vec` on this branch.
```
word_position_docids:merge_and_send_docids: 202s
compute_word_fst: 20.4s
word_pair_proximity_docids:merge_and_send_docids: 427s
compute_word_prefix_fid_docids:recompute_modified_prefixes: 65.5s
compute_word_prefix_position_docids:recompute_modified_prefixes:from_prefixes: 62.5s
```
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2024-12-05 09:35:52 +00:00
Kerollmops
52843123d4
Clean up and remove the non-sorted merge_caches function
2024-12-05 10:03:05 +01:00
Louis Dureuil
5f896b1050
Fix geo when spilling
2024-12-04 17:51:12 +01:00
Kerollmops
2e32d0474c
Lexicographically sort all the map to merge
2024-12-04 17:05:11 +01:00
Kerollmops
cb99ac6f7e
Consume vec instead of draining
2024-12-04 17:00:22 +01:00
Kerollmops
be411435f5
Use the merge_caches_alt function in the docids merging
2024-12-04 16:37:29 +01:00
Kerollmops
29ef164530
Introduce a new semi ordered merge function
2024-12-04 16:33:35 +01:00
Clément Renault
db4eaf4d2d
Rename serialize_into into serialize_into_writer
2024-12-02 10:03:27 +01:00
Clément Renault
08d6413365
Fix result types
2024-11-27 14:32:42 +01:00
Clément Renault
70802eb7c7
Fix most issues with the lifetimes
2024-11-27 14:32:42 +01:00
Clément Renault
6ac5b3b136
Finish most of the channels types
2024-11-27 14:32:26 +01:00
Clément Renault
8442db8101
Implement mostly all senders
2024-11-27 14:16:35 +01:00
Louis Dureuil
221e547e86
Slight changes
2024-11-21 16:47:44 +01:00
Clément Renault
61d0615253
Document the geo point extractor
2024-11-21 16:47:08 +01:00
Clément Renault
5727e00374
Remove useless geo skipped
2024-11-21 16:47:08 +01:00
ManyTheFish
36962b943b
First batch of PR comment
2024-11-21 16:38:11 +01:00
Clément Renault
a38344acb3
Replace eprintlns by tracing
2024-11-20 15:29:51 +01:00
ManyTheFish
4d616f8794
Parse every attributes and filter before tokenization
2024-11-20 15:15:25 +01:00
ManyTheFish
fe5d50969a
Fix filed selector in extrators
2024-11-20 13:16:44 +01:00
Clément Renault
56c7c5d5f0
Fix comments
2024-11-20 13:16:44 +01:00
Louis Dureuil
2afa33011a
Fix tokenize_document
2024-11-20 13:16:43 +01:00
Louis Dureuil
f893b5153e
Don't mark [""] as empty facet
2024-11-20 13:16:42 +01:00
Louis Dureuil
ca779c21f9
facets: Handle boolean and skip empty strings
2024-11-20 13:16:42 +01:00
ManyTheFish
b1f8aec348
Fix index_documents_check_exists_database
2024-11-20 13:16:41 +01:00
ManyTheFish
ba7f091db3
Use tokenizer on numbers and booleans
2024-11-20 13:16:41 +01:00
Louis Dureuil
8049df125b
Add depth to facet extraction so that null inside an array doesn't mark the entire field as null
2024-11-20 13:16:40 +01:00
ManyTheFish
41dbdd2d18
Fix filtered_placeholder_search_should_not_return_deleted_documents and word_scale_set_and_reset
2024-11-19 16:08:25 +01:00
Louis Dureuil
c782c09208
Move step to a dedicated mod and replace it with an enum
2024-11-18 18:22:13 +01:00
Louis Dureuil
04c38220ca
Move MostlySend, ThreadLocal, FullySend to their own commit
2024-11-18 16:43:05 +01:00
Louis Dureuil
5f93651cef
fixes
2024-11-18 16:23:11 +01:00
Louis Dureuil
0a21d9bfb3
Fix double borrow of new fields id map
2024-11-18 15:56:01 +01:00
Clément Renault
5b4c06c24c
Plug the grenad max memory parameter
2024-11-18 11:28:04 +01:00
ManyTheFish
4ff2b3c2ee
Fix test on locales
2024-11-14 15:45:04 +01:00
ManyTheFish
91c58cfa38
Fix positional databases
2024-11-14 11:40:12 +01:00
Clément Renault
9e8367f1e6
Move the rayon thread pool outside the extract method
2024-11-14 10:40:32 +01:00
Clément Renault
8e5b1a3ec1
Compute the field distribution and convert _geo into an f64s
2024-11-13 17:44:05 +01:00
ManyTheFish
e627e182ce
Fix facet strings
2024-11-13 17:43:02 +01:00
ManyTheFish
51b6293738
Add linear facet databases
2024-11-13 17:43:02 +01:00
Clément Renault
b17896d899
Finialize the GeoExtractor
2024-11-13 17:43:02 +01:00
Louis Dureuil
3b0cb5b487
Fix vector error messages
2024-11-12 23:26:16 +01:00
Louis Dureuil
c4e9f761e9
Emit better error messages when parsing vectors
2024-11-12 22:49:22 +01:00
Louis Dureuil
980921e078
Vector fixes
2024-11-12 16:31:22 +01:00
Louis Dureuil
6094bb299a
Fix user_provided vectors
2024-11-12 10:15:55 +01:00
ManyTheFish
1f5d801271
Fix crashes in facet search indexing
2024-11-07 17:22:30 +01:00
Clément Renault
0e4e9e866a
Move the RefCellExt trait in a dedicated module
2024-11-07 11:36:09 +01:00
Clément Renault
c9f478bc45
Fix bbbul merger
2024-11-07 10:53:46 +01:00