2521 Commits

Author SHA1 Message Date
Clément Renault
f4ab1f168e
Prefer using Rc<str> than String when cloning a lot 2024-09-16 15:41:29 +02:00
ManyTheFish
1a0e962299 Replace hashmap by vectors in wpp 2024-09-16 15:01:20 +02:00
ManyTheFish
f13e076b8a Use hashmap instead of Btree in wpp extractor 2024-09-16 14:40:40 +02:00
ManyTheFish
7ba49b849e Extract and write facet databases 2024-09-16 09:35:16 +02:00
Clément Renault
f7652186e1
WIP geo fields 2024-09-12 18:01:02 +02:00
Clément Renault
b2f4e67c9a
Do not store useless updates 2024-09-12 15:38:31 +02:00
Clément Renault
ff5d3b59f5
Move the document id extraction to the primary key code 2024-09-12 12:01:42 +02:00
ManyTheFish
aa69308e45 Use a bufWriter to build word FSTs 2024-09-12 11:48:00 +02:00
ManyTheFish
eb9a20ff0b Fix fid_word_docids extraction 2024-09-12 11:08:18 +02:00
Clément Renault
3e9198ebaa
Support guessing primary key again 2024-09-11 17:25:40 +02:00
Clément Renault
2a0ad0982f
Fix the document counter 2024-09-11 15:59:36 +02:00
ManyTheFish
2b317c681b Build mergers in parallel 2024-09-11 11:49:26 +02:00
ManyTheFish
39b5990f64 Mutualize tokenization 2024-09-11 10:22:38 +02:00
Clément Renault
8287c2644f
Support CSV again 2024-09-10 21:10:28 +01:00
Clément Renault
c1c44a0b81
Impl serialize on TopLevelMap 2024-09-10 19:32:03 +01:00
Clément Renault
04596f3616
Move the TopLevelMap into a dedicated module 2024-09-10 18:01:17 +01:00
Clément Renault
24cb5839ad
Move the document changes sorting logic to a new trait 2024-09-10 17:37:52 +01:00
ManyTheFish
f69688e8f7 Fix several warnings in extractors and remove unreachable macros 2024-09-09 14:52:50 +02:00
Clément Renault
8fd0afaaaa
Make sure we iterate over the payload documents in order 2024-09-06 08:09:08 +02:00
Clément Renault
72c6a21a30
Use raw JSON to read the payloads 2024-09-05 20:08:23 +02:00
Clément Renault
8412be4a7d
Cleanup CowStr and TopLevelMap struct 2024-09-05 18:32:55 +02:00
Louis Dureuil
10f09c531f
add some commented code to read from json with raw values 2024-09-05 18:22:16 +02:00
ManyTheFish
8fd99b111b Add tracing timers logs 2024-09-05 18:00:22 +02:00
Clément Renault
f6b3d1f9a5
Increase some channel sizes 2024-09-05 15:12:07 +02:00
Clément Renault
73ce67862d
Use the word pair proximity and fid word count docids extractors
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-05 10:56:22 +02:00
Clément Renault
0fc02f7351
Move the facet extraction to dedicated modules 2024-09-05 10:32:27 +02:00
ManyTheFish
34f11e3380 Implement word count and word pair proximity extractors 2024-09-05 10:30:39 +02:00
Clément Renault
27308eaab1
Import the facet extractors 2024-09-04 17:58:15 +02:00
Clément Renault
b33ec9ba3f
Introduce the FieldIdFacetIsNullDocidsExtractor 2024-09-04 17:50:08 +02:00
Clément Renault
9c0a1cd9fd
Introduce the FieldIdFacetExistsDocidsExtractor 2024-09-04 17:48:49 +02:00
Clément Renault
0b061f1e70
Introduce the FieldIdFacetIsEmptyDocidsExtractor 2024-09-04 17:40:24 +02:00
Clément Renault
19d937ab21
Introduce the facet extractors 2024-09-04 17:03:54 +02:00
Clément Renault
1d59c19cd2
Send the WordsFst by using an Mmap 2024-09-04 14:30:09 +02:00
Clément Renault
98e48371c3
Factorize some stuff 2024-09-04 12:17:13 +02:00
Clément Renault
6d74fb0229
Introduce the WordFidWordDocids database 2024-09-04 11:40:55 +02:00
ManyTheFish
1eb75a1040 remove milli/src/update/new/extract/tokenize_document.rs 2024-09-04 11:40:26 +02:00
Clément Renault
3b82d8b5b9
Fix the cache to serialize entries correctly 2024-09-04 10:55:36 +02:00
ManyTheFish
781a186f75 remove milli/src/update/new/extract/extract_word_docids.rs 2024-09-04 10:28:31 +02:00
ManyTheFish
6a399556b5 Implement more searchable extractor 2024-09-04 10:20:18 +02:00
Clément Renault
27b4cab857
Extract and write the documents and words fst in the database 2024-09-04 09:59:19 +02:00
Clément Renault
52d32b4ee9
Move the channel sender in the closure to stop the merger thread 2024-09-03 16:08:33 +02:00
ManyTheFish
da61408e52 Remove unimplemented from document changes 2024-09-03 15:14:16 +02:00
ManyTheFish
fe69385bd7 Fix tokenizer test 2024-09-03 14:24:37 +02:00
Clément Renault
c1557734dc
Use the GlobalFieldsIdsMap everywhere and write it to disk
Co-authored-by: Dureuill <louis@meilisearch.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-03 12:01:01 +02:00
ManyTheFish
c50d3edc4a Integrate first searchable exctrator 2024-09-03 11:02:39 +02:00
Clément Renault
5369bf4a62
Change some lifetimes 2024-09-02 19:51:22 +02:00
Clément Renault
bcb1aa3d22
Find a temporary solution to par into iter on an HashMap
Spoiler: Do not use an HashMap but drain it into a Vec
2024-09-02 19:39:48 +02:00
Clément Renault
9b7858fb90
Expose the new indexer 2024-09-02 15:21:59 +02:00
Clément Renault
ab01679a8f
Remove the useless option from the document changes 2024-09-02 15:21:00 +02:00
Clément Renault
521775f788
I push for Many 2024-09-02 15:10:21 +02:00