ManyTheFish
eb9a20ff0b
Fix fid_word_docids extraction
2024-09-12 11:08:18 +02:00
F. Levi
edcb4c60ba
Change Matcher so that phrases are counted as one instead of word by word
2024-09-12 09:46:08 +03:00
Clément Renault
0d868f36d7
Make sure we always use a BufWriter to write the update files
2024-09-11 18:38:04 +02:00
Clément Renault
e7d9db078f
Use the right key name when convertir from CSV to NDJSON
2024-09-11 18:27:00 +02:00
Clément Renault
3e9198ebaa
Support guessing primary key again
2024-09-11 17:25:40 +02:00
Clément Renault
2a0ad0982f
Fix the document counter
2024-09-11 15:59:36 +02:00
ManyTheFish
2b317c681b
Build mergers in parallel
2024-09-11 11:49:26 +02:00
ManyTheFish
39b5990f64
Mutualize tokenization
2024-09-11 10:22:38 +02:00
Clément Renault
3848adf5a2
Improve error management and simplify JSON read
2024-09-11 10:10:51 +02:00
Clément Renault
b4de06259e
Better CSV support
2024-09-11 10:02:00 +02:00
meili-bors[bot]
02c2b660f8
Merge #4920
...
4920: Change OpenAI default model r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes #4856
See also [public usage](https://meilisearch.notion.site/v1-11-AI-search-changes-0e37727193884a70999f254fa953ce6e#b4685a48c4784262a149ec307ec58671 )
## What does this PR do?
- make the `text-embedding-3-small` the default model for OpenAI instead of `text-embedding-ada-002`. Existing embedders are not impacted
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-11 07:08:39 +00:00
Clément Renault
8287c2644f
Support CSV again
2024-09-10 21:10:28 +01:00
Clément Renault
c1c44a0b81
Impl serialize on TopLevelMap
2024-09-10 19:32:03 +01:00
Clément Renault
04596f3616
Move the TopLevelMap into a dedicated module
2024-09-10 18:01:17 +01:00
Clément Renault
24cb5839ad
Move the document changes sorting logic to a new trait
2024-09-10 17:37:52 +01:00
Clément Renault
8d97b7b28c
Support JSON payloads again (not perfectly though)
2024-09-10 17:09:49 +01:00
ManyTheFish
f69688e8f7
Fix several warnings in extractors and remove unreachable macros
2024-09-09 14:52:50 +02:00
Louis Dureuil
f18e9cb7b3
Change openai default model
2024-09-09 13:09:35 +02:00
Clément Renault
8fd0afaaaa
Make sure we iterate over the payload documents in order
2024-09-06 08:09:08 +02:00
Clément Renault
72c6a21a30
Use raw JSON to read the payloads
2024-09-05 20:08:23 +02:00
Clément Renault
8412be4a7d
Cleanup CowStr and TopLevelMap struct
2024-09-05 18:32:55 +02:00
Louis Dureuil
10f09c531f
add some commented code to read from json with raw values
2024-09-05 18:22:16 +02:00
ManyTheFish
8fd99b111b
Add tracing timers logs
2024-09-05 18:00:22 +02:00
Clément Renault
f6b3d1f9a5
Increase some channel sizes
2024-09-05 15:12:07 +02:00
meili-bors[bot]
db0cf3b2ed
Merge #4912
...
4912: Allow Meilitool to dumplessly, offline upgrade v1.9 -> v1.10 in some conditions r=Kerollmops a=dureuill
- bail early if the DB contains at least 1 REST embedder, providing the list of detected REST embedders, and without modifying the DB
- Might depend on the feature set that meilitool was compiled with and the featureset that the Meilisearch that created the DB was compiled with 💀 . In case of runtime error, try again with a different feature set (passing or not passing `-p meilitool` when building after a `cargo clean`)
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-05 09:11:23 +00:00
Clément Renault
73ce67862d
Use the word pair proximity and fid word count docids extractors
...
Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-05 10:56:22 +02:00
Louis Dureuil
f6abf01d2c
Check REST embedders before touching the DB
2024-09-05 10:49:59 +02:00
Clément Renault
0fc02f7351
Move the facet extraction to dedicated modules
2024-09-05 10:32:27 +02:00
ManyTheFish
34f11e3380
Implement word count and word pair proximity extractors
2024-09-05 10:30:39 +02:00
Louis Dureuil
28da759f11
meilitool: Support dumpless upgrade from v1.9 to v1.10 when there are no REST embedders
2024-09-05 10:08:38 +02:00
Louis Dureuil
ea96d19525
Change versioning in meili
2024-09-05 10:08:06 +02:00
Louis Dureuil
d352b1ee83
Add serde to meilitool
2024-09-05 10:07:33 +02:00
Clément Renault
27308eaab1
Import the facet extractors
2024-09-04 17:58:15 +02:00
Clément Renault
b33ec9ba3f
Introduce the FieldIdFacetIsNullDocidsExtractor
2024-09-04 17:50:08 +02:00
Clément Renault
9c0a1cd9fd
Introduce the FieldIdFacetExistsDocidsExtractor
2024-09-04 17:48:49 +02:00
Clément Renault
0b061f1e70
Introduce the FieldIdFacetIsEmptyDocidsExtractor
2024-09-04 17:40:24 +02:00
Clément Renault
19d937ab21
Introduce the facet extractors
2024-09-04 17:03:54 +02:00
Clément Renault
1d59c19cd2
Send the WordsFst by using an Mmap
2024-09-04 14:30:09 +02:00
Clément Renault
98e48371c3
Factorize some stuff
2024-09-04 12:17:13 +02:00
Clément Renault
6d74fb0229
Introduce the WordFidWordDocids database
2024-09-04 11:40:55 +02:00
ManyTheFish
1eb75a1040
remove milli/src/update/new/extract/tokenize_document.rs
2024-09-04 11:40:26 +02:00
Clément Renault
3b82d8b5b9
Fix the cache to serialize entries correctly
2024-09-04 10:55:36 +02:00
ManyTheFish
781a186f75
remove milli/src/update/new/extract/extract_word_docids.rs
2024-09-04 10:28:31 +02:00
ManyTheFish
6a399556b5
Implement more searchable extractor
2024-09-04 10:20:18 +02:00
Clément Renault
27b4cab857
Extract and write the documents and words fst in the database
2024-09-04 09:59:19 +02:00
dependabot[bot]
3f3cebf5f9
Bump quinn-proto from 0.11.3 to 0.11.8
...
Bumps [quinn-proto](https://github.com/quinn-rs/quinn ) from 0.11.3 to 0.11.8.
- [Release notes](https://github.com/quinn-rs/quinn/releases )
- [Commits](https://github.com/quinn-rs/quinn/compare/quinn-proto-0.11.3...quinn-proto-0.11.8 )
---
updated-dependencies:
- dependency-name: quinn-proto
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
2024-09-03 20:50:30 +00:00
meili-bors[bot]
b278815617
Merge #4908
...
4908: Bring back changes from release v1.10.1 to main r=dureuill a=irevoire
# Pull Request
Following the [latest release](https://github.com/meilisearch/meilisearch/releases/tag/v1.10.1 ), this PR brings back the changes to main.
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: irevoire <irevoire@users.noreply.github.com>
2024-09-03 14:28:12 +00:00
Clément Renault
52d32b4ee9
Move the channel sender in the closure to stop the merger thread
2024-09-03 16:08:33 +02:00
ManyTheFish
da61408e52
Remove unimplemented from document changes
2024-09-03 15:14:16 +02:00
ManyTheFish
fe69385bd7
Fix tokenizer test
2024-09-03 14:24:37 +02:00