Commit Graph

10097 Commits

Author SHA1 Message Date
ManyTheFish
bb7a503e5d Compute prefix databases
We are now computing the prefix FST and a prefix delta in the Merger thread,
after all the databases are written, the main thread will recompute the prefix databases based on the prefix delta without needing any grenad temporary file anymore
2024-10-01 09:57:06 +02:00
Louis Dureuil
64589278ac
Appease *some* of clippy warnings 2024-09-30 16:08:29 +02:00
ManyTheFish
8df6daf308 Remove fid_wordcount_docids.rs 2024-09-30 11:52:31 +02:00
ManyTheFish
5b552caf42 Fix position in insertions 2024-09-30 11:46:32 +02:00
ManyTheFish
2b51a63418 Remove dead code 2024-09-30 11:42:36 +02:00
Louis Dureuil
3d8024fb2b
write the weighted fields ids map 2024-09-30 11:35:03 +02:00
Louis Dureuil
4b0da0ff24
Fix inversion of field_id and position 2024-09-30 11:34:50 +02:00
Louis Dureuil
079f2b5de0
Format error messages consistently 2024-09-30 11:34:31 +02:00
ManyTheFish
960060ebdf Fix fst builder when their is no previous FST 2024-09-25 16:53:00 +02:00
Clément Renault
3d244451df
Reduce the lru key size from 8 to 12 bytes 2024-09-25 16:14:13 +02:00
Clément Renault
5f53935c8a
Fix a bug in the Lru 2024-09-25 16:09:34 +02:00
Clément Renault
29a7623c3f
Fxi some logs 2024-09-25 15:57:50 +02:00
Clément Renault
e97041f7d0
Replace the Lru free list by a simple increment 2024-09-25 15:55:52 +02:00
Clément Renault
52d7f3ed1c
Reduce the lru key size from 20 to 8 bytes 2024-09-25 15:37:13 +02:00
Clément Renault
86d5e6d9ff
Use the new Lru 2024-09-25 14:54:56 +02:00
Clément Renault
759b9b1546
Introduce a new custom Lru 2024-09-25 14:49:12 +02:00
ManyTheFish
3f7a500f3b Build prefix fst 2024-09-25 14:36:06 +02:00
ManyTheFish
974272f2e9 Merge branch 'main' into indexer-edition-2024 2024-09-25 07:41:16 +02:00
Clément Renault
7ad037841f
Move the tracing info to eprintln 2024-09-24 18:21:58 +02:00
Clément Renault
e0c7067355
Expose an IndexedParallelIterator to the index function 2024-09-24 17:24:59 +02:00
ManyTheFish
6e87332410 Change the way the FST is built 2024-09-24 16:28:31 +02:00
Clément Renault
2d1caf27df
Use eprintln to log 2024-09-24 15:59:50 +02:00
Clément Renault
92678383d6
Update charabia 2024-09-24 15:37:56 +02:00
Clément Renault
7f148c127c
Measure the SmallVec efficacity 2024-09-24 15:32:15 +02:00
Clément Renault
4ce5d3d66d
Do not check before pushing in bitmaps 2024-09-24 09:43:16 +02:00
Clément Renault
ff931edb55
Update roaring to inline max calls 2024-09-23 16:53:42 +02:00
Clément Renault
42b093687d
Introduce the new PushOptimizedBitmap 2024-09-23 16:38:21 +02:00
Clément Renault
835c5f98f9
Remove the debug symbols 2024-09-23 15:49:24 +02:00
Clément Renault
f00664247d
Add more stats about the channel message sent 2024-09-23 15:13:52 +02:00
Clément Renault
3c63d4a1e5
Fix charabia Zho 2024-09-23 14:50:17 +02:00
Clément Renault
4551abf6d4
Update roaring to the latest version 2024-09-23 14:35:33 +02:00
Clément Renault
193d7f5d34
Add the mutualized charabia normalization 2024-09-23 14:24:25 +02:00
Clément Renault
013acb3d93
Measure merger writer channel contention 2024-09-23 11:07:59 +02:00
meili-bors[bot]
7f20c13f3f
Merge #4943
4943: Correct broken links in README r=curquiza a=iornstein

# Pull Request

## Related issue
Fixes #4942

## What does this PR do?
- Corrects some broken links in the README. My suspicion is that some of these documentation articles were moved around without someone updating links in the README.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? _(well the contributing guidelines led me to create an issue first)_
- [x] Have you read the contributing guidelines? _yes_
- [x] Have you made sure that the title is accurate and descriptive of the changes? _yes_

Thank you so much for contributing to Meilisearch!


Co-authored-by: Ian Ornstein <ian.ornstein@gmail.com>
2024-09-19 19:22:04 +00:00
meili-bors[bot]
462a2329f1
Merge #4941
4941: Implement the binary quantization in meilisearch r=irevoire a=irevoire

# Pull Request

## Related issue
Fixes https://github.com/meilisearch/meilisearch/issues/4873

## What does this PR do?
- Add a settings for the binary quantization
- Once enabled, the bq cannot be disabled

TODO:
- [ ] Missing a bunch of tests

Co-authored-by: Tamo <tamo@meilisearch.com>
2024-09-19 15:50:24 +00:00
Tamo
f6483cf15d apply review comment 2024-09-19 16:47:06 +02:00
meili-bors[bot]
bd34ed01d9
Merge #4945
4945: Add swedish in default pipelines r=dureuill a=ManyTheFish

# Summary
## Fix Swedish support

In Swedish the characters `å`/`ä`/`ö` are completely different than `a` or `o`  and should not be normalized as the same character.
because the Swedish specialized pipeline was not activated by default, these characters were normalized even with the settings:
```json
{
  "localizedAttributes": [ { "locales": ["swe"], "attributePatterns": ["*"] } ]
}
```

## Update Charabia adding German support

German segmentation will now be activated using the setting:
```json
{
  "localizedAttributes": [ { "locales": ["deu"], "attributePatterns": ["*"] } ]
}
```

# TODO

- [x] Activate Swedish Pipeline
- [x] Add a test to avoid future regressions
- [x] Update Charabia


Co-authored-by: ManyTheFish <many@meilisearch.com>
2024-09-19 14:42:03 +00:00
Tamo
74199f328d Make clippy happy 2024-09-19 16:27:34 +02:00
Tamo
1113c42de0 fix broken comments 2024-09-19 16:18:36 +02:00
ManyTheFish
465afe01b2 Add test for German 2024-09-19 16:09:01 +02:00
ManyTheFish
7d6768e4c4 Add german tokenization pipeline 2024-09-19 16:09:01 +02:00
ManyTheFish
f77661ec44 Update Charabia v0.9.1 2024-09-19 16:08:59 +02:00
Tamo
b8fd85a46d Get rids of useless collect before an iteration on the readers 2024-09-19 15:57:38 +02:00
Tamo
fd43c6c404 Improve the error message explaining you can't un-bq an embedder 2024-09-19 15:51:29 +02:00
Tamo
2564ec1496
Update milli/src/index.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:44 +02:00
Tamo
b6b73fe41c
Update milli/src/update/settings.rs
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2024-09-19 15:41:14 +02:00
Tamo
6dde41cc46 stop using a local version of arroy and instead point to the git repo with the rev 2024-09-19 15:25:38 +02:00
Tamo
163f8023a1 remove debug println 2024-09-19 12:13:25 +02:00
Tamo
2b120b89e4 update the test now that the embedder must be specified 2024-09-19 12:08:59 +02:00
Tamo
84f842233d snapshots the embedder settings in the dump import with vector test 2024-09-19 12:00:58 +02:00