3530: Fix highlighter bug r=Kerollmops a=ManyTheFish

# Pull Request

There was a highlighting issue on CJK's character, we were highlighting too many characters and these additional characters were duplicated after the highlight tag.

## Related issue
Fixes #3517 
Fixes #3526 

## What does this PR do?
- add a test showcasing the bug
- fix the bug by activating the char_map creation of the tokenizer during the highlighting process


Co-authored-by: ManyTheFish <many@meilisearch.com>
This commit is contained in:
bors[bot] 2023-02-23 10:59:43 +00:00 committed by GitHub
commit b985b96e4e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 37 additions and 2 deletions

View file

@ -375,9 +375,10 @@ pub fn perform_search(
&displayed_ids,
);
let tokenizer = TokenizerBuilder::default().build();
let mut tokenizer_buidler = TokenizerBuilder::default();
tokenizer_buidler.create_char_map(true);
let mut formatter_builder = MatcherBuilder::new(matching_words, tokenizer);
let mut formatter_builder = MatcherBuilder::new(matching_words, tokenizer_buidler.build());
formatter_builder.crop_marker(query.crop_marker);
formatter_builder.highlight_prefix(query.highlight_pre_tag);
formatter_builder.highlight_suffix(query.highlight_post_tag);