Commit Graph

872 Commits

Author SHA1 Message Date
Loïc Lecrenier
36bd66281d Add method to create a new Index with specific creation dates 2022-10-25 14:37:56 +02:00
Loïc Lecrenier
9a569d73d1 Minor code style change 2022-10-24 15:30:43 +02:00
Loïc Lecrenier
be302fd250 Remove outdated workaround for duplicate words in phrase search 2022-10-24 15:27:06 +02:00
Loïc Lecrenier
d76d0cb1bf Merge branch 'main' into word-pair-proximity-docids-refactor 2022-10-24 15:23:00 +02:00
Loïc Lecrenier
a983129613 Apply suggestions from code review 2022-10-20 09:49:37 +02:00
bors[bot]
f11a4087da
Merge #665
665: Fixing piles of clippy errors. r=ManyTheFish a=ehiggs

## Related issue
No issue fixed. Simply cleaning up some code for clippy on the march towards a clean build when #659 is merged.

## What does this PR do?
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to name fresh variables.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
2022-10-20 07:19:46 +00:00
Loïc Lecrenier
176ffd23f5 Fix compile error after rebasing wppd-refactor 2022-10-18 10:40:26 +02:00
Loïc Lecrenier
ab2f6f3aa4 Refine some details in word_prefix_pair_proximity indexing code 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
e6e76fbefe Improve performance of resolve_phrase at the cost of some relevancy 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
178d00f93a Cargo fmt 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
830a7c0c7a Use resolve_phrase function for exactness criteria as well 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
18d578dfc4 Adjust some algorithms using DBs of word pair proximities 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
072b576514 Fix proximity value in keys of prefix_word_pair_proximity_docids 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
6c3a5d69e1 Update snapshots 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
a7de4f5b85 Don't add swapped word pairs to the word_pair_proximity_docids db 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
264a04922d Add prefix_word_pair_proximity database
Similar to the word_prefix_pair_proximity one but instead the keys are:
(proximity, prefix, word2)
2022-10-18 10:37:34 +02:00
Loïc Lecrenier
1dbbd8694f Rename StrStrU8Codec to U8StrStrCodec and reorder its fields 2022-10-18 10:37:34 +02:00
Loïc Lecrenier
bdeb47305e Change encoding of word_pair_proximity DB to (proximity, word1, word2)
Same for word_prefix_pair_proximity
2022-10-18 10:37:34 +02:00
Many the fish
81919a35a2
Update milli/src/search/criteria/initial.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:20 +02:00
Many the fish
516e838eb4
Update milli/src/search/criteria/initial.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-10-17 18:23:15 +02:00
ManyTheFish
6f55e7844c Add some code comments 2022-10-17 14:41:57 +02:00
ManyTheFish
cf203b7fde Take filter in account when computing the pages candidates 2022-10-17 14:13:44 +02:00
ManyTheFish
d71bc1e69f Compute an exact count when using distinct 2022-10-17 14:13:44 +02:00
ManyTheFish
a396806343 Add settings to force milli to exhaustively compute the total number of hits 2022-10-17 14:13:44 +02:00
Ewan Higgs
beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
bors[bot]
f30979d021
Merge #662
662: Enhance word splitting strategy r=ManyTheFish a=akki1306

# Pull Request

## Related issue
Fixes #648 

## What does this PR do?
- [split_best_frequency](55d889522b/milli/src/search/query_tree.rs (L282-L301)) to use frequency of word pairs near together with proximity value of 1 instead of considering the frequency of individual words. Word pairs having max frequency are considered.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Co-authored-by: Akshay Kulkarni <akshayk.gj@gmail.com>
2022-10-13 08:14:22 +00:00
Akshay Kulkarni
85f3028317
remove underscore and introduce back word_documents_count 2022-10-13 13:21:59 +05:30
Akshay Kulkarni
8195fc6141
revert removal of word_documents_count method 2022-10-13 13:14:27 +05:30
Akshay Kulkarni
32f825d442
move default implementation of word_pair_frequency to TestContext 2022-10-13 12:57:50 +05:30
Akshay Kulkarni
ff8b2d4422
formatting 2022-10-13 12:44:08 +05:30
Akshay Kulkarni
6cb8b46900
use word_pair_frequency and remove word_documents_count 2022-10-13 12:43:11 +05:30
Akshay Kulkarni
8c9245149e
format file 2022-10-12 15:27:56 +05:30
Akshay Kulkarni
63e79a9039
update comment 2022-10-12 13:36:48 +05:30
Akshay Kulkarni
7f9680f0a0
Enhance word splitting strategy 2022-10-12 13:18:23 +05:30
Loïc Lecrenier
6fbf5dac68 Simplify documents! macro to reduce compile times 2022-10-12 09:22:05 +02:00
msvaljek
762e320c35
Add proximity calculation for the same word 2022-10-07 12:59:12 +02:00
vishalsodani
00c02d00f3 Add missing logging timer to extractors 2022-09-30 22:17:06 +05:30
bors[bot]
15d478cf4d
Merge #635
635: Use an unstable algorithm for `grenad::Sorter` when possible r=Kerollmops a=loiclec

# Pull Request
## What does this PR do?

Use an unstable algorithm to sort the internal vector used by `grenad::Sorter` whenever possible to speed up indexing.

In practice, every time the merge function creates a `RoaringBitmap`, we use an unstable sort. For every other merge function, such as `keep_first`, `keep_last`, etc., a stable sort is used.


Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-09-14 12:00:52 +00:00
Loïc Lecrenier
3794962330 Use an unstable algorithm for grenad::Sorter when possible 2022-09-13 14:49:53 +02:00
Kerollmops
d4d7c9d577
We avoid skipping errors in the indexing pipeline 2022-09-13 14:03:00 +02:00
Kerollmops
fe3973a51c
Make sure that long words are correctly skipped 2022-09-07 15:03:32 +02:00
Kerollmops
c83c3cd796
Add a test to make sure that long words are correctly skipped 2022-09-07 14:12:36 +02:00
ManyTheFish
bf750e45a1 Fix word removal issue 2022-09-01 12:10:47 +02:00
ManyTheFish
a38608fe59 Add test mixing phrased and no-phrased words 2022-09-01 12:02:10 +02:00
Clément Renault
7f92116b51
Accept again integers as document ids 2022-08-31 10:56:39 +02:00
Irevoire
f6024b3269
Remove the artifacts of the past 2022-08-23 16:10:38 +02:00
ManyTheFish
5391e3842c replace optional_words by term_matching_strategy 2022-08-22 17:47:19 +02:00
ManyTheFish
993aa1321c Fix query tree building 2022-08-18 17:56:06 +02:00
ManyTheFish
bff9653050 Fix remove count 2022-08-18 17:36:30 +02:00
ManyTheFish
9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00