Commit Graph

123 Commits

Author SHA1 Message Date
Jakub Jirutka e615fa5ec6 Fix unused_imports warning in milli when japanese is not enabled 2023-05-04 15:46:11 +02:00
Jakub Jirutka 13f1277637 Allow to disable specialized tokenizations (again)
In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai`
feature flags to allow melisearch to be built without huge specialed
tokenizations that took up 90% of the melisearch binary size.
Unfortunately, due to some recent changes, this doesn't work anymore.
The problem lies in excessive use of the `default` feature flag, which
infects the dependency graph.

Instead of adding `default-features = false` here and there, it's easier
and more future-proof to not declare `default` in `milli` and
`meilisearch-types`. I've renamed it to `all-tokenizers`, which also
makes it a bit clearer what it's about.
2023-05-04 15:45:40 +02:00
Loïc Lecrenier 48f5bb1693 Implements the geo-sort ranking rule 2023-04-29 11:02:16 +02:00
Loïc Lecrenier d1fdbb63da Make all search tests pass, fix distinctAttribute bug 2023-04-24 12:12:08 +02:00
ManyTheFish 47f6a3ad3d Take into account that a logger need the search context 2023-04-06 15:02:23 +02:00
ManyTheFish a1148c09c2 remove old matcher 2023-04-06 14:00:21 +02:00
ManyTheFish 9c5f64769a Integrate the new Highlighter in the search 2023-04-06 13:58:56 +02:00
Clément Renault 0d2e7bcc13 Implement the previous way for the exhaustive distinct candidates 2023-04-03 10:08:10 +02:00
Louis Dureuil abb19d368d
Initialize query time ranking rule for query search 2023-03-28 12:40:52 +02:00
Loïc Lecrenier 862714a18b Remove criterion_implementation_strategy param of Search 2023-03-23 09:44:12 +01:00
Loïc Lecrenier d18ebe4f3a Remove more warnings 2023-03-23 09:41:18 +01:00
Loïc Lecrenier 7169d85115 Remove old query_tree code and make clippy happy 2023-03-23 09:39:16 +01:00
Loïc Lecrenier f5f5f03ec0 Remove old criteria code 2023-03-23 09:35:53 +01:00
Loïc Lecrenier 83e5b4ed0d Compute edges of proximity graph lazily 2023-03-21 10:44:40 +01:00
Loïc Lecrenier 2d88089129 Remove unused term matching strategies 2023-03-20 09:41:55 +01:00
ManyTheFish 8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
Many the fish 119e6d8811
Update milli/src/search/mod.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:33:10 +01:00
Tamo 7a38fe624f
throw an error if the top left corner is found below the bottom right corner 2023-02-06 17:50:47 +01:00
ManyTheFish 0bc1a18f52 Use Languages list detected during indexing at search time 2023-02-01 18:57:43 +01:00
ManyTheFish 643d99e0f9 Add expectancy test 2023-02-01 18:39:54 +01:00
Loïc Lecrenier 229405aeb9 Choose implementation strategy of criterion at runtime 2022-12-21 09:29:39 +01:00
ManyTheFish 55724f2412 Introduce an initial candidates set that makes the difference between an exhaustive count and an estimation 2022-12-08 09:41:34 +01:00
Loïc Lecrenier cb8442a119 Further unify facet databases of f64s and strings 2022-10-26 13:47:04 +02:00
Loïc Lecrenier e8a156d682 Reorganise facets database indexing code 2022-10-26 13:46:46 +02:00
Loïc Lecrenier c3f49f766d Prepare refactor of facets database
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
bors[bot] f11a4087da
Merge #665
665: Fixing piles of clippy errors. r=ManyTheFish a=ehiggs

## Related issue
No issue fixed. Simply cleaning up some code for clippy on the march towards a clean build when #659 is merged.

## What does this PR do?
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to name fresh variables.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
2022-10-20 07:19:46 +00:00
ManyTheFish 6f55e7844c Add some code comments 2022-10-17 14:41:57 +02:00
ManyTheFish d71bc1e69f Compute an exact count when using distinct 2022-10-17 14:13:44 +02:00
ManyTheFish a396806343 Add settings to force milli to exhaustively compute the total number of hits 2022-10-17 14:13:44 +02:00
Ewan Higgs beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
ManyTheFish 5391e3842c replace optional_words by term_matching_strategy 2022-08-22 17:47:19 +02:00
ManyTheFish 9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00
Tamo 3b309f654a
Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Kerollmops d2f84a9d9e
Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
Kerollmops 69931e50d2
Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
ManyTheFish 86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
ad hoc ac975cc747
cache context's exact words 2022-05-24 09:43:17 +02:00
bors[bot] ea4bb9402f
Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
ad hoc dda28d7415
exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
ad hoc bbb6728d2f
add distinct attributes to cli 2022-04-13 12:10:35 +02:00
ManyTheFish 5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
ManyTheFish 827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
Irevoire 4f3ce6d9cd
nested fields 2022-04-07 16:58:46 +02:00
ManyTheFish 3bb1e35ada Fix match count 2022-04-05 17:48:45 +02:00
ManyTheFish b3f0f39106 Make some cleaning 2022-04-05 17:41:32 +02:00
ManyTheFish 734d0899d3 Publish Matcher 2022-04-05 17:41:32 +02:00
ManyTheFish d96e72e5dc Create formater with some tests 2022-04-05 17:41:32 +02:00
ad hoc 9fe40df960
add word derivations tests 2022-04-01 11:05:18 +02:00
ad hoc d5ddc6b080
fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00
ad hoc 6ef3bb9d83
fmt 2022-03-31 14:06:23 +02:00