Commit Graph

124 Commits

Author SHA1 Message Date
Louis Dureuil
da833eb095
Expose the scores and detailed scores in the API 2023-06-22 12:39:14 +02:00
Jakub Jirutka
e615fa5ec6 Fix unused_imports warning in milli when japanese is not enabled 2023-05-04 15:46:11 +02:00
Jakub Jirutka
13f1277637 Allow to disable specialized tokenizations (again)
In PR #2773, I added the `chinese`, `hebrew`, `japanese` and `thai`
feature flags to allow melisearch to be built without huge specialed
tokenizations that took up 90% of the melisearch binary size.
Unfortunately, due to some recent changes, this doesn't work anymore.
The problem lies in excessive use of the `default` feature flag, which
infects the dependency graph.

Instead of adding `default-features = false` here and there, it's easier
and more future-proof to not declare `default` in `milli` and
`meilisearch-types`. I've renamed it to `all-tokenizers`, which also
makes it a bit clearer what it's about.
2023-05-04 15:45:40 +02:00
Loïc Lecrenier
48f5bb1693 Implements the geo-sort ranking rule 2023-04-29 11:02:16 +02:00
Loïc Lecrenier
d1fdbb63da Make all search tests pass, fix distinctAttribute bug 2023-04-24 12:12:08 +02:00
ManyTheFish
47f6a3ad3d Take into account that a logger need the search context 2023-04-06 15:02:23 +02:00
ManyTheFish
a1148c09c2 remove old matcher 2023-04-06 14:00:21 +02:00
ManyTheFish
9c5f64769a Integrate the new Highlighter in the search 2023-04-06 13:58:56 +02:00
Clément Renault
0d2e7bcc13 Implement the previous way for the exhaustive distinct candidates 2023-04-03 10:08:10 +02:00
Louis Dureuil
abb19d368d
Initialize query time ranking rule for query search 2023-03-28 12:40:52 +02:00
Loïc Lecrenier
862714a18b Remove criterion_implementation_strategy param of Search 2023-03-23 09:44:12 +01:00
Loïc Lecrenier
d18ebe4f3a Remove more warnings 2023-03-23 09:41:18 +01:00
Loïc Lecrenier
7169d85115 Remove old query_tree code and make clippy happy 2023-03-23 09:39:16 +01:00
Loïc Lecrenier
f5f5f03ec0 Remove old criteria code 2023-03-23 09:35:53 +01:00
Loïc Lecrenier
83e5b4ed0d Compute edges of proximity graph lazily 2023-03-21 10:44:40 +01:00
Loïc Lecrenier
2d88089129 Remove unused term matching strategies 2023-03-20 09:41:55 +01:00
ManyTheFish
8aa808d51b Merge branch 'main' into enhance-language-detection 2023-02-20 18:14:34 +01:00
Many the fish
119e6d8811
Update milli/src/search/mod.rs
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:33:10 +01:00
Tamo
7a38fe624f
throw an error if the top left corner is found below the bottom right corner 2023-02-06 17:50:47 +01:00
ManyTheFish
0bc1a18f52 Use Languages list detected during indexing at search time 2023-02-01 18:57:43 +01:00
ManyTheFish
643d99e0f9 Add expectancy test 2023-02-01 18:39:54 +01:00
Loïc Lecrenier
229405aeb9 Choose implementation strategy of criterion at runtime 2022-12-21 09:29:39 +01:00
ManyTheFish
55724f2412 Introduce an initial candidates set that makes the difference between an exhaustive count and an estimation 2022-12-08 09:41:34 +01:00
Loïc Lecrenier
cb8442a119 Further unify facet databases of f64s and strings 2022-10-26 13:47:04 +02:00
Loïc Lecrenier
e8a156d682 Reorganise facets database indexing code 2022-10-26 13:46:46 +02:00
Loïc Lecrenier
c3f49f766d Prepare refactor of facets database
Prepare refactor of facets database
2022-10-26 13:46:14 +02:00
bors[bot]
f11a4087da
Merge #665
665: Fixing piles of clippy errors. r=ManyTheFish a=ehiggs

## Related issue
No issue fixed. Simply cleaning up some code for clippy on the march towards a clean build when #659 is merged.

## What does this PR do?
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to name fresh variables.

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Co-authored-by: Ewan Higgs <ewan.higgs@gmail.com>
2022-10-20 07:19:46 +00:00
ManyTheFish
6f55e7844c Add some code comments 2022-10-17 14:41:57 +02:00
ManyTheFish
d71bc1e69f Compute an exact count when using distinct 2022-10-17 14:13:44 +02:00
ManyTheFish
a396806343 Add settings to force milli to exhaustively compute the total number of hits 2022-10-17 14:13:44 +02:00
Ewan Higgs
beb987d3d1 Fixing piles of clippy errors.
Most of these are calling clone when the struct supports Copy.

Many are using & and &mut on `self` when the function they are called
from already has an immutable or mutable borrow so this isn't needed.

I tried to stay away from actual changes or places where I'd have to
name fresh variables.
2022-10-13 22:02:54 +02:00
ManyTheFish
5391e3842c replace optional_words by term_matching_strategy 2022-08-22 17:47:19 +02:00
ManyTheFish
9640976c79 Rename TermMatchingPolicies 2022-08-18 17:36:08 +02:00
Tamo
3b309f654a
Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Kerollmops
d2f84a9d9e
Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
Kerollmops
69931e50d2
Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
ManyTheFish
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
ad hoc
ac975cc747
cache context's exact words 2022-05-24 09:43:17 +02:00
bors[bot]
ea4bb9402f
Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
ad hoc
dda28d7415
exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
ad hoc
bbb6728d2f
add distinct attributes to cli 2022-04-13 12:10:35 +02:00
ManyTheFish
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
ManyTheFish
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
Irevoire
4f3ce6d9cd
nested fields 2022-04-07 16:58:46 +02:00
ManyTheFish
3bb1e35ada Fix match count 2022-04-05 17:48:45 +02:00
ManyTheFish
b3f0f39106 Make some cleaning 2022-04-05 17:41:32 +02:00
ManyTheFish
734d0899d3 Publish Matcher 2022-04-05 17:41:32 +02:00
ManyTheFish
d96e72e5dc Create formater with some tests 2022-04-05 17:41:32 +02:00
ad hoc
9fe40df960
add word derivations tests 2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080
fix 2 typos word derivation bug 2022-04-01 10:51:22 +02:00