Commit Graph

785 Commits

Author SHA1 Message Date
Many the fish
e261ef64d7
Update milli/src/search/matches/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:51 +02:00
Many the fish
1da4ab5918
Update milli/src/search/matches/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:03 +02:00
Kerollmops
448114cc1c
Fix the benchmarks with the new indexation API 2022-07-12 15:22:09 +02:00
Kerollmops
25e768f31c
Fix another issue with the nested primary key selector 2022-07-12 15:14:07 +02:00
Kerollmops
192793ee38
Add some tests to check for the nested documents ids 2022-07-12 15:14:07 +02:00
Kerollmops
a892a4a79c
Introduce a function to extend from a JSON array of objects 2022-07-12 15:14:06 +02:00
Kerollmops
dc61105554
Fix the nested document id fetching function 2022-07-12 15:14:06 +02:00
Kerollmops
2eec290424
Check the validity of the latitute and longitude numbers 2022-07-12 15:14:06 +02:00
Kerollmops
5d149d631f
Remove tests for a function that no more exists 2022-07-12 15:14:06 +02:00
Kerollmops
0bbcc7b180
Expose the DocumentId struct to be sure to inject the generated ids 2022-07-12 15:14:06 +02:00
Kerollmops
d1a4da9812
Generate a real UUIDv4 when ids are auto-generated 2022-07-12 15:14:06 +02:00
Kerollmops
c8ebf0de47
Rename the validate function as an enriching function 2022-07-12 15:14:06 +02:00
Kerollmops
905af2a2e9
Use the primary key and external id in the transform 2022-07-12 15:14:05 +02:00
Kerollmops
742543091e
Constify the default primary key name 2022-07-12 14:55:52 +02:00
Kerollmops
5f1bfb73ee
Extract the primary key name and make it accessible 2022-07-12 14:55:52 +02:00
Kerollmops
6a0a0ae94f
Make the Transform read from an EnrichedDocumentsBatchReader 2022-07-12 14:55:52 +02:00
Kerollmops
dc3f092d07
Do not leak an internal grenad Error 2022-07-12 14:55:52 +02:00
Kerollmops
8ebf5eed0d
Make the nested primary key work 2022-07-12 14:55:52 +02:00
Kerollmops
19eb3b4708
Make sur that we do not accept floats as documents ids 2022-07-12 14:55:52 +02:00
Kerollmops
2ceeb51c37
Support the auto-generated ids when validating documents 2022-07-12 14:55:51 +02:00
Kerollmops
399eec5c01
Fix the indexation tests 2022-07-12 14:55:51 +02:00
Kerollmops
fcfc4caf8c
Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
Kerollmops
0146175fe6
Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
bdc4263883
Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
e8297ad27e
Fix the tests for the new DocumentsBatchBuilder/Reader 2022-07-12 14:52:56 +02:00
Kerollmops
419ce3966c
Rework the DocumentsBatchBuilder/Reader to use grenad 2022-07-12 14:52:55 +02:00
Kerollmops
048e174efb
Do not allocate when parsing CSV headers 2022-07-12 14:52:55 +02:00
ManyTheFish
5d79617a56 Chores: Enhance smart-crop code comments 2022-07-07 16:28:09 +02:00
bors[bot]
ebddfdb9a3
Merge #578
578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops

Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-05 14:56:08 +00:00
Kerollmops
1bfdcfc84f
Bump uuid to 1.1.2 2022-07-05 16:23:36 +02:00
Tamo
250be9fe6c
put the threshold back to 10k 2022-07-05 15:57:44 +02:00
Tamo
b61efd09fc
Makes the internal soft deleted error a UserError 2022-07-05 15:34:45 +02:00
Tamo
eaf28b0628
Apply review suggestions
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-05 15:30:33 +02:00
Tamo
3b309f654a
Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Dmytro Gordon
3ff03a3f5f Fix not equal filter when field contains both number and strings 2022-06-27 15:55:17 +03:00
Kerollmops
238692a8e7
Introduce the copy_to_path method on the Index 2022-06-22 16:49:47 +02:00
bors[bot]
290a40b7a5
Merge #564
564: Rename the limitedTo parameter into maxTotalHits r=curquiza a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2542, it renames the `limitedTo` parameter into `maxTotalHits`.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 13:48:33 +00:00
Kerollmops
d7c248042b
Rename the limitedTo parameter into maxTotalHits 2022-06-22 12:00:48 +02:00
Kerollmops
d2f84a9d9e
Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
ManyTheFish
a0ab90a4d7 Avoid having an ending separator before crop marker 2022-06-16 18:23:57 +02:00
ManyTheFish
177154828c Extends deletion tests 2022-06-13 17:34:16 +02:00
ManyTheFish
0d1d354052 Ensure that Index methods are not bypassed by Meilisearch 2022-06-13 17:34:11 +02:00
bors[bot]
f1d848bb9a
Merge #552
552: Fix escaped quotes in filter r=Kerollmops a=irevoire

Will fix https://github.com/meilisearch/meilisearch/issues/2380

The issue was that in the evaluation of the filter, I was using the deref implementation instead of calling the `value` method of my token.

To avoid the problem happening again, I removed the deref implementation; now, you need to either call the `lexeme` or the `value` methods but can't rely on a « default » implementation to get a string out of a token.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-06-09 14:56:44 +00:00
Tamo
90afde435b
fix escaped quotes in filter 2022-06-09 16:03:49 +02:00
Kerollmops
445d5474cc
Add the pagination_limited_to setting to the database 2022-06-08 18:14:27 +02:00
Kerollmops
69931e50d2
Add the max_values_by_facet setting to the database 2022-06-08 17:54:56 +02:00
Kerollmops
52a494bd3b
Add the new pagination.limited_to and faceting.max_values_per_facet settings 2022-06-08 17:15:36 +02:00
Kerollmops
2a505503b3
Change the number of facet values returned by default to 100 2022-06-08 15:58:57 +02:00
Kerollmops
bae4007447
Remove the hard limit on the number of facet values returned 2022-06-08 15:58:57 +02:00
Tamo
d0aaa7ff00
Fix wrong internal ids assignments 2022-06-07 15:49:33 +02:00
ad hoc
31776fdc3f
add failing test 2022-06-07 15:49:33 +02:00
ManyTheFish
d212dc6b8b Remove useless newline 2022-06-02 18:22:56 +02:00
ManyTheFish
7aabe42ae0 Refactor matching words 2022-06-02 17:59:04 +02:00
ManyTheFish
86ac8568e6 Use Charabia in milli 2022-06-02 16:59:11 +02:00
bors[bot]
74d1914a64
Merge #535
535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops

This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349.

~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~

I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help!

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-01 14:30:50 +00:00
bors[bot]
582930dbbb
Merge #538
538: speedup exact words r=Kerollmops a=MarinPostma

This PR make `exact_words` return an `Option` instead of an empty set, since set creation is costly, as noticed by `@kerollmops.`

I was not convinces that this was the cause for all of the performance drop we measured, and then realized that methods that initialized it were called recursively which caused initialization times to add up. While the first fix solves the issue when not using exact words, using exact word remained way more expensive that it should be. To address this issue, the exact words are cached into the `Context`, so they are only initialized once.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-05-30 08:20:34 +00:00
ad hoc
25fc576696
review changes 2022-05-24 14:15:33 +02:00
ad hoc
69dc4de80f
change &Option<Set> to Option<&Set> 2022-05-24 12:14:55 +02:00
ad hoc
ac975cc747
cache context's exact words 2022-05-24 09:43:17 +02:00
ad hoc
8993fec8a3
return optional exact words 2022-05-24 09:15:49 +02:00
Matthias Wright
754f48a4fb Improves ranking rules error message 2022-05-20 21:25:43 +02:00
Kerollmops
cd7c6e19ed
Reintroduce the max values by facet limit 2022-05-18 15:57:57 +02:00
ManyTheFish
137434a1c8 Add some implementation on MatchBounds 2022-05-17 15:57:09 +02:00
bors[bot]
08c6d50cd1
Merge #531
531: fix the mixed dataset geosearch indexing bug r=Kerollmops a=irevoire

port #529 to main

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-16 16:06:36 +00:00
bors[bot]
cf3e574cb4
Merge #530
530: fix the searchable fields bug when a field is nested r=Kerollmops a=irevoire

port #528 to main

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-16 15:52:30 +00:00
Tamo
0af399a6d7
fix the mixed dataset geosearch indexing bug 2022-05-16 17:37:45 +02:00
Tamo
f586028f9a
fix the searchable fields bug when a field is nested
Update milli/src/index.rs

Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-05-16 17:24:36 +02:00
bors[bot]
e1e85267fd
Merge #526
526: remove useless comment r=irevoire a=MarinPostma



Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-05-16 10:01:43 +00:00
bors[bot]
51809eb260
Merge #525
525: Simplify the error creation with thiserror r=irevoire a=irevoire

I introduced [`thiserror`](https://docs.rs/thiserror/latest/thiserror/) to implements all the `Display` trait and most of the `impl From<xxx> for yyy` in way less lines.
And then I introduced a cute macro to implements the `impl<X, Y, Z> From<X> for Z where Y: From<X>, Z: From<X>` more easily.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-04 15:47:32 +00:00
Tamo
484a9ddb27
Simplify the error creation with thiserror and a smol friendly macro 2022-05-04 17:24:00 +02:00
bors[bot]
65e6aa0de2
Merge #523
523: Improve geosearch error messages r=irevoire a=irevoire

Improve the geosearch error messages (#488).
And try to parse the string as specified in https://github.com/meilisearch/meilisearch/issues/2354

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-04 13:36:11 +00:00
Tamo
c55368ddd4
apply code suggestion
Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-05-04 14:11:03 +02:00
ad hoc
5ad5d56f7e
remove useless comment 2022-05-04 10:43:54 +02:00
bors[bot]
0c2c8af44e
Merge #520
520: fix mistake in Settings initialization r=irevoire a=MarinPostma

fix settings not being correctly initialized and add a test to make sure that they are in the future.

fix https://github.com/meilisearch/meilisearch/issues/2358


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-05-03 15:32:18 +00:00
Kerollmops
211c8763b9
Make sure that we do not generate too long keys 2022-05-03 10:03:15 +02:00
Kerollmops
7e47031bdc
Add a test for long keys in LMDB 2022-05-03 10:03:13 +02:00
Tamo
3cb1f6d0a1
improve geosearch error messages 2022-05-02 19:20:47 +02:00
ad hoc
1ee3d6ae33
fix mistake in Settings initialization 2022-04-29 16:24:25 +02:00
bors[bot]
9db86aac51
Merge #518
518: Return facets even when there is no value associated to it r=Kerollmops a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2352 and should fix the issue when Meilisearch is up-to-date with this PR.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-04-28 09:04:36 +00:00
Kerollmops
7d1c2d97bf
Return facets even when there is no values associated to it 2022-04-26 17:59:53 +02:00
bors[bot]
d388ea0f9d
Merge #506
506: fix cargo warnings r=Kerollmops a=MarinPostma

fix cargo warnings


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-26 15:45:20 +00:00
ad hoc
5c29258e8e
fix cargo warnings 2022-04-26 17:33:11 +02:00
Tamo
f19d2dc548
Only flatten the required fields
apply review comments

Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-04-26 12:33:46 +02:00
bors[bot]
8010eca9c7
Merge #505
505: normalize exact words r=curquiza a=MarinPostma

Normalize the exact words, as specified in the specification.


Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-25 09:35:32 +00:00
ad hoc
2e0089d5ff
normalize exact words 2022-04-21 15:38:40 +02:00
ad hoc
3a2451fcba
add test normalize exact words 2022-04-21 13:52:09 +02:00
Clément Renault
eb5830aa40
Add a test to make sure that long words are handled 2022-04-21 13:45:28 +02:00
ad hoc
8b14090927
fix min-word-len-for-typo not reset properly 2022-04-19 15:20:16 +02:00
bors[bot]
ea4bb9402f
Merge #483
483: Enhance matching words r=Kerollmops a=ManyTheFish

# Summary

Enhance milli word-matcher making it handle match computing and cropping.

# Implementation

## Computing best matches for cropping

Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.

Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.

## Cropping around the best matches interval

Before we were cropping around the interval without checking the context.

Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.

> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and  not like:
`Natalie risk her future. Split The World is a book …`


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
ManyTheFish
f1115e274f Use Copy impl of FormatOption instead of clonning 2022-04-19 10:35:50 +02:00
Tamo
00f78d6b5a
Apply code suggestions
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-04-14 11:14:08 +02:00
Tamo
399fba16bb
only flatten an object if it's nested 2022-04-14 11:14:08 +02:00
Tamo
ee64f4a936
Use smartstring to store the external id in our hashmap
We need to store all the external id (primary key) in a hashmap
associated to their internal id during.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
2022-04-13 21:22:07 +02:00
ad hoc
dda28d7415
exclude excluded canditates from search result candidates 2022-04-13 12:10:35 +02:00
ad hoc
bbb6728d2f
add distinct attributes to cli 2022-04-13 12:10:35 +02:00
ManyTheFish
5809d3ae0d Add first benchmarks on formatting 2022-04-12 16:31:58 +02:00
ManyTheFish
827cedcd15 Add format option structure 2022-04-12 13:42:14 +02:00
ManyTheFish
011f8210ed Make compute_matches more rust idiomatic 2022-04-12 10:19:02 +02:00
ManyTheFish
a16de5de84 Symplify format and remove intermediate function 2022-04-08 11:20:41 +02:00
ManyTheFish
a769e09dfa Make token_crop_bounds more rust idiomatic 2022-04-07 20:15:14 +02:00