Loïc Lecrenier
80b962b4f4
Run cargo fmt
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
c17d616250
Refactor index_documents_check_exists_database tests
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
30bd4db0fc
Simplify indexing task for facet_exists_docids database
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
392472f4bb
Apply suggestions from code review
...
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
0388b2d463
Run cargo fmt
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
dc64170a69
Improve syntax of EXISTS filter, allow “value NOT EXISTS”
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
72452f0cb2
Implements the EXIST filter operator
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
453d593ce8
Add a database containing the docids where each field exists
2022-07-19 10:07:33 +02:00
Many the fish
2d79720f5d
Update milli/src/search/matches/mod.rs
2022-07-18 17:48:04 +02:00
Many the fish
8ddb4e750b
Update milli/src/search/matches/mod.rs
2022-07-18 17:47:39 +02:00
Many the fish
a277daa1f2
Update milli/src/search/matches/mod.rs
2022-07-18 17:47:13 +02:00
Many the fish
fb794c6b5e
Update milli/src/search/matches/mod.rs
2022-07-18 17:46:00 +02:00
Many the fish
1237cfc249
Update milli/src/search/matches/mod.rs
2022-07-18 17:45:37 +02:00
Many the fish
d7fd5c58cd
Update milli/src/search/matches/mod.rs
2022-07-18 17:45:06 +02:00
Loïc Lecrenier
fc9f3f31e7
Change DocumentsBatchReader to access cursor and index at same time
...
Otherwise it is not possible to iterate over all documents while
using the fields index at the same time.
2022-07-18 16:08:14 +02:00
Loïc Lecrenier
ab1571cdec
Simplify Transform::read_documents, enabled by enriched documents reader
2022-07-18 12:45:47 +02:00
Many the fish
e261ef64d7
Update milli/src/search/matches/mod.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:51 +02:00
Many the fish
1da4ab5918
Update milli/src/search/matches/mod.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:03 +02:00
Kerollmops
448114cc1c
Fix the benchmarks with the new indexation API
2022-07-12 15:22:09 +02:00
Kerollmops
25e768f31c
Fix another issue with the nested primary key selector
2022-07-12 15:14:07 +02:00
Kerollmops
192793ee38
Add some tests to check for the nested documents ids
2022-07-12 15:14:07 +02:00
Kerollmops
a892a4a79c
Introduce a function to extend from a JSON array of objects
2022-07-12 15:14:06 +02:00
Kerollmops
dc61105554
Fix the nested document id fetching function
2022-07-12 15:14:06 +02:00
Kerollmops
2eec290424
Check the validity of the latitute and longitude numbers
2022-07-12 15:14:06 +02:00
Kerollmops
5d149d631f
Remove tests for a function that no more exists
2022-07-12 15:14:06 +02:00
Kerollmops
0bbcc7b180
Expose the DocumentId
struct to be sure to inject the generated ids
2022-07-12 15:14:06 +02:00
Kerollmops
d1a4da9812
Generate a real UUIDv4 when ids are auto-generated
2022-07-12 15:14:06 +02:00
Kerollmops
c8ebf0de47
Rename the validate function as an enriching function
2022-07-12 15:14:06 +02:00
Kerollmops
905af2a2e9
Use the primary key and external id in the transform
2022-07-12 15:14:05 +02:00
Kerollmops
742543091e
Constify the default primary key name
2022-07-12 14:55:52 +02:00
Kerollmops
5f1bfb73ee
Extract the primary key name and make it accessible
2022-07-12 14:55:52 +02:00
Kerollmops
6a0a0ae94f
Make the Transform read from an EnrichedDocumentsBatchReader
2022-07-12 14:55:52 +02:00
Kerollmops
dc3f092d07
Do not leak an internal grenad Error
2022-07-12 14:55:52 +02:00
Kerollmops
8ebf5eed0d
Make the nested primary key work
2022-07-12 14:55:52 +02:00
Kerollmops
19eb3b4708
Make sur that we do not accept floats as documents ids
2022-07-12 14:55:52 +02:00
Kerollmops
2ceeb51c37
Support the auto-generated ids when validating documents
2022-07-12 14:55:51 +02:00
Kerollmops
399eec5c01
Fix the indexation tests
2022-07-12 14:55:51 +02:00
Kerollmops
fcfc4caf8c
Move the Object type in the lib.rs file and use it everywhere
2022-07-12 14:55:51 +02:00
Kerollmops
0146175fe6
Introduce the validate_documents_batch function
2022-07-12 14:55:51 +02:00
Kerollmops
bdc4263883
Introduce the validate_documents_batch function
2022-07-12 14:55:51 +02:00
Kerollmops
e8297ad27e
Fix the tests for the new DocumentsBatchBuilder/Reader
2022-07-12 14:52:56 +02:00
Kerollmops
419ce3966c
Rework the DocumentsBatchBuilder/Reader to use grenad
2022-07-12 14:52:55 +02:00
Kerollmops
048e174efb
Do not allocate when parsing CSV headers
2022-07-12 14:52:55 +02:00
ManyTheFish
5d79617a56
Chores: Enhance smart-crop code comments
2022-07-07 16:28:09 +02:00
bors[bot]
ebddfdb9a3
Merge #578
...
578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops
Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584 ).
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-05 14:56:08 +00:00
Kerollmops
1bfdcfc84f
Bump uuid to 1.1.2
2022-07-05 16:23:36 +02:00
Tamo
250be9fe6c
put the threshold back to 10k
2022-07-05 15:57:44 +02:00
Tamo
b61efd09fc
Makes the internal soft deleted error a UserError
2022-07-05 15:34:45 +02:00
Tamo
eaf28b0628
Apply review suggestions
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-05 15:30:33 +02:00
Tamo
3b309f654a
Fasten the document deletion
...
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Dmytro Gordon
3ff03a3f5f
Fix not equal filter when field contains both number and strings
2022-06-27 15:55:17 +03:00
Kerollmops
238692a8e7
Introduce the copy_to_path method on the Index
2022-06-22 16:49:47 +02:00
bors[bot]
290a40b7a5
Merge #564
...
564: Rename the limitedTo parameter into maxTotalHits r=curquiza a=Kerollmops
This PR is related to https://github.com/meilisearch/meilisearch/issues/2542 , it renames the `limitedTo` parameter into `maxTotalHits`.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 13:48:33 +00:00
Kerollmops
d7c248042b
Rename the limitedTo parameter into maxTotalHits
2022-06-22 12:00:48 +02:00
Kerollmops
d2f84a9d9e
Improve the estimatedNbHits when distinct is enabled
2022-06-22 11:39:21 +02:00
ManyTheFish
a0ab90a4d7
Avoid having an ending separator before crop marker
2022-06-16 18:23:57 +02:00
ManyTheFish
177154828c
Extends deletion tests
2022-06-13 17:34:16 +02:00
ManyTheFish
0d1d354052
Ensure that Index methods are not bypassed by Meilisearch
2022-06-13 17:34:11 +02:00
bors[bot]
f1d848bb9a
Merge #552
...
552: Fix escaped quotes in filter r=Kerollmops a=irevoire
Will fix https://github.com/meilisearch/meilisearch/issues/2380
The issue was that in the evaluation of the filter, I was using the deref implementation instead of calling the `value` method of my token.
To avoid the problem happening again, I removed the deref implementation; now, you need to either call the `lexeme` or the `value` methods but can't rely on a « default » implementation to get a string out of a token.
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-06-09 14:56:44 +00:00
Tamo
90afde435b
fix escaped quotes in filter
2022-06-09 16:03:49 +02:00
Kerollmops
445d5474cc
Add the pagination_limited_to setting to the database
2022-06-08 18:14:27 +02:00
Kerollmops
69931e50d2
Add the max_values_by_facet setting to the database
2022-06-08 17:54:56 +02:00
Kerollmops
52a494bd3b
Add the new pagination.limited_to and faceting.max_values_per_facet settings
2022-06-08 17:15:36 +02:00
Kerollmops
2a505503b3
Change the number of facet values returned by default to 100
2022-06-08 15:58:57 +02:00
Kerollmops
bae4007447
Remove the hard limit on the number of facet values returned
2022-06-08 15:58:57 +02:00
Tamo
d0aaa7ff00
Fix wrong internal ids assignments
2022-06-07 15:49:33 +02:00
ad hoc
31776fdc3f
add failing test
2022-06-07 15:49:33 +02:00
ManyTheFish
d212dc6b8b
Remove useless newline
2022-06-02 18:22:56 +02:00
ManyTheFish
7aabe42ae0
Refactor matching words
2022-06-02 17:59:04 +02:00
ManyTheFish
86ac8568e6
Use Charabia in milli
2022-06-02 16:59:11 +02:00
bors[bot]
74d1914a64
Merge #535
...
535: Reintroduce the max values by facet limit r=ManyTheFish a=Kerollmops
This PR reintroduces the max values by facet limit this is related to https://github.com/meilisearch/meilisearch/issues/2349 .
~I would like some help in deciding on whether I keep the default 100 max values in milli and set up the `FacetDistribution` settings in Meilisearch to use 1000 as the new value, I expose the `max_values_by_facet` for this purpose.~
I changed the default value to 1000 and the max to 10000, thank you `@ManyTheFish` for the help!
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-01 14:30:50 +00:00
bors[bot]
582930dbbb
Merge #538
...
538: speedup exact words r=Kerollmops a=MarinPostma
This PR make `exact_words` return an `Option` instead of an empty set, since set creation is costly, as noticed by `@kerollmops.`
I was not convinces that this was the cause for all of the performance drop we measured, and then realized that methods that initialized it were called recursively which caused initialization times to add up. While the first fix solves the issue when not using exact words, using exact word remained way more expensive that it should be. To address this issue, the exact words are cached into the `Context`, so they are only initialized once.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-05-30 08:20:34 +00:00
ad hoc
25fc576696
review changes
2022-05-24 14:15:33 +02:00
ad hoc
69dc4de80f
change &Option<Set> to Option<&Set>
2022-05-24 12:14:55 +02:00
ad hoc
ac975cc747
cache context's exact words
2022-05-24 09:43:17 +02:00
ad hoc
8993fec8a3
return optional exact words
2022-05-24 09:15:49 +02:00
Matthias Wright
754f48a4fb
Improves ranking rules error message
2022-05-20 21:25:43 +02:00
Kerollmops
cd7c6e19ed
Reintroduce the max values by facet limit
2022-05-18 15:57:57 +02:00
ManyTheFish
137434a1c8
Add some implementation on MatchBounds
2022-05-17 15:57:09 +02:00
bors[bot]
08c6d50cd1
Merge #531
...
531: fix the mixed dataset geosearch indexing bug r=Kerollmops a=irevoire
port #529 to main
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-16 16:06:36 +00:00
bors[bot]
cf3e574cb4
Merge #530
...
530: fix the searchable fields bug when a field is nested r=Kerollmops a=irevoire
port #528 to main
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-16 15:52:30 +00:00
Tamo
0af399a6d7
fix the mixed dataset geosearch indexing bug
2022-05-16 17:37:45 +02:00
Tamo
f586028f9a
fix the searchable fields bug when a field is nested
...
Update milli/src/index.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-05-16 17:24:36 +02:00
bors[bot]
e1e85267fd
Merge #526
...
526: remove useless comment r=irevoire a=MarinPostma
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-05-16 10:01:43 +00:00
bors[bot]
51809eb260
Merge #525
...
525: Simplify the error creation with thiserror r=irevoire a=irevoire
I introduced [`thiserror`](https://docs.rs/thiserror/latest/thiserror/ ) to implements all the `Display` trait and most of the `impl From<xxx> for yyy` in way less lines.
And then I introduced a cute macro to implements the `impl<X, Y, Z> From<X> for Z where Y: From<X>, Z: From<X>` more easily.
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-04 15:47:32 +00:00
Tamo
484a9ddb27
Simplify the error creation with thiserror and a smol friendly macro
2022-05-04 17:24:00 +02:00
bors[bot]
65e6aa0de2
Merge #523
...
523: Improve geosearch error messages r=irevoire a=irevoire
Improve the geosearch error messages (#488 ).
And try to parse the string as specified in https://github.com/meilisearch/meilisearch/issues/2354
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-04 13:36:11 +00:00
Tamo
c55368ddd4
apply code suggestion
...
Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-05-04 14:11:03 +02:00
ad hoc
5ad5d56f7e
remove useless comment
2022-05-04 10:43:54 +02:00
bors[bot]
0c2c8af44e
Merge #520
...
520: fix mistake in Settings initialization r=irevoire a=MarinPostma
fix settings not being correctly initialized and add a test to make sure that they are in the future.
fix https://github.com/meilisearch/meilisearch/issues/2358
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-05-03 15:32:18 +00:00
Kerollmops
211c8763b9
Make sure that we do not generate too long keys
2022-05-03 10:03:15 +02:00
Kerollmops
7e47031bdc
Add a test for long keys in LMDB
2022-05-03 10:03:13 +02:00
Tamo
3cb1f6d0a1
improve geosearch error messages
2022-05-02 19:20:47 +02:00
ad hoc
1ee3d6ae33
fix mistake in Settings initialization
2022-04-29 16:24:25 +02:00
bors[bot]
9db86aac51
Merge #518
...
518: Return facets even when there is no value associated to it r=Kerollmops a=Kerollmops
This PR is related to https://github.com/meilisearch/meilisearch/issues/2352 and should fix the issue when Meilisearch is up-to-date with this PR.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-04-28 09:04:36 +00:00
Kerollmops
7d1c2d97bf
Return facets even when there is no values associated to it
2022-04-26 17:59:53 +02:00
bors[bot]
d388ea0f9d
Merge #506
...
506: fix cargo warnings r=Kerollmops a=MarinPostma
fix cargo warnings
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-26 15:45:20 +00:00
ad hoc
5c29258e8e
fix cargo warnings
2022-04-26 17:33:11 +02:00
Tamo
f19d2dc548
Only flatten the required fields
...
apply review comments
Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-04-26 12:33:46 +02:00
bors[bot]
8010eca9c7
Merge #505
...
505: normalize exact words r=curquiza a=MarinPostma
Normalize the exact words, as specified in the specification.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-04-25 09:35:32 +00:00
ad hoc
2e0089d5ff
normalize exact words
2022-04-21 15:38:40 +02:00
ad hoc
3a2451fcba
add test normalize exact words
2022-04-21 13:52:09 +02:00
Clément Renault
eb5830aa40
Add a test to make sure that long words are handled
2022-04-21 13:45:28 +02:00
ad hoc
8b14090927
fix min-word-len-for-typo not reset properly
2022-04-19 15:20:16 +02:00
bors[bot]
ea4bb9402f
Merge #483
...
483: Enhance matching words r=Kerollmops a=ManyTheFish
# Summary
Enhance milli word-matcher making it handle match computing and cropping.
# Implementation
## Computing best matches for cropping
Before we were considering that the first match of the attribute was the best one, this was accurate when only one word was searched but was missing the target when more than one word was searched.
Now we are searching for the best matches interval to crop around, the chosen interval is the one:
1) that have the highest count of unique matches
> for example, if we have a query `split the world`, then the interval `the split the split the` has 5 matches but only 2 unique matches (1 for `split` and 1 for `the`) where the interval `split of the world` has 3 matches and 3 unique matches. So the interval `split of the world` is considered better.
2) that have the minimum distance between matches
> for example, if we have a query `split the world`, then the interval `split of the world` has a distance of 3 (2 between `split` and `the`, and 1 between `the` and `world`) where the interval `split the world` has a distance of 2. So the interval `split the world` is considered better.
3) that have the highest count of ordered matches
> for example, if we have a query `split the world`, then the interval `the world split` has 2 ordered words where the interval `split the world` has 3. So the interval `split the world` is considered better.
## Cropping around the best matches interval
Before we were cropping around the interval without checking the context.
Now we are cropping around words in the same context as matching words.
This means that we will keep words that are farther from the matching words but are in the same phrase, than words that are nearer but separated by a dot.
> For instance, for the matching word `Split` the text:
`Natalie risk her future. Split The World is a book written by Emily Henry. I never read it.`
will be cropped like:
`…. Split The World is a book written by Emily Henry. …`
and not like:
`Natalie risk her future. Split The World is a book …`
Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-04-19 11:42:32 +00:00
ManyTheFish
f1115e274f
Use Copy impl of FormatOption instead of clonning
2022-04-19 10:35:50 +02:00
Tamo
00f78d6b5a
Apply code suggestions
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-04-14 11:14:08 +02:00
Tamo
399fba16bb
only flatten an object if it's nested
2022-04-14 11:14:08 +02:00
Tamo
ee64f4a936
Use smartstring to store the external id in our hashmap
...
We need to store all the external id (primary key) in a hashmap
associated to their internal id during.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
2022-04-13 21:22:07 +02:00
ad hoc
dda28d7415
exclude excluded canditates from search result candidates
2022-04-13 12:10:35 +02:00
ad hoc
bbb6728d2f
add distinct attributes to cli
2022-04-13 12:10:35 +02:00
ManyTheFish
5809d3ae0d
Add first benchmarks on formatting
2022-04-12 16:31:58 +02:00
ManyTheFish
827cedcd15
Add format option structure
2022-04-12 13:42:14 +02:00
ManyTheFish
011f8210ed
Make compute_matches more rust idiomatic
2022-04-12 10:19:02 +02:00
ManyTheFish
a16de5de84
Symplify format and remove intermediate function
2022-04-08 11:20:41 +02:00
ManyTheFish
a769e09dfa
Make token_crop_bounds more rust idiomatic
2022-04-07 20:15:14 +02:00
ManyTheFish
c8ed1675a7
Add some documentation
2022-04-07 17:32:13 +02:00
ManyTheFish
b1905dfa24
Make split_best_frequency returns references instead of owned data
2022-04-07 17:05:44 +02:00
Irevoire
4f3ce6d9cd
nested fields
2022-04-07 16:58:46 +02:00
ad hoc
b799f3326b
rename merge_nothing to merge_ignore_values
2022-04-05 18:44:35 +02:00
ManyTheFish
fa7d3a37c0
Make some cleaning and add comments
2022-04-05 17:48:56 +02:00
ManyTheFish
3bb1e35ada
Fix match count
2022-04-05 17:48:45 +02:00
ManyTheFish
56e0edd621
Put crop markers direclty around words
2022-04-05 17:41:32 +02:00
ManyTheFish
a93cd8c61c
Fix prefix highlight with special chars
2022-04-05 17:41:32 +02:00
ManyTheFish
b3f0f39106
Make some cleaning
2022-04-05 17:41:32 +02:00
ManyTheFish
6dc345bc53
Test and Fix prefix highlight
2022-04-05 17:41:32 +02:00
ManyTheFish
bd30ee97b8
Keep separators at start of the croped string
2022-04-05 17:41:32 +02:00
ManyTheFish
29c5f76d7f
Use new matcher in http-ui
2022-04-05 17:41:32 +02:00
ManyTheFish
734d0899d3
Publish Matcher
2022-04-05 17:41:32 +02:00
ManyTheFish
4428cb5909
Add some tests and fix some corner cases
2022-04-05 17:41:32 +02:00
ManyTheFish
844f546a8b
Add matches algorithm V1
2022-04-05 17:41:32 +02:00
ManyTheFish
3be1790803
Add crop algorithm with naive match algorithm
2022-04-05 17:41:32 +02:00
ManyTheFish
d96e72e5dc
Create formater with some tests
2022-04-05 17:41:32 +02:00
ad hoc
201fea0fda
limit extract_word_docids memory usage
2022-04-05 14:14:15 +02:00
ad hoc
5cfd3d8407
add exact attributes documentation
2022-04-05 14:10:22 +02:00
ad hoc
b85cd4983e
remove field_id_from_position
2022-04-05 09:50:34 +02:00
ad hoc
ab185a59b5
fix infos
2022-04-05 09:46:56 +02:00
ad hoc
1810927dbd
rephrase exact_attributes doc
2022-04-04 21:04:49 +02:00
ad hoc
b7694c34f5
remove println
2022-04-04 21:00:07 +02:00
ad hoc
6cabd47c32
fix typo in comment
2022-04-04 20:59:20 +02:00
ad hoc
6b2c2509b2
fix bug in exact search
2022-04-04 20:54:03 +02:00
ad hoc
56b4f5dce2
add exact prefix to query_docids
2022-04-04 20:54:03 +02:00
ad hoc
21ae4143b1
add exact_word_prefix to Context
2022-04-04 20:54:03 +02:00
ad hoc
e8f06f6c06
extract exact_word_prefix_docids
2022-04-04 20:54:03 +02:00
ad hoc
6dd2e4ffbd
introduce exact_word_prefix database in index
2022-04-04 20:54:03 +02:00
ad hoc
ba0bb29cd8
refactor WordPrefixDocids to take dbs instead of indexes
2022-04-04 20:54:02 +02:00
ad hoc
c4c6e35352
query exact_word_docids in resolve_query_tree
2022-04-04 20:54:02 +02:00
ad hoc
8d46a5b0b5
extract exact word docids
2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0
introduce exact_word_docids db
2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d
refactor spawn_extraction_task
2022-04-04 20:54:02 +02:00
ad hoc
f82d4b36eb
introduce exact attribute setting
2022-04-04 20:54:02 +02:00
ad hoc
c882d8daf0
add test for exact words
2022-04-04 20:54:01 +02:00
ad hoc
7e9d56a9e7
disable typos on exact words
2022-04-04 20:54:01 +02:00
ad hoc
30a2711bac
rename serde module to serde_impl module
...
needed because of issues with rustfmt
2022-04-04 20:10:55 +02:00
ad hoc
0fd55db21c
fmt
2022-04-04 20:10:55 +02:00
ad hoc
559e46be5e
fix bad rebase bug
2022-04-04 20:10:55 +02:00
ad hoc
8b1e5d9c6d
add test for exact words
2022-04-04 20:10:55 +02:00
ad hoc
774fa8f065
disable typos on exact words
2022-04-04 20:10:55 +02:00
ad hoc
9bbffb8fee
add exact words setting
2022-04-04 20:10:54 +02:00
ad hoc
853b4a520f
fmt
2022-04-04 10:41:46 +02:00
ad hoc
1941072bb2
implement Copy on Setting
2022-04-04 10:41:46 +02:00
ad hoc
fdaf45aab2
replace hardcoded value with constant in TestContext
2022-04-04 10:41:46 +02:00
ad hoc
950a740bd4
refactor typos for readability
2022-04-04 10:41:46 +02:00
ad hoc
66020cd923
rename min_word_len* to use plain letter numbers
2022-04-04 10:41:46 +02:00
ad hoc
4c4b336ecb
rename min word len for typo error
2022-04-01 11:17:03 +02:00
ad hoc
286dd7b2e4
rename min_word_len_2_typo
2022-04-01 11:17:03 +02:00
ad hoc
55af85db3c
add tests for min_word_len_for_typo
2022-04-01 11:17:02 +02:00
ad hoc
9102de5500
fix error message
2022-04-01 11:17:02 +02:00
ad hoc
a1a3a49bc9
dynamic minimum word len for typos in query tree builder
2022-04-01 11:17:02 +02:00
ad hoc
5a24e60572
introduce word len for typo setting
2022-04-01 11:17:02 +02:00
ad hoc
9fe40df960
add word derivations tests
2022-04-01 11:05:18 +02:00
ad hoc
d5ddc6b080
fix 2 typos word derivation bug
2022-04-01 10:51:22 +02:00
ad hoc
3e34981d9b
add test for authorize_typos in update
2022-03-31 14:12:00 +02:00
ad hoc
6ef3bb9d83
fmt
2022-03-31 14:06:23 +02:00
ad hoc
f782fe2062
add authorize_typo_test
2022-03-31 10:08:39 +02:00
ad hoc
c4653347fd
add authorize typo setting
2022-03-31 10:05:44 +02:00
bors[bot]
90276d9a2d
Merge #472
...
472: Remove useless variables in proximity r=Kerollmops a=ManyTheFish
Was passing by plane sweep algorithm to find some inspiration, and I discover that we have useless variables that were not detected because of the recursive function.
Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-03-16 15:33:11 +00:00
ManyTheFish
49d59d88c2
Remove useless variables in proximity
2022-03-16 16:12:52 +01:00
Bruno Casali
adc71742c8
Move string concat to the struct instead of in the calling
2022-03-16 10:26:12 -03:00
Bruno Casali
4822fe1beb
Add a better error message when the filterable attrs are empty
...
Fixes https://github.com/meilisearch/meilisearch/issues/2140
2022-03-15 18:13:59 -03:00
bors[bot]
ad4c982c68
Merge #439
...
439: Optimize typo criterion r=Kerollmops a=MarinPostma
This pr implements a couple of optimization for the typo criterion:
- clamp max typo on concatenated query words to 1: By considering that a concatenated query word is a typo, we clamp the max number of typos allowed o it to 1. This is useful because we noticed that concatenated query words often introduced words with 2 typos in queries that otherwise didn't allow for 2 typo words.
- Make typos on the first letter count for 2. This change is a big performance gain: by considering the typos on the first letter to count as 2 typos, we drastically restrict the search space for 1 typo, and if we reach 2 typos, the search space is reduced as well, as we only consider: (2 typos ∩ correct first letter) ∪ (wrong first letter ∩ 1 typo) instead of 2 typos anywhere in the word.
## benches
```
group main typo
----- ---- ----
smol-songs.csv: asc + default/Notstandskomitee 2.51 5.8±0.01ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec
smol-songs.csv: asc + default/charles 2.48 3.0±0.01ms ? ?/sec 1.00 1190.9±1.29µs ? ?/sec
smol-songs.csv: asc + default/charles mingus 5.56 10.8±0.01ms ? ?/sec 1.00 1935.3±1.00µs ? ?/sec
smol-songs.csv: asc + default/david 1.65 3.9±0.00ms ? ?/sec 1.00 2.4±0.01ms ? ?/sec
smol-songs.csv: asc + default/david bowie 3.34 12.5±0.02ms ? ?/sec 1.00 3.7±0.00ms ? ?/sec
smol-songs.csv: asc + default/john 1.00 1849.7±3.74µs ? ?/sec 1.01 1875.1±4.65µs ? ?/sec
smol-songs.csv: asc + default/marcus miller 4.32 15.7±0.01ms ? ?/sec 1.00 3.6±0.01ms ? ?/sec
smol-songs.csv: asc + default/michael jackson 3.31 12.5±0.01ms ? ?/sec 1.00 3.8±0.00ms ? ?/sec
smol-songs.csv: asc + default/tamo 1.05 565.4±0.86µs ? ?/sec 1.00 539.3±1.22µs ? ?/sec
smol-songs.csv: asc + default/thelonious monk 3.49 11.5±0.01ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec
smol-songs.csv: asc/Notstandskomitee 2.59 5.6±0.02ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec
smol-songs.csv: asc/charles 6.05 2.1±0.00ms ? ?/sec 1.00 347.8±0.60µs ? ?/sec
smol-songs.csv: asc/charles mingus 14.46 9.4±0.01ms ? ?/sec 1.00 649.2±0.97µs ? ?/sec
smol-songs.csv: asc/david 3.87 2.4±0.00ms ? ?/sec 1.00 618.2±0.69µs ? ?/sec
smol-songs.csv: asc/david bowie 10.14 9.8±0.01ms ? ?/sec 1.00 970.8±1.55µs ? ?/sec
smol-songs.csv: asc/john 1.00 546.5±1.10µs ? ?/sec 1.00 547.1±2.11µs ? ?/sec
smol-songs.csv: asc/marcus miller 11.45 10.4±0.06ms ? ?/sec 1.00 907.9±1.37µs ? ?/sec
smol-songs.csv: asc/michael jackson 10.56 9.7±0.01ms ? ?/sec 1.00 919.6±1.03µs ? ?/sec
smol-songs.csv: asc/tamo 1.03 43.3±0.18µs ? ?/sec 1.00 42.2±0.23µs ? ?/sec
smol-songs.csv: asc/thelonious monk 4.16 10.7±0.02ms ? ?/sec 1.00 2.6±0.00ms ? ?/sec
smol-songs.csv: basic filter: <=/Notstandskomitee 1.00 95.7±0.20µs ? ?/sec 1.15 109.6±10.40µs ? ?/sec
smol-songs.csv: basic filter: <=/charles 1.00 27.8±0.15µs ? ?/sec 1.01 27.9±0.18µs ? ?/sec
smol-songs.csv: basic filter: <=/charles mingus 1.72 119.2±0.67µs ? ?/sec 1.00 69.1±0.13µs ? ?/sec
smol-songs.csv: basic filter: <=/david 1.00 22.3±0.33µs ? ?/sec 1.05 23.4±0.19µs ? ?/sec
smol-songs.csv: basic filter: <=/david bowie 1.59 86.9±0.79µs ? ?/sec 1.00 54.5±0.31µs ? ?/sec
smol-songs.csv: basic filter: <=/john 1.00 17.9±0.06µs ? ?/sec 1.06 18.9±0.15µs ? ?/sec
smol-songs.csv: basic filter: <=/marcus miller 1.65 102.7±1.63µs ? ?/sec 1.00 62.3±0.18µs ? ?/sec
smol-songs.csv: basic filter: <=/michael jackson 1.76 128.2±1.85µs ? ?/sec 1.00 72.9±0.19µs ? ?/sec
smol-songs.csv: basic filter: <=/tamo 1.00 17.9±0.13µs ? ?/sec 1.05 18.7±0.20µs ? ?/sec
smol-songs.csv: basic filter: <=/thelonious monk 1.53 157.5±2.38µs ? ?/sec 1.00 102.8±0.88µs ? ?/sec
smol-songs.csv: basic filter: TO/Notstandskomitee 1.00 100.9±4.36µs ? ?/sec 1.04 105.0±8.25µs ? ?/sec
smol-songs.csv: basic filter: TO/charles 1.00 28.4±0.36µs ? ?/sec 1.03 29.4±0.33µs ? ?/sec
smol-songs.csv: basic filter: TO/charles mingus 1.71 118.1±1.08µs ? ?/sec 1.00 68.9±0.26µs ? ?/sec
smol-songs.csv: basic filter: TO/david 1.00 24.0±0.26µs ? ?/sec 1.03 24.6±0.43µs ? ?/sec
smol-songs.csv: basic filter: TO/david bowie 1.72 95.2±0.30µs ? ?/sec 1.00 55.2±0.14µs ? ?/sec
smol-songs.csv: basic filter: TO/john 1.00 18.8±0.09µs ? ?/sec 1.06 19.8±0.17µs ? ?/sec
smol-songs.csv: basic filter: TO/marcus miller 1.61 102.4±1.65µs ? ?/sec 1.00 63.4±0.24µs ? ?/sec
smol-songs.csv: basic filter: TO/michael jackson 1.77 132.1±1.41µs ? ?/sec 1.00 74.5±0.59µs ? ?/sec
smol-songs.csv: basic filter: TO/tamo 1.00 18.2±0.14µs ? ?/sec 1.05 19.2±0.46µs ? ?/sec
smol-songs.csv: basic filter: TO/thelonious monk 1.49 150.8±1.92µs ? ?/sec 1.00 101.3±0.44µs ? ?/sec
smol-songs.csv: basic placeholder/ 1.00 27.3±0.07µs ? ?/sec 1.03 28.0±0.05µs ? ?/sec
smol-songs.csv: basic with quote/"Notstandskomitee" 1.00 122.4±0.17µs ? ?/sec 1.03 125.6±0.16µs ? ?/sec
smol-songs.csv: basic with quote/"charles" 1.00 88.8±0.30µs ? ?/sec 1.00 88.4±0.15µs ? ?/sec
smol-songs.csv: basic with quote/"charles" "mingus" 1.00 685.2±0.74µs ? ?/sec 1.01 689.4±6.07µs ? ?/sec
smol-songs.csv: basic with quote/"david" 1.00 161.6±0.42µs ? ?/sec 1.01 162.6±0.17µs ? ?/sec
smol-songs.csv: basic with quote/"david" "bowie" 1.00 731.7±0.73µs ? ?/sec 1.02 743.1±0.77µs ? ?/sec
smol-songs.csv: basic with quote/"john" 1.00 267.1±0.33µs ? ?/sec 1.01 270.9±0.33µs ? ?/sec
smol-songs.csv: basic with quote/"marcus" "miller" 1.00 138.7±0.31µs ? ?/sec 1.02 140.9±0.13µs ? ?/sec
smol-songs.csv: basic with quote/"michael" "jackson" 1.01 841.4±0.72µs ? ?/sec 1.00 833.8±0.92µs ? ?/sec
smol-songs.csv: basic with quote/"tamo" 1.01 189.2±0.26µs ? ?/sec 1.00 188.2±0.71µs ? ?/sec
smol-songs.csv: basic with quote/"thelonious" "monk" 1.00 1100.5±1.36µs ? ?/sec 1.01 1111.7±2.17µs ? ?/sec
smol-songs.csv: basic without quote/Notstandskomitee 3.40 7.9±0.02ms ? ?/sec 1.00 2.3±0.02ms ? ?/sec
smol-songs.csv: basic without quote/charles 2.57 494.4±0.89µs ? ?/sec 1.00 192.5±0.18µs ? ?/sec
smol-songs.csv: basic without quote/charles mingus 1.29 2.8±0.02ms ? ?/sec 1.00 2.1±0.01ms ? ?/sec
smol-songs.csv: basic without quote/david 1.95 623.8±0.90µs ? ?/sec 1.00 319.2±1.22µs ? ?/sec
smol-songs.csv: basic without quote/david bowie 1.12 5.9±0.00ms ? ?/sec 1.00 5.2±0.00ms ? ?/sec
smol-songs.csv: basic without quote/john 1.24 1340.9±2.25µs ? ?/sec 1.00 1084.7±7.76µs ? ?/sec
smol-songs.csv: basic without quote/marcus miller 7.97 14.6±0.01ms ? ?/sec 1.00 1826.0±6.84µs ? ?/sec
smol-songs.csv: basic without quote/michael jackson 1.19 3.9±0.00ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec
smol-songs.csv: basic without quote/tamo 1.65 737.7±3.58µs ? ?/sec 1.00 446.7±0.51µs ? ?/sec
smol-songs.csv: basic without quote/thelonious monk 1.16 4.5±0.02ms ? ?/sec 1.00 3.9±0.04ms ? ?/sec
smol-songs.csv: big filter/Notstandskomitee 3.27 7.6±0.02ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec
smol-songs.csv: big filter/charles 8.26 1957.5±1.37µs ? ?/sec 1.00 236.8±0.34µs ? ?/sec
smol-songs.csv: big filter/charles mingus 18.49 11.2±0.06ms ? ?/sec 1.00 607.7±3.03µs ? ?/sec
smol-songs.csv: big filter/david 3.78 2.4±0.00ms ? ?/sec 1.00 622.8±0.80µs ? ?/sec
smol-songs.csv: big filter/david bowie 9.00 12.0±0.01ms ? ?/sec 1.00 1336.0±3.17µs ? ?/sec
smol-songs.csv: big filter/john 1.00 554.2±0.95µs ? ?/sec 1.01 560.4±0.79µs ? ?/sec
smol-songs.csv: big filter/marcus miller 18.09 12.0±0.01ms ? ?/sec 1.00 664.7±0.60µs ? ?/sec
smol-songs.csv: big filter/michael jackson 8.43 12.0±0.01ms ? ?/sec 1.00 1421.6±1.37µs ? ?/sec
smol-songs.csv: big filter/tamo 1.00 86.3±0.14µs ? ?/sec 1.01 87.3±0.21µs ? ?/sec
smol-songs.csv: big filter/thelonious monk 5.55 14.3±0.02ms ? ?/sec 1.00 2.6±0.01ms ? ?/sec
smol-songs.csv: desc + default/Notstandskomitee 2.52 5.8±0.01ms ? ?/sec 1.00 2.3±0.01ms ? ?/sec
smol-songs.csv: desc + default/charles 3.04 2.7±0.01ms ? ?/sec 1.00 893.4±1.08µs ? ?/sec
smol-songs.csv: desc + default/charles mingus 6.77 10.3±0.01ms ? ?/sec 1.00 1520.8±1.90µs ? ?/sec
smol-songs.csv: desc + default/david 1.39 5.7±0.00ms ? ?/sec 1.00 4.1±0.00ms ? ?/sec
smol-songs.csv: desc + default/david bowie 2.34 15.8±0.02ms ? ?/sec 1.00 6.7±0.01ms ? ?/sec
smol-songs.csv: desc + default/john 1.00 2.5±0.00ms ? ?/sec 1.02 2.6±0.01ms ? ?/sec
smol-songs.csv: desc + default/marcus miller 5.06 14.5±0.02ms ? ?/sec 1.00 2.9±0.01ms ? ?/sec
smol-songs.csv: desc + default/michael jackson 2.64 14.1±0.05ms ? ?/sec 1.00 5.4±0.00ms ? ?/sec
smol-songs.csv: desc + default/tamo 1.00 567.0±0.65µs ? ?/sec 1.00 565.7±0.97µs ? ?/sec
smol-songs.csv: desc + default/thelonious monk 3.55 11.6±0.02ms ? ?/sec 1.00 3.3±0.00ms ? ?/sec
smol-songs.csv: desc/Notstandskomitee 2.58 5.6±0.02ms ? ?/sec 1.00 2.2±0.02ms ? ?/sec
smol-songs.csv: desc/charles 6.04 2.1±0.00ms ? ?/sec 1.00 348.1±0.57µs ? ?/sec
smol-songs.csv: desc/charles mingus 14.51 9.4±0.01ms ? ?/sec 1.00 646.7±0.99µs ? ?/sec
smol-songs.csv: desc/david 3.86 2.4±0.00ms ? ?/sec 1.00 620.7±2.46µs ? ?/sec
smol-songs.csv: desc/david bowie 10.10 9.8±0.01ms ? ?/sec 1.00 973.9±3.31µs ? ?/sec
smol-songs.csv: desc/john 1.00 545.5±0.78µs ? ?/sec 1.00 547.2±0.48µs ? ?/sec
smol-songs.csv: desc/marcus miller 11.39 10.3±0.01ms ? ?/sec 1.00 903.7±0.95µs ? ?/sec
smol-songs.csv: desc/michael jackson 10.51 9.7±0.01ms ? ?/sec 1.00 924.7±2.02µs ? ?/sec
smol-songs.csv: desc/tamo 1.01 43.2±0.33µs ? ?/sec 1.00 42.6±0.35µs ? ?/sec
smol-songs.csv: desc/thelonious monk 4.19 10.8±0.03ms ? ?/sec 1.00 2.6±0.00ms ? ?/sec
smol-songs.csv: prefix search/a 1.00 1008.7±1.00µs ? ?/sec 1.00 1005.5±0.91µs ? ?/sec
smol-songs.csv: prefix search/b 1.00 885.0±0.70µs ? ?/sec 1.01 890.6±1.11µs ? ?/sec
smol-songs.csv: prefix search/i 1.00 1051.8±1.25µs ? ?/sec 1.00 1056.6±4.12µs ? ?/sec
smol-songs.csv: prefix search/s 1.00 724.7±1.77µs ? ?/sec 1.00 721.6±0.59µs ? ?/sec
smol-songs.csv: prefix search/x 1.01 212.4±0.21µs ? ?/sec 1.00 210.9±0.38µs ? ?/sec
smol-songs.csv: proximity/7000 Danses Un Jour Dans Notre Vie 18.55 48.5±0.09ms ? ?/sec 1.00 2.6±0.03ms ? ?/sec
smol-songs.csv: proximity/The Disneyland Sing-Along Chorus 8.41 56.7±0.45ms ? ?/sec 1.00 6.7±0.05ms ? ?/sec
smol-songs.csv: proximity/Under Great Northern Lights 15.74 38.9±0.14ms ? ?/sec 1.00 2.5±0.00ms ? ?/sec
smol-songs.csv: proximity/black saint sinner lady 11.82 40.1±0.13ms ? ?/sec 1.00 3.4±0.02ms ? ?/sec
smol-songs.csv: proximity/les dangeureuses 1960 6.90 26.1±0.13ms ? ?/sec 1.00 3.8±0.04ms ? ?/sec
smol-songs.csv: typo/Arethla Franklin 14.93 5.8±0.01ms ? ?/sec 1.00 390.1±1.89µs ? ?/sec
smol-songs.csv: typo/Disnaylande 3.18 7.3±0.01ms ? ?/sec 1.00 2.3±0.00ms ? ?/sec
smol-songs.csv: typo/dire straights 5.55 15.2±0.02ms ? ?/sec 1.00 2.7±0.00ms ? ?/sec
smol-songs.csv: typo/fear of the duck 28.03 20.0±0.03ms ? ?/sec 1.00 713.3±1.54µs ? ?/sec
smol-songs.csv: typo/indochie 19.25 1851.4±2.38µs ? ?/sec 1.00 96.2±0.13µs ? ?/sec
smol-songs.csv: typo/indochien 14.66 1887.7±3.18µs ? ?/sec 1.00 128.8±0.18µs ? ?/sec
smol-songs.csv: typo/klub des loopers 37.73 18.0±0.02ms ? ?/sec 1.00 476.7±0.73µs ? ?/sec
smol-songs.csv: typo/michel depech 10.17 5.8±0.01ms ? ?/sec 1.00 565.8±1.16µs ? ?/sec
smol-songs.csv: typo/mongus 15.33 1897.4±3.44µs ? ?/sec 1.00 123.8±0.13µs ? ?/sec
smol-songs.csv: typo/stromal 14.63 1859.3±2.40µs ? ?/sec 1.00 127.1±0.29µs ? ?/sec
smol-songs.csv: typo/the white striper 10.83 9.4±0.01ms ? ?/sec 1.00 866.0±0.98µs ? ?/sec
smol-songs.csv: typo/thelonius monk 14.40 3.8±0.00ms ? ?/sec 1.00 261.5±1.30µs ? ?/sec
smol-songs.csv: words/7000 Danses / Le Baiser / je me trompe de mots 5.54 70.8±0.09ms ? ?/sec 1.00 12.8±0.03ms ? ?/sec
smol-songs.csv: words/Bring Your Daughter To The Slaughter but now this is not part of the title 3.48 119.8±0.14ms ? ?/sec 1.00 34.4±0.04ms ? ?/sec
smol-songs.csv: words/The Disneyland Children's Sing-Alone song 8.98 71.9±0.12ms ? ?/sec 1.00 8.0±0.01ms ? ?/sec
smol-songs.csv: words/les liaisons dangeureuses 1793 11.88 37.4±0.07ms ? ?/sec 1.00 3.1±0.01ms ? ?/sec
smol-songs.csv: words/seven nation mummy 22.86 23.4±0.04ms ? ?/sec 1.00 1024.8±1.57µs ? ?/sec
smol-songs.csv: words/the black saint and the sinner lady and the good doggo 2.76 124.4±0.15ms ? ?/sec 1.00 45.1±0.09ms ? ?/sec
smol-songs.csv: words/whathavenotnsuchforth and a good amount of words to pop to match the first one 2.52 107.0±0.23ms ? ?/sec 1.00 42.4±0.66ms ? ?/sec
group main-wiki typo-wiki
----- --------- ---------
smol-wiki-articles.csv: basic placeholder/ 1.02 13.7±0.02µs ? ?/sec 1.00 13.4±0.03µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"film" 1.02 409.8±0.67µs ? ?/sec 1.00 402.6±0.48µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"france" 1.00 325.9±0.91µs ? ?/sec 1.00 326.4±0.49µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"japan" 1.00 218.4±0.26µs ? ?/sec 1.01 220.5±0.20µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"machine" 1.00 143.0±0.12µs ? ?/sec 1.04 148.8±0.21µs ? ?/sec
smol-wiki-articles.csv: basic with quote/"miles" "davis" 1.00 11.7±0.06ms ? ?/sec 1.00 11.8±0.01ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"mingus" 1.00 4.4±0.03ms ? ?/sec 1.00 4.4±0.00ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"rock" "and" "roll" 1.00 43.5±0.08ms ? ?/sec 1.01 43.8±0.06ms ? ?/sec
smol-wiki-articles.csv: basic with quote/"spain" 1.00 137.3±0.35µs ? ?/sec 1.05 144.4±0.23µs ? ?/sec
smol-wiki-articles.csv: basic without quote/film 1.00 125.3±0.30µs ? ?/sec 1.06 133.1±0.37µs ? ?/sec
smol-wiki-articles.csv: basic without quote/france 1.21 1782.6±1.65µs ? ?/sec 1.00 1477.0±1.39µs ? ?/sec
smol-wiki-articles.csv: basic without quote/japan 1.28 1363.9±0.80µs ? ?/sec 1.00 1064.3±1.79µs ? ?/sec
smol-wiki-articles.csv: basic without quote/machine 1.73 760.3±0.81µs ? ?/sec 1.00 439.6±0.75µs ? ?/sec
smol-wiki-articles.csv: basic without quote/miles davis 1.03 17.0±0.03ms ? ?/sec 1.00 16.5±0.02ms ? ?/sec
smol-wiki-articles.csv: basic without quote/mingus 1.07 5.3±0.01ms ? ?/sec 1.00 5.0±0.00ms ? ?/sec
smol-wiki-articles.csv: basic without quote/rock and roll 1.01 63.9±0.18ms ? ?/sec 1.00 63.0±0.07ms ? ?/sec
smol-wiki-articles.csv: basic without quote/spain 2.07 667.4±0.93µs ? ?/sec 1.00 322.8±0.29µs ? ?/sec
smol-wiki-articles.csv: prefix search/c 1.00 343.1±0.47µs ? ?/sec 1.00 344.0±0.34µs ? ?/sec
smol-wiki-articles.csv: prefix search/g 1.00 374.4±3.42µs ? ?/sec 1.00 374.1±0.44µs ? ?/sec
smol-wiki-articles.csv: prefix search/j 1.00 359.9±0.31µs ? ?/sec 1.00 361.2±0.79µs ? ?/sec
smol-wiki-articles.csv: prefix search/q 1.01 102.0±0.12µs ? ?/sec 1.00 101.4±0.32µs ? ?/sec
smol-wiki-articles.csv: prefix search/t 1.00 536.7±1.39µs ? ?/sec 1.00 534.3±0.84µs ? ?/sec
smol-wiki-articles.csv: prefix search/x 1.00 400.9±1.00µs ? ?/sec 1.00 399.5±0.45µs ? ?/sec
smol-wiki-articles.csv: proximity/april paris 3.86 14.4±0.01ms ? ?/sec 1.00 3.7±0.01ms ? ?/sec
smol-wiki-articles.csv: proximity/diesel engine 12.98 10.4±0.01ms ? ?/sec 1.00 803.5±1.13µs ? ?/sec
smol-wiki-articles.csv: proximity/herald sings 1.00 12.7±0.06ms ? ?/sec 5.29 67.1±0.09ms ? ?/sec
smol-wiki-articles.csv: proximity/tea two 6.48 1452.1±2.78µs ? ?/sec 1.00 224.1±0.38µs ? ?/sec
smol-wiki-articles.csv: typo/Disnaylande 3.89 8.5±0.01ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec
smol-wiki-articles.csv: typo/aritmetric 3.78 10.3±0.01ms ? ?/sec 1.00 2.7±0.00ms ? ?/sec
smol-wiki-articles.csv: typo/linax 8.91 1426.7±0.97µs ? ?/sec 1.00 160.1±0.18µs ? ?/sec
smol-wiki-articles.csv: typo/migrosoft 7.48 1417.3±5.84µs ? ?/sec 1.00 189.5±0.88µs ? ?/sec
smol-wiki-articles.csv: typo/nympalidea 3.96 7.2±0.01ms ? ?/sec 1.00 1810.1±2.03µs ? ?/sec
smol-wiki-articles.csv: typo/phytogropher 3.71 7.2±0.01ms ? ?/sec 1.00 1934.3±6.51µs ? ?/sec
smol-wiki-articles.csv: typo/sisan 6.44 1497.2±1.38µs ? ?/sec 1.00 232.7±0.94µs ? ?/sec
smol-wiki-articles.csv: typo/the fronce 6.92 2.9±0.00ms ? ?/sec 1.00 418.0±1.76µs ? ?/sec
smol-wiki-articles.csv: words/Abraham machin 16.63 10.8±0.01ms ? ?/sec 1.00 649.7±1.08µs ? ?/sec
smol-wiki-articles.csv: words/Idaho Bellevue pizza 27.15 25.6±0.03ms ? ?/sec 1.00 944.2±5.07µs ? ?/sec
smol-wiki-articles.csv: words/Kameya Tokujirō mingus monk 26.87 40.7±0.05ms ? ?/sec 1.00 1515.3±2.73µs ? ?/sec
smol-wiki-articles.csv: words/Ulrich Hensel meilisearch milli 11.99 48.8±0.10ms ? ?/sec 1.00 4.1±0.02ms ? ?/sec
smol-wiki-articles.csv: words/the black saint and the sinner lady and the good doggo 4.90 110.0±0.15ms ? ?/sec 1.00 22.4±0.03ms ? ?/sec
```
Co-authored-by: mpostma <postma.marin@protonmail.com>
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-15 16:43:36 +00:00
ad hoc
3f24555c3d
custom fst automatons
2022-03-15 17:38:35 +01:00
ad hoc
628c835a22
fix tests
2022-03-15 17:38:34 +01:00
bors[bot]
8efac33b53
Merge #467
...
467: optimize prefix database r=Kerollmops a=MarinPostma
This pr introduces two optimizations that greatly improve the speed of computing prefix databases.
- The time that it takes to create the prefix FST has been divided by 5 by inverting the way we iterated over the words FST.
- We unconditionally and needlessly checked for documents to remove in `word_prefix_pair`, which caused an iteration over the whole database.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-03-15 16:14:35 +00:00
ad hoc
d127c57f2d
review edits
2022-03-15 17:12:48 +01:00
ad hoc
d633ac5b9d
optimize word prefix pair
2022-03-15 16:37:22 +01:00
ad hoc
d68fe2b3c7
optimize word prefix fst
2022-03-15 16:36:48 +01:00
Clément Renault
0c5f4ed7de
Apply suggestions
...
Co-authored-by: Many <many@meilisearch.com>
2022-03-15 14:18:29 +01:00
Kerollmops
21ec334dcc
Fix the compilation error of the dependency versions
2022-03-15 11:17:45 +01:00
psvnl sai kumar
5e08fac729
fixes for rustfmt pass
2022-03-14 19:22:41 +05:30
psvnl sai kumar
92e2e09434
exporting heed to avoid having different versions of Heed in Meilisearch
2022-03-14 01:01:58 +05:30
Kerollmops
1ae13c1374
Avoid iterating on big databases when useless
2022-03-09 15:43:54 +01:00
Bruno Casali
66c6d5e1ef
Add a new error message when the valid_fields
is empty
...
> "Attribute `{}` is not sortable. This index doesn't have configured sortable attributes."
> "Attribute `{}` is not sortable. Available sortable attributes are: `{}`."
coexist in the error handling
2022-03-05 10:38:18 -03:00
Kerollmops
d5b8b5a2f8
Replace the ugly unwraps by clean if let Somes
2022-02-28 16:31:33 +01:00
Kerollmops
8d26f3040c
Remove a useless grenad file merging
2022-02-28 16:31:33 +01:00
Clément Renault
04b1bbf932
Reintroduce appending sorted entries when possible
2022-02-24 14:50:45 +01:00
bors[bot]
25123af3b8
Merge #436
...
436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops
This PR depends on the fixes done in #431 and must be merged after it.
In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents.
---
The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity.
The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-02-16 15:41:14 +00:00
Clément Renault
ff8d7a810d
Change the behavior of the as_cloneable_grenad by taking a ref
2022-02-16 15:40:08 +01:00
Clément Renault
f367cc2e75
Finally bump grenad to v0.4.1
2022-02-16 15:28:48 +01:00
Irevoire
48542ac8fd
get rid of chrono in favor of time
2022-02-15 11:41:55 +01:00
bors[bot]
5d58cb7449
Merge #442
...
442: fix phrase search r=curquiza a=MarinPostma
Run the exact match search on 7 words windows instead of only two. This makes false positive very very unlikely, and impossible on phrase query that are less than seven words.
Co-authored-by: ad hoc <postma.marin@protonmail.com>
2022-02-07 16:18:20 +00:00
ad hoc
bd2262ceea
allow null values in csv
2022-02-03 16:03:01 +01:00
ad hoc
13de251047
rewrite word pair distance gathering
2022-02-03 15:57:20 +01:00
Many
d59bcea749
Revert "Revert "Change chunk size to 4MiB to fit more the end user usage""
2022-02-02 17:01:13 +01:00
mpostma
7541ab99cd
review changes
2022-02-02 12:59:01 +01:00
mpostma
d0aabde502
optimize 2 typos case
2022-02-02 12:56:09 +01:00
mpostma
55e6cb9c7b
typos on first letter counts as 2
2022-02-02 12:56:09 +01:00
mpostma
642c01d0dc
set max typos on ngram to 1
2022-02-02 12:56:08 +01:00
ad hoc
d852dc0d2b
fix phrase search
2022-02-01 20:21:33 +01:00
Kerollmops
fb79c32430
Compute the new, common and, deleted prefix words fst once
2022-01-27 11:00:18 +01:00
Clément Renault
51d1e64b23
Remove, now useless, the WriteMethod enum
2022-01-27 10:08:35 +01:00
Clément Renault
e9c02173cf
Rework the WordsPrefixPositionDocids update to compute a subset of the database
2022-01-27 10:08:35 +01:00
Clément Renault
dbba5fd461
Create a function to simplify the word prefix pair proximity docids compute
2022-01-27 10:08:35 +01:00
Clément Renault
e760e02737
Fix the computation of the newly added and common prefix pair proximity words
2022-01-27 10:08:35 +01:00
Clément Renault
d59e559317
Fix the computation of the newly added and common prefix words
2022-01-27 10:08:34 +01:00
Clément Renault
2ec8542105
Rework the WordPrefixDocids update to compute a subset of the database
2022-01-27 10:08:34 +01:00
Clément Renault
28692f65be
Rework the WordPrefixDocids update to compute a subset of the database
2022-01-27 10:08:34 +01:00
Clément Renault
5404bc02dd
Move the fst_stream_into_hashset method in the helper methods
2022-01-27 10:06:00 +01:00
Clément Renault
c90fa95f93
Only compute the word prefix pairs on the created word pair proximities
2022-01-27 10:06:00 +01:00
Clément Renault
822f67e9ad
Bring the newly created word pair proximity docids
2022-01-27 10:06:00 +01:00
Clément Renault
d28f18658e
Retrieve the previous version of the words prefixes FST
2022-01-27 10:05:59 +01:00
Clément Renault
f9b214f34e
Apply suggestions from code review
...
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2022-01-26 11:28:11 +01:00
Clément Renault
f04cd19886
Introduce a max prefix length parameter to the word prefix pair proximity update
2022-01-25 17:04:23 +01:00
Clément Renault
1514dfa1b7
Introduce a max proximity parameter to the word prefix pair proximity update
2022-01-25 17:04:23 +01:00
Clément Renault
23ea3ad738
Remove the useless threshold when computing the word prefix pair proximity
2022-01-25 17:04:23 +01:00
Clément Renault
e3c34684c6
Fix a bug where we were skipping most of the prefix pairs
2022-01-25 17:04:23 +01:00
bors[bot]
fd177b63f8
Merge #423
...
423: Remove an unused file r=irevoire a=irevoire
This empty file is not included anywhere
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-19 14:18:05 +00:00
Marin Postma
0c84a40298
document batch support
...
reusable transform
rework update api
add indexer config
fix tests
review changes
Co-authored-by: Clément Renault <clement@meilisearch.com>
fmt
2022-01-19 12:40:20 +01:00
Tamo
01968d7ca7
ensure we get no documents and no error when filtering on an empty db
2022-01-18 11:40:30 +01:00
bors[bot]
8f4499090b
Merge #433
...
433: fix(filter): Fix two bugs. r=Kerollmops a=irevoire
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
documents containing this field thus we return an empty RoaringBitmap
instead of throwing an internal error
Will fix https://github.com/meilisearch/MeiliSearch/issues/2082 once meilisearch is released
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-17 14:06:53 +00:00
Tamo
d1ac40ea14
fix(filter): Fix two bugs.
...
- Stop lowercasing the field when looking in the field id map
- When a field id does not exist it means there is currently zero
documents containing this field thus we returns an empty RoaringBitmap
instead of throwing an internal error
2022-01-17 13:51:46 +01:00
Samyak S Sarnayak
2d7607734e
Run cargo fmt on matching_words.rs
2022-01-17 13:04:33 +05:30
Samyak S Sarnayak
5ab505be33
Fix highlight by replacing num_graphemes_from_bytes
...
num_graphemes_from_bytes has been renamed in the tokenizer to
num_chars_from_bytes.
Highlight now works correctly!
2022-01-17 13:02:55 +05:30
Samyak S Sarnayak
e752bd06f7
Fix matching_words tests to compile successfully
...
The tests still fail due to a bug in https://github.com/meilisearch/tokenizer/pull/59
2022-01-17 11:37:45 +05:30
Samyak S Sarnayak
30247d70cd
Fix search highlight for non-unicode chars
...
The `matching_bytes` function takes a `&Token` now and:
- gets the number of bytes to highlight (unchanged).
- uses `Token.num_graphemes_from_bytes` to get the number of grapheme
clusters to highlight.
In essence, the `matching_bytes` function returns the number of matching
grapheme clusters instead of bytes. Should this function be renamed
then?
Added proper highlighting in the HTTP UI:
- requires dependency on `unicode-segmentation` to extract grapheme
clusters from tokens
- `<mark>` tag is put around only the matched part
- before this change, the entire word was highlighted even if only a
part of it matched
2022-01-17 11:37:44 +05:30
Tamo
98a365aaae
store the geopoint in three dimensions
2021-12-14 12:21:24 +01:00
Tamo
d671d6f0f1
remove an unused file
2021-12-13 19:27:34 +01:00
Clément Renault
25faef67d0
Remove the database setup in the filter_depth test
2021-12-09 11:57:53 +01:00
Clément Renault
65519bc04b
Test that empty filters return a None
2021-12-09 11:57:53 +01:00
Clément Renault
ef59762d8e
Prefer returning None instead of the Empty Filter state
2021-12-09 11:57:52 +01:00
Clément Renault
ee856a7a46
Limit the max filter depth to 2000
2021-12-07 17:36:45 +01:00
Clément Renault
32bd9f091f
Detect the filters that are too deep and return an error
2021-12-07 17:20:11 +01:00
Clément Renault
90f49eab6d
Check the filter max depth limit and reject the invalid ones
2021-12-07 16:32:48 +01:00
many
8970246bc4
Sort positions before iterating over them during word pair proximity extraction
2021-11-22 18:16:54 +01:00
Marin Postma
6e977dd8e8
change visibility of DocumentDeletionResult
2021-11-22 15:44:44 +01:00
many
35f9499638
Export tokenizer from milli
2021-11-18 16:57:12 +01:00
Marin Postma
6eb47ab792
remove update_id in UpdateBuilder
2021-11-16 13:07:04 +01:00
Marin Postma
09b4281cff
improve document addition returned metaimprove document addition
...
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
2021-11-10 14:08:36 +01:00
Marin Postma
721fc294be
improve document deletion returned meta
...
returns both the remaining number of documents and the number of deleted
documents.
2021-11-10 14:08:18 +01:00
Irevoire
0ea0146e04
implement deref &str on the tokens
2021-11-09 11:34:10 +01:00
Tamo
7483c7513a
fix the filterable fields
2021-11-07 01:52:19 +01:00
Tamo
e5af3ac65c
rename the filter_condition.rs to filter.rs
2021-11-06 16:37:55 +01:00
Tamo
6831c23449
merge with main
2021-11-06 16:34:30 +01:00
Tamo
b249989bef
fix most of the tests
2021-11-06 01:32:12 +01:00
Tamo
27a6a26b4b
makes the parse function part of the filter_parser
2021-11-05 10:46:54 +01:00
Tamo
76d961cc77
implements the last errors
2021-11-04 17:42:06 +01:00
Tamo
8234f9fdf3
recreate most filter error except for the geosearch
2021-11-04 17:24:55 +01:00
Tamo
07a5ffb04c
update http-ui
2021-11-04 15:52:22 +01:00
Tamo
a58bc5bebb
update milli with the new parser_filter
2021-11-04 15:02:36 +01:00
many
7b3bac46a0
Change Attribute and Ranking rules errors
2021-11-04 13:19:32 +01:00
many
0c0038488c
Change last error messages
2021-11-03 11:24:06 +01:00
Tamo
76a2adb7c3
re-enable the tests in the parser and start the creation of an error type
2021-11-02 17:35:17 +01:00
bors[bot]
08ae47e475
Merge #405
...
405: Change some error messages r=ManyTheFish a=ManyTheFish
Co-authored-by: many <maxime@meilisearch.com>
2021-10-28 13:35:55 +00:00
many
9f1e0d2a49
Refine asc/desc error messages
2021-10-28 14:47:17 +02:00
many
ed6db19681
Fix PR comments
2021-10-28 11:18:32 +02:00
marin postma
183d3dada7
return document count from builder
2021-10-28 10:33:04 +02:00
many
2be755ce75
Lower error check, already check in meilisearch
2021-10-27 19:50:41 +02:00
many
3599df77f0
Change some error messages
2021-10-27 19:33:01 +02:00
bors[bot]
d7943fe225
Merge #402
...
402: Optimize document transform r=MarinPostma a=MarinPostma
This pr optimizes the transform of documents additions in the obkv format. Instead on accepting any serializable objects, we instead treat json and CSV specifically:
- For json, we build a serde `Visitor`, that transform the json straight into obkv without intermediate representation.
- For csv, we directly write the lines in the obkv, applying other optimization as well.
Co-authored-by: marin postma <postma.marin@protonmail.com>
2021-10-26 09:55:28 +00:00
marin postma
baddd80069
implement review suggestions
2021-10-25 18:29:12 +02:00
marin postma
f9445c1d90
return float parsing error context in csv
2021-10-25 17:27:10 +02:00
Clémentine Urquizar
208903ddde
Revert "Replacing pest with nom "
2021-10-25 11:58:00 +02:00
marin postma
3fcccc31b5
add document builder example
2021-10-25 10:26:43 +02:00
marin postma
430e9b13d3
add csv builder tests
2021-10-25 10:26:43 +02:00
marin postma
53c79e85f2
document errors
2021-10-25 10:26:43 +02:00
marin postma
2e62925a6e
fix tests
2021-10-25 10:26:42 +02:00
marin postma
0f86d6b28f
implement csv serialization
2021-10-25 10:26:42 +02:00
marin postma
8d70b01714
optimize document deserialization
2021-10-25 10:26:42 +02:00
Tamo
1327807caa
add some error messages
2021-10-22 19:00:33 +02:00
Tamo
c8d03046bf
add a check on the fid in the geosearch
2021-10-22 18:08:18 +02:00
Tamo
3942b3732f
re-implement the geosearch
2021-10-22 18:03:39 +02:00
Tamo
7cd9109e2f
lowercase value extracted from Token
2021-10-22 17:50:15 +02:00
Tamo
e25ca9776f
start updating the exposed function to makes other modules happy
2021-10-22 17:23:22 +02:00
Tamo
6c9165b6a8
provide a helper to parse the token but to not handle the errors
2021-10-22 16:52:13 +02:00
Tamo
efb2f8b325
convert the errors
2021-10-22 16:38:35 +02:00
Tamo
c27870e765
integrate a first version without any error handling
2021-10-22 14:33:18 +02:00
Tamo
01dedde1c9
update some names and move some parser out of the lib.rs
2021-10-22 01:59:38 +02:00
Tamo
c634d43ac5
add a simple test on the filters with an integer
2021-10-21 17:10:27 +02:00
Tamo
6c15f50899
rewrite the parser logic
2021-10-21 16:45:42 +02:00
Tamo
e1d81342cf
add test on the or and and operator
2021-10-21 13:01:25 +02:00
Tamo
423baac08b
fix the tests
2021-10-21 12:45:40 +02:00
Tamo
36281a653f
write all the simple tests
2021-10-21 12:40:11 +02:00
Tamo
661bc21af5
Fix the filter parser
...
And add a bunch of tests on the filter::from_array
2021-10-21 11:45:03 +02:00
bors[bot]
59cc59e93e
Merge #358
...
358: Replacing pest with nom r=Kerollmops a=CNLHC
Co-authored-by: 刘瀚骋 <cn_lhc@qq.com>
2021-10-16 20:44:38 +00:00
刘瀚骋
7666e4f34a
follow the suggestions
2021-10-14 21:37:59 +08:00
刘瀚骋
2ea2f7570c
use nightly cargo to format the code
2021-10-14 16:46:13 +08:00
刘瀚骋
e750465e15
check logic for geolocation.
2021-10-14 16:12:00 +08:00
bors[bot]
aa5e099718
Merge #390
...
390: Add helper methods on the settings r=Kerollmops a=irevoire
This would be a good addition to look at the content of a setting without consuming it.
It’s useful for analytics.
Co-authored-by: Irevoire <tamo@meilisearch.com>
2021-10-13 20:36:30 +00:00
bors[bot]
c7db4176f3
Merge #384
...
384: Replace memmap with memmap2 r=Kerollmops a=palfrey
[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html ) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.
Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
2021-10-13 13:47:23 +00:00
Irevoire
a3e7c468cd
add helper methods on the settings
2021-10-13 13:05:07 +02:00
刘瀚骋
cd359cd96e
WIP: extract the error trait bound to new trait.
2021-10-13 18:04:15 +08:00
刘瀚骋
5de5dd80a3
WIP: remove '_nom' suffix/redundant error enum/...
2021-10-13 11:06:15 +08:00
刘瀚骋
2c65781d91
format
2021-10-12 22:20:22 +08:00
bors[bot]
6e3b869e6a
Merge #388
...
388: fix primary key inference r=MarinPostma a=MarinPostma
The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id".
This fix sorts the the index by field_id when infering the primary key.
Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-12 09:25:16 +00:00
mpostma
86ead92ed5
infer primary key on sorted fields
2021-10-12 11:15:11 +02:00
mpostma
9a266a531b
test correct primary key inference
2021-10-12 11:08:53 +02:00
many
c5a6075484
Make max_position_per_attributes changable
2021-10-12 10:10:50 +02:00
many
360c5ff3df
Remove limit of 1000 position per attribute
...
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
刘瀚骋
d323e35001
add a test case
2021-10-12 13:30:40 +08:00
刘瀚骋
70f576d5d3
error handling
2021-10-12 13:30:40 +08:00
刘瀚骋
28f9be8d7c
support syntax
2021-10-12 13:30:40 +08:00
刘瀚骋
469d92c569
tweak error handling
2021-10-12 13:30:40 +08:00
刘瀚骋
7a90a101ee
reorganize parser logic
2021-10-12 13:30:40 +08:00
刘瀚骋
f7796edc7e
remove everything about pest
2021-10-12 13:30:40 +08:00
刘瀚骋
ac1df9d9d7
fix typo and remove pest
2021-10-12 13:30:40 +08:00
刘瀚骋
50ad750ec1
enhance error handling
2021-10-12 13:30:40 +08:00
刘瀚骋
8748df2ca4
draft without error handling
2021-10-12 13:30:40 +08:00
mpostma
99889a0ed0
add obkv document serialization test
2021-10-11 15:13:17 +02:00
mpostma
799f3d43c8
fix serialization to obkv format
2021-10-11 15:04:47 +02:00
Tom Parker-Shemilt
2dfe24f067
memmap -> memmap2
2021-10-10 22:47:12 +01:00
Irevoire
b65aa7b5ac
Apply suggestions from code review
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-10-07 17:51:52 +02:00
Tamo
11dfe38761
Update the check on the latitude and longitude
...
Latitude are not supposed to go beyound 90 degrees or below -90.
The same goes for longitude with 180 or -180.
This was badly implemented in the filters, and was not implemented for the AscDesc rules.
2021-10-07 16:10:43 +02:00
many
085bc6440c
Apply PR comments
2021-10-06 11:12:26 +02:00
many
1bd15d849b
Reduce candidates threshold
2021-10-05 18:52:14 +02:00
many
ea4bd29d14
Apply PR comments
2021-10-05 17:35:07 +02:00
many
3296bb243c
Simplify word level position DB into a word position DB
2021-10-05 12:15:02 +02:00
many
75d341d928
Re-implement set based algorithm for attribute criterion
2021-10-05 12:14:50 +02:00
Tamo
d9eba9d145
improve and test the sort error message
2021-09-30 14:38:27 +02:00
Tamo
0ee67bb7d1
improve the reserved keyword error message for the filters
2021-09-30 14:38:27 +02:00
bors[bot]
22551d0941
Merge #379
...
379: Revert "Change chunk size to 4MiB to fit more the end user usage" r=curquiza a=ManyTheFish
Reverts meilisearch/milli#370
Co-authored-by: Many <legendre.maxime.isn@gmail.com>
2021-09-29 13:20:53 +00:00
Many
26b5dad042
Revert "Change chunk size to 4MiB to fit more the end user usage"
2021-09-29 15:08:39 +02:00
Many
2e49230ca2
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:45 +02:00
Many
7ad0214089
Update milli/src/search/criteria/attribute.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-29 14:49:41 +02:00
many
1df5b8712b
Hotfix meilisearch#1707
2021-09-29 14:41:56 +02:00
Tamo
f65153ad64
stop casting integer docids to string
2021-09-28 18:35:54 +02:00
Vishnu Gt
785c1372f2
Change "settings" to "setting"
...
Co-authored-by: Clément Renault <renault.cle@gmail.com>
2021-09-28 20:11:32 +05:30
Vishnu Ganesan
3580b2d803
Fixes #365
2021-09-28 19:30:23 +05:30
bors[bot]
3a12f5887e
Merge #373
...
373: Improve error message for bad sort syntax with geosearch r=Kerollmops a=irevoire
`@Kerollmops` This should be the last PR for the geosearch and error handling, sorry for doing it in so many steps 😬
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-28 12:39:32 +00:00
Tamo
a80dcfd4a3
improve error message for bad sort syntax with geosearch
2021-09-28 14:32:24 +02:00
bors[bot]
b2a332599e
Merge #372
...
372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish
The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word.
On Chinese Script characters, we were allowing 2 typos on 3 characters words.
We are now counting the number of char instead of counting bytes to assign the typo tolerance.
Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714 )
Co-authored-by: many <maxime@meilisearch.com>
2021-09-28 11:59:45 +00:00
many
8046ae4bd5
Count the number of char instead of counting bytes to assign the typo tolerance
2021-09-28 12:10:43 +02:00
many
1988416295
Add failing test related to Meilisearch#1714
2021-09-28 12:05:11 +02:00
Tamo
c7cb816ae1
simplify the error handling of the sort syntax for meilisearch
2021-09-27 19:07:22 +02:00
many
b188063869
Change chunk size to 4MiB to fit more the end user usage
2021-09-27 14:26:21 +02:00
many
551df0cb77
Add test checking the bug reported in meilisearch issue 1716
2021-09-23 15:55:39 +02:00
Irevoire
218f0a6661
Apply suggestions from code review
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-22 17:00:27 +02:00
Tamo
47ee93b0bd
return an error when _geoPoint is used but _geo is not sortable
2021-09-22 16:37:41 +02:00
Tamo
1e5e3d57e2
auto convert AscDescError into CriterionError
2021-09-22 16:37:41 +02:00
Tamo
023446ecf3
create a smaller and easier to maintain CriterionError type
2021-09-22 16:37:41 +02:00
Tamo
86e272856a
create an asc_desc error type that is never supposed to be returned to the end user
2021-09-22 16:37:41 +02:00
Tamo
257e621d40
create an asc_desc module
2021-09-22 16:37:41 +02:00
Tamo
113a061bee
fix the error handling on the criterion side
2021-09-22 15:09:07 +02:00
Tamo
78b0bce9a1
fix the returned error when asc desc fails to be parsed
2021-09-22 11:37:05 +02:00
mpostma
aa6c5df0bc
Implement documents format
...
document reader transform
remove update format
support document sequences
fix document transform
clean transform
improve error handling
add documents! macro
fix transform bug
fix tests
remove csv dependency
Add comments on the transform process
replace search cli
fmt
review edits
fix http ui
fix clippy warnings
Revert "fix clippy warnings"
This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.
fix review comments
remove smallvec in transform loop
review edits
2021-09-21 16:58:33 +02:00
bors[bot]
31c8de1cca
Merge #322
...
322: Geosearch r=ManyTheFish a=irevoire
This PR introduces [basic geo-search functionalities](https://github.com/meilisearch/specifications/pull/59 ), it makes the engine able to index, filter and, sort by geo-point. We decided to use [the rstar library](https://docs.rs/rstar ) and to save the points in [an RTree](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html ) that we de/serialize in the index database [by using serde](https://serde.rs/ ) with [bincode](https://docs.rs/bincode ). This is not an efficient way to query this tree as it will consume a lot of CPU and memory when a search is made, but at least it is an easy first way to do so.
### What we will have to do on the indexing part:
- [x] Index the `_geo` fields from the documents.
- [x] Create a new module with an extractor in the `extract` module that takes the `obkv_documents` and retrieves the latitude and longitude coordinates, outputting them in a `grenad::Reader` for further process.
- [x] Call the extractor in the `extract::extract_documents_data` function and send the result to the `TypedChunk` module.
- [x] Get the `grenad::Reader` in the `typed_chunk::write_typed_chunk_into_index` function and store all the points in the `rtree`
- [x] Delete the documents from the `RTree` when deleting documents from the database. All this can be done in the `delete_documents.rs` file by getting the data structure and removing the points from it, inserting it back after the modification.
- [x] Clearing the `RTree` entirely when we clear the documents from the database, everything happens in the `clear_documents.rs` file.
- [x] save a Roaring bitmap of all documents containing the `_geo` field
### What we will have to do on the query part:
- [x] Filter the documents at a certain distance around a point, this is done by [collecting the documents from the searched point](https://docs.rs/rstar/0.9.1/rstar/struct.RTree.html#method.nearest_neighbor_iter ) while they are in range.
- [x] We must introduce new `geoLowerThan` and `geoGreaterThan` variants to the `Operator` filter enum.
- [x] Implement the `negative` method on both variants where the `geoGreaterThan` variant is implemented by executing the `geoLowerThan` and removing the results found from the whole list of geo faceted documents.
- [x] Add the `_geoRadius` function in the pest parser.
- [x] Introduce a `_geo` ascending ranking function that takes a point in parameter, ~~this function must keep the iterator on the `RTree` and make it peekable~~ This was not possible for now, we had to collect the whole iterator. Only the documents that are part of the candidates must be sent too!
- [x] This ascending ranking rule will only be active if the search is set up with the `_geoPoint` parameter that indicates the center point of the ascending ranking rule.
-----------
- On Meilisearch part: We must introduce a new concept, returning the documents with a new `_geoDistance` field when it passed by the `_geo` ranking rule, this has never been done before. We could maybe just do it afterward when the documents have been retrieved from the database, computing the distance from the `_geoPoint` and all of the documents to be returned.
Co-authored-by: Irevoire <tamo@meilisearch.com>
Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-20 19:04:57 +00:00
Irevoire
0d104a0fce
Update milli/src/criterion.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 18:13:17 +02:00
Tamo
f4b8e5675d
move the reserved keyword logic for the criterion and sort + add test
2021-09-20 17:21:02 +02:00
Irevoire
3b7a2cdbce
fix typo
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-20 16:10:39 +02:00
Tamo
c695a1ffd2
add the possibility to sort by descending order on geoPoint
2021-09-15 11:49:58 +02:00
Tamo
91ce4d1721
Stop iterating through the whole list of points
...
We stop when there is no possible candidates left
2021-09-15 11:49:58 +02:00
Tamo
cfc62a1c15
use geoutils instead of haversine
2021-09-09 18:11:38 +02:00
many
26deeb45a3
Add lacking parameter to word level position builder
2021-09-09 17:49:04 +02:00
Tamo
3fc145c254
if we have no rtree we return all other provided documents
2021-09-09 17:44:09 +02:00
Irevoire
a84f3a8b31
Apply suggestions from code review
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2021-09-09 15:09:35 +02:00
Tamo
c81ff22c5b
delete the invalid criterion name error in favor of invalid ranking rule name
2021-09-08 19:17:00 +02:00
Tamo
bad8ea47d5
edit the two lasts TODO comments
2021-09-08 18:24:09 +02:00
Tamo
b15c77ebc4
return an error in case a user try to sort with :desc
2021-09-08 18:24:09 +02:00
Tamo
e5ef0cad9a
use meters in the filters
2021-09-08 18:24:09 +02:00
Tamo
4f69b190bc
remove the distance from the search, the computation of the distance will be made on meilisearch side
2021-09-08 18:24:09 +02:00
Tamo
7ae2a7341c
introduce the reserved keywords in the filters
2021-09-08 18:24:09 +02:00
Tamo
6d5762a6c8
handle the case where you forgot entirely the parenthesis
2021-09-08 18:24:09 +02:00
Tamo
ebf82ac28c
improve the error messages and add tests for the filters
2021-09-08 18:24:09 +02:00
Tamo
bd4c248292
improve the error handling in general and introduce the concept of reserved keywords
2021-09-08 18:24:09 +02:00
Tamo
e8c093c1d0
fix the error handling in the filters
2021-09-08 18:24:09 +02:00
Tamo
f0b74637dc
fix all the tests
2021-09-08 18:24:09 +02:00
Tamo
b1bf7d4f40
reformat
2021-09-08 18:24:09 +02:00
Tamo
aca707413c
remove the memory leak
2021-09-08 18:24:09 +02:00
Tamo
a8a1f5bd55
move the geosearch criteria out of asc_desc.rs
2021-09-08 18:24:09 +02:00
Tamo
dc84ecc40b
fix a bug
2021-09-08 18:24:09 +02:00
Tamo
4820ac71a6
allow spaces in a geoRadius
2021-09-08 18:24:09 +02:00
Tamo
13c78e5aa2
Implement the _geoPoint in the sortable
2021-09-08 18:24:09 +02:00
Tamo
5bb175fc90
only index _geo if it's set as sortable OR filterable
...
and only allow the filters if geo was set to filterable
2021-09-08 17:51:08 +02:00
Tamo
f73273d71c
only call the extractor if needed
2021-09-08 17:51:08 +02:00
Irevoire
ea2f2ecf96
create a new database containing all the documents that were geo-faceted
2021-09-08 17:51:08 +02:00
Irevoire
4b459768a0
create the _geoRadius filter
2021-09-08 17:51:07 +02:00
Irevoire
6d70978edc
update the facet filter grammar
2021-09-08 17:51:07 +02:00
Irevoire
216a8aa3b2
add a tests for the indexation of the geosearch
2021-09-08 17:51:07 +02:00
Irevoire
a21c854790
handle errors
2021-09-08 17:51:07 +02:00
Irevoire
70ab2c37c5
remove multiple bugs
2021-09-08 17:51:07 +02:00
Irevoire
b4b6ba6d82
rename all the ’long’ into ’lng’ like written in the specification
2021-09-08 17:51:07 +02:00
Irevoire
3b9f1db061
implement the clear of the rtree
2021-09-08 17:51:07 +02:00
Irevoire
d344489c12
implement the deletion of geo points
2021-09-08 17:51:07 +02:00
Irevoire
44d6b6ae9e
Index the geo points
2021-09-08 17:51:07 +02:00
Irevoire
8d9c2c4425
create a new db with getters and setters
2021-09-08 17:51:07 +02:00
bors[bot]
b22aac92ac
Merge #342
...
342: Let the caller decide what kind of error they want to returns when parsing `AscDesc` r=Kerollmops a=irevoire
This is one possible fix for #339
We would then need to patch these lines https://github.com/meilisearch/MeiliSearch/blob/main/meilisearch-http/src/index/search.rs#L110-L114 to return the error we want.
Another solution would be to add a parameter to the `from_str` to specify which context we are in.
Co-authored-by: Tamo <tamo@meilisearch.com>
2021-09-08 14:18:57 +00:00
Tamo
932998f5cc
let the caller decide if they want to return an invalidSortName or an
...
invalidCriterionName error
2021-09-08 16:17:31 +02:00
many
e54280fbfc
Skip empty normalized words
2021-09-08 15:25:23 +02:00
many
d18ee58ab9
Check if key are not empty in validator
2021-09-08 15:25:23 +02:00
many
9961b78b06
Drop sorter before creating a new one
2021-09-08 13:30:26 +02:00
bors[bot]
48d211b8b0
Merge #344
...
344: Move the sort ranking rule before the exactness ranking rule r=ManyTheFish a=Kerollmops
This PR moves the sort ranking rule at the 5th position by default, right before the exactness one.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-09-07 15:47:15 +00:00
Kerollmops
e2cefc9b4f
Move the sort ranking rule before the exactness ranking rule
2021-09-07 16:41:33 +02:00
Kerollmops
fd3daa4423
Throw a query time error when a sort param is used but sort ranking rule is missing
2021-09-07 11:02:00 +02:00
Kerollmops
8dca36433c
Introduce the new SortRankingRuleMissing user error variant
2021-09-07 11:01:59 +02:00
Alexey Shekhirin
c2517e7d5f
fix(facet): string fields sorting
2021-09-03 11:58:26 +03:00
bors[bot]
5cbe879325
Merge #308
...
308: Implement a better parallel indexer r=Kerollmops a=ManyTheFish
Rewrite the indexer:
- enhance memory consumption control
- optimize parallelism using rayon and crossbeam channel
- factorize the different parts and make new DB implementation easier
- optimize and fix prefix databases
Co-authored-by: many <maxime@meilisearch.com>
2021-09-02 15:03:52 +00:00
many
741a4444a9
Remove log in chunk generator
2021-09-02 16:57:46 +02:00
many
7f7fafb857
Make document_chunk_size settable from update builder
2021-09-02 15:25:39 +02:00
many
db0c681bae
Fix Pr comments
2021-09-02 15:17:52 +02:00
many
4860fd4529
Ignore empty facet values
2021-09-01 16:48:40 +02:00
many
b3a22f31f6
Fix memory consuption in word pair proximity extractor
2021-09-01 16:48:40 +02:00
many
9452fabfb2
Optimize cbo roaring bitmaps merge
2021-09-01 16:48:40 +02:00
many
8f702828ca
Ignore errors comming from crossbeam channel senders
2021-09-01 16:48:40 +02:00
many
e09eec37bc
Handle distance addition with hard separators
2021-09-01 16:48:40 +02:00
many
fc7cc770d4
Add logging timers
2021-09-01 16:48:40 +02:00
many
a2f59a28f7
Remove unwrap sending errors in channel
2021-09-01 16:48:40 +02:00
many
5c962c03dd
Fix and optimize word_prefix_pair_proximity_docids database
2021-09-01 16:48:40 +02:00
many
2d1727697d
Take stop word in account
2021-09-01 16:48:40 +02:00
many
823da19745
Fix test and use progress callback
2021-09-01 16:48:39 +02:00
many
1d314328f0
Plug new indexer
2021-09-01 16:48:36 +02:00
many
3aaf1d62f3
Publish grenad CompressionType type in milli
2021-09-01 16:42:08 +02:00
Alexey Shekhirin
0e379558a1
fix(search): get sortable_fields only if criteria present
2021-08-31 21:35:41 +03:00
bors[bot]
d6bba0663a
Merge #334
...
334: Wrap long values into BStr for warn logs r=Kerollmops a=shekhirin
Resolves https://github.com/meilisearch/milli/issues/263
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
2021-08-31 17:38:54 +00:00
Alexey Shekhirin
0b02eb456c
chore(update): wrap long values into BStr for warn logs
2021-08-31 20:28:16 +03:00
Kerollmops
f230ae6fd5
Introduce the reset_sortable_fields Settings method
2021-08-25 17:44:16 +02:00
Kerollmops
af65485ba7
Reexport the grenad CompressionType from milli
2021-08-24 18:15:31 +02:00
Clément Renault
89d0758713
Revert "Revert "Sort at query time""
2021-08-24 11:55:16 +02:00
Clément Renault
c084f7f731
Fix the facet string docids filterable deletion bug
2021-08-23 10:50:39 +02:00
Clémentine Urquizar
922f9fd4d5
Revert "Sort at query time"
2021-08-20 18:09:17 +02:00
many
d1df0d20f9
Add integration test of SortBy criterion
2021-08-18 16:21:51 +02:00
Kerollmops
1b7f6ea1e7
Return a new error when the sort criteria is not sortable
2021-08-18 15:04:07 +02:00
Kerollmops
71602e0f1b
Add the sortable fields into the settings and in the index
2021-08-18 15:04:07 +02:00
Kerollmops
407f53872a
Add a sort_criteria method to the Search builder struct
2021-08-18 15:04:07 +02:00
Kerollmops
687cd2e205
Introduce the new Sort criterion and AscDesc enum
2021-08-18 15:04:07 +02:00
Kerollmops
5b88df508e
Use the new Asc/Desc syntax everywhere
2021-08-17 14:15:22 +02:00
Kerollmops
fcedff95e8
Change the Asc/Desc criterion syntax to use a colon (:)
2021-08-17 14:03:21 +02:00
Kerollmops
e9ada44509
AscDesc criterion returns documents ordered by numbers then by strings
2021-08-17 13:21:31 +02:00
Kerollmops
110bf6b778
Make the FacetStringIter work in both, ascending and descending orders
2021-08-17 11:18:40 +02:00
Kerollmops
22ebd2658f
Introduce the EitherString/RevRange private aliases
2021-08-17 10:47:15 +02:00
Kerollmops
7a5889bc5a
Introduce the highest_reverse_iter private method
2021-08-17 10:45:26 +02:00
Kerollmops
ad0d311f8a
Introduce the FacetStringLevelZeroRevRange struct
2021-08-17 10:44:43 +02:00
Kerollmops
6214c38da9
Introduce the FacetStringGroupRevRange struct
2021-08-17 10:44:27 +02:00
Kerollmops
1c604de158
Introduce the highest_iter private method on the FacetStringIter struct
2021-08-17 10:41:11 +02:00
Kerollmops
64df159057
Introduce the new_reducing constructor on the FacetStringIter struct
2021-08-17 10:35:06 +02:00
Kerollmops
01a4052828
Move the FacetStringIter creation logic into a private new method
2021-08-17 10:29:43 +02:00
many
7dbefae1e3
Make facet string iterator non reducing
2021-08-12 17:23:39 +02:00
many
8fdf860c17
Remove max values by facet limit for facet distribution
2021-08-12 11:29:20 +02:00
bors[bot]
89b9b61840
Merge #300
...
300: Fix prefix level position docids database r=curquiza a=ManyTheFish
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.
Fix #299
Co-authored-by: many <maxime@meilisearch.com>
2021-08-04 16:52:09 +00:00
many
cdeb07f0fd
Fix prefix level position docids database
...
The prefix search was inverted when we generated the DB.
Instead of searching if word had a prefix in prefix fst,
we were searching if the word was a prefix of a prefix contained in the prefix fst.
The indexer, now, iterate over prefix contained in the fst
and search them by prefix in the word-level-position-docids database,
aggregating matches in a sorter.
Fix #299
2021-08-04 14:11:49 +02:00
Kerollmops
90514e03d1
Fix invalid faceted documents ids buffer size
2021-07-29 15:49:23 +02:00
bors[bot]
200e98c211
Merge #293
...
293: Make sure that the relevancy is not impacted by other settings r=Kerollmops a=Kerollmops
Fix https://github.com/meilisearch/meilisearch/issues/1505 .
fix https://github.com/meilisearch/MeiliSearch/issues/1529
Co-authored-by: Kerollmops <clement@meilisearch.com>
2021-07-27 16:04:52 +00:00