Louis Dureuil
c8af572697
Add tests for exact words and exact attributes
2023-04-26 16:13:01 +02:00
Loïc Lecrenier
b448aca49c
Add more tests for exactness rr
2023-04-26 11:04:18 +02:00
Loïc Lecrenier
55bad07c16
Fix bug in exact_attribute rr implementation
2023-04-26 10:40:05 +02:00
Loïc Lecrenier
3421125a55
Prevent the exactness
ranking rule from removing random words
...
Make it strictly follow the term matching strategy
2023-04-26 09:09:19 +02:00
Clément Renault
14293f6c8f
Make rustfmt happy
2023-04-25 16:55:39 +02:00
Loïc Lecrenier
d3a94e8b25
Fix bugs and add tests to exactness ranking rule
2023-04-25 16:49:08 +02:00
Clément Renault
cfd1b2cc97
Fix the clippy warnings
2023-04-25 16:40:32 +02:00
Loïc Lecrenier
8f2e971879
Add tests for "exactness" rr, make correct universe computation
2023-04-24 16:57:34 +02:00
Loïc Lecrenier
d1fdbb63da
Make all search tests pass, fix distinctAttribute bug
2023-04-24 12:12:08 +02:00
Loïc Lecrenier
84d9c731f8
Fix bug in encoding of word_position_docids and word_fid_docids
2023-04-24 09:59:30 +02:00
Loïc Lecrenier
bd9aba4d77
Add "position" part of the attribute ranking rule
2023-04-13 10:46:09 +02:00
Loïc Lecrenier
8edad8291b
Add logger to attribute rr, fix a bug
2023-04-13 10:25:00 +02:00
Kerollmops
d9cebff61c
Add a simple test to check that attributes are ranking correctly
2023-04-13 08:27:09 +02:00
Loïc Lecrenier
30f7bd03f6
Fix compiler warning/errors caused by previous merge
2023-04-13 08:27:09 +02:00
Kerollmops
df0d9bb878
Introduce the attribute ranking rule in the list of ranking rules
2023-04-13 08:27:09 +02:00
Kerollmops
5230ddb3ea
Resolve the attribute ranking rule conditions
2023-04-13 08:27:09 +02:00
Kerollmops
d6a7c28e4d
Implement the attribute ranking rule edge computation
2023-04-13 08:27:09 +02:00
Kerollmops
e55efc419e
Introduce a new cache for the words fids
2023-04-13 08:27:09 +02:00
Loïc Lecrenier
644e136aee
Merge branch 'search-refactor-typo-attributes' into search-refactor
2023-04-13 08:26:56 +02:00
Louis Dureuil
38b7b31beb
Decide to use prefix DB if the word is not an ngram
2023-04-12 16:45:38 +02:00
Louis Dureuil
7a01f20df7
Use word_prefix_docids, make get_word_prefix_docids private
2023-04-12 16:45:38 +02:00
Louis Dureuil
c20c38a7fa
Add SearchContext::word_prefix_docids() method
2023-04-12 16:44:43 +02:00
Louis Dureuil
5ab46324c4
Everyone uses the SearchContext::word_docids instead of get_db_word_docids
...
make get_db_word_docids private
2023-04-12 16:44:43 +02:00
Louis Dureuil
325f17488a
Add SearchContext::word_docids() method
2023-04-12 16:37:05 +02:00
Louis Dureuil
e7ff987c46
Update call sites
2023-04-12 16:36:38 +02:00
Louis Dureuil
244003e36f
Refactor DB cache to return Roaring Bitmaps directly instead of byte slices
2023-04-12 16:35:48 +02:00
Loïc Lecrenier
1f813a6f3b
Simplify implementation of the detailed (=visual) logger
2023-04-12 16:32:53 +02:00
Loïc Lecrenier
96183e804a
Simplify the logger
2023-04-12 16:32:53 +02:00
Loïc Lecrenier
7ab48ed8c7
Matching words fixes
2023-04-12 16:21:43 +02:00
Loïc Lecrenier
e7bb8c940f
Merge branch 'search-refactor-highlighter' into search-refactor-highlighter-merged
2023-04-11 12:22:34 +02:00
Loïc Lecrenier
d0e9d65025
Fix distinct attribute bugs
2023-04-07 11:09:01 +02:00
Loïc Lecrenier
a81165f0d8
Merge remote-tracking branch 'origin/main' into search-refactor
2023-04-07 10:15:55 +02:00
Loïc Lecrenier
d6585eb10b
Avoid splitting ngrams into their original component words
2023-04-07 10:13:49 +02:00
Loïc Lecrenier
f7d90ad19f
Merge remote-tracking branch 'origin/search-refactor-tests-doc' into search-refactor
2023-04-07 10:13:18 +02:00
Louis Dureuil
31630c85d0
exactness graph rr: Add important TODO/FIXME after review
2023-04-06 17:50:39 +02:00
Louis Dureuil
ab09dc0167
exact_attributes: Add TODOs and additional check after review
2023-04-06 17:50:39 +02:00
Louis Dureuil
618c54915d
exact_attribute: dedup nodes after sorting them
2023-04-06 17:50:39 +02:00
Louis Dureuil
90a6c01495
Use correct codec in proximity
2023-04-06 17:50:39 +02:00
Louis Dureuil
e58426109a
Fix panics and issues in exactness graph ranking rule
2023-04-06 17:50:39 +02:00
Louis Dureuil
f513cf930a
Exact attribute with state
2023-04-06 17:50:39 +02:00
Louis Dureuil
8a13ed7e3f
Add exactness ranking rules
2023-04-06 17:50:39 +02:00
Louis Dureuil
1b8e4d0301
Add ExactTerm and helper method
2023-04-06 17:50:39 +02:00
Louis Dureuil
996619b22a
Increase position by 8 on hard separator when building query terms
2023-04-06 17:50:39 +02:00
Louis Dureuil
2c9822a337
Rename is_multiple_words
to is_ngram
and zero_typo
to exact
2023-04-06 17:50:39 +02:00
Louis Dureuil
7276deee0a
Add new db caches
2023-04-06 17:50:39 +02:00
ManyTheFish
f7e7f438f8
Patch prefix match
2023-04-06 17:22:31 +02:00
ManyTheFish
ba8dcc2d78
Fix clippy
2023-04-06 15:50:47 +02:00
Loïc Lecrenier
7ca91ebb71
Merge branch 'search-refactor-exactness' into search-refactor-tests-doc
2023-04-06 15:16:35 +02:00
ManyTheFish
47f6a3ad3d
Take into account that a logger need the search context
2023-04-06 15:02:23 +02:00
bors[bot]
b4c01581cd
Merge #3641
...
3641: Bring back changes from `release v1.1.0` into `main` after v1.1.0 release r=curquiza a=curquiza
Replace https://github.com/meilisearch/meilisearch/pull/3637 since we don't want to pull commits from `main` into `release-v1.1.0` when fixing git conflicts
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
Co-authored-by: curquiza <clementine@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-04-06 12:37:54 +00:00
ManyTheFish
ae17c62e24
Remove warnings
2023-04-06 14:07:18 +02:00
ManyTheFish
a1148c09c2
remove old matcher
2023-04-06 14:00:21 +02:00
ManyTheFish
9c5f64769a
Integrate the new Highlighter in the search
2023-04-06 13:58:56 +02:00
ManyTheFish
ebe23b04c9
Make the matcher consume the search context
2023-04-06 12:28:28 +02:00
ManyTheFish
13b7c826c1
add new highlighter
2023-04-06 12:15:37 +02:00
Louis Dureuil
d1ddaa223d
Use correct codec in proximity
2023-04-05 18:14:00 +02:00
Louis Dureuil
f7ecea142e
Fix panics and issues in exactness graph ranking rule
2023-04-05 18:13:46 +02:00
Louis Dureuil
337e75b0e4
Exact attribute with state
2023-04-05 18:12:46 +02:00
Loïc Lecrenier
b5691802a3
Add new tests and fix construction of query graph from paths
2023-04-05 16:31:10 +02:00
Loïc Lecrenier
6e50f23896
Add more search tests
2023-04-05 13:33:23 +02:00
Loïc Lecrenier
4c8a0179ba
Add more search tests
2023-04-05 11:30:49 +02:00
Loïc Lecrenier
c69cbec64a
Add more search tests
2023-04-05 11:20:04 +02:00
Loïc Lecrenier
ce328c329d
Move bucket sort function to its own module and fix a bug
2023-04-04 18:03:08 +02:00
Loïc Lecrenier
959e4607bb
Add more search tests
2023-04-04 18:02:46 +02:00
Louis Dureuil
4b4ffb8ec9
Add exactness ranking rules
2023-04-04 17:12:07 +02:00
Louis Dureuil
3951fe22ab
Add ExactTerm and helper method
2023-04-04 17:09:32 +02:00
Louis Dureuil
4d5bc9df4c
Increase position by 8 on hard separator when building query terms
2023-04-04 17:07:26 +02:00
Louis Dureuil
ec2f8e8040
Rename is_multiple_words
to is_ngram
and zero_typo
to exact
2023-04-04 17:06:07 +02:00
Louis Dureuil
406b8bd248
Add new db caches
2023-04-04 17:04:46 +02:00
Loïc Lecrenier
62b9c6fbee
Add search tests
2023-04-04 16:18:22 +02:00
Loïc Lecrenier
b439d36807
Split query_term module into multiple submodules
2023-04-04 15:38:30 +02:00
Loïc Lecrenier
faceb661e3
Add note that a part of the code needs fixing
2023-04-04 15:02:01 +02:00
Loïc Lecrenier
4129d657e2
Simplify query_term module a bit
2023-04-04 15:01:42 +02:00
Filip Bachul
1e6fe71a67
fix clippy warning
2023-04-03 20:18:26 +02:00
Filip Bachul
fddfb37f1f
remove unnecessary FilterError:ReservedGeo and FilterError:ReservedGeo
2023-04-03 20:18:26 +02:00
Loïc Lecrenier
3f13608002
Fix computation of ngram derivations
2023-04-03 15:27:49 +02:00
Loïc Lecrenier
4708d9b016
Fix compiler warnings/errors
2023-04-03 10:09:27 +02:00
Clément Renault
0d2e7bcc13
Implement the previous way for the exhaustive distinct candidates
2023-04-03 10:08:10 +02:00
Loïc Lecrenier
55fbfb6124
Merge branch 'search-refactor-located-query-terms' into search-refactor
2023-04-03 10:04:36 +02:00
Loïc Lecrenier
58fe260c72
Allow removing all the terms from a query if it contains a phrase
2023-04-03 09:18:02 +02:00
Loïc Lecrenier
24e5f6f7a9
Don't remove phrases with "last" term matching strategy
2023-04-03 09:17:33 +02:00
Louis Dureuil
9b87c36200
Limit the number of derivations for a single word.
2023-03-31 09:19:18 +02:00
Loïc Lecrenier
12b26cd54e
Don't remove phrases from the query with term matching strategy Last
2023-03-30 14:54:08 +02:00
Loïc Lecrenier
061b1e6d7c
Tiny refactor of query graph remove_nodes method
2023-03-30 14:49:25 +02:00
Loïc Lecrenier
0d6e8b5c31
Fix phrase search bug when the phrase has only one word
2023-03-30 14:48:12 +02:00
Loïc Lecrenier
d48cdc67a0
Fix term matching strategy bugs
2023-03-30 14:01:52 +02:00
Loïc Lecrenier
35c16ad047
Use new term matching strategy logic in words ranking rule
2023-03-30 13:15:43 +02:00
Loïc Lecrenier
2997d1f186
Use new term matching strategy logic in resolve_maximally_reduced_...
2023-03-30 13:12:51 +02:00
Loïc Lecrenier
2a5997fb20
Avoid expensive assert! in bucket sort function
2023-03-30 13:07:17 +02:00
Loïc Lecrenier
ee8a9e0bad
Remove outdated sentence in documentation
2023-03-30 12:22:24 +02:00
Loïc Lecrenier
3b0737a092
Fix detailed logger
2023-03-30 12:20:44 +02:00
Loïc Lecrenier
fdd02105ac
Graph-based ranking rule + term matching strategy support
2023-03-30 12:19:21 +02:00
Loïc Lecrenier
aa9592455c
Refactor the paths_of_cost algorithm
...
Support conditions that require certain nodes to be skipped
2023-03-30 12:11:11 +02:00
Loïc Lecrenier
01e24dd630
Rewrite proximity ranking rule
2023-03-30 11:59:06 +02:00
Loïc Lecrenier
ae6bb1ce17
Update the ConditionDocidsCache after change to RankingRuleGraphTrait
2023-03-30 11:41:20 +02:00
Loïc Lecrenier
5fd28620cd
Build ranking rule graph correctly after changes to trait definition
2023-03-30 11:32:55 +02:00
Loïc Lecrenier
728710d63a
Update typo ranking rule to use new query term structure
2023-03-30 11:32:19 +02:00
Loïc Lecrenier
fa81381865
Update the trait requirements of ranking-rule graphs
2023-03-30 11:19:45 +02:00
Loïc Lecrenier
b96a682f16
Update resolve_graph module to work with lazy query terms
2023-03-30 11:10:38 +02:00
Loïc Lecrenier
d0f048c068
Simplify the API of the DatabaseCache
2023-03-30 11:08:17 +02:00
Loïc Lecrenier
223e82a10d
Update QueryGraph to use new lazy query terms + build from paths
2023-03-30 11:06:02 +02:00
Loïc Lecrenier
9507ff5e31
Update query term structure to allow for laziness
2023-03-30 11:06:02 +02:00
Louis Dureuil
c2b025946a
located_query_terms_from_string
: use u16 for positions, hard limit number of iterated tokens.
...
- Refactor phrase logic to reduce number of possible states
2023-03-30 11:04:14 +02:00
Loïc Lecrenier
3a818c5e87
Add more functionality to interners
2023-03-30 09:56:23 +02:00
Louis Dureuil
d74134ce3a
Check sort criteria
2023-03-29 15:21:54 +02:00
Louis Dureuil
5ac129bfa1
Mark geosearch as currently unimplemented for sort rule
2023-03-29 15:20:42 +02:00
ManyTheFish
efea1e5837
Fix facet normalization
2023-03-29 12:02:24 +02:00
Louis Dureuil
abb4522f76
Small comment on ignored rules for placeholder search
2023-03-29 09:11:06 +02:00
Louis Dureuil
ef084ef042
SmallBitmap: Consistently panic on incoherent universe lengths
2023-03-29 08:45:38 +02:00
Louis Dureuil
3524bd1257
SmallBitmap: Add documentation
2023-03-29 08:44:11 +02:00
Tamo
a50b058557
update the geoBoundingBox feature
...
Now instead of using the (top_left, bottom_right) corners of the bounding box it s using the (top_right, bottom_left) corners.
2023-03-28 18:26:18 +02:00
Louis Dureuil
d4f6216966
Resolve rule time sort criteria
2023-03-28 16:42:02 +02:00
Louis Dureuil
77acafe534
Resolve search time sort criteria for placeholder search
2023-03-28 16:41:03 +02:00
Louis Dureuil
abb19d368d
Initialize query time ranking rule for query search
2023-03-28 12:40:52 +02:00
Louis Dureuil
b4a52a622e
BoxRankingRule
2023-03-28 12:39:42 +02:00
Louis Dureuil
e9eb271499
Remove empty attribute_rule mod
2023-03-27 11:08:03 +02:00
Louis Dureuil
3281a88d08
SmallBitmap: don't expose internal items
2023-03-27 11:04:43 +02:00
Louis Dureuil
5a644054ab
Removed unused search impl
2023-03-27 11:04:27 +02:00
Louis Dureuil
16fefd364e
Add TODO notes
2023-03-27 11:04:04 +02:00
Loïc Lecrenier
00bad8c716
Add comments suggesting performance improvements
2023-03-23 10:18:24 +01:00
Loïc Lecrenier
862714a18b
Remove criterion_implementation_strategy param of Search
2023-03-23 09:44:12 +01:00
Loïc Lecrenier
d18ebe4f3a
Remove more warnings
2023-03-23 09:41:18 +01:00
Loïc Lecrenier
7169d85115
Remove old query_tree code and make clippy happy
2023-03-23 09:39:16 +01:00
Loïc Lecrenier
f5f5f03ec0
Remove old criteria code
2023-03-23 09:35:53 +01:00
Loïc Lecrenier
9b2653427d
Split position DB into fid and relative position DB
2023-03-23 09:22:01 +01:00
Loïc Lecrenier
56b7209f26
Make clippy happy
2023-03-23 09:16:17 +01:00
Loïc Lecrenier
9b1f439a91
WIP
2023-03-23 09:12:35 +01:00
Loïc Lecrenier
a86aeba411
WIP
2023-03-22 14:43:08 +01:00
Loïc Lecrenier
384fdc2df4
Fix two bugs in proximity ranking rule
2023-03-21 11:43:25 +01:00
Loïc Lecrenier
83e5b4ed0d
Compute edges of proximity graph lazily
2023-03-21 10:44:40 +01:00
Loïc Lecrenier
272cd7ebbd
Small cleanup
2023-03-20 13:39:19 +01:00
Loïc Lecrenier
c63c7377e6
Switch order of MappedInterner generic params
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
5b50e49522
cargo fmt
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
65474c8de5
Update new sort ranking rule after rebasing
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
fbb1ba3de0
Cargo fmt
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
a59ca28e2c
Add forgotten file
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
825f742000
Simplify graph-based ranking rule impl
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
dd491320e5
Simplify graph-based ranking rule impl
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c6ff97a220
Rewrite the dead-ends cache to detect more dead-ends
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
49240c367a
Fix bug in cost of typo conditions
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
1e6e624078
Fix bug in SmallBitmap
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
8b4e07e1a3
WIP
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
2853009987
Renaming Edge -> Condition
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
aa59c3bc2c
Replace EdgeCondition with an Option<..> + other code cleanup
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
7b1d8f4c6d
Make PathSet strongly typed
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
a49ddec9df
Prune the query graph after executing a ranking rule
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
05fe856e6e
Merge forward and backward proximity conditions in proximity graph
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c0cdaf9f53
Fix bug in the proximity ranking rule for queries with ngrams
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
e9cf58d584
Refactor of the Interner
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
31628c5cd4
Merge Phrase and WordDerivations into one structure
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
3004e281d7
Support ngram typos + splitwords and splitwords+synonyms in proximity
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
14e8d0aaa2
Rename lifetime
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
1c58cf8426
Intern ranking rule graph edge conditions as well
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
5155fd2bf1
Reorganise initialisation of ranking rules + rename PathsMap -> PathSet
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
9ec9c204d3
Small code cleanup
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
78b9304d52
Implement distinct attribute
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
0465ba4a05
Intern more values
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
2099991dd1
Continue documenting and cleaning up the code
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c232cdabf5
Add documentation
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
4e266211bf
Small code reorganisation
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
57fa689131
Cargo fmt
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
10626dddfc
Add a few more optimisations to new search algorithms
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
9051065c22
Apply a few optimisations for graph-based ranking rules
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
e8c76cf7bf
Intern all strings and phrases in the search logic
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
3f1729a17f
Update new search test
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
cab2b6bcda
Fix: computation of initial universe, code organisation
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c4979a2fda
Fix code visibility issue + unimplemented detail in proximity rule
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
23931f8a4f
Fix small bug in visual logger of search algo
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
aa414565bb
Fix proximity graph edge builder to include all proximities
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
1db152046e
WIP on split words and synonyms support
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c27ea2677f
Rewrite cheapest path algorithm and empty path cache
...
It is now much simpler and has much better performance.
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
caa1e1b923
Add typo ranking rule to new search impl
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
71f18e4379
Add sort ranking rule to new search impl
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
600e3dd1c5
Remove warnings
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
362eb0de86
Add support for filters
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
998d46ac10
Add support for search offset and limit
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
6c85c0d95e
Fix more bugs + visual empty path cache logging
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
0e1fbbf7c6
Fix bugs in query graph's "remove word" and "cheapest paths" algos
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
6806640ef0
Fix d2 description of paths map
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
173e37584c
Improve the visual/detailed search logger
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
6ba4d5e987
Add a search logger
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
dd12d44134
Support swapped word pairs in new proximity ranking rule impl
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
c8e251bf24
Remove noise in codebase
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
a938fbde4a
Use a cache when resolving the query graph
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
dcf3f1d18a
Remove EdgeIndex and NodeIndex types, prefer u32 instead
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
66d0c63694
Add some documentation and use bitmaps instead of hashmaps when possible
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
132191360b
Introduce the sort ranking rule working with the new search structures
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
345c99d5bd
Introduce the words ranking rule working with the new search structures
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
89d696c1e3
Introduce the proximity ranking rule as a graph-based ranking rule
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
c645853529
Introduce a generic graph-based ranking rule
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
a70ab8b072
Introduce a function to find the K shortest paths in a graph
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
48aae76b15
Introduce a function to find the docids of a set of paths in a graph
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
23bf572dea
Introduce cache structures used with ranking rule graphs
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
864f6410ed
Introduce a structure to represent a set of graph paths efficiently
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
c9bf6bb2fa
Introduce a structure to implement ranking rules with graph algorithms
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
46249ea901
Implement a function to find a QueryGraph's docids
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
ce0d1e0e13
Introduce a common way to manage the coordination between ranking rules
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
5065d8b0c1
Introduce a DatabaseCache to memorize the addresses of LMDB values
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
a83007c013
Introduce structure to represent search queries as graphs
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
79e0a6dd4e
Introduce a new search module, eventually meant to replace the old one
...
The code here does not compile, because I am merely splitting one giant
commit into smaller ones where each commit explains a single file.
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
2d88089129
Remove unused term matching strategies
2023-03-20 09:41:55 +01:00
Clément Renault
ea016d97af
Implementing an IS EMPTY filter
2023-03-15 14:12:34 +01:00
Clément Renault
175e8a8495
Fix a diacritic issue
2023-03-09 14:57:47 +01:00
Clément Renault
7c0cd7172d
Introduce the NULL and NOT value NULL operator
2023-03-08 17:14:34 +01:00
bors[bot]
4f1ccbc495
Merge #3525
...
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish
# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.
fixes https://github.com/meilisearch/meilisearch/issues/3521
related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-03-02 10:55:37 +00:00
ManyTheFish
37489fd495
Return an internal error in the case of matching word is invalid
2023-03-01 19:05:16 +01:00
bors[bot]
ac5a1e4c4b
Merge #3423
...
3423: Add min and max facet stats r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes #3426
## What does this PR do?
### User standpoint
- When using a `facets` parameter in search, the facets that have numeric values are displayed in a new section of the response called `facetStats` that contains, per facet, the numeric min and max value of the hits returned by the search.
<details>
<summary>
Sample request/response
</summary>
```json
❯ curl \
-X POST 'http://localhost:7700/indexes/meteorites/search?facets=mass ' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "LL6", "facets":["mass", "recclass"], "limit": 5 }' | jsonxf
{
"hits": [
{
"name": "Niger (LL6)",
"id": "16975",
"nametype": "Valid",
"recclass": "LL6",
"mass": 3.3,
"fall": "Fell"
},
{
"name": "Appley Bridge",
"id": "2318",
"nametype": "Valid",
"recclass": "LL6",
"mass": 15000,
"fall": "Fell",
"_geo": {
"lat": 53.58333,
"lng": -2.71667
}
},
{
"name": "Athens",
"id": "4885",
"nametype": "Valid",
"recclass": "LL6",
"mass": 265,
"fall": "Fell",
"_geo": {
"lat": 34.75,
"lng": -87.0
}
},
{
"name": "Bandong",
"id": "4935",
"nametype": "Valid",
"recclass": "LL6",
"mass": 11500,
"fall": "Fell",
"_geo": {
"lat": -6.91667,
"lng": 107.6
}
},
{
"name": "Benguerir",
"id": "30443",
"nametype": "Valid",
"recclass": "LL6",
"mass": 25000,
"fall": "Fell",
"_geo": {
"lat": 32.25,
"lng": -8.15
}
}
],
"query": "LL6",
"processingTimeMs": 15,
"limit": 5,
"offset": 0,
"estimatedTotalHits": 42,
"facetDistribution": {
"mass": {
"110000": 1,
"11500": 1,
"1161": 1,
"12000": 1,
"1215.5": 1,
"127000": 1,
"15000": 1,
"1676": 1,
"1700": 1,
"1710.5": 1,
"18000": 1,
"19000": 1,
"220000": 1,
"2220": 1,
"22300": 1,
"25000": 2,
"265": 1,
"271000": 1,
"2840": 1,
"3.3": 1,
"3000": 1,
"303": 1,
"32000": 1,
"34000": 1,
"36.1": 1,
"45000": 1,
"460": 1,
"478": 1,
"483": 1,
"5500": 2,
"600": 1,
"6000": 1,
"67.8": 1,
"678": 1,
"680.5": 1,
"6930": 1,
"8": 1,
"8300": 1,
"840": 1,
"8400": 1
},
"recclass": {
"L/LL6": 3,
"LL6": 39
}
},
"facetStats": {
"mass": {
"min": 3.3,
"max": 271000.0
}
}
}
```
</details>
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 13:06:43 +00:00
ManyTheFish
900bae3d9d
keep phrases that has at least one word
2023-02-21 18:16:51 +01:00
ManyTheFish
8aa808d51b
Merge branch 'main' into enhance-language-detection
2023-02-20 18:14:34 +01:00
Many the fish
119e6d8811
Update milli/src/search/mod.rs
...
Co-authored-by: Tamo <tamo@meilisearch.com>
2023-02-20 15:33:10 +01:00
Louis Dureuil
eb28d4c525
add facet test
2023-02-20 13:52:28 +01:00
Louis Dureuil
9ac981d025
Remove some clippy type complexity warns by deboxing iters
2023-02-20 13:52:27 +01:00
Louis Dureuil
74859ecd61
Add min and max facet stats
2023-02-20 13:52:27 +01:00
Louis Dureuil
8ae441a4db
Update usage of iterators
2023-02-20 13:52:27 +01:00
Louis Dureuil
042d86cbb3
facet sort ascending/descending now also return the values
2023-02-20 13:52:27 +01:00
bors[bot]
143e3cf948
Merge #3490
...
3490: Fix attributes set candidates r=curquiza a=ManyTheFish
# Pull Request
Fix attributes set candidates for v1.1.0
## details
The attribute criterion was not returning the remaining candidates when its internal algorithm was been exhausted.
We had a loss of candidates by the attribute criterion leading to the bug reported in the issue linked below.
After some investigation, it seems that it was the only criterion that had this behavior.
We are now returning the remaining candidates instead of an empty bitmap.
## Related issue
Fixes #3483
PR on milli for v1.0.1: https://github.com/meilisearch/milli/pull/777
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-02-15 17:38:07 +00:00
Filip Bachul
a53536836b
fmt
2023-02-14 17:04:22 +01:00
Filip Bachul
d7ad39ad77
fix: clippy error
2023-02-14 00:15:35 +01:00
Filip Bachul
7481559e8b
move BadGeo to FilterError
2023-02-14 00:15:35 +01:00
Filip Bachul
83c765ce6c
implement From<ParseGeoError> for FilterError
2023-02-14 00:15:35 +01:00
Filip Bachul
825923f6fc
export ParseGeoError
2023-02-14 00:15:35 +01:00
Filip Bachul
e405702733
chore: introduce new error ParseGeoError type
2023-02-14 00:15:35 +01:00
ManyTheFish
6fa877efb0
Fix attributes set candidates
2023-02-13 17:49:52 +01:00
bors[bot]
c88c3637b4
Merge #3461
...
3461: Bring v1 changes into main r=curquiza a=Kerollmops
Also bring back changes in milli (the remote repository) into main done during the pre-release
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Co-authored-by: curquiza <curquiza@users.noreply.github.com>
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Philipp Ahlner <philipp@ahlner.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2023-02-07 11:27:27 +00:00
Tamo
7a38fe624f
throw an error if the top left corner is found below the bottom right corner
2023-02-06 17:50:47 +01:00
Tamo
1b005f697d
update the syntax of the geoboundingbox filter to uses brackets instead of parens around lat and lng
2023-02-06 16:50:27 +01:00
Kerollmops
fbec48f56e
Merge remote-tracking branch 'milli/main' into bring-v1-changes
2023-02-06 16:48:10 +01:00
Tamo
3ebc99473f
Apply suggestions from code review
...
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-06 13:29:37 +01:00
Tamo
d27007005e
comments the geoboundingbox + forbid the usage of the lexeme method which could introduce bugs
2023-02-06 11:36:49 +01:00
Tamo
fcb09ccc3d
add tests on the geoBoundingBox
2023-02-02 18:19:56 +01:00
Louis Dureuil
ae8660e585
Add Token::original_span rather than making Token::span pub
2023-02-02 15:03:34 +01:00
Guillaume Mourier
0d71c80ba6
add tests
2023-02-02 12:31:27 +01:00
Guillaume Mourier
b078477d80
Add error handling and earth lap collision with bounding box
2023-02-02 12:17:38 +01:00
ManyTheFish
0bc1a18f52
Use Languages list detected during indexing at search time
2023-02-01 18:57:43 +01:00
ManyTheFish
643d99e0f9
Add expectancy test
2023-02-01 18:39:54 +01:00
Louis Dureuil
20f05efb3c
clippy: needless_lifetimes
2023-01-31 11:12:59 +01:00
Louis Dureuil
3296cf7ae6
clippy: remove needless lifetimes
2023-01-31 09:32:40 +01:00
Louis Dureuil
4fd6fd9bef
Indicate filterable attributes when the user set a non filterable attribute in facet distributions
2023-01-19 12:25:18 +01:00
Clément Renault
1d507c84b2
Fix the formatting
2023-01-17 18:25:55 +01:00
Clément Renault
1b78231e18
Make clippy happy
2023-01-17 18:25:54 +01:00
Loïc Lecrenier
02fd06ea0b
Integrate deserr
2023-01-11 13:56:47 +01:00
bors[bot]
c3f4835e8e
Merge #733
...
733: Avoid a prefix-related worst-case scenario in the proximity criterion r=loiclec a=loiclec
# Pull Request
## Related issue
Somewhat fixes (until merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3118
## What does this PR do?
When a query ends with a word and a prefix, such as:
```
word pr
```
Then we first determine whether `pre` *could possibly* be in the proximity prefix database before querying it. There are then three possibilities:
1. `pr` is not in any prefix cache because it is not the prefix of many words. We don't query the proximity prefix database. Instead, we list all the word derivations of `pre` through the FST and query the regular proximity databases.
2. `pr` is in the prefix cache but cannot be found in the proximity prefix databases. **In this case, we partially disable the proximity ranking rule for the pair `word pre`.** This is done as follows:
1. Only find the documents where `word` is in proximity to `pre` **exactly** (no derivations)
2. Otherwise, assume that their proximity in all the documents in which they coexist is >= 8
3. `pr` is in the prefix cache and can be found in the proximity prefix databases. In this case we simply query the proximity prefix databases.
Note that if a prefix is longer than 2 bytes, then it cannot be in the proximity prefix databases. Also, proximities larger than 4 are not present in these databases either. Therefore, the impact on relevancy is:
1. For common prefixes of one or two letters: we no longer distinguish between proximities from 4 to 8
2. For common prefixes of more than two letters: we no longer distinguish between any proximities
3. For uncommon prefixes: nothing changes
Regarding (1), it means that these two documents would be considered equally relevant according to the proximity rule for the query `heard pr` (IF `pr` is the prefix of more than 200 words in the dataset):
```json
[
{ "text": "I heard there is a faster proximity criterion" },
{ "text": "I heard there is a faster but less relevant proximity criterion" }
]
```
Regarding (2), it means that two documents would be considered equally relevant according to the proximity rule for the query "faster pro":
```json
[
{ "text": "I heard there is a faster but less relevant proximity criterion" }
{ "text": "I heard there is a faster proximity criterion" },
]
```
But the following document would be considered more relevant than the two documents above:
```json
{ "text": "I heard there is a faster swimmer who is competing in the pro section of the competition " }
```
Note, however, that this change of behaviour only occurs when using the set-based version of the proximity criterion. In cases where there are fewer than 1000 candidate documents when the proximity criterion is called, this PR does not change anything.
---
## Performance
I couldn't use the existing search benchmarks to measure the impact of the PR, but I did some manual tests with the `songs` benchmark dataset.
```
1. 10x 'a':
- 640ms ⟹ 630ms = no significant difference
2. 10x 'b':
- set-based: 4.47s ⟹ 7.42 = bad, ~2x regression
- dynamic: 1s ⟹ 870 ms = no significant difference
3. 'Someone I l':
- set-based: 250ms ⟹ 12 ms = very good, x20 speedup
- dynamic: 21ms ⟹ 11 ms = good, x2 speedup
4. 'billie e':
- set-based: 623ms ⟹ 2ms = very good, x300 speedup
- dynamic: ~4ms ⟹ 4ms = no difference
5. 'billie ei':
- set-based: 57ms ⟹ 20ms = good, ~2x speedup
- dynamic: ~4ms ⟹ ~2ms. = no significant difference
6. 'i am getting o'
- set-based: 300ms ⟹ 60ms = very good, 5x speedup
- dynamic: 30ms ⟹ 6ms = very good, 5x speedup
7. 'prologue 1 a 1:
- set-based: 3.36s ⟹ 120ms = very good, 30x speedup
- dynamic: 200ms ⟹ 30ms = very good, 6x speedup
8. 'prologue 1 a 10':
- set-based: 590ms ⟹ 18ms = very good, 30x speedup
- dynamic: 82ms ⟹ 35ms = good, ~2x speedup
```
Performance is often significantly better, but there is also one regression in the set-based implementation with the query `b b b b b b b b b b`.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-04 09:00:50 +00:00
bors[bot]
49f58b2c47
Merge #732
...
732: Interpret synonyms as phrases r=loiclec a=loiclec
# Pull Request
## Related issue
Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3125
## What does this PR do?
We now map multi-word synonyms to phrases instead of loose words. Such that the request:
```
btw I am going to nyc soon
```
is interpreted as (when the synonym interpretation is chosen for both `btw` and `nyc`):
```
"by the way" I am going to "New York City" soon
```
instead of:
```
by the way I am going to New York City soon
```
This prevents queries containing multi-word synonyms to exceed to word length limit and degrade the search performance.
In terms of relevancy, there is a debate to have. I personally think this could be considered an improvement, since it would be strange for a user to search for:
```
good DIY project
```
and have a result such as:
```
{
"text": "whether it is a good project to do, you'll have to decide for yourself"
}
```
However, for synonyms such as `NYC -> New York City`, then we will stop matching documents where `New York` is separated from `City`. This is however solvable by adding an additional mapping: `NYC -> New York`.
## Performance
With the old behaviour, some long search requests making heavy uses of synonyms could take minutes to be executed. This is no longer the case, these search requests now take an average amount of time to be resolved.
Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
2023-01-04 08:34:18 +00:00
bors[bot]
6a10e85707
Merge #736
...
736: Update charabia r=curquiza a=ManyTheFish
Update Charabia to the last version.
> We are now Romanizing Chinese characters into Pinyin.
> Note that we keep the accent because they are in fact never typed directly by the end-user, moreover, changing an accent leads to a different Chinese character, and I don't have sufficient knowledge to forecast the impact of removing accents in this context.
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-01-03 15:44:41 +00:00
Loïc Lecrenier
b5df889dcb
Apply review suggestions: simplify implementation of exactness criterion
2023-01-02 13:11:47 +01:00
Loïc Lecrenier
8d36570958
Add explicit criterion impl strategy to proximity search tests
2023-01-02 10:37:01 +01:00
Loïc Lecrenier
32c6062e65
Optimise exactness criterion
...
1. Cache some results between calls to next()
2. Compute the combinations of exact words more efficiently
2022-12-22 12:28:45 +01:00
Loïc Lecrenier
f097aafa1c
Add unit test for prefix handling by the proximity criterion
2022-12-22 12:08:00 +01:00
Loïc Lecrenier
777b387dc4
Avoid a prefix-related worst-case scenario in the proximity criterion
2022-12-22 12:08:00 +01:00
Loïc Lecrenier
b0f3dc2c06
Interpret synonyms as phrases
2022-12-22 12:07:51 +01:00