Louis Dureuil
e9eb271499
Remove empty attribute_rule mod
2023-03-27 11:08:03 +02:00
Louis Dureuil
3281a88d08
SmallBitmap: don't expose internal items
2023-03-27 11:04:43 +02:00
Louis Dureuil
5a644054ab
Removed unused search impl
2023-03-27 11:04:27 +02:00
Louis Dureuil
16fefd364e
Add TODO notes
2023-03-27 11:04:04 +02:00
Loïc Lecrenier
00bad8c716
Add comments suggesting performance improvements
2023-03-23 10:18:24 +01:00
Loïc Lecrenier
862714a18b
Remove criterion_implementation_strategy param of Search
2023-03-23 09:44:12 +01:00
Loïc Lecrenier
d18ebe4f3a
Remove more warnings
2023-03-23 09:41:18 +01:00
Loïc Lecrenier
7169d85115
Remove old query_tree code and make clippy happy
2023-03-23 09:39:16 +01:00
Loïc Lecrenier
f5f5f03ec0
Remove old criteria code
2023-03-23 09:35:53 +01:00
Loïc Lecrenier
9b2653427d
Split position DB into fid and relative position DB
2023-03-23 09:22:01 +01:00
Loïc Lecrenier
56b7209f26
Make clippy happy
2023-03-23 09:16:17 +01:00
Loïc Lecrenier
9b1f439a91
WIP
2023-03-23 09:12:35 +01:00
Loïc Lecrenier
01c7d2de8f
Add example targets to the milli crate
2023-03-22 14:50:41 +01:00
Loïc Lecrenier
a86aeba411
WIP
2023-03-22 14:43:08 +01:00
Loïc Lecrenier
384fdc2df4
Fix two bugs in proximity ranking rule
2023-03-21 11:43:25 +01:00
Loïc Lecrenier
83e5b4ed0d
Compute edges of proximity graph lazily
2023-03-21 10:44:40 +01:00
Loïc Lecrenier
272cd7ebbd
Small cleanup
2023-03-20 13:39:19 +01:00
Loïc Lecrenier
c63c7377e6
Switch order of MappedInterner generic params
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
5b50e49522
cargo fmt
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
65474c8de5
Update new sort ranking rule after rebasing
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
fbb1ba3de0
Cargo fmt
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
a59ca28e2c
Add forgotten file
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
825f742000
Simplify graph-based ranking rule impl
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
dd491320e5
Simplify graph-based ranking rule impl
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c6ff97a220
Rewrite the dead-ends cache to detect more dead-ends
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
49240c367a
Fix bug in cost of typo conditions
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
1e6e624078
Fix bug in SmallBitmap
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
8b4e07e1a3
WIP
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
2853009987
Renaming Edge -> Condition
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
aa59c3bc2c
Replace EdgeCondition with an Option<..> + other code cleanup
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
7b1d8f4c6d
Make PathSet strongly typed
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
a49ddec9df
Prune the query graph after executing a ranking rule
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
05fe856e6e
Merge forward and backward proximity conditions in proximity graph
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c0cdaf9f53
Fix bug in the proximity ranking rule for queries with ngrams
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
e9cf58d584
Refactor of the Interner
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
31628c5cd4
Merge Phrase and WordDerivations into one structure
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
3004e281d7
Support ngram typos + splitwords and splitwords+synonyms in proximity
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
14e8d0aaa2
Rename lifetime
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
1c58cf8426
Intern ranking rule graph edge conditions as well
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
5155fd2bf1
Reorganise initialisation of ranking rules + rename PathsMap -> PathSet
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
9ec9c204d3
Small code cleanup
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
78b9304d52
Implement distinct attribute
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
0465ba4a05
Intern more values
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
2099991dd1
Continue documenting and cleaning up the code
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c232cdabf5
Add documentation
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
4e266211bf
Small code reorganisation
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
57fa689131
Cargo fmt
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
10626dddfc
Add a few more optimisations to new search algorithms
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
9051065c22
Apply a few optimisations for graph-based ranking rules
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
e8c76cf7bf
Intern all strings and phrases in the search logic
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
3f1729a17f
Update new search test
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
cab2b6bcda
Fix: computation of initial universe, code organisation
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c4979a2fda
Fix code visibility issue + unimplemented detail in proximity rule
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
23931f8a4f
Fix small bug in visual logger of search algo
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
aa414565bb
Fix proximity graph edge builder to include all proximities
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
1db152046e
WIP on split words and synonyms support
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
c27ea2677f
Rewrite cheapest path algorithm and empty path cache
...
It is now much simpler and has much better performance.
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
caa1e1b923
Add typo ranking rule to new search impl
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
71f18e4379
Add sort ranking rule to new search impl
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
600e3dd1c5
Remove warnings
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
362eb0de86
Add support for filters
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
998d46ac10
Add support for search offset and limit
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
6c85c0d95e
Fix more bugs + visual empty path cache logging
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
0e1fbbf7c6
Fix bugs in query graph's "remove word" and "cheapest paths" algos
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
6806640ef0
Fix d2 description of paths map
2023-03-20 09:41:56 +01:00
Loïc Lecrenier
173e37584c
Improve the visual/detailed search logger
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
6ba4d5e987
Add a search logger
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
dd12d44134
Support swapped word pairs in new proximity ranking rule impl
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
c8e251bf24
Remove noise in codebase
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
a938fbde4a
Use a cache when resolving the query graph
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
dcf3f1d18a
Remove EdgeIndex and NodeIndex types, prefer u32 instead
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
66d0c63694
Add some documentation and use bitmaps instead of hashmaps when possible
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
132191360b
Introduce the sort ranking rule working with the new search structures
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
345c99d5bd
Introduce the words ranking rule working with the new search structures
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
89d696c1e3
Introduce the proximity ranking rule as a graph-based ranking rule
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
c645853529
Introduce a generic graph-based ranking rule
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
a70ab8b072
Introduce a function to find the K shortest paths in a graph
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
48aae76b15
Introduce a function to find the docids of a set of paths in a graph
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
23bf572dea
Introduce cache structures used with ranking rule graphs
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
864f6410ed
Introduce a structure to represent a set of graph paths efficiently
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
c9bf6bb2fa
Introduce a structure to implement ranking rules with graph algorithms
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
46249ea901
Implement a function to find a QueryGraph's docids
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
ce0d1e0e13
Introduce a common way to manage the coordination between ranking rules
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
5065d8b0c1
Introduce a DatabaseCache to memorize the addresses of LMDB values
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
a83007c013
Introduce structure to represent search queries as graphs
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
79e0a6dd4e
Introduce a new search module, eventually meant to replace the old one
...
The code here does not compile, because I am merely splitting one giant
commit into smaller ones where each commit explains a single file.
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
2d88089129
Remove unused term matching strategies
2023-03-20 09:41:55 +01:00
Loïc Lecrenier
6c659dc12f
Use MiMalloc in milli tests
2023-03-20 09:41:37 +01:00
Tamo
0f33a65468
makes kero happy
2023-03-13 16:51:11 +01:00
Tamo
eddefb0e0f
refactor the error type of the milli::document thing
...
silence a warning
2023-03-09 13:03:14 +01:00
Tamo
c5f22be6e1
add boolean support for csv documents
2023-03-09 11:12:49 +01:00
bors[bot]
4f1ccbc495
Merge #3525
...
3525: Fix phrase search containing stop words r=ManyTheFish a=ManyTheFish
# Summary
A search with a phrase containing only stop words was returning an HTTP error 500,
this PR filters the phrase containing only stop words dropping them before the search starts, a query with a phrase containing only stop words now behaves like a placeholder search.
fixes https://github.com/meilisearch/meilisearch/issues/3521
related v1.0.2 PR on milli: https://github.com/meilisearch/milli/pull/779
Co-authored-by: ManyTheFish <many@meilisearch.com>
2023-03-02 10:55:37 +00:00
ManyTheFish
37489fd495
Return an internal error in the case of matching word is invalid
2023-03-01 19:05:16 +01:00
Louis Dureuil
5822764be9
Skip computing index budget in tests
2023-02-23 11:23:39 +01:00
bors[bot]
ac5a1e4c4b
Merge #3423
...
3423: Add min and max facet stats r=dureuill a=dureuill
# Pull Request
## Related issue
Fixes #3426
## What does this PR do?
### User standpoint
- When using a `facets` parameter in search, the facets that have numeric values are displayed in a new section of the response called `facetStats` that contains, per facet, the numeric min and max value of the hits returned by the search.
<details>
<summary>
Sample request/response
</summary>
```json
❯ curl \
-X POST 'http://localhost:7700/indexes/meteorites/search?facets=mass ' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "LL6", "facets":["mass", "recclass"], "limit": 5 }' | jsonxf
{
"hits": [
{
"name": "Niger (LL6)",
"id": "16975",
"nametype": "Valid",
"recclass": "LL6",
"mass": 3.3,
"fall": "Fell"
},
{
"name": "Appley Bridge",
"id": "2318",
"nametype": "Valid",
"recclass": "LL6",
"mass": 15000,
"fall": "Fell",
"_geo": {
"lat": 53.58333,
"lng": -2.71667
}
},
{
"name": "Athens",
"id": "4885",
"nametype": "Valid",
"recclass": "LL6",
"mass": 265,
"fall": "Fell",
"_geo": {
"lat": 34.75,
"lng": -87.0
}
},
{
"name": "Bandong",
"id": "4935",
"nametype": "Valid",
"recclass": "LL6",
"mass": 11500,
"fall": "Fell",
"_geo": {
"lat": -6.91667,
"lng": 107.6
}
},
{
"name": "Benguerir",
"id": "30443",
"nametype": "Valid",
"recclass": "LL6",
"mass": 25000,
"fall": "Fell",
"_geo": {
"lat": 32.25,
"lng": -8.15
}
}
],
"query": "LL6",
"processingTimeMs": 15,
"limit": 5,
"offset": 0,
"estimatedTotalHits": 42,
"facetDistribution": {
"mass": {
"110000": 1,
"11500": 1,
"1161": 1,
"12000": 1,
"1215.5": 1,
"127000": 1,
"15000": 1,
"1676": 1,
"1700": 1,
"1710.5": 1,
"18000": 1,
"19000": 1,
"220000": 1,
"2220": 1,
"22300": 1,
"25000": 2,
"265": 1,
"271000": 1,
"2840": 1,
"3.3": 1,
"3000": 1,
"303": 1,
"32000": 1,
"34000": 1,
"36.1": 1,
"45000": 1,
"460": 1,
"478": 1,
"483": 1,
"5500": 2,
"600": 1,
"6000": 1,
"67.8": 1,
"678": 1,
"680.5": 1,
"6930": 1,
"8": 1,
"8300": 1,
"840": 1,
"8400": 1
},
"recclass": {
"L/LL6": 3,
"LL6": 39
}
},
"facetStats": {
"mass": {
"min": 3.3,
"max": 271000.0
}
}
}
```
</details>
## PR checklist
Please check if your PR fulfills the following requirements:
- [ ] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [ ] Have you read the contributing guidelines?
- [ ] Have you made sure that the title is accurate and descriptive of the changes?
Thank you so much for contributing to Meilisearch!
Co-authored-by: Louis Dureuil <louis@meilisearch.com>
2023-02-22 13:06:43 +00:00
ManyTheFish
900bae3d9d
keep phrases that has at least one word
2023-02-21 18:16:51 +01:00
ManyTheFish
28b7d73d4a
Remove an unefficient part of a test on milli
2023-02-21 18:16:51 +01:00
bors[bot]
39407885c2
Merge #3347
...
3347: Enhance language detection r=irevoire a=ManyTheFish
## Summary
Some completely unrelated Languages can share the same characters, in Meilisearch we detect the Languages using `whatlang`, which works well on large texts but fails on small search queries leading to a bad segmentation and normalization of the query.
This PR now stores the Languages detected during the indexing in order to reduce the Languages list that can be detected during the search.
## Detail
- Create a 19th database mapping the scripts and the Languages detected with the documents where the Language is detected
- Fill the newly created database during indexing
- Create an allow-list with this database and pass it to Charabia
- Add a test ensuring that a Japanese request containing kanjis only is detected as Japanese and not Chinese
## Related issues
Fixes #2403
Fixes #3513
Co-authored-by: f3r10 <frledesma@outlook.com>
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2023-02-21 10:52:13 +00:00
ManyTheFish
bbecab8948
fix clippy
2023-02-21 10:18:44 +01:00
ManyTheFish
8aa808d51b
Merge branch 'main' into enhance-language-detection
2023-02-20 18:14:34 +01:00