Loïc Lecrenier
392472f4bb
Apply suggestions from code review
...
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
453d593ce8
Add a database containing the docids where each field exists
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
fc9f3f31e7
Change DocumentsBatchReader to access cursor and index at same time
...
Otherwise it is not possible to iterate over all documents while
using the fields index at the same time.
2022-07-18 16:08:14 +02:00
Loïc Lecrenier
ab1571cdec
Simplify Transform::read_documents, enabled by enriched documents reader
2022-07-18 12:45:47 +02:00
Kerollmops
448114cc1c
Fix the benchmarks with the new indexation API
2022-07-12 15:22:09 +02:00
Kerollmops
25e768f31c
Fix another issue with the nested primary key selector
2022-07-12 15:14:07 +02:00
Kerollmops
192793ee38
Add some tests to check for the nested documents ids
2022-07-12 15:14:07 +02:00
Kerollmops
dc61105554
Fix the nested document id fetching function
2022-07-12 15:14:06 +02:00
Kerollmops
2eec290424
Check the validity of the latitute and longitude numbers
2022-07-12 15:14:06 +02:00
Kerollmops
5d149d631f
Remove tests for a function that no more exists
2022-07-12 15:14:06 +02:00
Kerollmops
0bbcc7b180
Expose the DocumentId
struct to be sure to inject the generated ids
2022-07-12 15:14:06 +02:00
Kerollmops
d1a4da9812
Generate a real UUIDv4 when ids are auto-generated
2022-07-12 15:14:06 +02:00
Kerollmops
c8ebf0de47
Rename the validate function as an enriching function
2022-07-12 15:14:06 +02:00
Kerollmops
905af2a2e9
Use the primary key and external id in the transform
2022-07-12 15:14:05 +02:00
Kerollmops
742543091e
Constify the default primary key name
2022-07-12 14:55:52 +02:00
Kerollmops
5f1bfb73ee
Extract the primary key name and make it accessible
2022-07-12 14:55:52 +02:00
Kerollmops
6a0a0ae94f
Make the Transform read from an EnrichedDocumentsBatchReader
2022-07-12 14:55:52 +02:00
Kerollmops
8ebf5eed0d
Make the nested primary key work
2022-07-12 14:55:52 +02:00
Kerollmops
19eb3b4708
Make sur that we do not accept floats as documents ids
2022-07-12 14:55:52 +02:00
Kerollmops
2ceeb51c37
Support the auto-generated ids when validating documents
2022-07-12 14:55:51 +02:00
Kerollmops
399eec5c01
Fix the indexation tests
2022-07-12 14:55:51 +02:00
Kerollmops
fcfc4caf8c
Move the Object type in the lib.rs file and use it everywhere
2022-07-12 14:55:51 +02:00
Kerollmops
0146175fe6
Introduce the validate_documents_batch function
2022-07-12 14:55:51 +02:00
Kerollmops
bdc4263883
Introduce the validate_documents_batch function
2022-07-12 14:55:51 +02:00
Kerollmops
e8297ad27e
Fix the tests for the new DocumentsBatchBuilder/Reader
2022-07-12 14:52:56 +02:00
bors[bot]
ebddfdb9a3
Merge #578
...
578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops
Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584 ).
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-05 14:56:08 +00:00
Kerollmops
1bfdcfc84f
Bump uuid to 1.1.2
2022-07-05 16:23:36 +02:00
Tamo
eaf28b0628
Apply review suggestions
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-05 15:30:33 +02:00
Tamo
3b309f654a
Fasten the document deletion
...
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Tamo
d0aaa7ff00
Fix wrong internal ids assignments
2022-06-07 15:49:33 +02:00
ad hoc
31776fdc3f
add failing test
2022-06-07 15:49:33 +02:00
ManyTheFish
86ac8568e6
Use Charabia in milli
2022-06-02 16:59:11 +02:00
bors[bot]
08c6d50cd1
Merge #531
...
531: fix the mixed dataset geosearch indexing bug r=Kerollmops a=irevoire
port #529 to main
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-16 16:06:36 +00:00
Tamo
0af399a6d7
fix the mixed dataset geosearch indexing bug
2022-05-16 17:37:45 +02:00
Tamo
f586028f9a
fix the searchable fields bug when a field is nested
...
Update milli/src/index.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-05-16 17:24:36 +02:00
bors[bot]
65e6aa0de2
Merge #523
...
523: Improve geosearch error messages r=irevoire a=irevoire
Improve the geosearch error messages (#488 ).
And try to parse the string as specified in https://github.com/meilisearch/meilisearch/issues/2354
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-05-04 13:36:11 +00:00
Tamo
c55368ddd4
apply code suggestion
...
Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-05-04 14:11:03 +02:00
Kerollmops
211c8763b9
Make sure that we do not generate too long keys
2022-05-03 10:03:15 +02:00
Kerollmops
7e47031bdc
Add a test for long keys in LMDB
2022-05-03 10:03:13 +02:00
Tamo
3cb1f6d0a1
improve geosearch error messages
2022-05-02 19:20:47 +02:00
Tamo
f19d2dc548
Only flatten the required fields
...
apply review comments
Co-authored-by: Kerollmops <kero@meilisearch.com>
2022-04-26 12:33:46 +02:00
Clément Renault
eb5830aa40
Add a test to make sure that long words are handled
2022-04-21 13:45:28 +02:00
Tamo
00f78d6b5a
Apply code suggestions
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-04-14 11:14:08 +02:00
Tamo
399fba16bb
only flatten an object if it's nested
2022-04-14 11:14:08 +02:00
Tamo
ee64f4a936
Use smartstring to store the external id in our hashmap
...
We need to store all the external id (primary key) in a hashmap
associated to their internal id during.
The smartstring remove heap allocation / memory usage and should
improve the cache locality.
2022-04-13 21:22:07 +02:00
Irevoire
4f3ce6d9cd
nested fields
2022-04-07 16:58:46 +02:00
ad hoc
b799f3326b
rename merge_nothing to merge_ignore_values
2022-04-05 18:44:35 +02:00
ad hoc
201fea0fda
limit extract_word_docids memory usage
2022-04-05 14:14:15 +02:00
ad hoc
b85cd4983e
remove field_id_from_position
2022-04-05 09:50:34 +02:00
ad hoc
b7694c34f5
remove println
2022-04-04 21:00:07 +02:00
ad hoc
6cabd47c32
fix typo in comment
2022-04-04 20:59:20 +02:00
ad hoc
6b2c2509b2
fix bug in exact search
2022-04-04 20:54:03 +02:00
ad hoc
e8f06f6c06
extract exact_word_prefix_docids
2022-04-04 20:54:03 +02:00
ad hoc
ba0bb29cd8
refactor WordPrefixDocids to take dbs instead of indexes
2022-04-04 20:54:02 +02:00
ad hoc
c4c6e35352
query exact_word_docids in resolve_query_tree
2022-04-04 20:54:02 +02:00
ad hoc
8d46a5b0b5
extract exact word docids
2022-04-04 20:54:02 +02:00
ad hoc
0a77be4ec0
introduce exact_word_docids db
2022-04-04 20:54:02 +02:00
ad hoc
5f9f82757d
refactor spawn_extraction_task
2022-04-04 20:54:02 +02:00
Kerollmops
d5b8b5a2f8
Replace the ugly unwraps by clean if let Somes
2022-02-28 16:31:33 +01:00
Kerollmops
8d26f3040c
Remove a useless grenad file merging
2022-02-28 16:31:33 +01:00
Clément Renault
04b1bbf932
Reintroduce appending sorted entries when possible
2022-02-24 14:50:45 +01:00
bors[bot]
25123af3b8
Merge #436
...
436: Speed up the word prefix databases computation time r=Kerollmops a=Kerollmops
This PR depends on the fixes done in #431 and must be merged after it.
In this PR we will bring the `WordPrefixPairProximityDocids`, `WordPrefixDocids` and, `WordPrefixPositionDocids` update structures to a new era, a better era, where computing the word prefix pair proximities costs much fewer CPU cycles, an era where this update structure can use the, previously computed, set of new word docids from the newly indexed batch of documents.
---
The `WordPrefixPairProximityDocids` is an update structure, which means that it is an object that we feed with some parameters and which modifies the LMDB database of an index when asked for. This structure specifically computes the list of word prefix pair proximities, which correspond to a list of pairs of words associated with a proximity (the distance between both words) where the second word is not a word but a prefix e.g. `s`, `se`, `a`. This word prefix pair proximity is associated with the list of documents ids which contains the pair of words and prefix at the given proximity.
The origin of the performances issue that this struct brings is related to the fact that it starts its job from the beginning, it clears the LMDB database before rewriting everything from scratch, using the other LMDB databases to achieve that. I hope you understand that this is absolutely not an optimized way of doing things.
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-02-16 15:41:14 +00:00
Clément Renault
ff8d7a810d
Change the behavior of the as_cloneable_grenad by taking a ref
2022-02-16 15:40:08 +01:00
Clément Renault
f367cc2e75
Finally bump grenad to v0.4.1
2022-02-16 15:28:48 +01:00
Many
d59bcea749
Revert "Revert "Change chunk size to 4MiB to fit more the end user usage""
2022-02-02 17:01:13 +01:00
Kerollmops
fb79c32430
Compute the new, common and, deleted prefix words fst once
2022-01-27 11:00:18 +01:00
Clément Renault
51d1e64b23
Remove, now useless, the WriteMethod enum
2022-01-27 10:08:35 +01:00
Clément Renault
e9c02173cf
Rework the WordsPrefixPositionDocids update to compute a subset of the database
2022-01-27 10:08:35 +01:00
Clément Renault
d59e559317
Fix the computation of the newly added and common prefix words
2022-01-27 10:08:34 +01:00
Clément Renault
28692f65be
Rework the WordPrefixDocids update to compute a subset of the database
2022-01-27 10:08:34 +01:00
Clément Renault
5404bc02dd
Move the fst_stream_into_hashset method in the helper methods
2022-01-27 10:06:00 +01:00
Clément Renault
822f67e9ad
Bring the newly created word pair proximity docids
2022-01-27 10:06:00 +01:00
Clément Renault
d28f18658e
Retrieve the previous version of the words prefixes FST
2022-01-27 10:05:59 +01:00
bors[bot]
fd177b63f8
Merge #423
...
423: Remove an unused file r=irevoire a=irevoire
This empty file is not included anywhere
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-01-19 14:18:05 +00:00
Marin Postma
0c84a40298
document batch support
...
reusable transform
rework update api
add indexer config
fix tests
review changes
Co-authored-by: Clément Renault <clement@meilisearch.com>
fmt
2022-01-19 12:40:20 +01:00
Tamo
98a365aaae
store the geopoint in three dimensions
2021-12-14 12:21:24 +01:00
Tamo
d671d6f0f1
remove an unused file
2021-12-13 19:27:34 +01:00
many
8970246bc4
Sort positions before iterating over them during word pair proximity extraction
2021-11-22 18:16:54 +01:00
Marin Postma
6eb47ab792
remove update_id in UpdateBuilder
2021-11-16 13:07:04 +01:00
Marin Postma
09b4281cff
improve document addition returned metaimprove document addition
...
returned metaimprove document addition returned metaimprove document
addition returned metaimprove document addition returned metaimprove
document addition returned metaimprove document addition returned
metaimprove document addition returned meta
2021-11-10 14:08:36 +01:00
many
3599df77f0
Change some error messages
2021-10-27 19:33:01 +02:00
marin postma
baddd80069
implement review suggestions
2021-10-25 18:29:12 +02:00
marin postma
430e9b13d3
add csv builder tests
2021-10-25 10:26:43 +02:00
marin postma
2e62925a6e
fix tests
2021-10-25 10:26:42 +02:00
marin postma
0f86d6b28f
implement csv serialization
2021-10-25 10:26:42 +02:00
marin postma
8d70b01714
optimize document deserialization
2021-10-25 10:26:42 +02:00
bors[bot]
c7db4176f3
Merge #384
...
384: Replace memmap with memmap2 r=Kerollmops a=palfrey
[memmap is unmaintained](https://rustsec.org/advisories/RUSTSEC-2020-0077.html ) and needs replacing. memmap2 is a drop-in replacement fork that's well maintained. Note that the version numbers got reset on fork, hence the lower values.
Co-authored-by: Tom Parker-Shemilt <palfrey@tevp.net>
2021-10-13 13:47:23 +00:00
bors[bot]
6e3b869e6a
Merge #388
...
388: fix primary key inference r=MarinPostma a=MarinPostma
The primary key is was infered from a hashtable index of the field. For this reason the order in which the fields were interated upon was not deterministic, and the primary key was chosed ffrom the first field containing "id".
This fix sorts the the index by field_id when infering the primary key.
Co-authored-by: mpostma <postma.marin@protonmail.com>
2021-10-12 09:25:16 +00:00
mpostma
86ead92ed5
infer primary key on sorted fields
2021-10-12 11:15:11 +02:00
mpostma
9a266a531b
test correct primary key inference
2021-10-12 11:08:53 +02:00
many
c5a6075484
Make max_position_per_attributes changable
2021-10-12 10:10:50 +02:00
many
360c5ff3df
Remove limit of 1000 position per attribute
...
Instead of using an arbitrary limit we encode the absolute position in a u32
using one strong u16 for the field id and a weak u16 for the relative position in the attribute.
2021-10-12 10:10:50 +02:00
Tom Parker-Shemilt
2dfe24f067
memmap -> memmap2
2021-10-10 22:47:12 +01:00
many
3296bb243c
Simplify word level position DB into a word position DB
2021-10-05 12:15:02 +02:00
Many
26b5dad042
Revert "Change chunk size to 4MiB to fit more the end user usage"
2021-09-29 15:08:39 +02:00
Tamo
f65153ad64
stop casting integer docids to string
2021-09-28 18:35:54 +02:00
many
1988416295
Add failing test related to Meilisearch#1714
2021-09-28 12:05:11 +02:00
many
b188063869
Change chunk size to 4MiB to fit more the end user usage
2021-09-27 14:26:21 +02:00
many
551df0cb77
Add test checking the bug reported in meilisearch issue 1716
2021-09-23 15:55:39 +02:00
mpostma
aa6c5df0bc
Implement documents format
...
document reader transform
remove update format
support document sequences
fix document transform
clean transform
improve error handling
add documents! macro
fix transform bug
fix tests
remove csv dependency
Add comments on the transform process
replace search cli
fmt
review edits
fix http ui
fix clippy warnings
Revert "fix clippy warnings"
This reverts commit a1ce3cd96e603633dbf43e9e0b12b2453c9c5620.
fix review comments
remove smallvec in transform loop
review edits
2021-09-21 16:58:33 +02:00