Clément Renault
5606679c53
Use the obkv and grenad crates.io versions
2024-11-25 16:24:59 +01:00
Clément Renault
a3103f347e
Fix the facet f64 database name
2024-11-25 16:05:31 +01:00
Clément Renault
25aac45fc7
Expose better error messages
2024-11-25 15:54:43 +01:00
Louis Dureuil
323ecbb885
Add span on document operation
2024-11-21 17:01:10 +01:00
Louis Dureuil
dcc3caef0d
Remove TopLevelMap
2024-11-21 16:56:46 +01:00
Louis Dureuil
221e547e86
Slight changes
2024-11-21 16:47:44 +01:00
Clément Renault
61d0615253
Document the geo point extractor
2024-11-21 16:47:08 +01:00
Clément Renault
5727e00374
Remove useless geo skipped
2024-11-21 16:47:08 +01:00
Clément Renault
9b60843831
Remove commented lines
2024-11-21 16:47:07 +01:00
ManyTheFish
36962b943b
First batch of PR comment
2024-11-21 16:38:11 +01:00
Louis Dureuil
32bcacefd5
Changes Document::len to Document::top_level_fields_count
2024-11-21 15:01:07 +01:00
Louis Dureuil
4ed195426c
remove unused stuff in global.rs
2024-11-21 15:01:07 +01:00
ManyTheFish
94b260fd25
Remove orphan span
2024-11-21 12:12:07 +01:00
Clément Renault
ab2c83f868
Use the disk less when computing prefixes
2024-11-21 10:45:37 +01:00
Louis Dureuil
6e6acfcf1b
Merge branch 'main' into indexer-edition-2024
2024-11-20 16:59:58 +01:00
Louis Dureuil
e0864f1b21
Separate side effect and debug asserts
2024-11-20 16:25:17 +01:00
Clément Renault
a38344acb3
Replace eprintlns by tracing
2024-11-20 15:29:51 +01:00
ManyTheFish
4d616f8794
Parse every attributes and filter before tokenization
2024-11-20 15:15:25 +01:00
Louis Dureuil
ff9c92c409
rename documents -> substep
2024-11-20 15:12:02 +01:00
Clément Renault
8380ddbdcd
Fix progress of into_changes
2024-11-20 15:10:09 +01:00
Louis Dureuil
867138f166
Add SP to into_changes
2024-11-20 15:07:05 +01:00
Clément Renault
567bd4538b
Fxi the into_changes stop processing
2024-11-20 14:58:25 +01:00
Louis Dureuil
84600a10d1
Add MSP to document_update.into_changes()
2024-11-20 14:53:37 +01:00
Louis Dureuil
7d64e8dbd3
Fix Windows compilation
2024-11-20 14:40:38 +01:00
Louis Dureuil
cae8c89467
"fix" last warnings
2024-11-20 14:03:52 +01:00
Clément Renault
7cb8732b45
Introduce a new bincode internal error
2024-11-20 13:23:11 +01:00
ManyTheFish
fe5d50969a
Fix filed selector in extrators
2024-11-20 13:16:44 +01:00
Clément Renault
56c7c5d5f0
Fix comments
2024-11-20 13:16:44 +01:00
Louis Dureuil
4cdfdddd6d
Fix one more
2024-11-20 13:16:43 +01:00
Louis Dureuil
2afa33011a
Fix tokenize_document
2024-11-20 13:16:43 +01:00
Louis Dureuil
61feca1f41
More tests pass
2024-11-20 13:16:43 +01:00
Louis Dureuil
f893b5153e
Don't mark [""] as empty facet
2024-11-20 13:16:42 +01:00
Louis Dureuil
ca779c21f9
facets: Handle boolean and skip empty strings
2024-11-20 13:16:42 +01:00
Louis Dureuil
477077bdc2
Remove _vectors
from fid map when there are no vectors in sight
2024-11-20 13:16:42 +01:00
ManyTheFish
b1f8aec348
Fix index_documents_check_exists_database
2024-11-20 13:16:41 +01:00
ManyTheFish
ba7f091db3
Use tokenizer on numbers and booleans
2024-11-20 13:16:41 +01:00
Louis Dureuil
8049df125b
Add depth to facet extraction so that null inside an array doesn't mark the entire field as null
2024-11-20 13:16:40 +01:00
Clément Renault
50d1bd01df
We no longer index geo lat and lng
2024-11-20 13:16:40 +01:00
Louis Dureuil
a28d4f5d0c
Fix setup_search_index_with_criteria
2024-11-20 13:16:40 +01:00
Louis Dureuil
fc14f4bc66
Attempt to fix setup_search_index_with_criteria
2024-11-20 13:16:39 +01:00
Clément Renault
5f8a82d6f5
Improve test
2024-11-20 13:16:39 +01:00
Clément Renault
fe04e51a49
One more
2024-11-20 13:16:38 +01:00
Clément Renault
01b27e40ad
Fix a bit of the placeholder search tests
2024-11-20 13:16:38 +01:00
Louis Dureuil
8076d98544
Fix stats_should_not_return_deleted_documents
2024-11-20 13:16:37 +01:00
Louis Dureuil
9e951baad5
One more test passing
2024-11-20 13:16:37 +01:00
Louis Dureuil
52f2fc4c46
Fail in case of user error in tests
2024-11-20 13:16:37 +01:00
Clément Renault
3957917e0b
Correctly count indexed documents
2024-11-20 13:16:36 +01:00
Louis Dureuil
651c30899e
Allow fetching embedders from inside tests
2024-11-20 13:16:36 +01:00
Clément Renault
2c7a7fe4e8
Count the number of documents correctly
2024-11-20 13:16:35 +01:00
Clément Renault
23f0c2c29b
Generate internal ids only when needed
2024-11-20 13:16:35 +01:00
Louis Dureuil
6641c3f59b
Remove all autogenerated tests
2024-11-20 13:16:34 +01:00
Louis Dureuil
07a72824b7
Subfields of _vectors
are no longer part of the fid map
2024-11-20 13:16:34 +01:00
Louis Dureuil
000eb55c4e
fix one
2024-11-20 13:16:34 +01:00
Louis Dureuil
1aef0e4037
documents! macro accepts a single object again
2024-11-20 13:16:33 +01:00
Clément Renault
32d0e50a75
Fix all the benchmark compilation errors
2024-11-20 13:16:32 +01:00
Louis Dureuil
df5884b0c1
Fix settings test
2024-11-20 13:16:32 +01:00
Louis Dureuil
9e0eb5ebb0
Removed some warnings
2024-11-20 13:16:32 +01:00
Clément Renault
3cf1352ae1
Fix the benchmark tests
2024-11-20 13:16:31 +01:00
Clément Renault
aba8a0e9e0
Fix some tests but not all of them
2024-11-20 13:16:31 +01:00
Clément Renault
670aff5553
Remove useless Transform methods
2024-11-20 13:16:08 +01:00
Tamo
229fa0f902
implements the batch details
2024-11-20 10:51:06 +01:00
Lukas Kalbertodt
057fcb3993
Add indices
field to _matchesPosition
to specify where in an array a match comes from ( #5005 )
...
* Remove unreachable code
* Add `indices` field to `MatchBounds`
For matches inside arrays, this field holds the indices of the array
elements that matched. For example, searching for `cat` inside
`{ "a": ["dog", "cat", "fox"] }` would return `indices: [1]`. For nested
arrays, this contains multiple indices, starting with the one for the
top-most array. For matches in fields without arrays, `indices` is not
serialized (does not exist) to save space.
2024-11-20 01:00:43 +01:00
ManyTheFish
41dbdd2d18
Fix filtered_placeholder_search_should_not_return_deleted_documents and word_scale_set_and_reset
2024-11-19 16:08:25 +01:00
Louis Dureuil
c782c09208
Move step to a dedicated mod and replace it with an enum
2024-11-18 18:22:13 +01:00
Louis Dureuil
75943a5a9b
Add TODO to remember replacing steps with an enum
2024-11-18 17:40:51 +01:00
Louis Dureuil
04c38220ca
Move MostlySend, ThreadLocal, FullySend to their own commit
2024-11-18 16:43:05 +01:00
Louis Dureuil
5f93651cef
fixes
2024-11-18 16:23:11 +01:00
ManyTheFish
510ca99996
Fixes #4974
2024-11-18 16:08:55 +01:00
ManyTheFish
e0c3f3d560
Fix #4984
2024-11-18 16:08:53 +01:00
Louis Dureuil
0a21d9bfb3
Fix double borrow of new fields id map
2024-11-18 15:56:01 +01:00
Louis Dureuil
e736a74729
Remove infinite loop in import_vectors
2024-11-18 12:50:56 +01:00
Louis Dureuil
e9d17136b2
Add deadline of 3 seconds to embedding requests made in the context of hybrid search
2024-11-18 12:15:11 +01:00
ManyTheFish
cd796b0f4b
Fix SDK test
2024-11-18 11:46:00 +01:00
Louis Dureuil
6570da3bcb
Retry in case where the JSON deserialization fails
2024-11-18 11:33:09 +01:00
Clément Renault
5b4c06c24c
Plug the grenad max memory parameter
2024-11-18 11:28:04 +01:00
Louis Dureuil
3a8051866a
Use return_keyword_results
function instead of returning raw keyword results when the embedder is broken
2024-11-18 11:17:15 +01:00
Louis Dureuil
c202f3dbe2
fix tests and revert change in behavior when primary_key_from_op != primary_key_from_db && index.is_empty()
2024-11-18 10:59:05 +01:00
Clément Renault
677d7293f5
Fix a lot of primary key related tests
2024-11-18 10:59:05 +01:00
Clément Renault
83865d2ebd
Expose intermediate errors when processing batches
2024-11-18 10:59:05 +01:00
ManyTheFish
4ff2b3c2ee
Fix test on locales
2024-11-14 15:45:04 +01:00
ManyTheFish
91c58cfa38
Fix positional databases
2024-11-14 11:40:12 +01:00
Clément Renault
9e8367f1e6
Move the rayon thread pool outside the extract method
2024-11-14 10:40:32 +01:00
Louis Dureuil
0e3c5d91ab
Document deletion test passes
2024-11-14 08:42:56 +01:00
Louis Dureuil
695c2c6b99
Cosmetic fix
2024-11-14 08:42:39 +01:00
Louis Dureuil
40dd25d6b2
Fix issue with Replace document method when adding and deleting a document in the same batch
2024-11-13 22:10:00 +01:00
Clément Renault
8e5b1a3ec1
Compute the field distribution and convert _geo into an f64s
2024-11-13 17:44:05 +01:00
ManyTheFish
e627e182ce
Fix facet strings
2024-11-13 17:43:02 +01:00
ManyTheFish
51b6293738
Add linear facet databases
2024-11-13 17:43:02 +01:00
Clément Renault
b17896d899
Finialize the GeoExtractor
2024-11-13 17:43:02 +01:00
Louis Dureuil
7accfea624
Don't short circuit when we encounter a semantic error while extracting fields and external docid
2024-11-13 10:33:59 +01:00
Louis Dureuil
3b0cb5b487
Fix vector error messages
2024-11-12 23:26:16 +01:00
Louis Dureuil
bfdcd1cf33
Space changes
2024-11-12 22:52:45 +01:00
Louis Dureuil
c4e9f761e9
Emit better error messages when parsing vectors
2024-11-12 22:49:22 +01:00
Louis Dureuil
8a6e61c77f
InvalidVectorsEmbedderConf error takes a String rather than a deserr error
2024-11-12 22:47:57 +01:00
Louis Dureuil
980921e078
Vector fixes
2024-11-12 16:31:22 +01:00
Louis Dureuil
6094bb299a
Fix user_provided vectors
2024-11-12 10:15:55 +01:00
Louis Dureuil
bef8fc6cf1
Fix hf embedder
2024-11-08 13:10:17 +01:00
Louis Dureuil
5185aa21b8
Know if your vectors are implicit when writing them back in documents + don't write empty _vectors
2024-11-08 00:05:36 +01:00
Louis Dureuil
8a314ab81d
Fix primary key fid order
2024-11-08 00:05:12 +01:00
Louis Dureuil
4706a0eb49
Fix vector parsing
2024-11-07 23:26:20 +01:00