1861 Commits

Author SHA1 Message Date
Loïc Lecrenier
fc9f3f31e7 Change DocumentsBatchReader to access cursor and index at same time
Otherwise it is not possible to iterate over all documents while
using the fields index at the same time.
2022-07-18 16:08:14 +02:00
Loïc Lecrenier
ab1571cdec Simplify Transform::read_documents, enabled by enriched documents reader 2022-07-18 12:45:47 +02:00
Kerollmops
448114cc1c
Fix the benchmarks with the new indexation API 2022-07-12 15:22:09 +02:00
Kerollmops
25e768f31c
Fix another issue with the nested primary key selector 2022-07-12 15:14:07 +02:00
Kerollmops
192793ee38
Add some tests to check for the nested documents ids 2022-07-12 15:14:07 +02:00
Kerollmops
a892a4a79c
Introduce a function to extend from a JSON array of objects 2022-07-12 15:14:06 +02:00
Kerollmops
dc61105554
Fix the nested document id fetching function 2022-07-12 15:14:06 +02:00
Kerollmops
2eec290424
Check the validity of the latitute and longitude numbers 2022-07-12 15:14:06 +02:00
Kerollmops
5d149d631f
Remove tests for a function that no more exists 2022-07-12 15:14:06 +02:00
Kerollmops
0bbcc7b180
Expose the DocumentId struct to be sure to inject the generated ids 2022-07-12 15:14:06 +02:00
Kerollmops
d1a4da9812
Generate a real UUIDv4 when ids are auto-generated 2022-07-12 15:14:06 +02:00
Kerollmops
c8ebf0de47
Rename the validate function as an enriching function 2022-07-12 15:14:06 +02:00
Kerollmops
905af2a2e9
Use the primary key and external id in the transform 2022-07-12 15:14:05 +02:00
Kerollmops
742543091e
Constify the default primary key name 2022-07-12 14:55:52 +02:00
Kerollmops
5f1bfb73ee
Extract the primary key name and make it accessible 2022-07-12 14:55:52 +02:00
Kerollmops
6a0a0ae94f
Make the Transform read from an EnrichedDocumentsBatchReader 2022-07-12 14:55:52 +02:00
Kerollmops
ea852200bb
Fix the format used for a geo deleting benchmark 2022-07-12 14:55:52 +02:00
Kerollmops
dc3f092d07
Do not leak an internal grenad Error 2022-07-12 14:55:52 +02:00
Kerollmops
8ebf5eed0d
Make the nested primary key work 2022-07-12 14:55:52 +02:00
Kerollmops
19eb3b4708
Make sur that we do not accept floats as documents ids 2022-07-12 14:55:52 +02:00
Kerollmops
2ceeb51c37
Support the auto-generated ids when validating documents 2022-07-12 14:55:51 +02:00
Kerollmops
399eec5c01
Fix the indexation tests 2022-07-12 14:55:51 +02:00
Kerollmops
fcfc4caf8c
Move the Object type in the lib.rs file and use it everywhere 2022-07-12 14:55:51 +02:00
Kerollmops
0146175fe6
Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
cefffde9af
Improve the .gitignore of the fuzz crate 2022-07-12 14:55:51 +02:00
Kerollmops
bdc4263883
Introduce the validate_documents_batch function 2022-07-12 14:55:51 +02:00
Kerollmops
a97d4d63b9
Fix the benchmarks 2022-07-12 14:55:50 +02:00
Kerollmops
f29114f94a
Fix http-ui to fit with the new DocumentsBatchBuilder/Reader structs 2022-07-12 14:52:56 +02:00
Kerollmops
a4ceef9624
Fix the cli for the new DocumentsBatchBuilder/Reader structs 2022-07-12 14:52:56 +02:00
Kerollmops
6d0498df24
Fix the fuzz tests 2022-07-12 14:52:56 +02:00
Kerollmops
e8297ad27e
Fix the tests for the new DocumentsBatchBuilder/Reader 2022-07-12 14:52:56 +02:00
Kerollmops
419ce3966c
Rework the DocumentsBatchBuilder/Reader to use grenad 2022-07-12 14:52:55 +02:00
Kerollmops
eb63af1f10
Update grenad to 0.4.2 2022-07-12 14:52:55 +02:00
Kerollmops
048e174efb
Do not allocate when parsing CSV headers 2022-07-12 14:52:55 +02:00
bors[bot]
ce90fc628a
Merge #583
583: Use BufReader to read datasets in benchmarks r=ManyTheFish a=loiclec

## What does this PR do?
Ensure that the datasets used by the benchmarks are read efficiently by using a `BufReader`.

## Why?
Using a `BufReader` is more representative of how `meilisearch` works. It will also make performance comparisons between different branches of `milli` more  accurate.




Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-07-07 08:13:07 +00:00
Loïc Lecrenier
aae03356cb Use BufReader to read datasets in benchmarks 2022-07-06 18:20:15 +02:00
bors[bot]
ebddfdb9a3
Merge #578
578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops

Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-05 14:56:08 +00:00
bors[bot]
eeba196053
Merge #572
572: Add reindexing benchmarks r=Kerollmops a=irevoire

With #557 coming, we should add benchmarks that measure our impact on the reindexing process.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 14:43:01 +00:00
Kerollmops
1bfdcfc84f
Bump uuid to 1.1.2 2022-07-05 16:23:36 +02:00
bors[bot]
dd1e606f13
Merge #557
557: Fasten documents deletion and update r=Kerollmops a=irevoire

When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search and all the other endpoints.

I ran the benchmarks against main;
```
% ./compare.sh indexing_main_83ad1aaf.json indexing_fasten-document-deletion_abab51fb.json
group                                                                     indexing_fasten-document-deletion_abab51fb    indexing_main_83ad1aaf
-----                                                                     ------------------------------------------    ----------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable-                 1.05      2.0±0.40ms        ? ?/sec           1.00  1904.9±190.00µs        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-           1.00     10.3±2.64ms        ? ?/sec           961.61      9.9±0.12s        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested-    1.00     15.1±3.90ms        ? ?/sec           554.63      8.4±0.12s        ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable-            1.00     45.1±7.53ms        ? ?/sec           710.15     32.0±0.10s        ? ?/sec
indexing/-wiki-delete-searchable-                                         1.00    277.8±7.97ms        ? ?/sec           1946.57    540.8±3.15s        ? ?/sec
indexing/Indexing geo_point                                               1.00      12.0±0.20s        ? ?/sec           1.03      12.4±0.19s        ? ?/sec
indexing/Indexing movies in three batches                                 1.00      19.3±0.30s        ? ?/sec           1.01      19.4±0.16s        ? ?/sec
indexing/Indexing movies with default settings                            1.00      18.8±0.09s        ? ?/sec           1.00      18.9±0.10s        ? ?/sec
indexing/Indexing nested movies with default settings                     1.00      25.9±0.19s        ? ?/sec           1.00      25.9±0.12s        ? ?/sec
indexing/Indexing nested movies without any facets                        1.00      24.8±0.17s        ? ?/sec           1.00      24.8±0.18s        ? ?/sec
indexing/Indexing songs in three batches with default settings            1.00      65.9±0.96s        ? ?/sec           1.03      67.8±0.82s        ? ?/sec
indexing/Indexing songs with default settings                             1.00      58.8±1.11s        ? ?/sec           1.02      59.9±2.09s        ? ?/sec
indexing/Indexing songs without any facets                                1.00      53.4±0.72s        ? ?/sec           1.01      54.2±0.88s        ? ?/sec
indexing/Indexing songs without faceted numbers                           1.00      57.9±1.17s        ? ?/sec           1.01      58.3±1.20s        ? ?/sec
indexing/Indexing wiki                                                    1.00   1065.2±13.26s        ? ?/sec           1.00   1065.8±12.66s        ? ?/sec
indexing/Indexing wiki in three batches                                   1.00    1182.4±6.20s        ? ?/sec           1.01    1190.8±8.48s        ? ?/sec
```

Most things do not change, we lost 0.1ms on the indexing of geo point (I don’t get why), and then we are between 500 and 1900 times faster when we delete documents.


Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 14:14:38 +00:00
Tamo
250be9fe6c
put the threshold back to 10k 2022-07-05 15:57:44 +02:00
bors[bot]
62692c171d
Merge #577
577: Fix deserialisation of NDJson documents in benchmarks r=irevoire a=loiclec

Previously, the first document in the NDJson file was read over and over again. So the `geo_point` benchmark was not working properly: it only indexed one document.

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-07-05 13:54:47 +00:00
Loïc Lecrenier
9bc7627e27 Fix deserialisation of NDJson documents in benchmarks 2022-07-05 15:51:06 +02:00
Tamo
b61efd09fc
Makes the internal soft deleted error a UserError 2022-07-05 15:34:45 +02:00
Tamo
eaf28b0628
Apply review suggestions
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-05 15:30:33 +02:00
Tamo
3b309f654a
Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Tamo
2700d8dc67
Add reindexing benchmarks 2022-07-05 14:46:46 +02:00
bors[bot]
77c837fc1b
Merge #575
575: Bump charabia r=loiclec a=irevoire

This fix #573

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 11:53:57 +00:00
Tamo
446439e8be
bump charabia 2022-07-05 12:19:30 +02:00
bors[bot]
c6f4775fde
Merge #568
568: Fix not equal filter when field contains both number and strings r=Kerollmops a=GraDKh

Related to https://github.com/meilisearch/meilisearch/issues/2516
Looks like the issue should be moved to this repo, but I'm not sure what the right procedure for it.

Co-authored-by: Dmytro Gordon <dmytro@bigstream.co>
2022-06-28 08:46:23 +00:00