Commit Graph

1840 Commits

Author SHA1 Message Date
Loïc Lecrenier 453d593ce8 Add a database containing the docids where each field exists 2022-07-19 10:07:33 +02:00
bors[bot] 5704235521
Merge #584
584: Chores: Enhance smart-crop code comments r=curquiza a=ManyTheFish

Enhance explanation around smart crop algorithms

Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2022-07-19 07:08:14 +00:00
bors[bot] f6415b679f
Merge #588
588: Fix name of "release_date" facet in movies benchmarks r=ManyTheFish a=loiclec

## What does this PR do?
The `movies.json` file in the benchmark datasets contains a filterable field called "release_date", but the indexing benchmarks wrongly called the field "released_date" instead. This PR fixes that.


Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-07-18 15:51:09 +00:00
Many the fish 2d79720f5d
Update milli/src/search/matches/mod.rs 2022-07-18 17:48:04 +02:00
Many the fish 8ddb4e750b
Update milli/src/search/matches/mod.rs 2022-07-18 17:47:39 +02:00
Many the fish a277daa1f2
Update milli/src/search/matches/mod.rs 2022-07-18 17:47:13 +02:00
Many the fish fb794c6b5e
Update milli/src/search/matches/mod.rs 2022-07-18 17:46:00 +02:00
Many the fish 1237cfc249
Update milli/src/search/matches/mod.rs 2022-07-18 17:45:37 +02:00
Many the fish d7fd5c58cd
Update milli/src/search/matches/mod.rs 2022-07-18 17:45:06 +02:00
Loïc Lecrenier 8270e2b768 Fix name of "release_date" facet in movies benchmarks 2022-07-18 10:34:12 +02:00
Many the fish e261ef64d7
Update milli/src/search/matches/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:51 +02:00
Many the fish 1da4ab5918
Update milli/src/search/matches/mod.rs
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:03 +02:00
ManyTheFish 5d79617a56 Chores: Enhance smart-crop code comments 2022-07-07 16:28:09 +02:00
bors[bot] ce90fc628a
Merge #583
583: Use BufReader to read datasets in benchmarks r=ManyTheFish a=loiclec

## What does this PR do?
Ensure that the datasets used by the benchmarks are read efficiently by using a `BufReader`.

## Why?
Using a `BufReader` is more representative of how `meilisearch` works. It will also make performance comparisons between different branches of `milli` more  accurate.




Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-07-07 08:13:07 +00:00
Loïc Lecrenier aae03356cb Use BufReader to read datasets in benchmarks 2022-07-06 18:20:15 +02:00
bors[bot] ebddfdb9a3
Merge #578
578: Bump uuid to 1.1.2 r=ManyTheFish a=Kerollmops

Just to [align the version with Meilisearch](https://github.com/meilisearch/meilisearch/pull/2584).

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-05 14:56:08 +00:00
bors[bot] eeba196053
Merge #572
572: Add reindexing benchmarks r=Kerollmops a=irevoire

With #557 coming, we should add benchmarks that measure our impact on the reindexing process.

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 14:43:01 +00:00
Kerollmops 1bfdcfc84f
Bump uuid to 1.1.2 2022-07-05 16:23:36 +02:00
bors[bot] dd1e606f13
Merge #557
557: Fasten documents deletion and update r=Kerollmops a=irevoire

When a document deletion occurs, instead of deleting the document we mark it as deleted in the new “soft deleted” bitmap. It is then removed from the search and all the other endpoints.

I ran the benchmarks against main;
```
% ./compare.sh indexing_main_83ad1aaf.json indexing_fasten-document-deletion_abab51fb.json
group                                                                     indexing_fasten-document-deletion_abab51fb    indexing_main_83ad1aaf
-----                                                                     ------------------------------------------    ----------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable-                 1.05      2.0±0.40ms        ? ?/sec           1.00  1904.9±190.00µs        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-           1.00     10.3±2.64ms        ? ?/sec           961.61      9.9±0.12s        ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested-    1.00     15.1±3.90ms        ? ?/sec           554.63      8.4±0.12s        ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable-            1.00     45.1±7.53ms        ? ?/sec           710.15     32.0±0.10s        ? ?/sec
indexing/-wiki-delete-searchable-                                         1.00    277.8±7.97ms        ? ?/sec           1946.57    540.8±3.15s        ? ?/sec
indexing/Indexing geo_point                                               1.00      12.0±0.20s        ? ?/sec           1.03      12.4±0.19s        ? ?/sec
indexing/Indexing movies in three batches                                 1.00      19.3±0.30s        ? ?/sec           1.01      19.4±0.16s        ? ?/sec
indexing/Indexing movies with default settings                            1.00      18.8±0.09s        ? ?/sec           1.00      18.9±0.10s        ? ?/sec
indexing/Indexing nested movies with default settings                     1.00      25.9±0.19s        ? ?/sec           1.00      25.9±0.12s        ? ?/sec
indexing/Indexing nested movies without any facets                        1.00      24.8±0.17s        ? ?/sec           1.00      24.8±0.18s        ? ?/sec
indexing/Indexing songs in three batches with default settings            1.00      65.9±0.96s        ? ?/sec           1.03      67.8±0.82s        ? ?/sec
indexing/Indexing songs with default settings                             1.00      58.8±1.11s        ? ?/sec           1.02      59.9±2.09s        ? ?/sec
indexing/Indexing songs without any facets                                1.00      53.4±0.72s        ? ?/sec           1.01      54.2±0.88s        ? ?/sec
indexing/Indexing songs without faceted numbers                           1.00      57.9±1.17s        ? ?/sec           1.01      58.3±1.20s        ? ?/sec
indexing/Indexing wiki                                                    1.00   1065.2±13.26s        ? ?/sec           1.00   1065.8±12.66s        ? ?/sec
indexing/Indexing wiki in three batches                                   1.00    1182.4±6.20s        ? ?/sec           1.01    1190.8±8.48s        ? ?/sec
```

Most things do not change, we lost 0.1ms on the indexing of geo point (I don’t get why), and then we are between 500 and 1900 times faster when we delete documents.


Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 14:14:38 +00:00
Tamo 250be9fe6c
put the threshold back to 10k 2022-07-05 15:57:44 +02:00
bors[bot] 62692c171d
Merge #577
577: Fix deserialisation of NDJson documents in benchmarks r=irevoire a=loiclec

Previously, the first document in the NDJson file was read over and over again. So the `geo_point` benchmark was not working properly: it only indexed one document.

Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-07-05 13:54:47 +00:00
Loïc Lecrenier 9bc7627e27 Fix deserialisation of NDJson documents in benchmarks 2022-07-05 15:51:06 +02:00
Tamo b61efd09fc
Makes the internal soft deleted error a UserError 2022-07-05 15:34:45 +02:00
Tamo eaf28b0628
Apply review suggestions
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-05 15:30:33 +02:00
Tamo 3b309f654a
Fasten the document deletion
When a document deletion occurs, instead of deleting the document we mark it as deleted
in the new “soft deleted” bitmap. It is then removed from the search, and all the other
endpoints.
2022-07-05 15:30:33 +02:00
Tamo 2700d8dc67
Add reindexing benchmarks 2022-07-05 14:46:46 +02:00
bors[bot] 77c837fc1b
Merge #575
575: Bump charabia r=loiclec a=irevoire

This fix #573

Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-05 11:53:57 +00:00
Tamo 446439e8be
bump charabia 2022-07-05 12:19:30 +02:00
bors[bot] c6f4775fde
Merge #568
568: Fix not equal filter when field contains both number and strings r=Kerollmops a=GraDKh

Related to https://github.com/meilisearch/meilisearch/issues/2516
Looks like the issue should be moved to this repo, but I'm not sure what the right procedure for it.

Co-authored-by: Dmytro Gordon <dmytro@bigstream.co>
2022-06-28 08:46:23 +00:00
Dmytro Gordon 3ff03a3f5f Fix not equal filter when field contains both number and strings 2022-06-27 15:55:17 +03:00
bors[bot] 83ad1aaf05
Merge #567
567: Bump the milli version to 0.31.1 r=curquiza a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 15:07:03 +00:00
Kerollmops cc48992e79
Bump the milli version to 0.31.1 2022-06-22 17:05:51 +02:00
bors[bot] 68bb170732
Merge #566
566: Introduce the copy_to_path method on the Index r=irevoire a=Kerollmops

Meilisearch needs this method to do snapshots.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 14:52:19 +00:00
Kerollmops 238692a8e7
Introduce the copy_to_path method on the Index 2022-06-22 16:49:47 +02:00
bors[bot] 290a40b7a5
Merge #564
564: Rename the limitedTo parameter into maxTotalHits r=curquiza a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2542, it renames the `limitedTo` parameter into `maxTotalHits`.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 13:48:33 +00:00
bors[bot] d546f6f40e
Merge #563
563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops

This PR is related to https://github.com/meilisearch/meilisearch/issues/2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way.

The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. 

The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned.

If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles.

Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 12:39:44 +00:00
bors[bot] 38a8d3cae1
Merge #565
565: Bump the milli version to 0.31.0 r=curquiza a=Kerollmops



Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-06-22 10:09:41 +00:00
Kerollmops f5c3b951bc
Bump the milli version to 0.31.0 2022-06-22 12:08:16 +02:00
Kerollmops d7c248042b
Rename the limitedTo parameter into maxTotalHits 2022-06-22 12:00:48 +02:00
Kerollmops d2f84a9d9e
Improve the estimatedNbHits when distinct is enabled 2022-06-22 11:39:21 +02:00
bors[bot] 4f547eff02
Merge #560
560: Update version for next release (v0.30.0) r=curquiza a=curquiza



Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-06-20 12:37:01 +00:00
bors[bot] 64b833410c
Merge #559
559: Avoid having an ending separator before crop marker r=irevoire a=ManyTheFish

related to https://github.com/meilisearch/meilisearch/issues/2528


Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-06-20 11:06:52 +00:00
Clémentine Urquizar 31f749b5d8
Update version for next release (v0.30.0) 2022-06-20 12:09:57 +02:00
ManyTheFish a0ab90a4d7 Avoid having an ending separator before crop marker 2022-06-16 18:23:57 +02:00
bors[bot] a59ae19842
Merge #558
558: Deletion benchmarks r=ManyTheFish a=ManyTheFish

Add benchmarks on the deletion and start rethinking benchmark names.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-06-16 09:34:37 +00:00
ManyTheFish 2652310f2a Change delete benchmark names 2022-06-16 10:32:58 +02:00
ManyTheFish adbb0ff318 Add deletion benchmarks 2022-06-16 10:17:58 +02:00
bors[bot] 0a5d1a445e
Merge #554
554: Enhance tests for soft deletetion r=irevoire a=ManyTheFish

#### tests: (skip in changelog)
- [x] placeholder search shouldn’t return soft deleted
- [x] search shouldn’t return soft deleted
- [x] filtered placeholder search shouldn’t return soft deleted
- [x] geo-filtered placeholder search shouldn’t return soft deleted
- [x] documents list/get shouldn’t return soft deleted
- [x] stats shouldn’t count soft deleted

#### other: (API breaking)
- [x] ensure that Index methods are not bypassed by Meilisearch


Poke `@irevoire,` we may merge this into your branch.

Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-06-14 09:49:37 +00:00
ManyTheFish 447195a27a Replace format by to_string 2022-06-14 10:32:44 +02:00
ManyTheFish 177154828c Extends deletion tests 2022-06-13 17:34:16 +02:00