bors[bot]
50f6524ff2
Merge #579
...
579: Stop reindexing already indexed documents r=ManyTheFish a=irevoire
```
% ./compare.sh indexing_stop-reindexing-unchanged-documents_cb5a1669.json indexing_main_eeba1960.json
group indexing_main_eeba1960 indexing_stop-reindexing-unchanged-documents_cb5a1669
----- ---------------------- -----------------------------------------------------
indexing/-geo-delete-facetedNumber-facetedGeo-searchable- 1.03 2.0±0.22ms ? ?/sec 1.00 1955.4±336.24µs ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable- 1.08 11.0±2.93ms ? ?/sec 1.00 10.2±4.04ms ? ?/sec
indexing/-movies-delete-facetedString-facetedNumber-searchable-nested- 1.00 15.1±3.89ms ? ?/sec 1.14 17.1±5.18ms ? ?/sec
indexing/-songs-delete-facetedString-facetedNumber-searchable- 1.26 59.2±12.01ms ? ?/sec 1.00 47.1±8.52ms ? ?/sec
indexing/-wiki-delete-searchable- 1.08 316.6±31.53ms ? ?/sec 1.00 293.6±17.00ms ? ?/sec
indexing/Indexing geo_point 1.01 60.9±0.31s ? ?/sec 1.00 60.6±0.36s ? ?/sec
indexing/Indexing movies in three batches 1.04 20.0±0.30s ? ?/sec 1.00 19.2±0.25s ? ?/sec
indexing/Indexing movies with default settings 1.02 19.1±0.18s ? ?/sec 1.00 18.7±0.24s ? ?/sec
indexing/Indexing nested movies with default settings 1.02 26.2±0.29s ? ?/sec 1.00 25.9±0.22s ? ?/sec
indexing/Indexing nested movies without any facets 1.02 25.3±0.32s ? ?/sec 1.00 24.7±0.26s ? ?/sec
indexing/Indexing songs in three batches with default settings 1.00 66.7±0.41s ? ?/sec 1.01 67.1±0.86s ? ?/sec
indexing/Indexing songs with default settings 1.00 58.3±0.90s ? ?/sec 1.01 58.8±1.32s ? ?/sec
indexing/Indexing songs without any facets 1.00 54.5±1.43s ? ?/sec 1.01 55.2±1.29s ? ?/sec
indexing/Indexing songs without faceted numbers 1.00 57.9±1.20s ? ?/sec 1.01 58.4±0.93s ? ?/sec
indexing/Indexing wiki 1.00 1052.0±10.95s ? ?/sec 1.02 1069.4±20.38s ? ?/sec
indexing/Indexing wiki in three batches 1.00 1193.1±8.83s ? ?/sec 1.00 1189.5±9.40s ? ?/sec
indexing/Reindexing geo_point 3.22 67.5±0.73s ? ?/sec 1.00 21.0±0.16s ? ?/sec
indexing/Reindexing movies with default settings 3.75 19.4±0.28s ? ?/sec 1.00 5.2±0.05s ? ?/sec
indexing/Reindexing songs with default settings 8.90 61.4±0.91s ? ?/sec 1.00 6.9±0.07s ? ?/sec
indexing/Reindexing wiki 1.00 1748.2±35.68s ? ?/sec 1.00 1750.5±18.53s ? ?/sec
```
tldr: We do not lose any performance on the normal indexing benchmark, but we get between 3 and 8 times faster on the reindexing benchmarks 👍
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-08-04 08:10:37 +00:00
bors[bot]
e8987cf5aa
Merge #599
...
599: fix: Remove whitespace trimming during document id validation r=ManyTheFish a=ManyTheFish
fix #592
related to https://github.com/meilisearch/meilisearch/issues/2640
Co-authored-by: ManyTheFish <many@meilisearch.com>
2022-08-03 14:55:25 +00:00
ManyTheFish
d6f9a60a32
fix: Remove whitespace trimming during document id validation
...
fix #592
2022-08-03 11:38:40 +02:00
Tamo
7fc35c5586
remove the useless prints
2022-08-02 10:31:22 +02:00
Tamo
f156d7dd3b
Stop reindexing already indexed documents
2022-08-02 10:31:20 +02:00
Loïc Lecrenier
1fe224f2c6
Update filter-parser/fuzz/.gitignore
...
Co-authored-by: Many the fish <many@meilisearch.com>
2022-07-21 16:12:01 +02:00
Loïc Lecrenier
07003704a8
Merge branch 'filter/field-exist'
2022-07-21 14:51:41 +02:00
bors[bot]
e1bc610d27
Merge #595
...
595: Update version for next release (v0.32.0) r=ManyTheFish a=curquiza
In order to release on `main` (for v0.29.0, not v0.28.1)
<img width="1014" alt="Capture d’écran 2022-07-21 à 13 20 35" src="https://user-images.githubusercontent.com/20380692/180178725-381fbdf1-c0fb-4fa9-9954-452aec5a1574.png ">
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
2022-07-21 11:07:42 +00:00
Clémentine Urquizar
d5e9b7305b
Update version for next release (v0.32.0)
2022-07-21 13:20:02 +04:00
ManyTheFish
cbb3b25459
Fix(Search): Fix phrase search candidates computation
...
This bug is an old bug but was hidden by the proximity criterion,
Phrase search were always returning an empty candidates list.
Before the fix, we were trying to find any words[n] near words[n]
instead of finding any words[n] near words[n+1], for example:
for a phrase search '"Hello world"' we were searching for "hello" near "hello" first, instead of "hello" near "world".
2022-07-21 10:04:30 +02:00
bors[bot]
941af58239
Merge #561
...
561: Enriched documents batch reader r=curquiza a=Kerollmops
~This PR is based on #555 and must be rebased on main after it has been merged to ease the review.~
This PR contains the work in #555 and can be merged on main as soon as reviewed and approved.
- [x] Create an `EnrichedDocumentsBatchReader` that contains the external documents id.
- [x] Extract the primary key name and make it accessible in the `EnrichedDocumentsBatchReader`.
- [x] Use the external id from the `EnrichedDocumentsBatchReader` in the `Transform::read_documents`.
- [x] Remove the `update_primary_key` from the _transform.rs_ file.
- [x] Really generate the auto-generated documents ids.
- [x] Insert the (auto-generated) document ids in the document while processing it in `Transform::read_documents`.
Co-authored-by: Kerollmops <clement@meilisearch.com>
2022-07-21 07:08:50 +00:00
Loïc Lecrenier
41a0ce07cb
Add a code comment, as suggested in PR review
...
Co-authored-by: Many the fish <many@meilisearch.com>
2022-07-20 16:20:35 +02:00
Loïc Lecrenier
1506683705
Avoid using too much memory when indexing facet-exists-docids
2022-07-19 14:42:35 +02:00
Loïc Lecrenier
d0eee5ff7a
Fix compiler error
2022-07-19 13:54:30 +02:00
Loïc Lecrenier
aed8c69bcb
Refactor indexation of the "facet-id-exists-docids" database
...
The idea is to directly create a sorted and merged list of bitmaps
in the form of a BTreeMap<FieldId, RoaringBitmap> instead of creating
a grenad::Reader where the keys are field_id and the values are docids.
Then we send that BTreeMap to the thing that handles TypedChunks, which
inserts its content into the database.
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
1eb1e73bb3
Add integration tests for the EXISTS filter
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
4f0bd317df
Remove custom implementation of BytesEncode/Decode for the FieldId
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
80b962b4f4
Run cargo fmt
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
ea0642c32d
Make filter parser more strict regarding spacing around operators
...
OR, AND, NOT, TO must now be followed by spaces
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
c17d616250
Refactor index_documents_check_exists_database tests
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
30bd4db0fc
Simplify indexing task for facet_exists_docids database
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
392472f4bb
Apply suggestions from code review
...
Co-authored-by: Tamo <tamo@meilisearch.com>
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
bd15f5625a
Fix compiler warning
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
722db7b088
Ignore target directory of filter-parser/fuzz crate
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
a5c9162250
Improve parser for NOT EXISTS filter
...
Allow multiple spaces between NOT and EXISTS
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
0388b2d463
Run cargo fmt
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
dc64170a69
Improve syntax of EXISTS filter, allow “value NOT EXISTS”
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
72452f0cb2
Implements the EXIST filter operator
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
a8641b42a7
Modify flatten_serde_json to keep dummy value for all object keys
...
Example:
```json
{
"id": 0,
"colour" : { "green": 1 }
}
```
becomes:
```json
{
"id": 0,
"colour" : [],
"colour.green": 1
}
```
to retain the information the key "colour" exists in the original
json value.
2022-07-19 10:07:33 +02:00
Loïc Lecrenier
453d593ce8
Add a database containing the docids where each field exists
2022-07-19 10:07:33 +02:00
bors[bot]
5704235521
Merge #584
...
584: Chores: Enhance smart-crop code comments r=curquiza a=ManyTheFish
Enhance explanation around smart crop algorithms
Co-authored-by: ManyTheFish <many@meilisearch.com>
Co-authored-by: Many the fish <many@meilisearch.com>
2022-07-19 07:08:14 +00:00
bors[bot]
f6415b679f
Merge #588
...
588: Fix name of "release_date" facet in movies benchmarks r=ManyTheFish a=loiclec
## What does this PR do?
The `movies.json` file in the benchmark datasets contains a filterable field called "release_date", but the indexing benchmarks wrongly called the field "released_date" instead. This PR fixes that.
Co-authored-by: Loïc Lecrenier <loic@meilisearch.com>
2022-07-18 15:51:09 +00:00
Many the fish
2d79720f5d
Update milli/src/search/matches/mod.rs
2022-07-18 17:48:04 +02:00
Many the fish
8ddb4e750b
Update milli/src/search/matches/mod.rs
2022-07-18 17:47:39 +02:00
Many the fish
a277daa1f2
Update milli/src/search/matches/mod.rs
2022-07-18 17:47:13 +02:00
Many the fish
fb794c6b5e
Update milli/src/search/matches/mod.rs
2022-07-18 17:46:00 +02:00
Many the fish
1237cfc249
Update milli/src/search/matches/mod.rs
2022-07-18 17:45:37 +02:00
Many the fish
d7fd5c58cd
Update milli/src/search/matches/mod.rs
2022-07-18 17:45:06 +02:00
Loïc Lecrenier
fc9f3f31e7
Change DocumentsBatchReader to access cursor and index at same time
...
Otherwise it is not possible to iterate over all documents while
using the fields index at the same time.
2022-07-18 16:08:14 +02:00
Loïc Lecrenier
ab1571cdec
Simplify Transform::read_documents, enabled by enriched documents reader
2022-07-18 12:45:47 +02:00
Loïc Lecrenier
8270e2b768
Fix name of "release_date" facet in movies benchmarks
2022-07-18 10:34:12 +02:00
Many the fish
e261ef64d7
Update milli/src/search/matches/mod.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:51 +02:00
Many the fish
1da4ab5918
Update milli/src/search/matches/mod.rs
...
Co-authored-by: Clément Renault <clement@meilisearch.com>
2022-07-18 10:18:03 +02:00
Kerollmops
448114cc1c
Fix the benchmarks with the new indexation API
2022-07-12 15:22:09 +02:00
Kerollmops
25e768f31c
Fix another issue with the nested primary key selector
2022-07-12 15:14:07 +02:00
Kerollmops
192793ee38
Add some tests to check for the nested documents ids
2022-07-12 15:14:07 +02:00
Kerollmops
a892a4a79c
Introduce a function to extend from a JSON array of objects
2022-07-12 15:14:06 +02:00
Kerollmops
dc61105554
Fix the nested document id fetching function
2022-07-12 15:14:06 +02:00
Kerollmops
2eec290424
Check the validity of the latitute and longitude numbers
2022-07-12 15:14:06 +02:00
Kerollmops
5d149d631f
Remove tests for a function that no more exists
2022-07-12 15:14:06 +02:00