187: Fix fields distribution after documents merge r=Kerollmops a=shekhirin
Resolves https://github.com/meilisearch/milli/issues/174
The problem was with calculation of fields distribution before the merge in `output_from_sorter()`. So if you'd import two documents with the same primary key value, fields distribution will count it as two documents, while `output_from_sorter()` will merge these documents into one.
---
```console
➜ Downloads cat short_movies.json
[
{"id":"47474","title":"The Serpent's Egg","poster":"https://image.tmdb.org/t/p/w500/n7z0doFkXHcvo8QQWHLFnkEPXRU.jpg","overview":"The Serpent's Egg follows a week in the life of Abel Rosenberg, an out-of-work American circus acrobat living in poverty-stricken Berlin following Germany's defeat in World War I.","release_date":246844800,"genres":["Thriller","Drama","Mystery"]},
{"id":"47474","title":"The Serpent's Egg","poster":"https://image.tmdb.org/t/p/w500/n7z0doFkXHcvo8QQWHLFnkEPXRU.jpg","overview":"The Serpent's Egg follows a week in the life of Abel Rosenberg, an out-of-work American circus acrobat living in poverty-stricken Berlin following Germany's defeat in World War I.","release_date":246844800,"genres":["Thriller","Drama","Mystery"]}
]
➜ Downloads curl -X POST -H "Content-Type: text/json" --data-binary @short_movies.json 127.0.0.1:7700/indexes/movies/documents
{"updateId":0}
```
## Before
```console
➜ Downloads curl -s 127.0.0.1:7700/indexes/movies/stats | jq
{
"numberOfDocuments": 1,
"isIndexing": false,
"fieldsDistribution": {
"release_date": 2,
"poster": 2,
"title": 2,
"overview": 2,
"genres": 2,
"id": 2
}
}
```
## After
```console
➜ Downloads curl -s 127.0.0.1:7700/indexes/movies/stats | jq
{
"numberOfDocuments": 1,
"isIndexing": false,
"fieldsDistribution": {
"poster": 1,
"release_date": 1,
"title": 1,
"genres": 1,
"id": 1,
"overview": 1
}
}
```
Co-authored-by: Alexey Shekhirin <a.shekhirin@gmail.com>
183: remove tests on main r=Kerollmops a=MarinPostma
remove testing on main since we now use bors for merging.
Co-authored-by: Marin Postma <postma.marin@protonmail.com>
- pass excluded document to criteria to remove them in higher levels of the bucket-sort
- merge already returned document with excluded documents to avoid duplicas
Related to #125 and #112Fix#170