1045: Revert "Merge #1037" r=MarinPostma a=MarinPostma
This reverts commit 257f9fb2b2, reversing
changes made to 9bae7a35bf.
The reason fo this is that de-unicoding is not always desirable (for example is the case of CJK documents). This cannot be handled correctly for now, and will necessitate work on the tokenizer.
Co-authored-by: mpostma <postma.marin@protonmail.com>
1037: Synonym unidecode r=Kerollmops a=MarinPostma
fix#964
- unidecodes all synonyms before adding them to the synonyms fst
- stores a copy of the original synonyms (unicoded) for later retrieve
Co-authored-by: mpostma <postma.marin@protonmail.com>
1032: Remove not maintained csv movies dataset r=MarinPostma a=bidoubiwa
Remove `movies.csv` from the dataset folder as it is not updated and not usable with MeiliSearch without converting it to json.
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
This allows us to:
- transform a CSV, a JSON or a JSON lines data type into the same
Grenad x Obkv streamable data type and creates the new FieldsIdsMap.
- Extract all the documents user ids in advance to be able to delete
the existing documents before re-indexing them.
- Keep the last documents with the same user id avoiding duplicates
in the same request.
1040: Update movie posters r=Kerollmops a=bidoubiwa
This PR resolves 3 issues:
1. update posters URLs that changed
2. All posters point to a smaller image ( +- 20kb instead of 500kb+-) this was done by changing the width size from 1280 px to 500 px.
3. Remove films that are not in the tmdb database
Co-authored-by: Charlotte Vermandel <charlottevermandel@gmail.com>
1038: Add Sandbox section to README.md r=LegendreM a=eskombro
This PR adds a link to [MeiliSearch Sandbox](https://sandbox.meilisearch.com/) in the README.md
Co-authored-by: Samuel Jimenez <sjimenezre@gmail.com>