Introduce the Transform struct transforming CSVs

This allows us to:
  - transform a CSV, a JSON or a JSON lines data type into the same
    Grenad x Obkv streamable data type and creates the new FieldsIdsMap.
  - Extract all the documents user ids in advance to be able to delete
    the existing documents before re-indexing them.
  - Keep the last documents with the same user id avoiding duplicates
    in the same request.
This commit is contained in:
Clément Renault 2020-10-23 14:11:00 +02:00
parent 8d82e37ec0
commit 656a851830
No known key found for this signature in database
GPG key ID: 92ADA4E935E71FA4
3 changed files with 158 additions and 1 deletions

View file

@ -21,8 +21,9 @@ use self::merge_function::{
docid_word_positions_merge, documents_merge,
};
mod store;
mod merge_function;
mod store;
mod transform;
#[derive(Debug, Clone, StructOpt)]
pub struct IndexerOpt {