From 7992cd0d51c3b858f36e74abd76ceef986b51df8 Mon Sep 17 00:00:00 2001 From: jvoisin Date: Sun, 1 Apr 2018 15:36:45 +0200 Subject: [PATCH] Add some documentation --- doc/implementation_notes.md | 33 ++++++++++++++ doc/threat_model.md | 85 +++++++++++++++++++++++++++++++++++++ 2 files changed, 118 insertions(+) create mode 100644 doc/implementation_notes.md create mode 100644 doc/threat_model.md diff --git a/doc/implementation_notes.md b/doc/implementation_notes.md new file mode 100644 index 0000000..bc83671 --- /dev/null +++ b/doc/implementation_notes.md @@ -0,0 +1,33 @@ +Implementation notes +==================== + +Symlink attacks +--------------- + +MAT2 output predictable filenames (like yourfile.jpg.cleaned). +This may lead to symlink attack. Please check if you OS prevent +against them + +Archives handling +----------------- + +MAT2 doesn't support archives yet, because we haven't found an usable way to ask the user +what to do when a non-supported files are encountered. + +PDF handling +------------ + +MAT was doing some kind of rendering for PDF files, on a cairo surface, then +printed it to a file. This kept the text selectable, but unfortunately, it +didn't remove any *deep metadata*, like the ones in embedded pictures. This was +on of the reason MAT was abandoned: the absence of satisfying solution to +handle PDF. But apparently, people are ok with [pdf redact +tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply +transform the PDF into images. So this is what's MAT2 is doing too. + +Images handling +--------------- + +When possible, images are handled like PDF: rendered on a surface, then saved +to the filesystem. This ensures that every metadata is removed. + diff --git a/doc/threat_model.md b/doc/threat_model.md new file mode 100644 index 0000000..6d14ca6 --- /dev/null +++ b/doc/threat_model.md @@ -0,0 +1,85 @@ +Threat Model +============ +The Metadata Anonymisation Toolkit 2 adversary has a number +of goals, capabilities, and counter-attack types that can be +used to guide us towards a set of requirements for the MAT2. + +This is an overhaul of MAT's (the first iteration of the software) one. + +Warnings +-------- + +Mat only removes standard metadata from your files, it does _not_: + + - anonymise their content + - handle watermarking + - handle steganography + - handle any non-standard metadata field/system + +If you really want to be anonymous format that does not contain any +metadata, or better : use plain-text. And as usual, think before clicking. + + +Adversary +------------ + +* Goals: + + - Identifying the source of the document, since a document + always has one. Who/where/when/how was a picture + taken, where was the document leaked from and by + whom, ... + + - Identify the author; in some cases documents may be + anonymously authored or created. In these cases, + identifying the author is the goal. + + - Identify the equipment/software used. If the attacker fails + to directly identify the author and/or source, his next + goal is to determine the source of the equipment used + to produce, copy, and transmit the document. This can + include the model of camera used to take a photo, or + which software was used to produce an office document. + + +* Adversary Capabilities - Positioning + - The adversary created the document specifically for this + user. This is the strongest position for the adversary to + have. In this case, the adversary is capable of inserting + arbitrary, custom watermarks specifically for tracking + the user. In general, MAT cannot defend against this + adversary, but we list it for completeness. + + - The adversary created the document for a group of users. + In this case, the adversary knows that they attempted to + limit distribution to a specific group of users. They may + or may not have watermarked the document for these + users, but they certainly know the format used. + + - The adversary did not create the document, the weakest + position for the adversary to have. The file format is (most of the time) + standard, nothing custom is added: MAT + should be able to remove all meta-information from the + file. + +Requirements +--------------- + +* Processing + - The MAT2 *should* avoid interactions with information. + Its goal is to remove metadata, and the user is solely + responsible for the information of the file. + + - The MAT2 *must* warn when encountering an unknown + format. For example, in a zipfile, if MAT encounters an + unknown format, it should warn the user, and ask if the + file should be added to the anonymised archive that is + produced. + + - The MAT2 *must* not add metadata, since its purpose is to + anonymise files: every added items of metadata decreases + anonymity. + + - The MAT2 *should* handle unknown/hidden metadata fields, + like proprietary extensions of open formats. +