Add some documentation
This commit is contained in:
parent
9e7a4bd217
commit
7992cd0d51
33
doc/implementation_notes.md
Normal file
33
doc/implementation_notes.md
Normal file
@ -0,0 +1,33 @@
|
|||||||
|
Implementation notes
|
||||||
|
====================
|
||||||
|
|
||||||
|
Symlink attacks
|
||||||
|
---------------
|
||||||
|
|
||||||
|
MAT2 output predictable filenames (like yourfile.jpg.cleaned).
|
||||||
|
This may lead to symlink attack. Please check if you OS prevent
|
||||||
|
against them
|
||||||
|
|
||||||
|
Archives handling
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
MAT2 doesn't support archives yet, because we haven't found an usable way to ask the user
|
||||||
|
what to do when a non-supported files are encountered.
|
||||||
|
|
||||||
|
PDF handling
|
||||||
|
------------
|
||||||
|
|
||||||
|
MAT was doing some kind of rendering for PDF files, on a cairo surface, then
|
||||||
|
printed it to a file. This kept the text selectable, but unfortunately, it
|
||||||
|
didn't remove any *deep metadata*, like the ones in embedded pictures. This was
|
||||||
|
on of the reason MAT was abandoned: the absence of satisfying solution to
|
||||||
|
handle PDF. But apparently, people are ok with [pdf redact
|
||||||
|
tools](https://github.com/firstlookmedia/pdf-redact-tools), that simply
|
||||||
|
transform the PDF into images. So this is what's MAT2 is doing too.
|
||||||
|
|
||||||
|
Images handling
|
||||||
|
---------------
|
||||||
|
|
||||||
|
When possible, images are handled like PDF: rendered on a surface, then saved
|
||||||
|
to the filesystem. This ensures that every metadata is removed.
|
||||||
|
|
85
doc/threat_model.md
Normal file
85
doc/threat_model.md
Normal file
@ -0,0 +1,85 @@
|
|||||||
|
Threat Model
|
||||||
|
============
|
||||||
|
The Metadata Anonymisation Toolkit 2 adversary has a number
|
||||||
|
of goals, capabilities, and counter-attack types that can be
|
||||||
|
used to guide us towards a set of requirements for the MAT2.
|
||||||
|
|
||||||
|
This is an overhaul of MAT's (the first iteration of the software) one.
|
||||||
|
|
||||||
|
Warnings
|
||||||
|
--------
|
||||||
|
|
||||||
|
Mat only removes standard metadata from your files, it does _not_:
|
||||||
|
|
||||||
|
- anonymise their content
|
||||||
|
- handle watermarking
|
||||||
|
- handle steganography
|
||||||
|
- handle any non-standard metadata field/system
|
||||||
|
|
||||||
|
If you really want to be anonymous format that does not contain any
|
||||||
|
metadata, or better : use plain-text. And as usual, think before clicking.
|
||||||
|
|
||||||
|
|
||||||
|
Adversary
|
||||||
|
------------
|
||||||
|
|
||||||
|
* Goals:
|
||||||
|
|
||||||
|
- Identifying the source of the document, since a document
|
||||||
|
always has one. Who/where/when/how was a picture
|
||||||
|
taken, where was the document leaked from and by
|
||||||
|
whom, ...
|
||||||
|
|
||||||
|
- Identify the author; in some cases documents may be
|
||||||
|
anonymously authored or created. In these cases,
|
||||||
|
identifying the author is the goal.
|
||||||
|
|
||||||
|
- Identify the equipment/software used. If the attacker fails
|
||||||
|
to directly identify the author and/or source, his next
|
||||||
|
goal is to determine the source of the equipment used
|
||||||
|
to produce, copy, and transmit the document. This can
|
||||||
|
include the model of camera used to take a photo, or
|
||||||
|
which software was used to produce an office document.
|
||||||
|
|
||||||
|
|
||||||
|
* Adversary Capabilities - Positioning
|
||||||
|
- The adversary created the document specifically for this
|
||||||
|
user. This is the strongest position for the adversary to
|
||||||
|
have. In this case, the adversary is capable of inserting
|
||||||
|
arbitrary, custom watermarks specifically for tracking
|
||||||
|
the user. In general, MAT cannot defend against this
|
||||||
|
adversary, but we list it for completeness.
|
||||||
|
|
||||||
|
- The adversary created the document for a group of users.
|
||||||
|
In this case, the adversary knows that they attempted to
|
||||||
|
limit distribution to a specific group of users. They may
|
||||||
|
or may not have watermarked the document for these
|
||||||
|
users, but they certainly know the format used.
|
||||||
|
|
||||||
|
- The adversary did not create the document, the weakest
|
||||||
|
position for the adversary to have. The file format is (most of the time)
|
||||||
|
standard, nothing custom is added: MAT
|
||||||
|
should be able to remove all meta-information from the
|
||||||
|
file.
|
||||||
|
|
||||||
|
Requirements
|
||||||
|
---------------
|
||||||
|
|
||||||
|
* Processing
|
||||||
|
- The MAT2 *should* avoid interactions with information.
|
||||||
|
Its goal is to remove metadata, and the user is solely
|
||||||
|
responsible for the information of the file.
|
||||||
|
|
||||||
|
- The MAT2 *must* warn when encountering an unknown
|
||||||
|
format. For example, in a zipfile, if MAT encounters an
|
||||||
|
unknown format, it should warn the user, and ask if the
|
||||||
|
file should be added to the anonymised archive that is
|
||||||
|
produced.
|
||||||
|
|
||||||
|
- The MAT2 *must* not add metadata, since its purpose is to
|
||||||
|
anonymise files: every added items of metadata decreases
|
||||||
|
anonymity.
|
||||||
|
|
||||||
|
- The MAT2 *should* handle unknown/hidden metadata fields,
|
||||||
|
like proprietary extensions of open formats.
|
||||||
|
|
Loading…
Reference in New Issue
Block a user