previously, encountering an unknown member meant that any parser of this type would abort. now, the user can set parser.unknown_member_policy to either 'omit' or 'keep' if they don't want the current action of 'abort' note that this causes pylint to complain about branching depth for remove_all() because of the nuanced error-handling. I've disabled this check.
_____ _____ _____ ___
| | _ |_ _|_ | Keep your data,
| | | | | | | | _| trash your meta!
|_|_|_|__|__| |_| |___|
This software is currently in beta, please don't use it for anything critical.
Metadata and privacy
Metadata consist of information that characterizes data. Metadata are used to provide documentation for data products. In essence, metadata answer who, what, when, where, why, and how about every facet of the data that are being documented.
Metadata within a file can tell a lot about you. Cameras record data about when a picture was taken and what camera was used. Office documents like PDF or Office automatically adds author and company information to documents and spreadsheets. Maybe you don't want to disclose those information on the web.
This is precisely the job of MAT2: getting rid, as much as possible, of metadata.
Requirements
python3-mutagen
for audio supportpython3-gi-cairo
andgir1.2-poppler-0.18
for PDF supportgir1.2-gdkpixbuf-2.0
for images supportlibimage-exiftool-perl
for everything else
Please note that MAT2 requires at least Python3.5, meaning that it doesn't run on Debian Jessie,
Running the test suite
$ python3 -m unittest discover -v
How to use MAT2
usage: mat2 [-h] [-v] [-l] [-s | -L] [files [files ...]]
Metadata anonymisation toolkit 2
positional arguments:
files
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-l, --list list all supported fileformats
-s, --show list all the harmful metadata of a file without removing
them
-L, --lightweight remove SOME metadata
Notes about detecting metadata
While MAT2 is doing its very best to display metadata when the --show
flag is
passed, it doesn't mean that a file is clean from any metadata if MAT2 doesn't
show any. There is no reliable way to detect every single possible metadata for
complex file formats.
This is why you shouldn't rely on metadata's presence to decide if your file must be cleaned or not.
Related software
- The first iteration of MAT
- Exiftool
- pdf-redact-tools, that tries to deal with printer dots too.
- pdfparanoia, that removes watermarks from PDF.
Contact
If possible, use the issues system
or the mailing list
Should a more private contact be needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at julien.voisin+mat@dustri.org
,
using the gpg key 9FCDEE9E1A381F311EA62A7404D041E8171901CC
.
License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Copyright 2018 Julien (jvoisin) Voisin julien.voisin+mat2@dustri.org Copyright 2016 Marie Rose for MAT2's logo
Thanks
MAT2 wouldn't exist without:
- the Google Summer of Code;
- the fine people from Tails;
- friends
Many thanks to them!