2018-03-19 00:09:00 +01:00
|
|
|
```
|
2018-06-12 18:55:22 +02:00
|
|
|
_____ _____ _____ ___
|
2018-07-16 17:09:22 +02:00
|
|
|
| | _ |_ _|_ | Keep your data,
|
2018-03-19 00:09:00 +01:00
|
|
|
| | | | | | | | _| trash your meta!
|
|
|
|
|_|_|_|__|__| |_| |___|
|
2018-06-12 18:55:22 +02:00
|
|
|
|
2018-03-19 00:09:00 +01:00
|
|
|
```
|
|
|
|
|
2018-04-23 00:11:34 +02:00
|
|
|
This software is currently in **beta**, please don't use it for anything
|
|
|
|
critical.
|
|
|
|
|
2018-05-14 22:59:42 +02:00
|
|
|
# Metadata and privacy
|
|
|
|
|
2018-06-12 18:55:22 +02:00
|
|
|
Metadata consist of information that characterizes data.
|
|
|
|
Metadata are used to provide documentation for data products.
|
|
|
|
In essence, metadata answer who, what, when, where, why, and how about
|
|
|
|
every facet of the data that are being documented.
|
2018-05-14 22:59:42 +02:00
|
|
|
|
2018-06-12 18:55:22 +02:00
|
|
|
Metadata within a file can tell a lot about you.
|
|
|
|
Cameras record data about when a picture was taken and what
|
|
|
|
camera was used. Office documents like PDF or Office automatically adds
|
|
|
|
author and company information to documents and spreadsheets.
|
|
|
|
Maybe you don't want to disclose those information on the web.
|
2018-05-14 22:59:42 +02:00
|
|
|
|
|
|
|
This is precisely the job of MAT2: getting rid, as much as possible, of
|
|
|
|
metadata.
|
2018-03-27 21:15:30 +02:00
|
|
|
|
2018-04-01 01:06:56 +02:00
|
|
|
# Requirements
|
|
|
|
|
|
|
|
- `python3-mutagen` for audio support
|
|
|
|
- `python3-gi-cairo` and `gir1.2-poppler-0.18` for PDF support
|
|
|
|
- `gir1.2-gdkpixbuf-2.0` for images support
|
2019-03-29 19:27:26 +01:00
|
|
|
- `FFmpeg`, optionally, for video support
|
2018-04-01 01:06:56 +02:00
|
|
|
- `libimage-exiftool-perl` for everything else
|
2019-02-21 01:44:01 +01:00
|
|
|
- `bubblewrap`, optionally, for sandboxing
|
2018-04-01 01:06:56 +02:00
|
|
|
|
2018-04-14 21:35:45 +02:00
|
|
|
Please note that MAT2 requires at least Python3.5, meaning that it
|
2018-10-11 21:40:58 +02:00
|
|
|
doesn't run on [Debian Jessie](https://packages.debian.org/jessie/python3).
|
2018-03-19 00:09:00 +01:00
|
|
|
|
2018-06-13 18:49:44 +02:00
|
|
|
# Running the test suite
|
2018-03-19 00:09:00 +01:00
|
|
|
|
|
|
|
```bash
|
|
|
|
$ python3 -m unittest discover -v
|
|
|
|
```
|
2018-04-03 21:37:46 +02:00
|
|
|
|
2019-02-03 18:33:25 +01:00
|
|
|
And if you want to see the coverage:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ python3-coverage run --branch -m unittest discover -s tests/
|
|
|
|
$ python3-coverage report --include -m --include /libmat2/*'
|
|
|
|
```
|
|
|
|
|
2018-07-01 23:35:04 +02:00
|
|
|
# How to use MAT2
|
2018-05-14 22:59:42 +02:00
|
|
|
|
|
|
|
```bash
|
2018-10-03 16:12:03 +02:00
|
|
|
usage: mat2 [-h] [-v] [-l] [--check-dependencies] [-V]
|
|
|
|
[--unknown-members policy] [-s | -L]
|
|
|
|
[files [files ...]]
|
2018-07-01 23:35:04 +02:00
|
|
|
|
|
|
|
Metadata anonymisation toolkit 2
|
|
|
|
|
|
|
|
positional arguments:
|
2018-10-03 16:12:03 +02:00
|
|
|
files the files to process
|
2018-07-01 23:35:04 +02:00
|
|
|
|
|
|
|
optional arguments:
|
2018-10-03 16:12:03 +02:00
|
|
|
-h, --help show this help message and exit
|
|
|
|
-v, --version show program's version number and exit
|
|
|
|
-l, --list list all supported fileformats
|
|
|
|
--check-dependencies check if MAT2 has all the dependencies it needs
|
|
|
|
-V, --verbose show more verbose status information
|
|
|
|
--unknown-members policy
|
|
|
|
how to handle unknown members of archive-style files
|
|
|
|
(policy should be one of: abort, omit, keep)
|
|
|
|
-s, --show list harmful metadata detectable by MAT2 without
|
|
|
|
removing them
|
|
|
|
-L, --lightweight remove SOME metadata
|
2018-05-14 22:59:42 +02:00
|
|
|
```
|
|
|
|
|
2018-09-26 00:11:16 +02:00
|
|
|
Note that MAT2 **will not** clean files in-place, but will produce, for
|
|
|
|
example, with a file named "myfile.png" a cleaned version named
|
|
|
|
"myfile.cleaned.png".
|
|
|
|
|
2018-07-09 00:17:59 +02:00
|
|
|
# Notes about detecting metadata
|
|
|
|
|
|
|
|
While MAT2 is doing its very best to display metadata when the `--show` flag is
|
|
|
|
passed, it doesn't mean that a file is clean from any metadata if MAT2 doesn't
|
|
|
|
show any. There is no reliable way to detect every single possible metadata for
|
|
|
|
complex file formats.
|
|
|
|
|
|
|
|
This is why you shouldn't rely on metadata's presence to decide if your file must
|
|
|
|
be cleaned or not.
|
|
|
|
|
2018-12-08 18:39:56 +01:00
|
|
|
# Notes about the lightweight mode
|
|
|
|
|
|
|
|
By default, mat2 might alter a bit the data of your files, in order to remove
|
|
|
|
as much metadata as possible. For example, texts in PDF might not be selectable anymore,
|
|
|
|
compressed images might get compressed again, …
|
|
|
|
Since some users might be willing to trade some metadata's presence in exchange
|
|
|
|
of the guarantee that mat2 won't modify the data of their files, there is the
|
|
|
|
`-L` flag that precisely does that.
|
|
|
|
|
2018-06-13 18:49:44 +02:00
|
|
|
# Related software
|
2018-04-03 21:37:46 +02:00
|
|
|
|
2018-07-09 00:17:59 +02:00
|
|
|
- The first iteration of [MAT](https://mat.boum.org)
|
2018-04-03 21:37:46 +02:00
|
|
|
- [Exiftool](https://sno.phy.queensu.ca/~phil/exiftool/mat)
|
|
|
|
- [pdf-redact-tools](https://github.com/firstlookmedia/pdf-redact-tools), that
|
2018-04-23 00:11:34 +02:00
|
|
|
tries to deal with *printer dots* too.
|
2018-04-03 21:37:46 +02:00
|
|
|
- [pdfparanoia](https://github.com/kanzure/pdfparanoia), that removes
|
|
|
|
watermarks from PDF.
|
2018-09-06 11:20:08 +02:00
|
|
|
- [Scrambled Exif](https://f-droid.org/packages/com.jarsilio.android.scrambledeggsif/),
|
|
|
|
an open-source Android application to remove metadata from pictures.
|
2018-05-14 22:59:42 +02:00
|
|
|
|
2018-06-07 00:09:53 +02:00
|
|
|
# Contact
|
|
|
|
|
2018-09-01 16:45:20 +02:00
|
|
|
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
|
|
|
|
or the [mailing list](https://mailman.boum.org/listinfo/mat-dev)
|
|
|
|
Should a more private contact be needed (eg. for reporting security issues),
|
2018-10-01 15:51:22 +02:00
|
|
|
you can email Julien (jvoisin) Voisin at `julien.voisin+mat2@dustri.org`,
|
2018-06-07 00:09:53 +02:00
|
|
|
using the gpg key `9FCDEE9E1A381F311EA62A7404D041E8171901CC`.
|
|
|
|
|
2018-05-14 22:59:42 +02:00
|
|
|
# License
|
|
|
|
|
|
|
|
This program is free software: you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU Lesser General Public License as published by
|
|
|
|
the Free Software Foundation, either version 3 of the License, or
|
|
|
|
(at your option) any later version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU Lesser General Public License
|
|
|
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
Copyright 2018 Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org>
|
2018-08-03 21:45:41 +02:00
|
|
|
Copyright 2016 Marie Rose for MAT2's logo
|
2018-05-14 22:59:42 +02:00
|
|
|
|
|
|
|
# Thanks
|
|
|
|
|
2018-06-12 18:59:51 +02:00
|
|
|
MAT2 wouldn't exist without:
|
2018-05-14 22:59:42 +02:00
|
|
|
|
2018-06-04 23:50:55 +02:00
|
|
|
- the [Google Summer of Code](https://summerofcode.withgoogle.com/);
|
|
|
|
- the fine people from [Tails]( https://tails.boum.org);
|
2018-05-14 22:59:42 +02:00
|
|
|
- friends
|
|
|
|
|
|
|
|
Many thanks to them!
|
|
|
|
|