1
0
mirror of synced 2024-11-24 18:24:23 +01:00
mat2/README.md

194 lines
6.9 KiB
Markdown
Raw Normal View History

2018-03-19 00:09:00 +01:00
```
_____ _____ _____ ___
2018-07-16 17:09:22 +02:00
| | _ |_ _|_ | Keep your data,
| | | | |_| | | | | _| trash your meta!
|_|_|_|_| |_| |_| |___|
2018-03-19 00:09:00 +01:00
```
2018-05-14 22:59:42 +02:00
# Metadata and privacy
Metadata consist of information that characterizes data.
Metadata are used to provide documentation for data products.
In essence, metadata answer who, what, when, where, why, and how about
every facet of the data that are being documented.
2018-05-14 22:59:42 +02:00
Metadata within a file can tell a lot about you.
Cameras record data about when a picture was taken and what
camera was used. Office documents like PDF or Office automatically adds
author and company information to documents and spreadsheets.
Maybe you don't want to disclose those information.
2018-05-14 22:59:42 +02:00
2019-11-28 03:15:20 +01:00
This is precisely the job of mat2: getting rid, as much as possible, of
2018-05-14 22:59:42 +02:00
metadata.
2018-03-27 21:15:30 +02:00
2022-12-05 20:31:12 +01:00
mat2 provides:
- a library called `libmat2`;
- a command line tool called `mat2`,
- a service menu for Dolphin, KDE's default file manager
If you prefer a regular graphical user interface, you might be interested in
[Metadata Cleaner](https://metadatacleaner.romainvigier.fr/), which is using
`mat2` under the hood.
2018-04-01 01:06:56 +02:00
# Requirements
- `python3-mutagen` for audio support
- `python3-gi-cairo` and `gir1.2-poppler-0.18` for PDF support
- `gir1.2-gdkpixbuf-2.0` for images support
2019-07-13 21:26:05 +02:00
- `gir1.2-rsvg-2.0` for svg support
- `FFmpeg`, optionally, for video support
2018-04-01 01:06:56 +02:00
- `libimage-exiftool-perl` for everything else
2019-02-21 01:44:01 +01:00
- `bubblewrap`, optionally, for sandboxing
2018-04-01 01:06:56 +02:00
2019-11-28 03:15:20 +01:00
Please note that mat2 requires at least Python3.5.
2018-03-19 00:09:00 +01:00
2020-02-10 03:31:07 +01:00
# Requirements setup on macOS (OS X) using [Homebrew](https://brew.sh/)
```bash
brew install exiftool cairo pygobject3 poppler gdk-pixbuf librsvg ffmpeg
```
# Running the test suite
2018-03-19 00:09:00 +01:00
```bash
$ python3 -m unittest discover -v
```
2018-04-03 21:37:46 +02:00
And if you want to see the coverage:
```bash
$ python3-coverage run --branch -m unittest discover -s tests/
$ python3-coverage report --include -m --include /libmat2/*'
```
2019-11-28 03:15:20 +01:00
# How to use mat2
2018-05-14 22:59:42 +02:00
2019-12-08 11:28:32 +01:00
```
usage: mat2 [-h] [-V] [--unknown-members policy] [--inplace] [--no-sandbox]
[-v] [-l] [--check-dependencies] [-L | -s]
2018-10-03 16:12:03 +02:00
[files [files ...]]
2018-07-01 23:35:04 +02:00
Metadata anonymisation toolkit 2
positional arguments:
2018-10-03 16:12:03 +02:00
files the files to process
2018-07-01 23:35:04 +02:00
optional arguments:
2018-10-03 16:12:03 +02:00
-h, --help show this help message and exit
-V, --verbose show more verbose status information
--unknown-members policy
how to handle unknown members of archive-style files
(policy should be one of: abort, omit, keep) [Default:
abort]
2019-12-08 11:28:32 +01:00
--inplace clean in place, without backup
--no-sandbox Disable bubblewrap's sandboxing
2019-12-08 11:28:32 +01:00
-v, --version show program's version number and exit
-l, --list list all supported fileformats
--check-dependencies check if mat2 has all the dependencies it needs
-L, --lightweight remove SOME metadata
2019-11-28 03:15:20 +01:00
-s, --show list harmful metadata detectable by mat2 without
2018-10-03 16:12:03 +02:00
removing them
2018-05-14 22:59:42 +02:00
```
2019-11-28 03:15:20 +01:00
Note that mat2 **will not** clean files in-place, but will produce, for
example, with a file named "myfile.png" a cleaned version named
"myfile.cleaned.png".
2020-11-30 22:02:07 +01:00
## Web interface
It's possible to run mat2 as a web service, via
[mat2-web](https://0xacab.org/jvoisin/mat2-web).
2023-06-05 19:52:13 +02:00
If you're using WordPress, you might be interested in [wp-mat](https://git.autistici.org/noblogs/wp-mat)
and [wp-mat-server](https://git.autistici.org/noblogs/wp-mat-server).
2020-12-07 11:14:03 +01:00
## Desktop GUI
For GNU/Linux desktops, it's possible to use the
[Metadata Cleaner](https://gitlab.com/rmnvgr/metadata-cleaner) GTK application.
# Supported formats
The following formats are supported: avi, bmp, css, epub/ncx, flac, gif, jpeg,
m4a/mp2/mp3/…, mp4, odc/odf/odg/odi/odp/ods/odt/…, off/opus/oga/spx/…, pdf,
png, ppm, pptx/xlsx/docx/…, svg/svgz/…, tar/tar.gz/tar.bz2/tar.xz/…, tiff,
torrent, wav, wmv, zip, …
# Notes about detecting metadata
2019-11-28 03:15:20 +01:00
While mat2 is doing its very best to display metadata when the `--show` flag is
passed, it doesn't mean that a file is clean from any metadata if mat2 doesn't
show any. There is no reliable way to detect every single possible metadata for
complex file formats.
This is why you shouldn't rely on metadata's presence to decide if your file must
be cleaned or not.
# Notes about the lightweight mode
By default, mat2 might alter a bit the data of your files, in order to remove
as much metadata as possible. For example, texts in PDF might not be selectable anymore,
compressed images might get compressed again, …
Since some users might be willing to trade some metadata's presence in exchange
of the guarantee that mat2 won't modify the data of their files, there is the
`-L` flag that precisely does that.
# Related software
2018-04-03 21:37:46 +02:00
- The first iteration of [MAT](https://mat.boum.org)
2018-04-03 21:37:46 +02:00
- [Exiftool](https://sno.phy.queensu.ca/~phil/exiftool/mat)
- [pdf-redact-tools](https://github.com/firstlookmedia/pdf-redact-tools), that
tries to deal with *printer dots* too.
2018-04-03 21:37:46 +02:00
- [pdfparanoia](https://github.com/kanzure/pdfparanoia), that removes
watermarks from PDF.
- [Scrambled Exif](https://f-droid.org/packages/com.jarsilio.android.scrambledeggsif/),
an open-source Android application to remove metadata from pictures.
2022-01-06 18:31:34 +01:00
- [Dangerzone](https://dangerzone.rocks/), designed to sanitize harmful documents
into harmless ones.
2018-05-14 22:59:42 +02:00
2018-06-07 00:09:53 +02:00
# Contact
2018-09-01 16:45:20 +02:00
If possible, use the [issues system](https://0xacab.org/jvoisin/mat2/issues)
2020-11-30 21:52:39 +01:00
or the [mailing list](https://www.autistici.org/mailman/listinfo/mat-dev)
2018-09-01 16:45:20 +02:00
Should a more private contact be needed (eg. for reporting security issues),
you can email Julien (jvoisin) Voisin at `julien.voisin+mat2@dustri.org`,
2018-06-07 00:09:53 +02:00
using the gpg key `9FCDEE9E1A381F311EA62A7404D041E8171901CC`.
2020-06-30 22:02:36 +02:00
# Donations
If you want to donate some money, please give it to [Tails]( https://tails.boum.org/donate/?r=contribute ).
2018-05-14 22:59:42 +02:00
# License
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
2019-11-25 23:12:32 +01:00
Copyright 2018 Julien (jvoisin) Voisin <julien.voisin+mat2@dustri.org>
2019-11-28 03:15:20 +01:00
Copyright 2016 Marie-Rose for mat2's logo
2018-05-14 22:59:42 +02:00
2019-09-01 13:34:26 +02:00
The `tests/data/dirty_with_nsid.docx` file is licensed under GPLv3,
and was borrowed from the Calibre project: https://calibre-ebook.com/downloads/demos/demo.docx
2020-03-08 12:17:56 +01:00
The `narrated_powerpoint_presentation.pptx` file is in the public domain.
2018-05-14 22:59:42 +02:00
# Thanks
2019-11-28 03:15:20 +01:00
mat2 wouldn't exist without:
2018-05-14 22:59:42 +02:00
2018-06-04 23:50:55 +02:00
- the [Google Summer of Code](https://summerofcode.withgoogle.com/);
- the fine people from [Tails]( https://tails.boum.org);
2018-05-14 22:59:42 +02:00
- friends
Many thanks to them!