jvoisin
e3d817f57e
Split office and archives
2018-09-06 11:34:14 +02:00
jvoisin
120b204988
Change a bit the previous commit
2018-09-06 11:13:11 +02:00
Daniel Kahn Gillmor
f3cef319b9
Unknown Members: make policy use an Enum
...
Closes #60
Note: this changeset also ensures that clean.cleaned.docx is removed
up after the pytest is over.
2018-09-05 18:59:33 -04:00
jvoisin
072ee1814d
Remove defusedxml support and document why
2018-09-05 18:41:08 +02:00
jvoisin
46bb1b83ea
Improve the previous commit
2018-09-05 17:26:09 +02:00
Daniel Kahn Gillmor
1d7e374e5b
office: try all members, even when one fails
...
the end result will be the same -- an abort -- but the user will get
to see all the warnings for a particular file, instead of getting them
one at a time.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor
915dc634c4
document all unknown/unhandlable files even on abort
...
This makes it easy to get a list of all files that mat2 doesn't know
how to handle, without having to choose -u keep or -u omit.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor
4192a2daa3
office: create policy for what to do about unknown members
...
previously, encountering an unknown member meant that any parser of
this type would abort.
now, the user can set parser.unknown_member_policy to either 'omit' or
'keep' if they don't want the current action of 'abort'
note that this causes pylint to complain about branching depth for
remove_all() because of the nuanced error-handling. I've disabled
this check.
2018-09-04 16:13:33 -04:00
jvoisin
907fc591cc
Bump the coverage back to 100%
2018-09-01 16:58:34 +02:00
Daniel Kahn Gillmor
3e2890eb9e
three minor spelling fixes
2018-09-01 06:47:22 -07:00
jvoisin
91e80527fc
Add archlinux to the CI
2018-09-01 15:41:22 +02:00
jvoisin
7877ba0da5
Fix a minor formatting issue
2018-09-01 14:16:55 +02:00
dkg
e2634f7a50
Logging cleanup
2018-09-01 05:14:32 -07:00
jvoisin
1c72448e58
Improve the detection of unsupported extensions in uppercase
2018-08-23 21:28:37 +02:00
Antoine Tenart
f068621628
libmat2: images: fix handling of .JPG files
...
Pixbuf only supports .jpeg files, not .jpg, so libmat2 looks for such an
extension and converts it if necessary. As this check is case sensitive,
processing .JPG files does not work.
Fixes #47 .
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-23 20:43:27 +02:00
georg
71b1ced842
AbstractParser: Fix typos
2018-07-21 00:46:48 +00:00
jvoisin
942859601d
Improve the code's documentation
2018-07-19 23:10:27 +02:00
jvoisin
565cb66d14
Minor simplification in how we're handling xml for office files
2018-07-19 22:55:08 +02:00
jvoisin
84d50f97c0
Add a check for a missed dependency in ./mat2 -c
2018-07-15 17:00:01 +02:00
jvoisin
5a7c7f35f7
Remove print
from libmat, and use the logging
module instead
...
This should close #28
2018-07-10 21:30:38 +02:00
jvoisin
d5861e4653
Implement a check for dependencies in mat2
...
Example use:
```
$ mat2 -c
Dependencies required for MAT2 0.1.3:
- Cairo: yes
- Exiftool: yes
- GdkPixbuf from PyGobject: yes
- Mutagen: yes
- Poppler from PyGobject: yes
- PyGobject: yes
```
This should close #35
2018-07-10 21:24:26 +02:00
jvoisin
080d6769ca
Make pylint even happier
2018-07-09 01:11:44 +02:00
jvoisin
8c21006e6c
Fix some pep8 issues spotted by pyflakes
2018-07-08 22:40:36 +02:00
jvoisin
f49aa5cab7
Achieve 100% coverage!
2018-07-08 22:27:37 +02:00
jvoisin
ad3e7ccee8
Bump coverage for office files and fix some related crashes
2018-07-08 21:35:45 +02:00
jvoisin
ca01484126
Silence a mypy's stupid warning
2018-07-08 17:12:17 +02:00
jvoisin
f9bc022c96
Add defusedxml as an (optional) way to prevent XML-based attacks
...
Those attacks are DoS-only.
2018-07-08 17:07:26 +02:00
jvoisin
72e1fda18d
Remove a leftover print
2018-07-08 15:19:18 +02:00
jvoisin
3cd4f9111f
Bump coverage for torrent handling
2018-07-08 15:13:03 +02:00
jvoisin
b5fcddd6a6
Simplify how torrent files are handled
...
- Rework the testsuite wrt. torrent
- fail at parser's instantiation on corrupted torrent,
instead of during `get_meta` or `remove_all` call
2018-07-08 13:49:11 +02:00
jvoisin
7ea362d908
Bump the coverage for pdf
2018-07-07 18:12:33 +02:00
jvoisin
85455a4419
Fix a mistake in office file revisions handling
2018-07-07 18:05:54 +02:00
jvoisin
3d80f97524
Simplify BMP handling
2018-07-06 00:49:17 +02:00
jvoisin
53271495f7
Add support for .txt files
2018-07-06 00:42:09 +02:00
jvoisin
893f58554a
Improve a bit the formatting of the code thanks to pyflakes3
2018-07-02 00:22:05 +02:00
jvoisin
bee56a57ce
Remove docx revisions
2018-07-01 23:16:14 +02:00
jvoisin
02f7605ac1
MAT2 is now cleaning revisions from odt files!
2018-07-01 21:09:20 +02:00
jvoisin
80fc4ffb40
Remove the thumbnails from libreoffice files
2018-07-01 17:29:05 +02:00
jvoisin
177184ac67
Massively simplify how we're cleaning office files
2018-06-27 21:48:46 +02:00
jvoisin
f44769df41
Ensure Poppler's minimal version
...
We're using methods that aren't available in Poppler
below 0.46, so we're checking for this upon import.
This commit is based on ideas from @LogicalDash ♥
2018-06-24 22:40:57 +02:00
jvoisin
74f2d50433
Split the testsuite a bit and add more tests
2018-06-22 21:16:55 +02:00
jvoisin
b4ef0c9622
Improve reliability against corrupted image files
2018-06-22 20:38:29 +02:00
jvoisin
5b38bd7ccd
Improve the reliability of the office parser
2018-06-21 23:18:59 +02:00
jvoisin
846a261465
Fix some linter warnings
2018-06-21 23:07:21 +02:00
jvoisin
09e748fa4c
Refactor how offices files are handled
...
- xml files are no longer considered harmless
- Factorization of the `remove_all` method for office files
- Explicit whitelist are used
- Blacklist are used to skip files completely
- Non-blacklisted files are _still cleaned_
- Unsupported files are still triggering an error
2018-06-21 23:02:41 +02:00
jvoisin
a89dae054a
Minor simplification of the office-related code
2018-06-21 21:24:53 +02:00
Antoine Tenart
cce5de82e5
libmat2: harmless: add the text/xml mime type
...
Fedora defines the 'text/xml' mime type for xml files. Adds this mime
type to the harmless parser.
Fixes #36 .
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-06-12 21:34:47 +02:00
Antoine Tenart
484e26dd9c
libmat2: audio: add the audio/x-flac mime type
...
The FLAC parser looks for the 'audio/flac' mime type, but Fedora
defines 'audio/x-flac' in /etc/mime.types for FLAC files. Add this mime
type to the audio parser.
Fixes #36 .
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-06-12 21:34:47 +02:00
jvoisin
545887af98
Minor code simplification
2018-06-10 20:20:32 +02:00
jvoisin
7dad77a785
Make the parsing of office format's metadata more robust
2018-06-10 20:20:00 +02:00