1
0
mirror of synced 2024-12-22 20:59:58 +01:00

115 Commits

Author SHA1 Message Date
jvoisin
545dccc352 In archive-based formats, the mimetype file comes first
This should improve epub compatibility,
along with other formats as a side-effect
2019-02-24 23:32:32 +01:00
jvoisin
02ff21b158 Implement epub support 2019-02-20 16:28:11 -08:00
jvoisin
6cc034e81b Add support for html files 2019-02-08 23:05:18 +01:00
jvoisin
e1dd439fc8 Use of the archive refactoring for the office documents too 2019-02-07 22:19:37 +01:00
jvoisin
b9a62d798a Refactor a bit office get_meta handling
This should make easier to get more metadata from
archive-based file formats.
2019-02-04 00:31:26 +01:00
jvoisin
433609f8ea Implement .gif support 2019-02-03 21:01:58 +01:00
jvoisin
8e84ba547a Add support for wmv 2019-02-02 19:19:36 +01:00
jvoisin
dc35ef56c8 Add a missing file :/ 2018-11-07 22:20:31 +01:00
jvoisin
3aa76cc58e Prove that the previous commit is working 2018-11-07 22:13:36 +01:00
jvoisin
8ff57c5803 Do not display control characters in output
Kudos to Sherry Taylor for reporting this issue ♥
2018-11-07 22:07:46 +01:00
jvoisin
04bb8c8ccf Add mp4 support 2018-10-28 07:41:04 -07:00
jvoisin
3a070b0ab7 Add support for zip files 2018-10-25 11:56:46 +02:00
jvoisin
513d897ea0 Implement get_meta() for archives 2018-10-25 11:29:50 +02:00
jvoisin
5a08f5b7bf Add a test for tiff lightweight cleaning 2018-10-24 20:19:36 +02:00
jvoisin
fe885babee Implement lightweight cleaning for jpg 2018-10-24 19:35:07 +02:00
jvoisin
f1a071d460 Implement lightweight cleaning for png and tiff 2018-10-23 16:22:11 +02:00
jvoisin
38df679a88 Optimize the handling of problematic files 2018-10-23 13:49:58 +02:00
jvoisin
44f267a596 Improve problematic filenames support 2018-10-22 16:56:05 +02:00
jvoisin
5bc88faedf Fix the testsuite on fedora 2018-10-22 13:55:09 +02:00
jvoisin
83389a63e9 Test mat2's reliability wrt. corrupted video files 2018-10-22 13:42:04 +02:00
jvoisin
e70ea811c9 Implement support for .avi files, via ffmpeg
- This commit introduces optional dependencies (namely ffmpeg):
  mat2 will spit a warning when trying to process an .avi file
  if ffmpeg isn't installed.
- Since metadata are obtained via exiftool, this commit
  also refactors a bit our exfitool wrapper.
2018-10-22 12:58:01 +02:00
jvoisin
d4c050a738 wtf python 2018-10-18 20:29:50 +02:00
jvoisin
f04d4b28fc Fix the tests on Debian? 2018-10-18 20:23:00 +02:00
jvoisin
da88d30689 Fix the CI on debian 2018-10-14 10:59:50 +02:00
jvoisin
b832a59414 Refactor lightweight mode implementation 2018-10-12 11:49:24 +02:00
jvoisin
b9dbd12ef9 Implement recursive metadata for FLAC files
Since FLAC files can contain covers, it makes sense
to parse their metadata
2018-10-11 19:52:47 +02:00
jvoisin
b2e153b69c Delete pictures of FLAC files 2018-10-11 18:15:11 +02:00
jvoisin
8675706c93 Improve the display of mat2 when no metadata are found
This should close #74
2018-10-05 12:35:35 +02:00
jvoisin
df252fd71a Remove a superfluous import 2018-10-04 16:19:38 +02:00
jvoisin
a1c39104fc Make the testsuite runnable on the installed MAT2 2018-10-04 16:16:52 +02:00
jvoisin
84e302ac93 Remove file left behind by the testsuite 2018-10-03 16:38:05 +02:00
jvoisin
c67bbafb2c Use [Content_Types].xml to improve MS Office coverage 2018-10-02 11:55:42 -07:00
jvoisin
156e81fb4c Check that cleaning twice doesn't break the file 2018-10-02 16:05:51 +02:00
jvoisin
9578e4b4ee Silence a bit the testsuite 2018-10-02 15:26:13 +02:00
jvoisin
e342671ead Remove dangling references in MS Office's [Content_types].xml 2018-09-30 19:53:18 +02:00
jvoisin
719cdf20fa Second pass of minor formatting 2018-09-24 20:15:07 +02:00
jvoisin
2e243355f5 Fix some minor formatting issues 2018-09-24 19:50:24 +02:00
jvoisin
174d4a0ac0 Implement rsid stripping for office files
MS Office XML rsid is a "unique identifier used to track the editing session
when the physical character representing this section mark was last formatted."

See the following links for details:
- https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
- https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/.
2018-09-24 18:03:59 +02:00
jvoisin
9826de3526 Add a test for zip ordering 2018-09-20 14:04:46 +02:00
jvoisin
ab71c29a28 Make pyflakes happy 2018-09-20 01:19:22 +02:00
jvoisin
3d2842802c Split the tests 2018-09-20 01:13:59 +02:00
Yoann Lamouroux
0a2a398c9c trivial modification of all shebang.
`/usr/bin/python3` -> `/usr/bin/env python3`

It's always better to trust the environment defined path to bin/python, as
virtualenv become the way to go.
2018-09-12 14:58:27 +02:00
jvoisin
2e9adab86a Improve a cli test resilience 2018-09-06 11:32:29 +02:00
Daniel Kahn Gillmor
f3cef319b9 Unknown Members: make policy use an Enum
Closes #60

Note: this changeset also ensures that clean.cleaned.docx is removed
up after the pytest is over.
2018-09-05 18:59:33 -04:00
jvoisin
3649c0ccaf Remove short version of dangerous/advanced options 2018-09-05 17:48:14 +02:00
jvoisin
46bb1b83ea Improve the previous commit 2018-09-05 17:26:09 +02:00
Daniel Kahn Gillmor
10d60bd398 add --unknown-members argument to mat2
This allows the user to make use of parser.unknown_member_policy for
archive formats.

At the suggestion of @jvoisin, it also prints a scary warning if the
user explicitly chooses 'keep'.
2018-09-04 18:28:04 -04:00
dkg
e2634f7a50 Logging cleanup 2018-09-01 05:14:32 -07:00
jvoisin
b5a9520a60 Add a cli-related test 2018-07-30 22:54:41 +02:00
jvoisin
a1257c538b Add some tests about pathological files 2018-07-30 22:36:36 +02:00