1
0
Fork 0
Commit Graph

97 Commits

Author SHA1 Message Date
jvoisin 9a81b3adfd Improve type annotation coverage 2018-10-23 16:32:28 +02:00
jvoisin f1a071d460 Implement lightweight cleaning for png and tiff 2018-10-23 16:22:11 +02:00
jvoisin 38df679a88 Optimize the handling of problematic files 2018-10-23 13:49:58 +02:00
jvoisin 44f267a596 Improve problematic filenames support 2018-10-22 16:56:05 +02:00
jvoisin 83389a63e9 Test mat2's reliability wrt. corrupted video files 2018-10-22 13:42:04 +02:00
jvoisin e70ea811c9 Implement support for .avi files, via ffmpeg
- This commit introduces optional dependencies (namely ffmpeg):
  mat2 will spit a warning when trying to process an .avi file
  if ffmpeg isn't installed.
- Since metadata are obtained via exiftool, this commit
  also refactors a bit our exfitool wrapper.
2018-10-22 12:58:01 +02:00
jvoisin 2ba38dd2a1 Bump mypy typing coverage 2018-10-12 14:32:09 +02:00
jvoisin b832a59414 Refactor lightweight mode implementation 2018-10-12 11:49:24 +02:00
jvoisin b9dbd12ef9 Implement recursive metadata for FLAC files
Since FLAC files can contain covers, it makes sense
to parse their metadata
2018-10-11 19:52:47 +02:00
jvoisin b2e153b69c Delete pictures of FLAC files 2018-10-11 18:15:11 +02:00
jvoisin 0d25b18d26 Improve both the typing and the comments 2018-10-05 17:07:58 +02:00
jvoisin d0f3534eff Hide unsupported extensions in `mat2 -l` 2018-10-05 12:43:21 +02:00
jvoisin 8e98593b02 Trash word/people.xml in office files 2018-10-04 16:28:20 +02:00
georg 34fbd633fd
libmat2: fix shebang
Relates 0a2a398c9c
2018-10-03 18:38:28 +00:00
jvoisin 5a5c642a46 Don't break office files for MS Office
We didn't take the whitelist into account while
removing dangling files from [Content_types].xml
2018-10-03 16:38:05 +02:00
jvoisin 1b356b8c6f Improve mat2's cli reliability
- Replace some class members by instance members
- Don't thread the cleaning process anymore for now
2018-10-03 15:22:36 +02:00
jvoisin c67bbafb2c Use [Content_Types].xml to improve MS Office coverage 2018-10-02 11:55:42 -07:00
georg 5b606f939d
fix typo 2018-10-02 16:01:24 +00:00
jvoisin 652b8e519f Files processed via MAT2 are now accepted without warnings by MS Office 2018-10-01 12:25:37 -07:00
jvoisin 81a3881aa4 Please mypy 2018-09-30 19:55:17 +02:00
jvoisin e342671ead Remove dangling references in MS Office's [Content_types].xml 2018-09-30 19:53:18 +02:00
jvoisin 719cdf20fa Second pass of minor formatting 2018-09-24 20:15:07 +02:00
jvoisin 2e243355f5 Fix some minor formatting issues 2018-09-24 19:50:24 +02:00
jvoisin 174d4a0ac0 Implement rsid stripping for office files
MS Office XML rsid is a "unique identifier used to track the editing session
when the physical character representing this section mark was last formatted."

See the following links for details:
- https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
- https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/.
2018-09-24 18:03:59 +02:00
jvoisin fbcf68c280 Lexicographical sort on xml attributes for office files
In XML, the order of the attributes shouldn't be meaningful,
however, MS Office sorts attributes for a given XML tag
differently than LibreOffice.
2018-09-24 17:45:09 +02:00
jvoisin a1a06d023e Insert archive members in lexicographic order 2018-09-18 22:44:21 +02:00
jvoisin 5cf94bd256 Bump coverage back to 100% 2018-09-12 14:54:54 +02:00
jvoisin de65f4f4d4 Improve the resilience of MAT2 wrt. corrupted PNG 2018-09-09 19:09:05 +02:00
jvoisin 9fe6f1023b Make pylint happy 2018-09-06 11:36:04 +02:00
jvoisin e3d817f57e Split office and archives 2018-09-06 11:34:14 +02:00
jvoisin 120b204988 Change a bit the previous commit 2018-09-06 11:13:11 +02:00
Daniel Kahn Gillmor f3cef319b9 Unknown Members: make policy use an Enum
Closes #60

Note: this changeset also ensures that clean.cleaned.docx is removed
up after the pytest is over.
2018-09-05 18:59:33 -04:00
jvoisin 072ee1814d Remove defusedxml support and document why 2018-09-05 18:41:08 +02:00
jvoisin 46bb1b83ea Improve the previous commit 2018-09-05 17:26:09 +02:00
Daniel Kahn Gillmor 1d7e374e5b office: try all members, even when one fails
the end result will be the same -- an abort -- but the user will get
to see all the warnings for a particular file, instead of getting them
one at a time.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor 915dc634c4 document all unknown/unhandlable files even on abort
This makes it easy to get a list of all files that mat2 doesn't know
how to handle, without having to choose -u keep or -u omit.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor 4192a2daa3 office: create policy for what to do about unknown members
previously, encountering an unknown member meant that any parser of
this type would abort.

now, the user can set parser.unknown_member_policy to either 'omit' or
'keep' if they don't want the current action of 'abort'

note that this causes pylint to complain about branching depth for
remove_all() because of the nuanced error-handling.  I've disabled
this check.
2018-09-04 16:13:33 -04:00
jvoisin 907fc591cc Bump the coverage back to 100% 2018-09-01 16:58:34 +02:00
Daniel Kahn Gillmor 3e2890eb9e three minor spelling fixes 2018-09-01 06:47:22 -07:00
jvoisin 91e80527fc Add archlinux to the CI 2018-09-01 15:41:22 +02:00
jvoisin 7877ba0da5 Fix a minor formatting issue 2018-09-01 14:16:55 +02:00
dkg e2634f7a50 Logging cleanup 2018-09-01 05:14:32 -07:00
jvoisin 1c72448e58 Improve the detection of unsupported extensions in uppercase 2018-08-23 21:28:37 +02:00
Antoine Tenart f068621628 libmat2: images: fix handling of .JPG files
Pixbuf only supports .jpeg files, not .jpg, so libmat2 looks for such an
extension and converts it if necessary. As this check is case sensitive,
processing .JPG files does not work.

Fixes #47.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2018-08-23 20:43:27 +02:00
georg 71b1ced842
AbstractParser: Fix typos 2018-07-21 00:46:48 +00:00
jvoisin 942859601d Improve the code's documentation 2018-07-19 23:10:27 +02:00
jvoisin 565cb66d14 Minor simplification in how we're handling xml for office files 2018-07-19 22:55:08 +02:00
jvoisin 84d50f97c0 Add a check for a missed dependency in `./mat2 -c` 2018-07-15 17:00:01 +02:00
jvoisin 5a7c7f35f7 Remove `print` from libmat, and use the `logging` module instead
This should close #28
2018-07-10 21:30:38 +02:00
jvoisin d5861e4653 Implement a check for dependencies in mat2
Example use:

```
$ mat2 -c
Dependencies required for MAT2 0.1.3:
- Cairo: yes
- Exiftool: yes
- GdkPixbuf from PyGobject: yes
- Mutagen: yes
- Poppler from PyGobject: yes
- PyGobject: yes
```

This should close #35
2018-07-10 21:24:26 +02:00