jvoisin
df252fd71a
Remove a superfluous import
2018-10-04 16:19:38 +02:00
jvoisin
a1c39104fc
Make the testsuite runnable on the installed MAT2
2018-10-04 16:16:52 +02:00
georg
34fbd633fd
libmat2: fix shebang
...
Relates 0a2a398c9c
2018-10-03 18:38:28 +00:00
jvoisin
f1ceed13b5
Bump the changelog
2018-10-03 16:38:05 +02:00
jvoisin
5a5c642a46
Don't break office files for MS Office
...
We didn't take the whitelist into account while
removing dangling files from [Content_types].xml
2018-10-03 16:38:05 +02:00
jvoisin
84e302ac93
Remove file left behind by the testsuite
2018-10-03 16:38:05 +02:00
jvoisin
7901fdef2e
Fix the testsuite
2018-10-03 15:29:46 +02:00
jvoisin
1b356b8c6f
Improve mat2's cli reliability
...
- Replace some class members by instance members
- Don't thread the cleaning process anymore for now
2018-10-03 15:22:36 +02:00
jvoisin
c67bbafb2c
Use [Content_Types].xml to improve MS Office coverage
2018-10-02 11:55:42 -07:00
georg
5b606f939d
fix typo
2018-10-02 16:01:24 +00:00
jvoisin
156e81fb4c
Check that cleaning twice doesn't break the file
2018-10-02 16:05:51 +02:00
jvoisin
9578e4b4ee
Silence a bit the testsuite
2018-10-02 15:26:13 +02:00
jvoisin
a46a7eb6fa
Update the CONTRIBUTING.md file wrt. to the previous commit
2018-10-02 11:12:50 +02:00
georg
a24c59b208
manpage: this is about mat2, not mat
2018-10-01 21:26:59 +00:00
jvoisin
652b8e519f
Files processed via MAT2 are now accepted without warnings by MS Office
2018-10-01 12:25:37 -07:00
jvoisin
c14be47f95
Fix a typo in the README spotted by @georg
2018-10-01 15:51:22 +02:00
jvoisin
81a3881aa4
Please mypy
2018-09-30 19:55:17 +02:00
jvoisin
e342671ead
Remove dangling references in MS Office's [Content_types].xml
2018-09-30 19:53:18 +02:00
jvoisin
212d9c472c
Document mat2's output scheme in the manpage as well
2018-09-26 00:13:44 +02:00
jvoisin
a88107c9ca
Document the output scheme in the README
2018-09-26 00:11:16 +02:00
jvoisin
7f629ed2e3
Run the testsuite exclusively on Whitewhale for now
...
This should fix the intermittent failures, thanks
to @pollo for the tip
2018-09-25 17:09:04 +02:00
jvoisin
719cdf20fa
Second pass of minor formatting
2018-09-24 20:15:07 +02:00
jvoisin
2e243355f5
Fix some minor formatting issues
2018-09-24 19:50:24 +02:00
jvoisin
174d4a0ac0
Implement rsid stripping for office files
...
MS Office XML rsid is a "unique identifier used to track the editing session
when the physical character representing this section mark was last formatted."
See the following links for details:
- https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
- https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/ .
2018-09-24 18:03:59 +02:00
jvoisin
fbcf68c280
Lexicographical sort on xml attributes for office files
...
In XML, the order of the attributes shouldn't be meaningful,
however, MS Office sorts attributes for a given XML tag
differently than LibreOffice.
2018-09-24 17:45:09 +02:00
jvoisin
9826de3526
Add a test for zip ordering
2018-09-20 14:04:46 +02:00
jvoisin
ab71c29a28
Make pyflakes happy
2018-09-20 01:19:22 +02:00
jvoisin
3d2842802c
Split the tests
2018-09-20 01:13:59 +02:00
jvoisin
a1a06d023e
Insert archive members in lexicographic order
2018-09-18 22:44:21 +02:00
jvoisin
9275d64be5
Add a link to the gentoo overlay
2018-09-17 21:11:48 +02:00
Yoann Lamouroux
0a2a398c9c
trivial modification of all shebang.
...
`/usr/bin/python3` -> `/usr/bin/env python3`
It's always better to trust the environment defined path to bin/python, as
virtualenv become the way to go.
2018-09-12 14:58:27 +02:00
jvoisin
5cf94bd256
Bump coverage back to 100%
2018-09-12 14:54:54 +02:00
jvoisin
de65f4f4d4
Improve the resilience of MAT2 wrt. corrupted PNG
2018-09-09 19:09:05 +02:00
jvoisin
759efa03ee
Fix a setuptool-related warning
2018-09-06 11:42:07 +02:00
jvoisin
9fe6f1023b
Make pylint happy
2018-09-06 11:36:04 +02:00
jvoisin
e3d817f57e
Split office and archives
2018-09-06 11:34:14 +02:00
jvoisin
2e9adab86a
Improve a cli test resilience
2018-09-06 11:32:29 +02:00
jvoisin
c8c27dcf38
Mention "scambled exif" as a related software
2018-09-06 11:20:08 +02:00
jvoisin
120b204988
Change a bit the previous commit
2018-09-06 11:13:11 +02:00
Daniel Kahn Gillmor
f3cef319b9
Unknown Members: make policy use an Enum
...
Closes #60
Note: this changeset also ensures that clean.cleaned.docx is removed
up after the pytest is over.
2018-09-05 18:59:33 -04:00
Daniel Kahn Gillmor
2d9ba81a84
spelling correction.
...
while mat2 has both a thread model (a thread pool that strips metadata
in parallel) and a threat model (a list of malicious adversaries and
their capabilities that we are trying to defeat), i think this
paragraph is talking about the latter.
2018-09-05 13:00:28 -04:00
jvoisin
072ee1814d
Remove defusedxml support and document why
2018-09-05 18:41:08 +02:00
jvoisin
3649c0ccaf
Remove short version of dangerous/advanced options
2018-09-05 17:48:14 +02:00
Christian
119085f28d
Add missing dependencies for the Nautilus extension to INSTALL.md
2018-09-05 17:42:39 +02:00
Christian
e515d907d7
Make sure target directory exists, assume MAT2 is in parent directory
2018-09-05 17:42:13 +02:00
jvoisin
46bb1b83ea
Improve the previous commit
2018-09-05 17:26:09 +02:00
Daniel Kahn Gillmor
1d7e374e5b
office: try all members, even when one fails
...
the end result will be the same -- an abort -- but the user will get
to see all the warnings for a particular file, instead of getting them
one at a time.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor
915dc634c4
document all unknown/unhandlable files even on abort
...
This makes it easy to get a list of all files that mat2 doesn't know
how to handle, without having to choose -u keep or -u omit.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor
10d60bd398
add --unknown-members argument to mat2
...
This allows the user to make use of parser.unknown_member_policy for
archive formats.
At the suggestion of @jvoisin, it also prints a scary warning if the
user explicitly chooses 'keep'.
2018-09-04 18:28:04 -04:00
Daniel Kahn Gillmor
4192a2daa3
office: create policy for what to do about unknown members
...
previously, encountering an unknown member meant that any parser of
this type would abort.
now, the user can set parser.unknown_member_policy to either 'omit' or
'keep' if they don't want the current action of 'abort'
note that this causes pylint to complain about branching depth for
remove_all() because of the nuanced error-handling. I've disabled
this check.
2018-09-04 16:13:33 -04:00