georg
c3f097a82b
fix typo
2019-03-01 22:00:23 +00:00
jvoisin
55214206b5
Improve the previous commit
...
- More tests
- More documentation
- Minor code cleanup
2019-02-27 23:53:07 +01:00
jvoisin
73d2966e8c
Improve epub support
2019-02-27 23:04:38 +01:00
jvoisin
eb2e702f37
Document the previous commit
2019-02-25 15:37:44 +01:00
jvoisin
545dccc352
In archive-based formats, the mimetype
file comes first
...
This should improve epub compatibility,
along with other formats as a side-effect
2019-02-24 23:32:32 +01:00
jvoisin
524bae5972
<title> is also an html metadata
2019-02-23 20:47:26 +01:00
jvoisin
c757a9b7ef
Fix a bug in css cleaning
...
It's not mandatory to actually have a comment inside
comment delimiter, like `/**/`.
2019-02-23 20:21:11 +01:00
jvoisin
02ff21b158
Implement epub support
2019-02-20 16:28:11 -08:00
jvoisin
a81b7658a8
Make the mandatory metadata warning generic
...
This should close #95 .
2019-02-10 21:46:13 +01:00
jvoisin
6e63e03b86
Streamline a bit the previous commit
2019-02-09 15:23:16 +01:00
Poncho
a71488d459
bind mount /etc/ld.so.cache to the sandbox
...
without /etc/ld.so.cache available in the sandbox, tests fail on gentoo with:
/usr/bin/ffmpeg: error while loading shared libraries: libstdc++.so.6:
cannot open shared object file: No such file or directory
2019-02-09 09:49:51 +01:00
jvoisin
6ef6aaa222
Improve a bit get_meta for libreoffice files
2019-02-08 23:23:56 +01:00
jvoisin
6cc034e81b
Add support for html files
2019-02-08 23:05:18 +01:00
jvoisin
e1dd439fc8
Use of the archive refactoring for the office documents too
2019-02-07 22:19:37 +01:00
jvoisin
b9a62d798a
Refactor a bit office get_meta handling
...
This should make easier to get more metadata from
archive-based file formats.
2019-02-04 00:31:26 +01:00
jvoisin
433609f8ea
Implement .gif support
2019-02-03 21:01:58 +01:00
intrigeri
e8c1bb0e3c
Whenever possible, use bwrap for subprocesses
...
This should closes #90
2019-02-03 19:18:41 +01:00
jvoisin
8e84ba547a
Add support for wmv
2019-02-02 19:19:36 +01:00
jvoisin
04bb8c8ccf
Add mp4 support
2018-10-28 07:41:04 -07:00
jvoisin
3a070b0ab7
Add support for zip files
2018-10-25 11:56:46 +02:00
jvoisin
283e5e5787
Improve archive-based parser's robustness against corrupted embedded files
2018-10-25 11:56:12 +02:00
jvoisin
513d897ea0
Implement get_meta() for archives
2018-10-25 11:29:50 +02:00
jvoisin
5a9dc388ad
Minor refactorisation of how we're checking for exiftool's presence
2018-10-25 11:05:06 +02:00
jvoisin
fe885babee
Implement lightweight cleaning for jpg
2018-10-24 19:35:07 +02:00
jvoisin
9a81b3adfd
Improve type annotation coverage
2018-10-23 16:32:28 +02:00
jvoisin
f1a071d460
Implement lightweight cleaning for png and tiff
2018-10-23 16:22:11 +02:00
jvoisin
38df679a88
Optimize the handling of problematic files
2018-10-23 13:49:58 +02:00
jvoisin
44f267a596
Improve problematic filenames support
2018-10-22 16:56:05 +02:00
jvoisin
83389a63e9
Test mat2's reliability wrt. corrupted video files
2018-10-22 13:42:04 +02:00
jvoisin
e70ea811c9
Implement support for .avi files, via ffmpeg
...
- This commit introduces optional dependencies (namely ffmpeg):
mat2 will spit a warning when trying to process an .avi file
if ffmpeg isn't installed.
- Since metadata are obtained via exiftool, this commit
also refactors a bit our exfitool wrapper.
2018-10-22 12:58:01 +02:00
jvoisin
2ba38dd2a1
Bump mypy typing coverage
2018-10-12 14:32:09 +02:00
jvoisin
b832a59414
Refactor lightweight mode implementation
2018-10-12 11:49:24 +02:00
jvoisin
b9dbd12ef9
Implement recursive metadata for FLAC files
...
Since FLAC files can contain covers, it makes sense
to parse their metadata
2018-10-11 19:52:47 +02:00
jvoisin
b2e153b69c
Delete pictures of FLAC files
2018-10-11 18:15:11 +02:00
jvoisin
0d25b18d26
Improve both the typing and the comments
2018-10-05 17:07:58 +02:00
jvoisin
d0f3534eff
Hide unsupported extensions in mat2 -l
2018-10-05 12:43:21 +02:00
jvoisin
8e98593b02
Trash word/people.xml in office files
2018-10-04 16:28:20 +02:00
georg
34fbd633fd
libmat2: fix shebang
...
Relates 0a2a398c9c
2018-10-03 18:38:28 +00:00
jvoisin
5a5c642a46
Don't break office files for MS Office
...
We didn't take the whitelist into account while
removing dangling files from [Content_types].xml
2018-10-03 16:38:05 +02:00
jvoisin
1b356b8c6f
Improve mat2's cli reliability
...
- Replace some class members by instance members
- Don't thread the cleaning process anymore for now
2018-10-03 15:22:36 +02:00
jvoisin
c67bbafb2c
Use [Content_Types].xml to improve MS Office coverage
2018-10-02 11:55:42 -07:00
georg
5b606f939d
fix typo
2018-10-02 16:01:24 +00:00
jvoisin
652b8e519f
Files processed via MAT2 are now accepted without warnings by MS Office
2018-10-01 12:25:37 -07:00
jvoisin
81a3881aa4
Please mypy
2018-09-30 19:55:17 +02:00
jvoisin
e342671ead
Remove dangling references in MS Office's [Content_types].xml
2018-09-30 19:53:18 +02:00
jvoisin
719cdf20fa
Second pass of minor formatting
2018-09-24 20:15:07 +02:00
jvoisin
2e243355f5
Fix some minor formatting issues
2018-09-24 19:50:24 +02:00
jvoisin
174d4a0ac0
Implement rsid stripping for office files
...
MS Office XML rsid is a "unique identifier used to track the editing session
when the physical character representing this section mark was last formatted."
See the following links for details:
- https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
- https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/ .
2018-09-24 18:03:59 +02:00
jvoisin
fbcf68c280
Lexicographical sort on xml attributes for office files
...
In XML, the order of the attributes shouldn't be meaningful,
however, MS Office sorts attributes for a given XML tag
differently than LibreOffice.
2018-09-24 17:45:09 +02:00
jvoisin
a1a06d023e
Insert archive members in lexicographic order
2018-09-18 22:44:21 +02:00