jvoisin
b84f73c5c3
Handle multiple namespaces in MSOffice's content types
2020-11-06 15:29:42 +01:00
jvoisin
96e639dfd3
Fix a regexp for xsls files
...
This should increase a bit the compability with Excel files
2020-11-06 15:26:30 +01:00
jvoisin
46b3ae1672
Fix a crash affecting some mp3 files
2020-07-22 15:47:35 +02:00
jvoisin
d8b68ef68e
Improve a bit Microsoft word support
2020-05-17 16:53:36 +02:00
jvoisin
c8dc020dc5
Improve xlsx support
2020-04-06 20:47:32 +02:00
jvoisin
599909a760
Improve xlsx support
2020-04-02 20:58:10 +02:00
jvoisin
d7a03d907b
Vastly improve ppt compatibility
2020-03-08 14:06:27 +01:00
jvoisin
a23dc001cd
Improve compatibility with MS Office of cleaned ppt
2020-03-07 14:34:07 +01:00
jvoisin
f93df85d03
Improve a bit ppt support
2020-03-07 05:22:36 -08:00
jvoisin
e5b1068ed6
Improve a bit the support of ppt files
2020-03-07 12:49:45 +01:00
tguinot
56d2c4aa5f
Add which pathfinding for executables
2020-02-11 17:23:11 +01:00
jvoisin
5270071b94
Remove a couple of residual metadata in pdf
...
This commit takes care of removing residual metadata
added by mat2 during the cleaning of pdf.
2020-02-08 17:00:37 +01:00
jvoisin
ee704db2ff
Add support for wav files
2020-01-01 19:47:46 +01:00
jvoisin
693408f1a6
Please mypy
...
Mypy doesn't like some annotation in web.py,
this commits aims at pleasing it.
2019-12-29 15:20:48 +01:00
Ivy Fay
b2efffdaa4
sandbox: stop mounting new filesystem on /tmp
...
Mounting new, empty filesystem on /tmp makes impossible to use mat2 for manipulating files stored there. Especially it breaks running tests while creating package and using /tmp as temporary builddir which is common setup in Arch Linux:
https://aur.archlinux.org/packages/mat2/#comment-721221
2019-12-18 02:23:43 -08:00
jvoisin
7465cedee7
Handle tiff images with a .tif extension
2019-12-16 14:55:35 -08:00
jvoisin
f5aef1b391
Improve the reliability of Exiftool-base parsers
2019-12-15 09:04:51 -08:00
jvoisin
2e3496d3d4
Improve the reliability of Gdk-based parsers
2019-12-15 07:05:53 -08:00
jvoisin
be24c681ff
Improve the reliability of PNG parsing
2019-12-15 06:57:32 -08:00
jvoisin
efa525c102
Improve the robustness of the HTML parser
2019-12-15 06:50:54 -08:00
jvoisin
f67cd9d7dc
Improve the robustness of the CSS parser
2019-12-15 06:44:21 -08:00
jvoisin
e4114af3b5
Improve a bit ppt support
2019-11-30 11:38:22 +01:00
jvoisin
d56f83bed1
Improve a bit odt handling
2019-11-30 10:25:24 +01:00
georg
697cb36b81
This is mat2, not MAT2
...
Closes #131
2019-11-30 01:14:41 -08:00
jvoisin
df1eb98a40
Please the new version of pylint
2019-11-26 22:12:56 +01:00
jvoisin
655c19d17d
Improve a bit the support for ppt files
2019-10-17 23:02:17 +02:00
jvoisin
5f0b3beb46
Add a way to disable the sandbox
...
Due to bubblewrap's pickiness, mat2 can now be run
without a sandbox, even if bubblewrap is installed.
2019-10-12 16:13:49 -07:00
jvoisin
3cef7fe7fc
Refactor tests
2019-10-12 13:32:04 -07:00
jvoisin
12489bb682
Remove a useless \
2019-10-12 21:36:28 +02:00
jvoisin
bb903ec309
Remove useless parenthesis
2019-10-12 21:36:19 +02:00
jvoisin
4483c06f19
Replace abstractstaticmethod with abstractmethod
...
Apparently, abstractstaticmethod is deprecated
since python3.3.
2019-10-12 21:28:27 +02:00
madaidan
58773088ac
Mount a new tmpfs on /tmp and drop all capabilities
...
This mounts a new tmpfs on /tmp so any files residing there would be hidden
from the sandbox. Many programs store some files in there that might be useful
to an attacker. It also drops all capabilities incase it is ever run with
extra capabilities for whatever reason.
2019-10-05 15:21:40 +02:00
jvoisin
3714553185
Fix bubblewrap
...
On some machines (like mine), `/proc` has to be mounted. Also, since
sandboxing with bubblewrap is best effort and assumes that an attacker doesn't
have control outside of the file to clean, it's safe to __try__ to enable some
bubblewrap features, and to silently fail otherwise.
2019-09-21 14:14:39 +02:00
jvoisin
1678d37856
Mark a comment as FP
2019-09-01 19:01:33 +02:00
jvoisin
397a18b0cc
Add support for ppm
2019-09-01 09:28:46 -07:00
jvoisin
0170f0e37e
Improve a bit the comments in the code
...
This is related to the previous commit
2019-09-01 13:52:02 +02:00
jvoisin
0cf0541ad9
Remove nsid fields from MSOffice documents
...
nsids are random identifiers, usually used to ease merging
between documents, and can trivially be used for fingerprinting.
2019-09-01 13:52:02 +02:00
jvoisin
0c75cd15dc
Remove a mypy workaround to bump coverage back to 100%
2019-07-22 23:28:51 +02:00
jvoisin
5280b6c2b3
Add a test for svg namespace
2019-07-22 23:21:06 +02:00
georg
8bb2826f7a
CI: Add job to run codespell, a spell checking software
2019-07-22 13:31:40 -07:00
jvoisin
5c33b290ae
Fix mypy
2019-07-20 16:05:55 +02:00
jvoisin
dc5603eb1d
Please mypy
2019-07-13 23:25:44 +02:00
jvoisin
4999209f9c
Add support for svg
2019-07-13 21:26:05 +02:00
jvoisin
bdd5581033
Compress cleaned zip archives by default
2019-07-13 15:04:43 +02:00
jvoisin
47f9cb33bf
Please mypy
2019-07-13 15:03:40 +02:00
jvoisin
35d550d229
Use memoization get _*_path() functions
...
This shouldn't make a big difference in the CLI/extension
usage, but might improve the performances of long-running
instances, or people misusing the API.
2019-05-16 00:31:40 +02:00
jvoisin
aa52a5c91c
Please mypy wrt. the last two commits
2019-05-14 00:50:17 +02:00
Antoine Tenart
f19f6ed8b6
Rework the dependency checks to distinguish required/optional ones
...
Rework the dependencies definition to include a 'required' flags, which
is passed by the check_dependencies helper to the callers, so that they
can distinguish between required and optional dependencies.
This help in two ways:
- The unit test for the dependencies was now failing when an optional
one was missing, due to a previous rework.
- Mat2's --check-dependencies was referring to "required dependencies"
and was misleading for the user as some of them could be optional.
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2019-05-13 23:35:26 +02:00
jvoisin
97abafdc58
Minor code cleanup
2019-05-09 09:41:05 +02:00
fuzzy
7e031c9757
typo
2019-05-03 02:39:15 -07:00
jvoisin
9516990693
Add some verification for "dangerous" tarfiles
2019-05-01 17:55:35 +02:00
jvoisin
a7ebb587e1
Handle weird permissions in tar archives
2019-04-27 22:48:40 +02:00
jvoisin
14a4cddb8b
Improve the display of tarfile's members mtime
2019-04-27 21:15:06 +02:00
jvoisin
8e41b098d6
Add support for compressed tar files
2019-04-27 06:03:09 -07:00
jvoisin
82cc822a1d
Add tar archive support
2019-04-27 04:05:36 -07:00
jvoisin
05f429b197
Add support for xhtml files
2019-04-14 20:36:33 +02:00
jvoisin
1e325c5b5b
Please mypy
...
Apparently, mypy isn't able (yet?) to deal
with variables that are changing their types
at runtime.
Python is wonderful.
2019-03-30 10:33:16 +01:00
Antoine Tenart
d454ef5b8e
libmat2: fix dependency checks for cmd line utilities
...
The command line checks for command line utilities are done by trying to
access the executables and by throwing an exception when not found. This
lead to:
- The mat2 cmd line --check-dependencies option failing.
- The ffmpeg unit tests failing when ffmpeg isn't installed (even though
it's an optional dependency).
This patch fixes it.
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2019-03-29 19:29:28 +01:00
Antoine Tenart
c824a68dd8
libmat2: reshape the dependencies list
...
Invert the keys and values in DEPENDENCIES. It seems more natural to use
the key as a key in check_dependencies(), and the value as the value.
This also help in preparing for reworking the check_dependencies()
helper.
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2019-03-29 19:29:28 +01:00
jvoisin
b8c92fec09
Fix the testsuite
2019-03-23 00:41:23 +01:00
Antoine Tenart
0e3c2c9b1b
libmat2: audio: not all id3 types have a text attribute
...
Not all id3 types have a text attribute (such as mutagen.id3.APIC or
mutagen.id3.UFID). This leads to the get_meta helper to crash when
trying to access the text attribute of an object which does not have it.
Fixes it by checking the text attribute is available before accessing
it.
Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2019-03-23 00:32:44 +01:00
Brolf
5ac91cd4f9
Refactor {black,white}list into {block,allow}list
...
Closes #96
2019-03-05 23:13:42 +00:00
georg
c3f097a82b
fix typo
2019-03-01 22:00:23 +00:00
jvoisin
55214206b5
Improve the previous commit
...
- More tests
- More documentation
- Minor code cleanup
2019-02-27 23:53:07 +01:00
jvoisin
73d2966e8c
Improve epub support
2019-02-27 23:04:38 +01:00
jvoisin
eb2e702f37
Document the previous commit
2019-02-25 15:37:44 +01:00
jvoisin
545dccc352
In archive-based formats, the mimetype
file comes first
...
This should improve epub compatibility,
along with other formats as a side-effect
2019-02-24 23:32:32 +01:00
jvoisin
524bae5972
<title> is also an html metadata
2019-02-23 20:47:26 +01:00
jvoisin
c757a9b7ef
Fix a bug in css cleaning
...
It's not mandatory to actually have a comment inside
comment delimiter, like `/**/`.
2019-02-23 20:21:11 +01:00
jvoisin
02ff21b158
Implement epub support
2019-02-20 16:28:11 -08:00
jvoisin
a81b7658a8
Make the mandatory metadata warning generic
...
This should close #95 .
2019-02-10 21:46:13 +01:00
jvoisin
6e63e03b86
Streamline a bit the previous commit
2019-02-09 15:23:16 +01:00
Poncho
a71488d459
bind mount /etc/ld.so.cache to the sandbox
...
without /etc/ld.so.cache available in the sandbox, tests fail on gentoo with:
/usr/bin/ffmpeg: error while loading shared libraries: libstdc++.so.6:
cannot open shared object file: No such file or directory
2019-02-09 09:49:51 +01:00
jvoisin
6ef6aaa222
Improve a bit get_meta for libreoffice files
2019-02-08 23:23:56 +01:00
jvoisin
6cc034e81b
Add support for html files
2019-02-08 23:05:18 +01:00
jvoisin
e1dd439fc8
Use of the archive refactoring for the office documents too
2019-02-07 22:19:37 +01:00
jvoisin
b9a62d798a
Refactor a bit office get_meta handling
...
This should make easier to get more metadata from
archive-based file formats.
2019-02-04 00:31:26 +01:00
jvoisin
433609f8ea
Implement .gif support
2019-02-03 21:01:58 +01:00
intrigeri
e8c1bb0e3c
Whenever possible, use bwrap for subprocesses
...
This should closes #90
2019-02-03 19:18:41 +01:00
jvoisin
8e84ba547a
Add support for wmv
2019-02-02 19:19:36 +01:00
jvoisin
04bb8c8ccf
Add mp4 support
2018-10-28 07:41:04 -07:00
jvoisin
3a070b0ab7
Add support for zip files
2018-10-25 11:56:46 +02:00
jvoisin
283e5e5787
Improve archive-based parser's robustness against corrupted embedded files
2018-10-25 11:56:12 +02:00
jvoisin
513d897ea0
Implement get_meta() for archives
2018-10-25 11:29:50 +02:00
jvoisin
5a9dc388ad
Minor refactorisation of how we're checking for exiftool's presence
2018-10-25 11:05:06 +02:00
jvoisin
fe885babee
Implement lightweight cleaning for jpg
2018-10-24 19:35:07 +02:00
jvoisin
9a81b3adfd
Improve type annotation coverage
2018-10-23 16:32:28 +02:00
jvoisin
f1a071d460
Implement lightweight cleaning for png and tiff
2018-10-23 16:22:11 +02:00
jvoisin
38df679a88
Optimize the handling of problematic files
2018-10-23 13:49:58 +02:00
jvoisin
44f267a596
Improve problematic filenames support
2018-10-22 16:56:05 +02:00
jvoisin
83389a63e9
Test mat2's reliability wrt. corrupted video files
2018-10-22 13:42:04 +02:00
jvoisin
e70ea811c9
Implement support for .avi files, via ffmpeg
...
- This commit introduces optional dependencies (namely ffmpeg):
mat2 will spit a warning when trying to process an .avi file
if ffmpeg isn't installed.
- Since metadata are obtained via exiftool, this commit
also refactors a bit our exfitool wrapper.
2018-10-22 12:58:01 +02:00
jvoisin
2ba38dd2a1
Bump mypy typing coverage
2018-10-12 14:32:09 +02:00
jvoisin
b832a59414
Refactor lightweight mode implementation
2018-10-12 11:49:24 +02:00
jvoisin
b9dbd12ef9
Implement recursive metadata for FLAC files
...
Since FLAC files can contain covers, it makes sense
to parse their metadata
2018-10-11 19:52:47 +02:00
jvoisin
b2e153b69c
Delete pictures of FLAC files
2018-10-11 18:15:11 +02:00
jvoisin
0d25b18d26
Improve both the typing and the comments
2018-10-05 17:07:58 +02:00
jvoisin
d0f3534eff
Hide unsupported extensions in mat2 -l
2018-10-05 12:43:21 +02:00
jvoisin
8e98593b02
Trash word/people.xml in office files
2018-10-04 16:28:20 +02:00
georg
34fbd633fd
libmat2: fix shebang
...
Relates 0a2a398c9c
2018-10-03 18:38:28 +00:00