1
0
mirror of synced 2024-11-24 18:24:23 +01:00
Commit Graph

250 Commits

Author SHA1 Message Date
jvoisin
bf0c777cb9 Improve support for xlsx files 2021-05-20 18:16:28 +02:00
jvoisin
85c08c5b68 Add support for AIFF files
This should close #151
2021-04-24 17:26:38 +02:00
jvoisin
d00ca800b2 Keep sharedStrings.xml when processing MSOffice sheets 2021-03-14 14:41:40 +01:00
jvoisin
8b42b28b70 Don't keep [trash] files when processing MS Office files 2021-03-14 14:35:29 +01:00
jvoisin
e2362b8620 Improve epub support
Warn when there are encrypted fonts in an epub file
2021-03-07 17:50:25 +01:00
jvoisin
626669f95f Add some typing to epub.py 2021-03-07 17:50:17 +01:00
jvoisin
497f5f71fc Improve epub compatibility 2021-03-07 16:59:18 +01:00
jvoisin
cd5f2eb71c Add a missing comma
This should improve epub support
2021-03-07 16:42:38 +01:00
jvoisin
ec082d6483 Improve a bit the support of epub 2021-02-07 17:24:50 +01:00
jvoisin
f8111547ae Improve epub compatibility 2021-01-30 16:24:42 +01:00
jvoisin
a517f8d36e Please pylint 2020-11-30 18:52:07 +01:00
jvoisin
61dce89fbd Raise a ValueError explicitly 2020-11-30 18:52:07 +01:00
jvoisin
148bcbba52 Bump coverage 2020-11-13 17:27:23 +01:00
jvoisin
b3def8b5de Mount /etc/alternatives inside bubblewrap
This is now required by ffmpeg
2020-11-13 17:18:20 +01:00
jvoisin
77dde8a049 Please pylint 2020-11-13 12:09:25 +01:00
Romain Vigier
1b361ec27e
Don't set a default value when retrieving Xmlns key for SVG metadata 2020-11-12 22:46:14 +01:00
jvoisin
f638168033 Better handling of malformed pdf 2020-11-06 16:05:24 +01:00
jvoisin
b84f73c5c3 Handle multiple namespaces in MSOffice's content types 2020-11-06 15:29:42 +01:00
jvoisin
96e639dfd3 Fix a regexp for xsls files
This should increase a bit the compability with Excel files
2020-11-06 15:26:30 +01:00
jvoisin
46b3ae1672 Fix a crash affecting some mp3 files 2020-07-22 15:47:35 +02:00
jvoisin
d8b68ef68e Improve a bit Microsoft word support 2020-05-17 16:53:36 +02:00
jvoisin
c8dc020dc5 Improve xlsx support 2020-04-06 20:47:32 +02:00
jvoisin
599909a760 Improve xlsx support 2020-04-02 20:58:10 +02:00
jvoisin
d7a03d907b Vastly improve ppt compatibility 2020-03-08 14:06:27 +01:00
jvoisin
a23dc001cd Improve compatibility with MS Office of cleaned ppt 2020-03-07 14:34:07 +01:00
jvoisin
f93df85d03 Improve a bit ppt support 2020-03-07 05:22:36 -08:00
jvoisin
e5b1068ed6 Improve a bit the support of ppt files 2020-03-07 12:49:45 +01:00
tguinot
56d2c4aa5f Add which pathfinding for executables 2020-02-11 17:23:11 +01:00
jvoisin
5270071b94 Remove a couple of residual metadata in pdf
This commit takes care of removing residual metadata
added by mat2 during the cleaning of pdf.
2020-02-08 17:00:37 +01:00
jvoisin
ee704db2ff Add support for wav files 2020-01-01 19:47:46 +01:00
jvoisin
693408f1a6 Please mypy
Mypy doesn't like some annotation in web.py,
this commits aims at pleasing it.
2019-12-29 15:20:48 +01:00
Ivy Fay
b2efffdaa4 sandbox: stop mounting new filesystem on /tmp
Mounting new, empty filesystem on /tmp makes impossible to use mat2 for manipulating files stored there. Especially it breaks running tests while creating package and using /tmp as temporary builddir which is common setup in Arch Linux:
https://aur.archlinux.org/packages/mat2/#comment-721221
2019-12-18 02:23:43 -08:00
jvoisin
7465cedee7 Handle tiff images with a .tif extension 2019-12-16 14:55:35 -08:00
jvoisin
f5aef1b391 Improve the reliability of Exiftool-base parsers 2019-12-15 09:04:51 -08:00
jvoisin
2e3496d3d4 Improve the reliability of Gdk-based parsers 2019-12-15 07:05:53 -08:00
jvoisin
be24c681ff Improve the reliability of PNG parsing 2019-12-15 06:57:32 -08:00
jvoisin
efa525c102 Improve the robustness of the HTML parser 2019-12-15 06:50:54 -08:00
jvoisin
f67cd9d7dc Improve the robustness of the CSS parser 2019-12-15 06:44:21 -08:00
jvoisin
e4114af3b5 Improve a bit ppt support 2019-11-30 11:38:22 +01:00
jvoisin
d56f83bed1 Improve a bit odt handling 2019-11-30 10:25:24 +01:00
georg
697cb36b81 This is mat2, not MAT2
Closes #131
2019-11-30 01:14:41 -08:00
jvoisin
df1eb98a40 Please the new version of pylint 2019-11-26 22:12:56 +01:00
jvoisin
655c19d17d Improve a bit the support for ppt files 2019-10-17 23:02:17 +02:00
jvoisin
5f0b3beb46 Add a way to disable the sandbox
Due to bubblewrap's pickiness, mat2 can now be run
without a sandbox, even if bubblewrap is installed.
2019-10-12 16:13:49 -07:00
jvoisin
3cef7fe7fc Refactor tests 2019-10-12 13:32:04 -07:00
jvoisin
12489bb682 Remove a useless \ 2019-10-12 21:36:28 +02:00
jvoisin
bb903ec309 Remove useless parenthesis 2019-10-12 21:36:19 +02:00
jvoisin
4483c06f19 Replace abstractstaticmethod with abstractmethod
Apparently, abstractstaticmethod is deprecated
since python3.3.
2019-10-12 21:28:27 +02:00
madaidan
58773088ac Mount a new tmpfs on /tmp and drop all capabilities
This mounts a new tmpfs on /tmp so any files residing there would be hidden
from the sandbox. Many programs store some files in there that might be useful
to an attacker.  It also drops all capabilities incase it is ever run with
extra capabilities for whatever reason.
2019-10-05 15:21:40 +02:00
jvoisin
3714553185 Fix bubblewrap
On some machines (like mine), `/proc` has to be mounted.  Also, since
sandboxing with bubblewrap is best effort and assumes that an attacker doesn't
have control outside of the file to clean, it's safe to __try__ to enable some
bubblewrap features, and to silently fail otherwise.
2019-09-21 14:14:39 +02:00
jvoisin
1678d37856 Mark a comment as FP 2019-09-01 19:01:33 +02:00
jvoisin
397a18b0cc Add support for ppm 2019-09-01 09:28:46 -07:00
jvoisin
0170f0e37e Improve a bit the comments in the code
This is related to the previous commit
2019-09-01 13:52:02 +02:00
jvoisin
0cf0541ad9 Remove nsid fields from MSOffice documents
nsids are random identifiers, usually used to ease merging
between documents, and can trivially be used for fingerprinting.
2019-09-01 13:52:02 +02:00
jvoisin
0c75cd15dc Remove a mypy workaround to bump coverage back to 100% 2019-07-22 23:28:51 +02:00
jvoisin
5280b6c2b3 Add a test for svg namespace 2019-07-22 23:21:06 +02:00
georg
8bb2826f7a CI: Add job to run codespell, a spell checking software 2019-07-22 13:31:40 -07:00
jvoisin
5c33b290ae Fix mypy 2019-07-20 16:05:55 +02:00
jvoisin
dc5603eb1d Please mypy 2019-07-13 23:25:44 +02:00
jvoisin
4999209f9c Add support for svg 2019-07-13 21:26:05 +02:00
jvoisin
bdd5581033 Compress cleaned zip archives by default 2019-07-13 15:04:43 +02:00
jvoisin
47f9cb33bf Please mypy 2019-07-13 15:03:40 +02:00
jvoisin
35d550d229 Use memoization get _*_path() functions
This shouldn't make a big difference in the CLI/extension
usage, but might improve the performances of long-running
instances, or people misusing the API.
2019-05-16 00:31:40 +02:00
jvoisin
aa52a5c91c Please mypy wrt. the last two commits 2019-05-14 00:50:17 +02:00
Antoine Tenart
f19f6ed8b6 Rework the dependency checks to distinguish required/optional ones
Rework the dependencies definition to include a 'required' flags, which
is passed by the check_dependencies helper to the callers, so that they
can distinguish between required and optional dependencies.

This help in two ways:
- The unit test for the dependencies was now failing when an optional
  one was missing, due to a previous rework.
- Mat2's --check-dependencies was referring to "required dependencies"
  and was misleading for the user as some of them could be optional.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2019-05-13 23:35:26 +02:00
jvoisin
97abafdc58 Minor code cleanup 2019-05-09 09:41:05 +02:00
fuzzy
7e031c9757 typo 2019-05-03 02:39:15 -07:00
jvoisin
9516990693 Add some verification for "dangerous" tarfiles 2019-05-01 17:55:35 +02:00
jvoisin
a7ebb587e1 Handle weird permissions in tar archives 2019-04-27 22:48:40 +02:00
jvoisin
14a4cddb8b Improve the display of tarfile's members mtime 2019-04-27 21:15:06 +02:00
jvoisin
8e41b098d6 Add support for compressed tar files 2019-04-27 06:03:09 -07:00
jvoisin
82cc822a1d Add tar archive support 2019-04-27 04:05:36 -07:00
jvoisin
05f429b197 Add support for xhtml files 2019-04-14 20:36:33 +02:00
jvoisin
1e325c5b5b Please mypy
Apparently, mypy isn't able (yet?) to deal
with variables that are changing their types
at runtime.

Python is wonderful.
2019-03-30 10:33:16 +01:00
Antoine Tenart
d454ef5b8e libmat2: fix dependency checks for cmd line utilities
The command line checks for command line utilities are done by trying to
access the executables and by throwing an exception when not found. This
lead to:
- The mat2 cmd line --check-dependencies option failing.
- The ffmpeg unit tests failing when ffmpeg isn't installed (even though
  it's an optional dependency).

This patch fixes it.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2019-03-29 19:29:28 +01:00
Antoine Tenart
c824a68dd8 libmat2: reshape the dependencies list
Invert the keys and values in DEPENDENCIES. It seems more natural to use
the key as a key in check_dependencies(), and the value as the value.
This also help in preparing for reworking the check_dependencies()
helper.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2019-03-29 19:29:28 +01:00
jvoisin
b8c92fec09 Fix the testsuite 2019-03-23 00:41:23 +01:00
Antoine Tenart
0e3c2c9b1b libmat2: audio: not all id3 types have a text attribute
Not all id3 types have a text attribute (such as mutagen.id3.APIC or
mutagen.id3.UFID). This leads to the get_meta helper to crash when
trying to access the text attribute of an object which does not have it.
Fixes it by checking the text attribute is available before accessing
it.

Signed-off-by: Antoine Tenart <antoine.tenart@ack.tf>
2019-03-23 00:32:44 +01:00
Brolf
5ac91cd4f9
Refactor {black,white}list into {block,allow}list
Closes #96
2019-03-05 23:13:42 +00:00
georg
c3f097a82b
fix typo 2019-03-01 22:00:23 +00:00
jvoisin
55214206b5 Improve the previous commit
- More tests
- More documentation
- Minor code cleanup
2019-02-27 23:53:07 +01:00
jvoisin
73d2966e8c Improve epub support 2019-02-27 23:04:38 +01:00
jvoisin
eb2e702f37 Document the previous commit 2019-02-25 15:37:44 +01:00
jvoisin
545dccc352 In archive-based formats, the mimetype file comes first
This should improve epub compatibility,
along with other formats as a side-effect
2019-02-24 23:32:32 +01:00
jvoisin
524bae5972 <title> is also an html metadata 2019-02-23 20:47:26 +01:00
jvoisin
c757a9b7ef Fix a bug in css cleaning
It's not mandatory to actually have a comment inside
comment delimiter, like `/**/`.
2019-02-23 20:21:11 +01:00
jvoisin
02ff21b158 Implement epub support 2019-02-20 16:28:11 -08:00
jvoisin
a81b7658a8 Make the mandatory metadata warning generic
This should close #95.
2019-02-10 21:46:13 +01:00
jvoisin
6e63e03b86 Streamline a bit the previous commit 2019-02-09 15:23:16 +01:00
Poncho
a71488d459 bind mount /etc/ld.so.cache to the sandbox
without /etc/ld.so.cache available in the sandbox, tests fail on gentoo with:
/usr/bin/ffmpeg: error while loading shared libraries: libstdc++.so.6:
    cannot open shared object file: No such file or directory
2019-02-09 09:49:51 +01:00
jvoisin
6ef6aaa222 Improve a bit get_meta for libreoffice files 2019-02-08 23:23:56 +01:00
jvoisin
6cc034e81b Add support for html files 2019-02-08 23:05:18 +01:00
jvoisin
e1dd439fc8 Use of the archive refactoring for the office documents too 2019-02-07 22:19:37 +01:00
jvoisin
b9a62d798a Refactor a bit office get_meta handling
This should make easier to get more metadata from
archive-based file formats.
2019-02-04 00:31:26 +01:00
jvoisin
433609f8ea Implement .gif support 2019-02-03 21:01:58 +01:00
intrigeri
e8c1bb0e3c Whenever possible, use bwrap for subprocesses
This should closes  #90
2019-02-03 19:18:41 +01:00
jvoisin
8e84ba547a Add support for wmv 2019-02-02 19:19:36 +01:00
jvoisin
04bb8c8ccf Add mp4 support 2018-10-28 07:41:04 -07:00
jvoisin
3a070b0ab7 Add support for zip files 2018-10-25 11:56:46 +02:00
jvoisin
283e5e5787 Improve archive-based parser's robustness against corrupted embedded files 2018-10-25 11:56:12 +02:00