1
0
Fork 0
Commit Graph

145 Commits

Author SHA1 Message Date
jvoisin e70ea811c9 Implement support for .avi files, via ffmpeg
- This commit introduces optional dependencies (namely ffmpeg):
  mat2 will spit a warning when trying to process an .avi file
  if ffmpeg isn't installed.
- Since metadata are obtained via exiftool, this commit
  also refactors a bit our exfitool wrapper.
2018-10-22 12:58:01 +02:00
jvoisin d4c050a738 wtf python 2018-10-18 20:29:50 +02:00
jvoisin f04d4b28fc Fix the tests on Debian? 2018-10-18 20:23:00 +02:00
jvoisin da88d30689 Fix the CI on debian 2018-10-14 10:59:50 +02:00
jvoisin b832a59414 Refactor lightweight mode implementation 2018-10-12 11:49:24 +02:00
jvoisin b9dbd12ef9 Implement recursive metadata for FLAC files
Since FLAC files can contain covers, it makes sense
to parse their metadata
2018-10-11 19:52:47 +02:00
jvoisin b2e153b69c Delete pictures of FLAC files 2018-10-11 18:15:11 +02:00
jvoisin 8675706c93 Improve the display of mat2 when no metadata are found
This should close #74
2018-10-05 12:35:35 +02:00
jvoisin df252fd71a Remove a superfluous import 2018-10-04 16:19:38 +02:00
jvoisin a1c39104fc Make the testsuite runnable on the installed MAT2 2018-10-04 16:16:52 +02:00
jvoisin 84e302ac93 Remove file left behind by the testsuite 2018-10-03 16:38:05 +02:00
jvoisin c67bbafb2c Use [Content_Types].xml to improve MS Office coverage 2018-10-02 11:55:42 -07:00
jvoisin 156e81fb4c Check that cleaning twice doesn't break the file 2018-10-02 16:05:51 +02:00
jvoisin 9578e4b4ee Silence a bit the testsuite 2018-10-02 15:26:13 +02:00
jvoisin e342671ead Remove dangling references in MS Office's [Content_types].xml 2018-09-30 19:53:18 +02:00
jvoisin 719cdf20fa Second pass of minor formatting 2018-09-24 20:15:07 +02:00
jvoisin 2e243355f5 Fix some minor formatting issues 2018-09-24 19:50:24 +02:00
jvoisin 174d4a0ac0 Implement rsid stripping for office files
MS Office XML rsid is a "unique identifier used to track the editing session
when the physical character representing this section mark was last formatted."

See the following links for details:
- https://msdn.microsoft.com/en-us/library/office/documentformat.openxml.wordprocessing.previoussectionproperties.rsidrpr.aspx
- https://blogs.msdn.microsoft.com/brian_jones/2006/12/11/whats-up-with-all-those-rsids/.
2018-09-24 18:03:59 +02:00
jvoisin 9826de3526 Add a test for zip ordering 2018-09-20 14:04:46 +02:00
jvoisin ab71c29a28 Make pyflakes happy 2018-09-20 01:19:22 +02:00
jvoisin 3d2842802c Split the tests 2018-09-20 01:13:59 +02:00
Yoann Lamouroux 0a2a398c9c trivial modification of all shebang.
`/usr/bin/python3` -> `/usr/bin/env python3`

It's always better to trust the environment defined path to bin/python, as
virtualenv become the way to go.
2018-09-12 14:58:27 +02:00
jvoisin 2e9adab86a Improve a cli test resilience 2018-09-06 11:32:29 +02:00
Daniel Kahn Gillmor f3cef319b9 Unknown Members: make policy use an Enum
Closes #60

Note: this changeset also ensures that clean.cleaned.docx is removed
up after the pytest is over.
2018-09-05 18:59:33 -04:00
jvoisin 3649c0ccaf Remove short version of dangerous/advanced options 2018-09-05 17:48:14 +02:00
jvoisin 46bb1b83ea Improve the previous commit 2018-09-05 17:26:09 +02:00
Daniel Kahn Gillmor 10d60bd398 add --unknown-members argument to mat2
This allows the user to make use of parser.unknown_member_policy for
archive formats.

At the suggestion of @jvoisin, it also prints a scary warning if the
user explicitly chooses 'keep'.
2018-09-04 18:28:04 -04:00
dkg e2634f7a50 Logging cleanup 2018-09-01 05:14:32 -07:00
jvoisin b5a9520a60 Add a cli-related test 2018-07-30 22:54:41 +02:00
jvoisin a1257c538b Add some tests about pathological files 2018-07-30 22:36:36 +02:00
jvoisin 5a7c7f35f7 Remove `print` from libmat, and use the `logging` module instead
This should close #28
2018-07-10 21:30:38 +02:00
jvoisin d5861e4653 Implement a check for dependencies in mat2
Example use:

```
$ mat2 -c
Dependencies required for MAT2 0.1.3:
- Cairo: yes
- Exiftool: yes
- GdkPixbuf from PyGobject: yes
- Mutagen: yes
- Poppler from PyGobject: yes
- PyGobject: yes
```

This should close #35
2018-07-10 21:24:26 +02:00
jvoisin bd357b85f8 Remove a useless option that was never implemented anyway 2018-07-09 00:13:16 +02:00
jvoisin f49aa5cab7 Achieve 100% coverage! 2018-07-08 22:27:37 +02:00
jvoisin 52a2c800b7 Bump coverage again 2018-07-08 21:50:52 +02:00
jvoisin ad3e7ccee8 Bump coverage for office files and fix some related crashes 2018-07-08 21:35:45 +02:00
jvoisin 3cd4f9111f Bump coverage for torrent handling 2018-07-08 15:13:03 +02:00
jvoisin b5fcddd6a6 Simplify how torrent files are handled
- Rework the testsuite wrt. torrent
- fail at parser's instantiation on corrupted torrent,
  instead of during `get_meta` or `remove_all` call
2018-07-08 13:49:11 +02:00
jvoisin 9f631a1bb1 Bump a bit the coverage 2018-07-07 18:02:53 +02:00
jvoisin 3d80f97524 Simplify BMP handling 2018-07-06 00:49:17 +02:00
jvoisin 53271495f7 Add support for .txt files 2018-07-06 00:42:09 +02:00
jvoisin bee56a57ce Remove docx revisions 2018-07-01 23:16:14 +02:00
jvoisin 02f7605ac1 MAT2 is now cleaning revisions from odt files! 2018-07-01 21:09:20 +02:00
jvoisin 80fc4ffb40 Remove the thumbnails from libreoffice files 2018-07-01 17:29:05 +02:00
jvoisin 74f2d50433 Split the testsuite a bit and add more tests 2018-06-22 21:16:55 +02:00
jvoisin b4ef0c9622 Improve reliability against corrupted image files 2018-06-22 20:38:29 +02:00
jvoisin 5b38bd7ccd Improve the reliability of the office parser 2018-06-21 23:18:59 +02:00
jvoisin 4600ce3490 Improve a bit the coverage 2018-06-10 20:20:45 +02:00
jvoisin 8c7979aae3 Add some tests for non-supported embedded fileformats 2018-06-10 20:19:35 +02:00
jvoisin 87bdcd1a95 Improve a bit our coverage wrt. torrent files handling 2018-06-10 00:56:55 +02:00
jvoisin e81ce6cd1a Fix and add a test for explicitly non-supported formats 2018-06-10 00:28:43 +02:00
jvoisin 6a832a4104 Prevent exiftool-based parameter-injection 2018-06-06 23:50:25 +02:00
jvoisin 6a1b0b31f0 Add more typing and use mypy in the CI 2018-06-04 23:20:30 +02:00
jvoisin 8cf9aeeb67 Rename mat2.py to mat2 2018-05-21 22:49:40 +02:00
jvoisin 38fae60b8b Rename some files to simplify packaging
- the `src` folder is now `libmat2`
- the `main.py` script is now `mat2.py`
2018-05-18 23:52:40 +02:00
jvoisin 0354c3b7e3 Add a test about unsupported files 2018-05-16 22:10:47 +02:00
jvoisin be6d32afa8 Some arguments are mutually exclusives 2018-05-16 00:07:04 +02:00
jvoisin c037e265c6 Add a `--version` option 2018-05-14 22:44:31 +02:00
jvoisin b02d72887a Test for faulty files, and document how MAT2 is behaving wrt. them 2018-05-06 21:58:31 +02:00
jvoisin 09930391c4 Clean up after the testsuite 2018-04-30 23:51:59 +02:00
jvoisin 23bc7e8f5f Rework the way we're outputing files 2018-04-30 23:46:37 +02:00
jvoisin d2b2a54a72 MAT2's cli now uses meaningful return codes
- Simplify the multiprocessing by using a Pool
- Use some functional (♥) constructions to exit
  with a return code
- Add some tests to prove that we're doing things
  that are working correctly
2018-04-29 22:59:23 +02:00
jvoisin cfc3a58550 Add a test for odg 2018-04-23 00:28:36 +02:00
jvoisin 0fa184cb6f Test .odf support 2018-04-23 00:25:06 +02:00
jvoisin 57bf89e035 Add support for torrent files cleaning 2018-04-22 22:02:00 +02:00
jvoisin ecb199b4a6 Add a cli-related test
Since I didn't notice that it was broken
until c5f5134502,
it's a good idea to have some tests for this ;)
2018-04-16 23:20:21 +02:00
jvoisin e34bc19f71 Add support for BMP
To be completely honest, BMP have no metadata,
but we still add it, just in case™
2018-04-16 22:27:29 +02:00
jvoisin 96299c6a53 Add lightweight processing for PDF 2018-04-14 21:23:31 +02:00
jvoisin 7ec1eff96e Improve the way we parse/display pdf metadata 2018-04-11 23:20:59 +02:00
jvoisin 0239ab3b6a Add some white lines to make the code more compliant 2018-04-04 23:21:48 +02:00
jvoisin 9fa76c4c20 Remove some unused imports 2018-04-04 23:18:38 +02:00
jvoisin d3b1eabe07 Add a test for when main.py is called without any args 2018-04-04 23:14:43 +02:00
jvoisin 4ee091d833 Improve get_meta in various ways
- Normalize the case
- Strip \00, \r, space and \n
- Flatten metadata lists
- Add tests for audio files
2018-04-04 21:59:46 +02:00
jvoisin 6c19e43e5d Add even more tests for the cli 2018-04-04 00:37:55 +02:00
jvoisin 6398befe14 Add a first test for the CLI 2018-04-04 00:22:00 +02:00
jvoisin ccf16d7489 Add a test for an issue highligthed by 76f25212d1 2018-04-03 23:29:34 +02:00
jvoisin 6868f20065 `parser_factory` now returns the mtype too 2018-04-02 17:36:26 +02:00
jvoisin 27beda354d Move every image-related parser into a single file 2018-04-01 12:30:00 +02:00
jvoisin eac51dbc99 Refactor office document handling 2018-04-01 01:04:06 +02:00
jvoisin 2d7c703c52 Add support for .tiff files 2018-04-01 00:43:36 +02:00
jvoisin c186fc4292 Clean deep metadata for zip files 2018-04-01 00:17:06 +02:00
jvoisin 6d506b8757 Add a deep check for office/libreoffice files 2018-03-31 23:09:54 +02:00
jvoisin 12b3b39d4d Add support for .odt 2018-03-31 21:20:21 +02:00
jvoisin 1ee936420c Display docx metadata 2018-03-31 21:16:02 +02:00
jvoisin 865ad181ae Add support for docx 2018-03-31 15:47:06 +02:00
jvoisin f391c9603c Change a bit the source code organisation 2018-03-31 15:46:17 +02:00
jvoisin 2eb68928d5 FLAC support 2018-03-25 16:20:45 +02:00
jvoisin 19a8fd97aa Implement mp3 and ogg support 2018-03-25 16:17:41 +02:00
jvoisin d4d6f31655 Add support for jpeg 2018-03-25 15:09:12 +02:00
jvoisin 7ad9ff08ad Add a test for PNG files 2018-03-20 23:35:02 +01:00
jvoisin acb9b2d14e Clean metadata 2018-03-18 23:48:14 +01:00
jvoisin df3c27d79d Improve the testsuite 2018-03-18 21:42:12 +01:00
jvoisin 069765376d Remove a useless file 2018-03-13 01:01:18 +01:00
jvoisin 67ce0f739f Add a working test 2018-03-13 01:01:07 +01:00
jvoisin 13d2507d60 First commit 2018-03-06 23:20:18 +01:00