1
0
mirror of synced 2024-11-24 18:24:23 +01:00
Commit Graph

421 Commits

Author SHA1 Message Date
jvoisin
96299c6a53 Add lightweight processing for PDF 2018-04-14 21:23:31 +02:00
jvoisin
6f4ed2490f Thread the cleaning process 2018-04-14 16:13:51 +02:00
jvoisin
cef5068fe9 Silence the apt process of the CI 2018-04-14 16:12:32 +02:00
jvoisin
bbde340e8a Silence a bit the CI 2018-04-11 23:40:35 +02:00
jvoisin
7ec1eff96e Improve the way we parse/display pdf metadata 2018-04-11 23:20:59 +02:00
jvoisin
0239ab3b6a Add some white lines to make the code more compliant 2018-04-04 23:21:48 +02:00
jvoisin
9fa76c4c20 Remove some unused imports 2018-04-04 23:18:38 +02:00
jvoisin
d830760d4f Oups, fix the build 2018-04-04 23:18:32 +02:00
jvoisin
972de8469e main.py is now correctly handling folders 2018-04-04 23:15:00 +02:00
jvoisin
d3b1eabe07 Add a test for when main.py is called without any args 2018-04-04 23:14:43 +02:00
jvoisin
4ee091d833 Improve get_meta in various ways
- Normalize the case
- Strip \00, \r, space and \n
- Flatten metadata lists
- Add tests for audio files
2018-04-04 21:59:46 +02:00
jvoisin
1ad817566d Fix the ci 2018-04-04 01:06:35 +02:00
jvoisin
6c19e43e5d Add even more tests for the cli 2018-04-04 00:37:55 +02:00
jvoisin
6398befe14 Add a first test for the CLI 2018-04-04 00:22:00 +02:00
jvoisin
e3a7c3b9c4 main.py is an executable script 2018-04-04 00:21:39 +02:00
jvoisin
463c0b62a1 Fix a typo spotted by @doobry in get_meta for zip-based files 2018-04-03 23:59:02 +02:00
jvoisin
afeb3753a8 Improve the cli
- Implement the `-l` option
- The help is now more awesome
2018-04-03 23:57:13 +02:00
jvoisin
1d6559596d Apparently, image/jpg isn't correct, image/jpeg is 2018-04-03 23:56:39 +02:00
jvoisin
ccf16d7489 Add a test for an issue highligthed by 76f25212d1 2018-04-03 23:29:34 +02:00
jvoisin
cd8f1a55b1 Add a note about why we do clean PDF in a completely overkill way 2018-04-03 21:45:05 +02:00
jvoisin
e8e3ab6c86 Add some related softwares 2018-04-03 21:37:46 +02:00
jvoisin
2a51ae03df Add more details to the warnings, thanks to @pabs 2018-04-03 21:34:45 +02:00
Loic Dachary
76f25212d1 get_parse needs to explore subclasses recusively 2018-04-03 21:27:38 +02:00
jvoisin
04a0032e9f Add some comments 2018-04-02 23:40:08 +02:00
jvoisin
b5a5535e3f Add some more type hinting 2018-04-02 23:40:00 +02:00
jvoisin
f5753dec40 Clean up the code for PDF handling 2018-04-02 23:36:56 +02:00
jvoisin
721ee78d15 Fix a mistake wrt. office handling 2018-04-02 23:35:03 +02:00
jvoisin
0cc7e1e680 Improve the main.py file 2018-04-02 19:12:10 +02:00
jvoisin
23bd22b305 Add more typing hints 2018-04-02 19:11:59 +02:00
jvoisin
6868f20065 parser_factory now returns the mtype too 2018-04-02 17:36:26 +02:00
jvoisin
6c29e0eae2 Improve a bit the main.py file 2018-04-01 17:13:34 +02:00
jvoisin
7992cd0d51 Add some documentation 2018-04-01 15:36:45 +02:00
jvoisin
9e7a4bd217 Implement support in get_meta for deep meta in office-related files 2018-04-01 15:08:38 +02:00
jvoisin
27beda354d Move every image-related parser into a single file 2018-04-01 12:30:00 +02:00
jvoisin
711347c87f AbstractParser is an abstract class 2018-04-01 12:06:50 +02:00
jvoisin
da5cef8c90 Add a bla about requirements 2018-04-01 01:06:56 +02:00
jvoisin
eac51dbc99 Refactor office document handling 2018-04-01 01:04:06 +02:00
jvoisin
2d7c703c52 Add support for .tiff files 2018-04-01 00:43:36 +02:00
jvoisin
c186fc4292 Clean deep metadata for zip files 2018-04-01 00:17:06 +02:00
jvoisin
6d506b8757 Add a deep check for office/libreoffice files 2018-03-31 23:09:54 +02:00
jvoisin
fb5956bd6b Add application/rdf+xml to harmless mimetypes 2018-03-31 23:09:15 +02:00
jvoisin
6aeffe6823 Add a logo 2018-03-31 22:01:41 +02:00
jvoisin
88fcd4071d Support even more libreoffice files 2018-03-31 21:22:16 +02:00
jvoisin
12b3b39d4d Add support for .odt 2018-03-31 21:20:21 +02:00
jvoisin
0bbafc4cc5 Improve resilience of main.py 2018-03-31 21:16:02 +02:00
jvoisin
1ee936420c Display docx metadata 2018-03-31 21:16:02 +02:00
jvoisin
e4d2506d6a Add LICENSE 2018-03-31 07:00:14 -07:00
jvoisin
8bd083ce51 Support python3.5 2018-03-31 15:57:14 +02:00
jvoisin
4da974edaf Add a .gitignore 2018-03-31 15:47:46 +02:00
jvoisin
865ad181ae Add support for docx 2018-03-31 15:47:06 +02:00