Alex Marchant
d61fb7f77a
Wait to remove elements until they are all processed
2024-09-13 14:28:57 +02:00
jvoisin
a47ac01eb6
Remove a duplicate function
...
This is a leftover from today's best-effort merges.
2024-04-05 19:51:14 +02:00
Alex Marchant
156855ab7e
Remove dangling references from document.xml.rels
...
The file `word/_rels/document.xml.rels` is similar to `[Content_Types].xml` and
has references to other files in the archive. If those references aren't
removed Word refuses to open the document. # Please enter the commit message
for your changes. Lines starting
2024-04-05 18:45:58 +02:00
jvoisin
09672a2dcc
Merge branch 'alexmarchant-utf-8-encode-all'
2024-04-05 18:33:30 +02:00
Alex Marchant
f2c898c92d
Strip comment references from document.xml
2024-04-05 18:31:49 +02:00
Alex Marchant
f931a0ecee
Make utf-8 explicit in all tree.write calls
2024-04-03 15:27:48 -04:00
Alex Marchant
61f39c4bd0
Strip comment references from document.xml
2024-04-03 15:20:00 -04:00
Alex Marchant
17e76ab6f0
Update comments file regex
2024-04-03 14:49:39 -04:00
Jason Smalls
8c26020f67
Add more files to ignore for MSOffice documents
2023-07-11 21:38:22 +02:00
jvoisin
1b9608aecf
Use proper type annotations instead of comments
2023-05-03 22:28:02 +02:00
jvoisin
3cb3f58084
Another typing pass
2023-01-28 17:22:26 +01:00
jvoisin
39fb254e01
Fix the type annotations
2023-01-28 15:57:20 +00:00
jvoisin
62a45c29df
Improve xlsx support
2022-12-25 18:05:13 +01:00
jvoisin
180ea24e5a
Remove pyflakes
...
Isn't borderline useless compared to mypy and pylint
2022-11-21 19:57:38 +01:00
jvoisin
cc5be8608b
Simplify the typing annotations
2022-08-28 22:29:06 +02:00
jvoisin
3378f3ab8c
Please pylint by iterating on dict directly, instead of calling .keys()
2021-12-26 15:23:26 +01:00
jvoisin
0b094b594b
Improve xlsx support
...
This should close #156
2021-07-14 23:34:02 +02:00
jvoisin
bf0c777cb9
Improve support for xlsx files
2021-05-20 18:16:28 +02:00
jvoisin
d00ca800b2
Keep sharedStrings.xml when processing MSOffice sheets
2021-03-14 14:41:40 +01:00
jvoisin
8b42b28b70
Don't keep [trash] files when processing MS Office files
2021-03-14 14:35:29 +01:00
jvoisin
148bcbba52
Bump coverage
2020-11-13 17:27:23 +01:00
jvoisin
b84f73c5c3
Handle multiple namespaces in MSOffice's content types
2020-11-06 15:29:42 +01:00
jvoisin
96e639dfd3
Fix a regexp for xsls files
...
This should increase a bit the compability with Excel files
2020-11-06 15:26:30 +01:00
jvoisin
d8b68ef68e
Improve a bit Microsoft word support
2020-05-17 16:53:36 +02:00
jvoisin
c8dc020dc5
Improve xlsx support
2020-04-06 20:47:32 +02:00
jvoisin
599909a760
Improve xlsx support
2020-04-02 20:58:10 +02:00
jvoisin
d7a03d907b
Vastly improve ppt compatibility
2020-03-08 14:06:27 +01:00
jvoisin
a23dc001cd
Improve compatibility with MS Office of cleaned ppt
2020-03-07 14:34:07 +01:00
jvoisin
f93df85d03
Improve a bit ppt support
2020-03-07 05:22:36 -08:00
jvoisin
e5b1068ed6
Improve a bit the support of ppt files
2020-03-07 12:49:45 +01:00
jvoisin
e4114af3b5
Improve a bit ppt support
2019-11-30 11:38:22 +01:00
jvoisin
d56f83bed1
Improve a bit odt handling
2019-11-30 10:25:24 +01:00
jvoisin
655c19d17d
Improve a bit the support for ppt files
2019-10-17 23:02:17 +02:00
jvoisin
0170f0e37e
Improve a bit the comments in the code
...
This is related to the previous commit
2019-09-01 13:52:02 +02:00
jvoisin
0cf0541ad9
Remove nsid fields from MSOffice documents
...
nsids are random identifiers, usually used to ease merging
between documents, and can trivially be used for fingerprinting.
2019-09-01 13:52:02 +02:00
jvoisin
82cc822a1d
Add tar archive support
2019-04-27 04:05:36 -07:00
Brolf
5ac91cd4f9
Refactor {black,white}list into {block,allow}list
...
Closes #96
2019-03-05 23:13:42 +00:00
jvoisin
6ef6aaa222
Improve a bit get_meta for libreoffice files
2019-02-08 23:23:56 +01:00
jvoisin
e1dd439fc8
Use of the archive refactoring for the office documents too
2019-02-07 22:19:37 +01:00
jvoisin
b9a62d798a
Refactor a bit office get_meta handling
...
This should make easier to get more metadata from
archive-based file formats.
2019-02-04 00:31:26 +01:00
intrigeri
e8c1bb0e3c
Whenever possible, use bwrap for subprocesses
...
This should closes #90
2019-02-03 19:18:41 +01:00
jvoisin
513d897ea0
Implement get_meta() for archives
2018-10-25 11:29:50 +02:00
jvoisin
2ba38dd2a1
Bump mypy typing coverage
2018-10-12 14:32:09 +02:00
jvoisin
0d25b18d26
Improve both the typing and the comments
2018-10-05 17:07:58 +02:00
jvoisin
8e98593b02
Trash word/people.xml in office files
2018-10-04 16:28:20 +02:00
jvoisin
5a5c642a46
Don't break office files for MS Office
...
We didn't take the whitelist into account while
removing dangling files from [Content_types].xml
2018-10-03 16:38:05 +02:00
jvoisin
1b356b8c6f
Improve mat2's cli reliability
...
- Replace some class members by instance members
- Don't thread the cleaning process anymore for now
2018-10-03 15:22:36 +02:00
jvoisin
c67bbafb2c
Use [Content_Types].xml to improve MS Office coverage
2018-10-02 11:55:42 -07:00
georg
5b606f939d
fix typo
2018-10-02 16:01:24 +00:00
jvoisin
652b8e519f
Files processed via MAT2 are now accepted without warnings by MS Office
2018-10-01 12:25:37 -07:00