Commit Graph

17 Commits

Author SHA1 Message Date
Ashwini Purohit 5669e4e289 fixes python3 installation error
Fixes invalid syntax error
2017-06-05 06:30:56 +05:30
vi e95374ec04 getobj can raise PDFObjectNotFound 2013-12-07 07:23:55 +08:00
vi 380bc289b3 Adapt to PDFMiner's breaking interface changes (#37). 2013-12-07 07:23:34 +08:00
Bryan Bishop 28bf8f5825 fix another syntax error in sciencemag
How were these missed??
2013-09-16 15:11:42 -05:00
Bryan Bishop cc7d14d173 WIP of "AdBlock for Science"
The purpose of adblock for science is to remove nasty ads from papers,
which at the moment means only papers from Science Magazine as published
by the American Association for the Advancement of Science (AAAS).

I am really annoyed that I have to write an ad blocker... for science
papers.
2013-07-19 21:31:30 -05:00
Bryan Bishop 528eae7e46 minor py3k-compat changes 2013-07-19 21:26:12 -05:00
Donncha O'Cearbhaill c673d77ec6 Check PDF is from the RSC before cleaning 2013-05-13 21:01:52 +01:00
Donncha O'Cearbhaill 18140d838d Adding support for PDF's from pubs.rsc.org 2013-05-13 20:28:35 +01:00
Zooko O'Whielacronx 503b8aead5 add -v -v mode which prints out the details (potentially sensitive, potentially bulky)
remove spie, which appears to do nothing
2013-02-13 21:08:49 +00:00
Zooko O'Whielacronx 9204b2e17e fix up verbose printouts, don't print out large data 2013-02-13 20:56:33 +00:00
Zooko O'Whielacronx 56cc7719da add a "--verbose" option that writes to stderr if it finds anything to omit
Also cleaned up some flakes noticed by pyflakes, and make the scrub() be @classmethod instead of @staticmethod so I could use the class for the verbose output.

caveats:

* there are no unit tests of this patch
* now your logs of your stderr have potentially sensitive information in them
* the implementation of arg parsing is very low-tech; (a *good* way to do arg parsing is the "argparse" module)
2013-02-13 19:58:47 +00:00
Bryan Bishop caed396870 SPIE watermark removal
This is slightly broken because the SPIE plugin removes more than just
watermarks. For some reason it seems to also remove images and large
blocks of text from the paper. However, the object that is being removed
is tiny. In the unit testing sample, the removed object is pdf stream
55.

For now, SPIE is partially disabled until this is fixed. The problem
does not originate from the other plugins.

fixes #20
2013-02-11 23:52:59 -06:00
Bryan Bishop b7b5a4ef65 jstor watermark removal
fixes #1
2013-02-06 17:33:00 -06:00
Bryan Bishop f78aad78ef AIP: better false-positives check 2013-02-05 17:20:11 -06:00
Bryan Bishop d276954bfa IEEE: remove print statement (oops) 2013-02-05 17:19:37 -06:00
Bryan Bishop 14f1439c76 ieee watermark removal 2013-02-05 04:49:56 -06:00
Bryan Bishop d8fc6c1d8f initial commit 2013-02-05 03:10:14 -06:00