The purpose of adblock for science is to remove nasty ads from papers,
which at the moment means only papers from Science Magazine as published
by the American Association for the Advancement of Science (AAAS).
I am really annoyed that I have to write an ad blocker... for science
papers.
As python 2.6 was already commented as a potential environment, there seemed little
reason to not use Argparse rather than a sys.argv popping system; argparse offers
automatically generated usage documentation and can offer useful errors when input
is incorrect.
The "with" context statement is also highly excellent and should be used wherever
legacy support for old-timers using 2.6 is not needed.
Also cleaned up some flakes noticed by pyflakes, and make the scrub() be @classmethod instead of @staticmethod so I could use the class for the verbose output.
caveats:
* there are no unit tests of this patch
* now your logs of your stderr have potentially sensitive information in them
* the implementation of arg parsing is very low-tech; (a *good* way to do arg parsing is the "argparse" module)
This is slightly broken because the SPIE plugin removes more than just
watermarks. For some reason it seems to also remove images and large
blocks of text from the paper. However, the object that is being removed
is tiny. In the unit testing sample, the removed object is pdf stream
55.
For now, SPIE is partially disabled until this is fixed. The problem
does not originate from the other plugins.
fixes#20
The deflate function expands some of the FlateDecode streams in a pdf
file. The output of the deflate function is not always correct and it is
very buggy. Still, this is a useful tool to poke around in foreign pdfs
under investigation.
Some publishers generate pdfs with the watermarks inside the text of a
page, in which case the object needs to be replaced. This deflates the
object and uses plaintext instead. While this increases the size of the
pdf, it is also effective for removing watermarks from the stream.