pdfparanoia/tests/diff
Scott Morrison 2ec1ca21a6 hash bang 2013-05-02 23:29:07 +10:00
..
README.md fixing README 2013-05-02 23:27:49 +10:00
comparediffs hash bang 2013-05-02 23:29:07 +10:00
urls merging urls.denied back into urls 2013-05-02 23:18:11 +10:00

README.md

comparediffs provides a tool which

  • downloads a PDF from two different sources,
  • runs pdfparanoia on both files, and
  • compares the outputs byte-for-byte.

Typical usage is to first establish two ssh tunnels to hosts with access to the literature, e.g. via ssh -D 1080 host1 and ssh -D 1081 host2. You can then invoke comparediffs via

./comparediffs localhost:1080 localhost:1081 < urls

where urls is a file containing one URL per line (e.g. the example file in this directory).

comparediffs creates a subdirectory pdf/, in which is stores PDFs. It won't try to download the same PDF twice, so if you make changes to pdfparanoia you'll want to clean out some or all of this subdirectory.

It's easy to see which PDFs pdfparanoia failed on afterwards, as it leaves copies of the scrubbed files with suffixes .1.cleaned.pdf and .2.cleaned.pdf. When pdfparanoia succeeds (or isn't even needed, because the downloaded files were identical), the scrubbed files are removed.