fixing README

This commit is contained in:
Scott Morrison 2013-05-02 23:27:49 +10:00
parent 702f2e2895
commit 11b59bd544
1 changed files with 9 additions and 7 deletions

View File

@ -1,15 +1,17 @@
`comparediffs` provides a tools for downloading a PDF from two different sources, running pdfparanoia on the files, comparing the outputs byte-for-byte, `comparediffs` provides a tool which
and reporting the results. * downloads a PDF from two different sources,
* runs `pdfparanoia` on both files, and
* compares the outputs byte-for-byte.
Typical usage is to first establish two `ssh` tunnels to hosts with access to the literature, e.g. via Typical usage is to first establish two `ssh` tunnels to hosts with access to the literature, e.g. via
`ssh -D 1080 host1` and `ssh -D 1081 host2`. You can then invoke `comparediffs` via `ssh -D 1080 host1` and `ssh -D 1081 host2`. You can then invoke `comparediffs` via
./comparediffs localhost:1080 localhost:1081 < urls ./comparediffs localhost:1080 localhost:1081 < urls
where urls is a file containing one URL per line (e.g. the example file in this directory). where `urls` is a file containing one URL per line (e.g. the example file in this directory).
`comparediffs` creates a subdirectory `pdf/`, in which is stores PDFs. It won't try to download the same PDF twice, so if you fix pdfparanoia you'll `comparediffs` creates a subdirectory `pdf/`, in which is stores PDFs. It won't try to download the same PDF twice, so if you make changes to `pdfparanoia` you'll
need to clean out some or all of this subdirectory. want to clean out some or all of this subdirectory.
It's easy to see which PDFs pdfparanoia failed on, as it leaves copies of the scrubbed files with suffixes `.1.cleaned.pdf` and `.2.cleaned.pdf`. It's easy to see which PDFs `pdfparanoia` failed on afterwards, as it leaves copies of the scrubbed files with suffixes `.1.cleaned.pdf` and `.2.cleaned.pdf`.
When pdfparanoia succeeds (or isn't even needed, because the downloaded files were identical), the scrubbed files are removed. When `pdfparanoia` succeeds (or isn't even needed, because the downloaded files were identical), the scrubbed files are removed.