pdfparanoia/README.md

65 lines
1.2 KiB
Markdown

# pdfparanoia
pdfparanoia is a PDF watermark removal library for academic papers. Some publishers include private information like institution names, personal names, ip addresses, timestamps and other identifying information in watermarks on each page.
## Installing
Simple.
``` bash
sudo pip install pdfparanoia
```
or,
``` bash
sudo python setup.py install
```
pdfparanoia is written for python2.7+ or python 3.
You will also need to manually install "pdfminer" if you do not use pip to install pdfparanoia.
## Usage
``` python
import pdfparanoia
pdf = pdfparanoia.scrub(open("nmat91417.pdf", "rb"))
with open("output.pdf", "wb") as file_handler:
file_handler.write(pdf)
```
or from the shell,
``` bash
pdfparanoia --verbose input.pdf -o output.pdf
```
and,
``` bash
cat input.pdf | pdfparanoia > output.pdf
```
## Supported
* AIP
* IEEE
* JSTOR
* RSC
* SPIE (sort of)
## Changelog
* 0.0.13 - RSC
* 0.0.12 - SPIE
* 0.0.11 - pdfparanoia command-line interface. Use it by either piping in pdf data, or specifying a path to a pdf in the first argv slot.
* 0.0.10 - JSTOR
* 0.0.9 - AIP: better checks for false-positives; IEEE: remove stdout garbage.
* 0.0.8 - IEEE
## License
BSD.