pdfparanoia/README.md

65 lines
1.2 KiB
Markdown
Raw Normal View History

2013-02-05 10:10:14 +01:00
# pdfparanoia
2013-02-14 10:43:23 +01:00
pdfparanoia is a PDF watermark removal library for academic papers. Some publishers include private information like institution names, personal names, ip addresses, timestamps and other identifying information in watermarks on each page.
2013-02-05 10:10:14 +01:00
## Installing
Simple.
``` bash
sudo pip install pdfparanoia
```
or,
``` bash
sudo python setup.py install
```
pdfparanoia is written for python2.7+ or python 3.
You will also need to manually install "pdfminer" if you do not use pip to install pdfparanoia.
2013-02-05 10:10:14 +01:00
## Usage
``` python
import pdfparanoia
pdf = pdfparanoia.scrub(open("nmat91417.pdf", "rb"))
with open("output.pdf", "wb") as file_handler:
file_handler.write(pdf)
2013-02-05 10:10:14 +01:00
```
2013-02-10 08:29:58 +01:00
or from the shell,
``` bash
pdfparanoia --verbose input.pdf -o output.pdf
2013-02-10 08:29:58 +01:00
```
and,
``` bash
cat input.pdf | pdfparanoia > output.pdf
```
2013-02-07 00:31:19 +01:00
## Supported
* AIP
* IEEE
* JSTOR
* RSC
* SPIE (sort of)
2013-02-07 00:31:19 +01:00
2013-02-05 10:10:14 +01:00
## Changelog
* 0.0.13 - RSC
* 0.0.12 - SPIE
2013-02-09 16:43:12 +01:00
* 0.0.11 - pdfparanoia command-line interface. Use it by either piping in pdf data, or specifying a path to a pdf in the first argv slot.
2013-02-07 00:31:19 +01:00
* 0.0.10 - JSTOR
2013-02-06 00:21:58 +01:00
* 0.0.9 - AIP: better checks for false-positives; IEEE: remove stdout garbage.
2013-02-07 00:31:19 +01:00
* 0.0.8 - IEEE
2013-02-05 10:10:14 +01:00
## License
BSD.