2013-02-05 10:10:14 +01:00
# pdfparanoia
2013-05-21 21:02:33 +02:00
pdfparanoia is a PDF watermark removal library for academic papers. Some
publishers include private information like institution names, personal names,
ip addresses, timestamps and other identifying information in watermarks on
each page.
2016-02-25 15:45:27 +01:00
pdfparanoia это библиотека для удаления водяных знаков из PDF файлов научных
2013-05-21 21:02:33 +02:00
статей. Некоторые издатели включают личную информацию, такую как названия
институтов, имена, IP-адреса, время и дату и другую информацию в водяные знаки
содержащиеся на каждой странице.
2013-02-05 10:10:14 +01:00
## Installing
Simple.
``` bash
sudo pip install pdfparanoia
```
or,
``` bash
sudo python setup.py install
```
2013-03-22 00:37:34 +01:00
pdfparanoia is written for python2.7+ or python 3.
You will also need to manually install "pdfminer" if you do not use pip to install pdfparanoia.
2013-07-14 12:10:15 +02:00
For python versions prior to Python 3, use "pdfminer" from the Python Package Index (http://pypi.python.org). For recent versions of Python, use pdfminer3k instead.
2013-03-22 00:37:34 +01:00
2013-02-05 10:10:14 +01:00
## Usage
``` python
import pdfparanoia
pdf = pdfparanoia.scrub(open("nmat91417.pdf", "rb"))
2013-03-22 00:37:34 +01:00
with open("output.pdf", "wb") as file_handler:
file_handler.write(pdf)
2013-02-05 10:10:14 +01:00
```
2013-02-10 08:29:58 +01:00
or from the shell,
``` bash
2013-03-22 00:37:34 +01:00
pdfparanoia --verbose input.pdf -o output.pdf
2013-02-10 08:29:58 +01:00
```
and,
``` bash
cat input.pdf | pdfparanoia > output.pdf
```
2013-02-07 00:31:19 +01:00
## Supported
* AIP
* IEEE
* JSTOR
2013-05-13 21:28:35 +02:00
* RSC
2013-02-12 06:52:59 +01:00
* SPIE (sort of)
2013-02-07 00:31:19 +01:00
2013-02-05 10:10:14 +01:00
## Changelog
2013-05-13 21:28:35 +02:00
* 0.0.13 - RSC
2013-02-12 06:52:59 +01:00
* 0.0.12 - SPIE
2013-02-09 16:43:12 +01:00
* 0.0.11 - pdfparanoia command-line interface. Use it by either piping in pdf data, or specifying a path to a pdf in the first argv slot.
2013-02-07 00:31:19 +01:00
* 0.0.10 - JSTOR
2013-02-06 00:21:58 +01:00
* 0.0.9 - AIP: better checks for false-positives; IEEE: remove stdout garbage.
2013-02-07 00:31:19 +01:00
* 0.0.8 - IEEE
2013-02-05 10:10:14 +01:00
## License
BSD.