Skip to content

Instantly share code, notes, and snippets.

@wecacuee
Last active August 26, 2024 07:42
Show Gist options
  • Save wecacuee/f1e92d421312b7c7c1907667f4f3a318 to your computer and use it in GitHub Desktop.
Save wecacuee/f1e92d421312b7c7c1907667f4f3a318 to your computer and use it in GitHub Desktop.
Anonymize pdf annotations and metadata
#!/usr/bin/env python
try:
import fitz
except ImportError as e:
import subprocess
print('Trying to install pip install PyMuPDF')
subprocess.call("pip install PyMuPDF".split())
print('Try pip install PyMuPDF')
raise
import sys
if len(sys.argv) < 2:
raise ValueError("Please provide a pdf to anonymize")
if len(sys.argv) < 3:
outfilename = filename.replace('.pdf', '.anon.pdf')
else:
outfilename = sys.argv[2]
filename = sys.argv[1]
doc = fitz.open(filename)
metadata = doc.metadata
for k, v in metadata.items():
if k not in ['format']: # retain some metadata
metadata[k] = ''
doc.set_metadata(metadata)
for page in doc:
for annot in page.annots():
info = annot.info
info['title'] = ''
annot.set_info(info)
annot.update()
doc.save(outfilename)
@jellepoland
Copy link

On Linux (Fedora) script is not working as expected.

  • both the packages frontend and fritz had to be installed manually.
  • the code does remove the names from annotations made inside a document viewer pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment