Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save januz/01e46431b94dee83c977 to your computer and use it in GitHub Desktop.
Save januz/01e46431b94dee83c977 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# -*- encoding:utf-8 -*-
from PyPDF2 import PdfFileReader
import re
import sys
pdf_file = sys.argv[1]
doi_re = re.compile("10.(\d)+/([^(\s\>\"\<)])+")
input = PdfFileReader(file(pdf_file, "rb"))
text = input.getPage(0).extractText()
m = doi_re.search(text)
print(m.group(0))
@lqcata
Copy link

lqcata commented Apr 16, 2019

Hi, thanks for sharing the codes.
I am very sorry to say that this method does not work for the pdf files downloaded from Acc. Chem. Res. journal. (https://pubs.acs.org/journal/achre4 )

see
qli@lhz:/Desktop/pdf_python$ python test.py acs.accounts.8b00502.pdf
10.1021/acs.accounts.8b00502Acc.Chem.Res.
qli@lhz:
/Desktop/pdf_python$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment