Skip to content

Instantly share code, notes, and snippets.

@asadamatic
Created March 19, 2020 11:49
Show Gist options
  • Save asadamatic/de187973f78e8ba96006837ab5db5ef7 to your computer and use it in GitHub Desktop.
Save asadamatic/de187973f78e8ba96006837ab5db5ef7 to your computer and use it in GitHub Desktop.
Python program to extract all the text from a pdf file and saving it as a text file. I have used PyPDF2 library for this module.
from PyPDF2 import PdfFileReader
fileName = '' #Enter the name of pdf file here
with open(fileName + '.pdf', 'rb') as pdf:
pdfReader = PdfFileReader(pdf)
totalPages = pdfReader.numPages
for page in range(totalPages):
text = pdfReader.getPage(page).extractText()
with open(fileName + '.txt', 'w') as textFile:
textFile.write(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment