Skip to content

Instantly share code, notes, and snippets.

@mhbeals
Last active April 18, 2018 00:55
Show Gist options
  • Save mhbeals/7d64a32a6e1ce7ebf33b9e26011b8f3f to your computer and use it in GitHub Desktop.
Save mhbeals/7d64a32a6e1ce7ebf33b9e26011b8f3f to your computer and use it in GitHub Desktop.
A script to download all (as of April 2018) newspaper articles from Papers Past (NZ)
import requests
number = 1
# Note, this will take a long time!
while number < 1318493:
# Make sure to change your API key at the end of the URL
urltext = "http://api.digitalnz.org/v3/records.xml?api_key=################&and[collection][]=Papers+Past&sort=date&text=+&and[category][]=Newspapers&direction=asc&page=" + str(number)
response = requests.get(urltext)
newtext = response.text
data = newtext.encode('ascii', 'ignore').decode('ascii')
with open('ppnewspapers\\' + str(number) + '.xml', 'w') as f:
f.write(data)
print(str((number*20)-19) + "-" + str(number*20) + " out of " + str(26369857) + " collected\n")
number = number+1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment