Skip to content

Instantly share code, notes, and snippets.

@JoeThunyathep
Last active August 2, 2020 04:29
Show Gist options
  • Save JoeThunyathep/f050b94850f32aaf009d15440c87f5db to your computer and use it in GitHub Desktop.
Save JoeThunyathep/f050b94850f32aaf009d15440c87f5db to your computer and use it in GitHub Desktop.
Python Script to Download Springer Textbooks
import requests, wget
import pandas as pd
df = pd.read_excel("Free+English+textbooks.xlsx")
for index, row in df.iterrows():
# loop through the excel list
file_name = f"{row.loc['Book Title']}_{row.loc['Edition']}".replace('/','-').replace(':','-')
url = f"{row.loc['OpenURL']}"
r = requests.get(url)
download_url = f"{r.url.replace('book','content/pdf')}.pdf"
wget.download(download_url, f"./download/{file_name}.pdf")
print(f"downloading {file_name}.pdf Complete ....")
@Lech69
Copy link

Lech69 commented May 3, 2020 via email

@HAKO411
Copy link

HAKO411 commented May 7, 2020

Hi, thank you for the explanation for your code. But I want to ask: How could we have "Free+ English+ textbooks.xlsx" ?

@Lech69
Copy link

Lech69 commented May 7, 2020 via email

@HAKO411
Copy link

HAKO411 commented May 7, 2020

I found this:
"Check this complete list of books from Springer: here."
But when I click on the link behind "here". There was nothing that appeared as expected. The only thing I see is this messages:
"{"projectVersion":"2.245.0-54832206c2d4e90345d71a4427e7542e623e43bf-2020-04-23_08:48:48.0010-local-1","requestUri":"https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v4","message":"com.springer.cms.service.ContentNotFoundException: No Content found for Version: 4 with Content-Id: coremedia:///cap/content/17858272","responseCode":404}"

@Lech69
Copy link

Lech69 commented May 7, 2020 via email

@HAKO411
Copy link

HAKO411 commented May 7, 2020

I tried with safari and chrome but receiving the same result. I hope you can check it again when you have time.

@JoeThunyathep
Copy link
Author

@Lech69
Copy link

Lech69 commented May 7, 2020 via email

@Lech69
Copy link

Lech69 commented May 7, 2020 via email

@pristanna
Copy link

pristanna commented May 19, 2020

Hi Joe, thanks for the code. I tried it and it just downloaded all the titles as 13kb pdfs that won’t open. If I download the file manually, it works. If I download it using wget in bash, it also works. Is there some explanation why it is not working with python wget.download? Thanks a lot in advance.
The small "PDF" file is actually this text:

{Skip to main content}
This service is more advanced with JavaScript available, learn more at {http://activatejavascript.org}
{[SpringerLink] }
Search SpringerLink
{Search }

  • {Home }
  • {Log in }
    You're almost there...
    Over 10 million scientific documents at your fingertips
  • {Home}
  • {Impressum}
  • {Legal information}
  • {Privacy statement}
  • {How we use cookies}
  • {Cookie settings}
  • {Accessibility}
  • {Contact us}
    {Springer Nature }
    © 2020 Springer Nature Switzerland AG. Part of {Springer Nature}.
    Not logged in Not affiliated

@0xOneBeing
Copy link

0xOneBeing commented May 21, 2020

This is a great project.
But after running it, I get this error:

image

Here is the code setup in my Sublime Text 3 editor:

image

I have initially installed all necessary packages using pip install ... command. Please, what could be wrong?

UPDATE
I found out that there is a /break character in the some cells in the Edition column of Free+English+textbooks.xlsx.

The error now is that its downloading all files in <100kb PDFs (which is unusual).

image

Please, what could be wrong this time?

@beardedherring
Copy link

beardedherring commented Aug 2, 2020

Please, what could be wrong this time?

reCapthca unfortunately :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment