Skip to content

Instantly share code, notes, and snippets.

@rbf
Created August 17, 2017 04:58
Show Gist options
  • Save rbf/a745495ad27417e398c064d75fb50888 to your computer and use it in GitHub Desktop.
Save rbf/a745495ad27417e398c064d75fb50888 to your computer and use it in GitHub Desktop.
Batch download country information cards from the Spanish Government website.
#!/bin/bash
# SOURCE: http://www.exteriores.gob.es/Portal/es/SalaDePrensa/Paginas/FichasPais.aspx
# Fail fast
set -eox pipefail
# Download PDFs
curl -sSL 'http://www.exteriores.gob.es/Portal/es/SalaDePrensa/Paginas/FichasPais.aspx' | tr '"' \\n | egrep http | egrep .pdf | sort -u | xargs -n1 -P4 curl -sLO --compressed
# Only print urls and last update date
for pdf in $(curl -sSL 'http://www.exteriores.gob.es/Portal/es/SalaDePrensa/Paginas/FichasPais.aspx' \
| egrep -v marcadoresJSON \
| tr '"><' \\n \
| sed 's:</span>::' \
| egrep 'pais.pdf|PAIS.pdf|\d\d/\d\d/20\d\d' \
| tr \\n ' ' \
| tr ':' \\n \
| egrep -o 'www.*? [0-9][0-9]/[0-9][0-9]/20[0-9][0-9]' \
| tr ' ' '-')
do
echo $pdf | sed 's:\-: Última modificación => :'
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment