Skip to content

Instantly share code, notes, and snippets.

@izmailoff
Created February 4, 2016 06:38
Show Gist options
  • Save izmailoff/c36f7ff6a344ce321ae8 to your computer and use it in GitHub Desktop.
Save izmailoff/c36f7ff6a344ce321ae8 to your computer and use it in GitHub Desktop.
Generates unix commands (wget, youtube-dl) to download files based on the links found on a web page.
#!/bin/sh
exec scala "$0" "$@"
!#
import scala.io.Source
val url = "https://work.caltech.edu/lectures.html"
val html = Source fromURL(url) mkString
val href = """<a href="\s*(.+?)\s*">""".r
val links = (for (m <- href findAllMatchIn html) yield m group 1) toList
val slides = links filter (_ endsWith "pdf") toSet
val videos = links filter (_ contains "youtube.com") map { _.takeWhile (_ != '#') } toSet
slides foreach { x => println(s"wget '$x'") }
videos foreach { x => println(s"youtube-dl '$x'") }
@izmailoff
Copy link
Author

Prints something like this:

wget 'http://work.caltech.edu/slides/slides07.pdf'
wget 'http://work.caltech.edu/slides/slides09.pdf'
wget 'http://work.caltech.edu/slides/slides16.pdf'
wget 'http://work.caltech.edu/slides/slides12.pdf'
wget 'http://work.caltech.edu/slides/slides06.pdf'
wget 'http://work.caltech.edu/slides/slides02.pdf'
wget 'http://work.caltech.edu/slides/slides01.pdf'
wget 'http://work.caltech.edu/slides/slides11.pdf'
wget 'http://work.caltech.edu/slides/slides17.pdf'
wget 'http://work.caltech.edu/slides/slides13.pdf'
wget 'http://work.caltech.edu/slides/slides14.pdf'
wget 'http://work.caltech.edu/slides/slides15.pdf'
wget 'http://work.caltech.edu/slides/slides08.pdf'
wget 'http://work.caltech.edu/slides/slides18.pdf'
wget 'http://work.caltech.edu/slides/slides05.pdf'
wget 'http://work.caltech.edu/slides/slides04.pdf'
wget 'http://work.caltech.edu/slides/slides10.pdf'
wget 'http://work.caltech.edu/slides/slides03.pdf'
youtube-dl 'http://www.youtube.com/watch?v=eHsErlPJWUU&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=O8CfrnOPtLc&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=XUj5JbQihlU&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=I-VfYXzC5ro&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=MEG35RDD7RA&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=ihLwJPHkMRY&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=6FWRijsmLtE&hd=1'
youtube-dl 'http://www.youtube.com/playlist?list=PLD63A284B7615313A'
youtube-dl 'http://www.youtube.com/watch?v=zrEyxfl2-a8&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=o7zzaKd0Lkk&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=L_0efNkdGMc&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=mbyG85GZ0PI&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=EZBUDG12Nr0&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=EQWr3GGCdzw&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=FIbVs5GbBlQ&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=SEYAnnLazMU&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=Dc0sr0kdBVI&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=qSTHZvN8hzs&hd=1'
youtube-dl 'http://www.youtube.com/watch?v=Ih5Mr93E-2c&hd=1'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment