Skip to content

Instantly share code, notes, and snippets.

@kirushik
Created April 25, 2018 21:13
Show Gist options
  • Save kirushik/11025fc8c9e8b43a5c54af1dd34885b0 to your computer and use it in GitHub Desktop.
Save kirushik/11025fc8c9e8b43a5c54af1dd34885b0 to your computer and use it in GitHub Desktop.
Fetches all ST:TNG episode summaries from Wikipedia
#!/usr/bin/env ruby
require 'rubygems'
require 'mechanize'
require 'uri'
a = Mechanize.new
root = URI.parse('https://en.wikipedia.org/wiki/List_of_Star_Trek:_The_Next_Generation_episodes')
urls = []
a.get(root) do |page|
page.search('.wikiepisodetable').each do |table|
table.search('.summary a').each do |cell|
urls << root.merge(cell['href'])
end
end
end
File.open('episodes.txt', 'w') do |file|
urls.lazy.map{|url| a.get(url)}.each do |page|
page.search('.mw-parser-output').first
.search("//span[@id='Plot']/parent::h2/following-sibling::*")
.take_while{|n| n.name=='p'}.each do |par|
file.puts par.text
end
end.to_a
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment