Skip to content

Instantly share code, notes, and snippets.

@ixtgorilla
Last active August 18, 2018 03:57
Show Gist options
  • Save ixtgorilla/3aec7a651c5546481770e351be1fff08 to your computer and use it in GitHub Desktop.
Save ixtgorilla/3aec7a651c5546481770e351be1fff08 to your computer and use it in GitHub Desktop.
TwitterProfileCrawler
module Requestor
module Https
# Net::HTTP for Get Wrapper
#
# @param [String] url
# @return [Net]
def self.execute(url, attributes = {})
uri = URI.parse(url)
https = Net::HTTP.new(uri.host, uri.port)
https.use_ssl = true
request = Net::HTTP::Get.new(uri.request_uri)
attributes.each do |key, value|
request[key] = value
end
https.request(request)
end
end
end
require 'nokogiri'
class TwitterCrawler
def initialize(url)
@url = url
end
def execute
{
screen_name: screen_name,
twitter_screen_name: twitter_screen_name,
icon_image_url_cache: twitter_icon_image_url,
cover_image_url_cache: twitter_cover_image_url,
bio: twitter_bio
}
end
private
def twitter_cover_image_url
twitter_xpath
.xpath('//*[@id="page-container"]/div[1]/div/div[1]/div[1]/img')[0]
.attributes['src']
.value
end
def twitter_icon_image_url
twitter_xpath
.xpath('//*[@id="page-container"]/div[1]/div/div[1]/div[2]/div[1]/div/a/img')[0]
.attributes['src']
.value
end
def screen_name
twitter_xpath
.xpath('//*[@id="page-container"]/div[2]/div/div/div[1]/div/div/div/div[1]/h1/a')[0]
.children[0]
.text
end
def twitter_screen_name
twitter_xpath
.xpath('//*[@id="page-container"]/div[2]/div/div/div[1]/div/div/div/div[1]/h2/a/span/b')[0]
.children[0]
.text
end
def twitter_bio
bio_tags = twitter_xpath
.xpath('//*[@id="page-container"]/div[2]/div/div/div[1]/div/div/div/div[1]/p')[0]
.children
bio_tags.map do |elem|
if elem.name == 'img'
elem.attributes['alt']
else
elem.text
end
end.join('')
end
def twitter_xpath
@twitter_xpath ||= Nokogiri::HTML.parse(twitter_html)
end
def twitter_html
@twitter_html ||= Requestor::Https.execute(@url).body
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment