Skip to content

Instantly share code, notes, and snippets.

@YanhaoYang
Created May 27, 2019 14:16
Show Gist options
  • Save YanhaoYang/c821b4a88ac3c6a176a2157bc35df632 to your computer and use it in GitHub Desktop.
Save YanhaoYang/c821b4a88ac3c6a176a2157bc35df632 to your computer and use it in GitHub Desktop.
Create a crawler with Capybara
require 'capybara'
require 'capybara/dsl'
require 'selenium-webdriver'
# java -jar selenium-server-standalone-3.14.0.jar
#
if ENV["SELENIUM_REMOTE_HOST"]
Capybara.register_driver :selenium_chrome_remote do |app|
args = [
"--disable-infobars",
"--window-size=1024,768",
]
prefs = {download: {default_directory: "~/Downloads", prompt_for_download: false}}
caps = Selenium::WebDriver::Remote::Capabilities.chrome("chromeOptions" => { "args" => args, "prefs" => prefs })
Capybara::Selenium::Driver.new(
app,
browser: :remote,
url: "http://#{ENV["SELENIUM_REMOTE_HOST"]}:4444/wd/hub",
desired_capabilities: caps
)
end
puts "Using remote Chrome from #{ENV["SELENIUM_REMOTE_HOST"]}..."
else
Capybara.register_driver :selenium_chrome_remote do |app|
Capybara::Selenium::Driver.load_selenium
browser_options = ::Selenium::WebDriver::Chrome::Options.new
browser_options.args << "--headless"
browser_options.args << "--no-sandbox"
browser_options.args << "--window-size=1024,768"
browser_options.args << "--disable-gpu" if Gem.win_platform?
client = Selenium::WebDriver::Remote::Http::Default.new
client.read_timeout = 120
options = {
browser: :chrome,
http_client: client,
options: browser_options
}
Capybara::Selenium::Driver.new(app, options)
end
end
Capybara.run_server = false
Capybara.current_driver = :selenium_chrome_veh
Capybara.app_host = "https://google.com"
class Crawler
include Capybara::DSL
def query(params)
visit("/")
return page.html
end
end
crawler = Crawler.new
item = crawler.query(item)
# Notes:
# Capybara.current_session.driver.browser.manage.all_cookies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment