jamezpolley / example_ruby_chrome_headless_scraper

Example scraper showing how to use Chrome headless from a ruby scraper


This is a simple scraper showing you how to use Chrome headless with Ruby. Here's what it does:

  1. Visits the morph.io home page
  2. Because the headless browser is a small window it now has to click the button to open the navigation menu so we can see the search box
  3. It enters a search for "planningalerts" into the search box and clicks the submit button
  4. After waiting for the results to appear (see the important gotcha in the code comments!) it outputs the full names of all the scrapers on the search results page

Any questions? Hit up the help forum.

Contributors mlandauer

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling...  -----> Ruby app detected -----> Compiling Ruby/Rack -----> Using Ruby version: ruby-2.5.0 -----> Installing dependencies using bundler version 1.15.2  Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment  Warning: the running version of Bundler (1.15.2) is older than the version that created the lockfile (1.16.1). We suggest you upgrade to the latest version of Bundler by running `gem install bundler`.  Fetching gem metadata from https://rubygems.org/........  Fetching version metadata from https://rubygems.org/.  Using bundler 1.15.2  Fetching public_suffix 3.0.2  Fetching mini_mime 1.0.0  Fetching mini_portile2 2.3.0  Installing mini_portile2 2.3.0  Installing mini_mime 1.0.0  Installing public_suffix 3.0.2  Fetching rack 2.0.4  Fetching ffi 1.9.23  Fetching rubyzip 1.2.1  Installing rubyzip 1.2.1  Installing ffi 1.9.23 with native extensions  Installing rack 2.0.4  Fetching nokogiri 1.8.2  Fetching addressable 2.5.2  Installing addressable 2.5.2  Fetching rack-test 1.0.0  Installing rack-test 1.0.0  Installing nokogiri 1.8.2 with native extensions  Fetching childprocess 0.9.0  Installing childprocess 0.9.0  Fetching selenium-webdriver 3.11.0  Installing selenium-webdriver 3.11.0  Fetching xpath 3.0.0  Installing xpath 3.0.0  Fetching capybara 2.18.0  Installing capybara 2.18.0  Bundle complete! 2 Gemfile dependencies, 14 gems now installed.  Gems in the groups development and test were not installed.  Bundled gems are installed into ./vendor/bundle.  Bundle completed (25.01s)  Cleaning up the bundler cache.  Warning: the running version of Bundler (1.15.2) is older than the version that created the lockfile (1.16.1). We suggest you upgrade to the latest version of Bundler by running `gem install bundler`. -----> Detecting rake tasks   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/protocol.rb:181:in `rbuf_fill': Net::ReadTimeout (Net::ReadTimeout) from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/protocol.rb:157:in `readuntil' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/protocol.rb:167:in `readline' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http/response.rb:40:in `read_status_line' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http/response.rb:29:in `read_new' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http.rb:1494:in `block in transport_request' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http.rb:1491:in `catch' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http.rb:1491:in `transport_request' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http.rb:1464:in `request' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http.rb:1457:in `block in request' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http.rb:910:in `start' from /app/vendor/ruby-2.5.0/lib/ruby/2.5.0/net/http.rb:1455:in `request' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/http/default.rb:121:in `response_for' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/http/default.rb:76:in `request' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/http/common.rb:59:in `call' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/bridge.rb:164:in `execute' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/bridge.rb:97:in `create_session' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/remote/bridge.rb:53:in `handshake' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/chrome/driver.rb:47:in `initialize' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/common/driver.rb:44:in `new' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver/common/driver.rb:44:in `for' from /app/vendor/bundle/ruby/2.5.0/gems/selenium-webdriver-3.11.0/lib/selenium/webdriver.rb:85:in `for' from /app/vendor/bundle/ruby/2.5.0/gems/capybara-2.18.0/lib/capybara/selenium/driver.rb:23:in `browser' from /app/vendor/bundle/ruby/2.5.0/gems/capybara-2.18.0/lib/capybara/selenium/driver.rb:49:in `visit' from /app/vendor/bundle/ruby/2.5.0/gems/capybara-2.18.0/lib/capybara/session.rb:274:in `visit' from scraper.rb:8:in `<main>'

Statistics

Total run time: 2 minutes

Total cpu time used: less than 5 seconds

Total disk space used: 26.6 KB

History

  • Manually ran revision f65c09f0 and failed .
  • Created on morph.io