You can do this by using a headless browser. On morph.io there are two different headless browsers pre-installed ready for you to use, Google Chrome and PhantomJS. We recommend using Google Chrome as PhantomJS is now deprecated. However, you will notice the documentation below for using Google Chrome from different scraping languages is pretty non-existent. If you would like to contribute to the documentation that would be amazing!
The Google Chrome binary is installed at
/usr/bin/google-chrome on every morph.io container
as part of the build process.
You can use Google Chrome directly by running it with
google-chrome --headless --disable-gpu or
you can control it via WebDriver with ChromeDriver
which is installed on morph.io at
gem 'capybara' gem 'selenium-webdriver'
Then in your scraper start a Capybara session using
require "capybara" require "selenium-webdriver" capybara = Capybara::Session.new(:selenium_chrome_headless) # Start scraping capybara.visit("https://morph.io/") puts capybara.find("#banner h2").text
The PhantomJS binary,
/usr/bin/phantomjs, is installed on morph.io as part of the
phantomjs Ubuntu package, which is installed in every scraper container as part of the build process.
On your own machine, you’ll need to download and install the binary yourself before you can run a scraper using PhantomJS.
To install it, add
poltergeist to your scraper Gemfile:
Then in your scraper start a Capybara session using Poltergeist:
require "capybara/poltergeist" capybara = Capybara::Session.new(:poltergeist) # Start scraping capybara.visit("https://morph.io/") puts capybara.find("#banner h2").text
To install it, add
splinter to your requirements.txt:
Then in your scraper,
from splinter import Browser with Browser("phantomjs") as browser: # Optional, but make sure large enough that responsive pages don't # hide elements on you... browser.driver.set_window_size(1280, 1024) # Open the page you want... browser.visit("https://morph.io") # submit the search form... browser.fill("q", "parliament") button = browser.find_by_css("button[type='submit']") button.click() # Scrape the data you like... links = browser.find_by_css(".search-results .list-group-item") for link in links: print link['href']
Sometimes there's nothing like seeing a real-life example. If you have a scraper you would like to add to this list, please let us know.
morph.io, faye.morph.io, www.gravatar.com, and 6 others