alisonkeen / SA_Parl_sitting_dates

Dates for both SA parliaments for which Hansard has been published (Senate and House of Assembly)



The beginnings of a Mechanize scraper to read Hansard dates

Source page may need to change, as the official page is generated using javascript.

This is a scraper that runs on Morph. To get started see the documentation

Contributors alisonkeen

Last run completed successfully .

Console output of last run

Injecting configuration and compiling...  -----> Ruby app detected -----> Compiling Ruby -----> Using Ruby version: ruby-2.0.0 -----> Installing dependencies using bundler 1.11.2  Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment  Fetching gem metadata from  Fetching version metadata from  Fetching  Rubygems is not threadsafe, so your gems will be installed one at a time. Upgrade to Rubygems 2.1.0 or higher to enable parallel gem installation.  Installing unf_ext with native extensions  Installing httpclient  Installing mime-types 2.5  Installing net-http-digest_auth 1.4  Installing net-http-persistent 2.9.4  Installing mini_portile 0.6.2  Installing ntlm-http 0.1.1  Installing webrobots 0.1.1  Installing sqlite3 1.3.10 with native extensions  Using bundler 1.11.2  Installing unf 0.1.4  Installing nokogiri with native extensions  Installing sqlite_magic 0.0.3  Installing domain_name 0.5.24  Using scraperwiki 3.0.1 from (at morph_defaults@fc50176)  Installing http-cookie 1.0.2  Installing mechanize 2.7.3  Bundle complete! 2 Gemfile dependencies, 17 gems now installed.  Gems in the groups development and test were not installed.  Bundled gems are installed into ./vendor/bundle.  Bundle completed (24.33s)  Cleaning up the bundler cache.   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... <div class="scheduler"> <div id="hansard-scheduler-decade" class="hansard-scheduler-year" style="display: inline-block;"> </div> <div class="yearWrapper"> <div class="month0"></div> <div class="month1"></div> <div class="month2"></div> <div class="month3"></div> <div class="month4"></div> <div class="month5"></div> <div class="month6"></div> <div class="month7"></div> <div class="month8"></div> <div class="month9"></div> <div class="month10"></div> <div class="month11"></div> </div> </div> The problem here is that each month in the calendar has the dates populated dynamically using JavaScript on load. I don't know how to scrape dynamically loaded pages, yet...


Average successful run time: half a minute

Total run time: half a minute

Total cpu time used: less than 5 seconds

Total disk space used: 23.4 KB


  • Manually ran revision db986a0b and completed successfully .
    nothing changed in the database
    1 page scraped
  • Created on

Scraper code


SA_Parl_sitting_dates / scraper.rb