alisonkeen / SA_Parl_sitting_dates

Dates for both SA parliaments for which Hansard has been published (Senate and House of Assembly)


A scraper to read Hansard dates

Scraper is built using Poltergeist / Capybara / Rspec due to heavy AJAX insertion of content that doesn't appear when scraped with Mechanize

Notes to self: When debugging/running scraper on own ubuntu system, prerequisites are [code] sudo gem install poltergeist [/code] Then go to the PhantomJS website and manually wget the PhantomJS build .tar.bz2 because the Ubuntu package is built against the wrong graphics toolkit and crashes on load.

This is a scraper that runs on Morph. To get started see the documentation

Contributors alisonkeen

Last run completed successfully .

Console output of last run

Injecting configuration and compiling...  -----> Ruby app detected -----> Compiling Ruby/Rack -----> Using Ruby version: ruby-2.2.6 -----> Installing dependencies using bundler 1.13.7  Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment  Fetching gem metadata from https://rubygems.org/........  Fetching version metadata from https://rubygems.org/.  Fetching https://github.com/openaustralia/scraperwiki-ruby.git  Installing mini_portile2 2.1.0  Installing mime-types-data 3.2016.0521  Installing addressable 2.4.0  Installing cliver 0.3.2  Installing diff-lcs 1.2.5  Installing rack 2.0.1  Installing httpclient 2.6.0.1  Installing websocket-extensions 0.1.2  Installing rspec-support 3.5.0  Using bundler 1.13.7  Installing sqlite3 1.3.10 with native extensions  Installing mime-types 3.1  Installing websocket-driver 0.6.4 with native extensions  Installing nokogiri 1.6.8.1 with native extensions  Installing rspec-core 3.5.2  Installing rspec-expectations 3.5.0  Installing rspec-mocks 3.5.0  Installing rack-test 0.6.3  Installing sqlite_magic 0.0.3  Using scraperwiki 3.0.1 from https://github.com/openaustralia/scraperwiki-ruby.git (at morph_defaults@fc50176)  Installing rspec 3.5.0  Installing xpath 2.0.0  Installing capybara 2.10.1  Installing poltergeist 1.11.0  Bundle complete! 5 Gemfile dependencies, 24 gems now installed.  Gems in the groups development and test were not installed.  Bundled gems are installed into ./vendor/bundle.  Bundle completed (20.26s)  Cleaning up the bundler cache. -----> Detecting rake tasks   ###### WARNING:  You have not declared a Ruby version in your Gemfile.  To set your Ruby version add this line to your Gemfile:  ruby '2.2.6'  # See https://devcenter.heroku.com/articles/ruby-versions for more information.   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... You're running an old version of PhantomJS, update to >= 2.1.1 for a better experience. waiting... all set! Found: 2017.2.14 - Legislative Council Found: 2017.2.14 - House of Assembly Found: 2017.2.15 - House of Assembly Found: 2017.2.15 - Legislative Council Found: 2017.2.16 - Legislative Council Found: 2017.2.16 - House of Assembly Found: 2017.2.28 - Legislative Council Found: 2017.2.28 - House of Assembly Found: 2017.3.1 - Legislative Council Found: 2017.3.1 - House of Assembly Found: 2017.3.2 - Legislative Council Found: 2017.3.2 - House of Assembly Found: 2017.3.28 - Legislative Council Found: 2017.3.28 - House of Assembly Found: 2017.3.29 - House of Assembly Found: 2017.3.29 - Legislative Council Found: 2017.3.30 - Legislative Council Found: 2017.3.30 - House of Assembly Found: 2017.4.11 - House of Assembly Found: 2017.4.11 - Legislative Council Found: 2017.4.12 - Legislative Council Found: 2017.4.12 - House of Assembly Found: 2017.4.13 - Legislative Council Found: 2017.4.13 - House of Assembly Found: 2017.5.9 - Legislative Council Found: 2017.5.9 - House of Assembly Found: 2017.5.10 - Legislative Council Found: 2017.5.10 - House of Assembly Found: 2017.5.11 - Legislative Council Found: 2017.5.11 - House of Assembly Found: 2017.5.16 - Legislative Council Found: 2017.5.16 - House of Assembly Found: 2017.5.17 - House of Assembly Found: 2017.5.17 - Legislative Council Found: 2017.5.18 - Legislative Council Found: 2017.5.18 - House of Assembly Found: 2017.5.30 - House of Assembly Found: 2017.5.30 - Legislative Council Found: 2017.5.31 - House of Assembly Found: 2017.5.31 - Legislative Council Found: 2017.6.1 - House of Assembly Found: 2017.6.1 - Legislative Council Found: 2017.6.20 - House of Assembly Found: 2017.6.20 - Legislative Council Found: 2017.6.21 - House of Assembly Found: 2017.6.21 - Legislative Council Found: 2017.6.22 - House of Assembly Found: 2017.6.22 - Legislative Council Found: 2017.7.4 - Legislative Council Found: 2017.7.4 - House of Assembly Found: 2017.7.5 - House of Assembly Found: 2017.7.5 - Legislative Council Found: 2017.7.6 - House of Assembly Found: 2017.7.6 - Legislative Council Found: 2017.8.2 - Legislative Council Found: 2017.8.2 - House of Assembly Found: 2017.8.3 - Legislative Council Found: 2017.8.3 - House of Assembly Found: 2017.8.8 - Legislative Council Found: 2017.8.8 - House of Assembly Found: 2017.8.9 - House of Assembly Found: 2017.8.9 - Legislative Council Found: 2017.8.10 - Legislative Council Found: 2017.8.10 - House of Assembly Found: 2017.9.26 - Legislative Council Found: 2017.9.26 - House of Assembly Found: 2017.9.27 - House of Assembly Found: 2017.9.27 - Legislative Council Found: 2017.9.28 - House of Assembly Found: 2017.9.28 - Legislative Council Found: 2017.10.17 - Legislative Council Found: 2017.10.17 - House of Assembly Found: 2017.10.18 - Legislative Council Found: 2017.10.18 - House of Assembly 74 date-ha-lc sitting days found. 0 date-ha sitting days found. 0 date-lc sitting days found.

Data

Downloaded 1 time by alisonkeen

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (35 KB) Use the API

rows 10 / 164

id date xml_file_url type_of_transcript
HANSARD-11-22043
2016.2.9
House of Assembly
HANSARD-10-17350
2016.2.9
Legislative Council
HANSARD-10-17406
2016.2.10
Legislative Council
HANSARD-11-22097
2016.2.10
House of Assembly
HANSARD-10-17453
2016.2.11
Legislative Council
HANSARD-11-22163
2016.2.11
House of Assembly
HANSARD-11-22239
2016.2.23
House of Assembly
HANSARD-10-17498
2016.2.23
Legislative Council
HANSARD-11-22303
2016.2.24
House of Assembly
HANSARD-10-17553
2016.2.24
Legislative Council

Statistics

Average successful run time: half a minute

Total run time: 8 minutes

Total cpu time used: less than a minute

Total disk space used: 79.1 KB

History

  • Manually ran revision c14f02c4 and completed successfully .
    74 records added in the database
  • Manually ran revision c14f02c4 and completed successfully .
    nothing changed in the database
    40 pages scraped
  • Manually ran revision c14f02c4 and completed successfully .
    90 records added in the database
    39 pages scraped
  • Manually ran revision c14f02c4 and completed successfully .
    45 records added, 45 records removed in the database
    79 pages scraped
  • Manually ran revision 183fa21c and completed successfully .
    45 records added in the database
    36 pages scraped
  • Manually ran revision 6776986a and completed successfully .
    45 records added in the database
    40 pages scraped
  • Manually ran revision 6ccfdc35 and failed .
    nothing changed in the database
  • Manually ran revision 2422c5c3 and failed .
    nothing changed in the database
  • Manually ran revision ac4b1427 and completed successfully .
    nothing changed in the database
    26 pages scraped
  • Manually ran revision 85b2f8cb and completed successfully .
    nothing changed in the database
  • Manually ran revision ced4a7bf and completed successfully .
    nothing changed in the database
    30 pages scraped
  • Manually ran revision ab3cf8c6 and completed successfully .
    nothing changed in the database
    24 pages scraped
  • Manually ran revision c9b1e260 and failed .
    nothing changed in the database
  • Manually ran revision db986a0b and completed successfully .
    nothing changed in the database
    1 page scraped
  • Created on morph.io

Scraper code

Ruby

SA_Parl_sitting_dates / scraper.rb