alisonkeen / SA_Parl_sitting_dates

Dates for both SA parliaments for which Hansard has been published (Senate and House of Assembly)

Scrapes hansardpublic.parliament.sa.gov.au, www.google-analytics.com, stats.g.doubleclick.net, and 1 other domain

Search


A scraper to read Hansard dates

Scraper is built using Poltergeist / Capybara / Rspec due to heavy AJAX insertion of content that doesn't appear when scraped with Mechanize

Notes to self: When debugging/running scraper on own ubuntu system, prerequisites are [code] sudo gem install poltergeist [/code] Then go to the PhantomJS website and manually wget the PhantomJS build .tar.bz2 because the Ubuntu package is built against the wrong graphics toolkit and crashes on load.

This is a scraper that runs on Morph. To get started see the documentation

Contributors alisonkeen

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... Error removing intermediate container 1a418eb8cffe: Driver aufs failed to remove root filesystem 1a418eb8cffe64035638ea060583fd1bccfd8099eea4804474d01a9facb6ed1e: rename /var/lib/docker/aufs/mnt/53d1709d003599373a4bf50b0306f8c16c209c5446021e8a4ca8c466f818b0e2 /var/lib/docker/aufs/mnt/53d1709d003599373a4bf50b0306f8c16c209c5446021e8a4ca8c466f818b0e2-removing: device or resource busy Error removing intermediate container 1a418eb8cffe: No such container: 1a418eb8cffe64035638ea060583fd1bccfd8099eea4804474d01a9facb6ed1e  -----> Ruby app detected -----> Compiling Ruby/Rack -----> Using Ruby version: ruby-2.2.4 -----> Installing dependencies using bundler 1.11.2  Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment  Fetching gem metadata from https://rubygems.org/.........  Fetching version metadata from https://rubygems.org/...  Fetching dependency metadata from https://rubygems.org/..  Fetching https://github.com/openaustralia/scraperwiki-ruby.git  Installing mini_portile2 2.1.0  Installing mime-types-data 3.2016.0521  Installing addressable 2.4.0  Installing cliver 0.3.2  Installing rack 2.0.1  Installing diff-lcs 1.2.5  Installing httpclient 2.6.0.1  Installing websocket-extensions 0.1.2  Installing rspec-support 3.5.0  Using bundler 1.11.2  Installing sqlite3 1.3.10 with native extensions  Installing mime-types 3.1  Installing websocket-driver 0.6.4 with native extensions  Installing nokogiri 1.6.8.1 with native extensions  Installing rspec-core 3.5.2  Installing rspec-expectations 3.5.0  Installing rspec-mocks 3.5.0  Installing rack-test 0.6.3  Installing sqlite_magic 0.0.3  Using scraperwiki 3.0.1 from https://github.com/openaustralia/scraperwiki-ruby.git (at morph_defaults@fc50176)  Installing rspec 3.5.0  Installing xpath 2.0.0  Installing capybara 2.10.1  Installing poltergeist 1.11.0  Bundle complete! 5 Gemfile dependencies, 24 gems now installed.  Gems in the groups development and test were not installed.  Bundled gems are installed into ./vendor/bundle.  Bundle completed (18.07s)  Cleaning up the bundler cache.   ###### WARNING:  You have not declared a Ruby version in your Gemfile.  To set your Ruby version add this line to your Gemfile:  ruby '2.2.4'  # See https://devcenter.heroku.com/articles/ruby-versions for more information.   -----> Discovering process types  Procfile declares types -> scraper Error removing intermediate container 1a418eb8cffe: No such container: 1a418eb8cffe64035638ea060583fd1bccfd8099eea4804474d01a9facb6ed1e Injecting scraper and running... You're running an old version of PhantomJS, update to >= 2.1.1 for a better experience. waiting... all set! Found: 2016.2.9 - House of Assembly Found: 2016.2.9 - Legislative Council Found: 2016.2.10 - Legislative Council Found: 2016.2.10 - House of Assembly Found: 2016.2.11 - Legislative Council Found: 2016.2.11 - House of Assembly Found: 2016.2.23 - House of Assembly Found: 2016.2.23 - Legislative Council Found: 2016.2.24 - House of Assembly Found: 2016.2.24 - Legislative Council Found: 2016.2.25 - House of Assembly Found: 2016.2.25 - Legislative Council Found: 2016.3.8 - House of Assembly Found: 2016.3.8 - Legislative Council Found: 2016.3.9 - House of Assembly Found: 2016.3.9 - Legislative Council Found: 2016.3.10 - Legislative Council Found: 2016.3.10 - House of Assembly Found: 2016.3.22 - House of Assembly Found: 2016.3.22 - Legislative Council Found: 2016.3.23 - House of Assembly Found: 2016.3.23 - Legislative Council Found: 2016.3.24 - Legislative Council Found: 2016.3.24 - House of Assembly Found: 2016.4.12 - Legislative Council Found: 2016.4.12 - House of Assembly Found: 2016.4.13 - House of Assembly Found: 2016.4.13 - Legislative Council Found: 2016.4.14 - Legislative Council Found: 2016.4.14 - House of Assembly Found: 2016.5.17 - House of Assembly Found: 2016.5.17 - Legislative Council Found: 2016.5.18 - House of Assembly Found: 2016.5.18 - Legislative Council Found: 2016.5.19 - Legislative Council Found: 2016.5.19 - House of Assembly Found: 2016.5.24 - House of Assembly Found: 2016.5.24 - Legislative Council Found: 2016.5.25 - House of Assembly Found: 2016.5.25 - Legislative Council Found: 2016.5.26 - Legislative Council Found: 2016.5.26 - House of Assembly Found: 2016.6.7 - House of Assembly Found: 2016.6.7 - Legislative Council Found: 2016.6.8 - House of Assembly Found: 2016.6.8 - Legislative Council Found: 2016.6.9 - Legislative Council Found: 2016.6.9 - House of Assembly Found: 2016.6.21 - Legislative Council Found: 2016.6.21 - House of Assembly Found: 2016.6.22 - Legislative Council Found: 2016.6.22 - House of Assembly Found: 2016.6.23 - House of Assembly Found: 2016.6.23 - Legislative Council Found: 2016.7.5 - House of Assembly Found: 2016.7.5 - Legislative Council Found: 2016.7.6 - House of Assembly Found: 2016.7.6 - Legislative Council Found: 2016.7.7 - House of Assembly Found: 2016.7.7 - Legislative Council Found: 2016.7.26 - Legislative Council Found: 2016.7.26 - House of Assembly Found: 2016.7.27 - House of Assembly Found: 2016.7.27 - Legislative Council Found: 2016.8.4 - Legislative Council Found: 2016.8.4 - House of Assembly Found: 2016.9.20 - House of Assembly Found: 2016.9.20 - Legislative Council Found: 2016.9.21 - House of Assembly Found: 2016.9.21 - Legislative Council Found: 2016.9.22 - Legislative Council Found: 2016.9.22 - House of Assembly Found: 2016.9.27 - House of Assembly Found: 2016.9.27 - Legislative Council Found: 2016.9.28 - Legislative Council Found: 2016.9.28 - House of Assembly Found: 2016.9.29 - Legislative Council Found: 2016.9.29 - House of Assembly Found: 2016.10.18 - House of Assembly Found: 2016.10.18 - Legislative Council Found: 2016.10.19 - House of Assembly Found: 2016.10.19 - Legislative Council Found: 2016.10.20 - Legislative Council Found: 2016.10.20 - House of Assembly Found: 2016.11.1 - House of Assembly Found: 2016.11.1 - Legislative Council Found: 2016.11.2 - House of Assembly Found: 2016.11.2 - Legislative Council Found: 2016.11.3 - Legislative Council Found: 2016.11.3 - House of Assembly Found: 2016.11.15 - Legislative Council Found: 2016.11.15 - House of Assembly Found: 2016.11.16 - Legislative Council Found: 2016.11.16 - House of Assembly Found: 2016.11.17 - House of Assembly Found: 2016.11.17 - Legislative Council 96 date-ha-lc sitting days found. 0 date-ha sitting days found. 0 date-lc sitting days found.

Data

Downloaded 1 time by alisonkeen

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (21 KB) Use the API

rows 10 / 90

id date xml_file_url type_of_transcript
HANSARD-11-22043
2016.2.9
House of Assembly
HANSARD-10-17350
2016.2.9
Legislative Council
HANSARD-10-17406
2016.2.10
Legislative Council
HANSARD-11-22097
2016.2.10
House of Assembly
HANSARD-10-17453
2016.2.11
Legislative Council
HANSARD-11-22163
2016.2.11
House of Assembly
HANSARD-11-22239
2016.2.23
House of Assembly
HANSARD-10-17498
2016.2.23
Legislative Council
HANSARD-11-22303
2016.2.24
House of Assembly
HANSARD-10-17553
2016.2.24
Legislative Council

Statistics

Average successful run time: half a minute

Total run time: 7 minutes

Total cpu time used: less than a minute

Total disk space used: 94.8 KB

History

  • Manually ran revision c14f02c4 and completed successfully .
    nothing changed in the database
    40 pages scraped
  • Manually ran revision c14f02c4 and completed successfully .
    90 records added in the database
    39 pages scraped
  • Manually ran revision c14f02c4 and completed successfully .
    45 records added, 45 records removed in the database
    79 pages scraped
  • Manually ran revision 183fa21c and completed successfully .
    45 records added in the database
    36 pages scraped
  • Manually ran revision 6776986a and completed successfully .
    45 records added in the database
    40 pages scraped
  • Manually ran revision 6ccfdc35 and failed .
    nothing changed in the database
  • Manually ran revision 2422c5c3 and failed .
    nothing changed in the database
  • Manually ran revision ac4b1427 and completed successfully .
    nothing changed in the database
    26 pages scraped
  • Manually ran revision 85b2f8cb and completed successfully .
    nothing changed in the database
  • Manually ran revision ced4a7bf and completed successfully .
    nothing changed in the database
    30 pages scraped
  • Manually ran revision ab3cf8c6 and completed successfully .
    nothing changed in the database
    24 pages scraped
  • Manually ran revision c9b1e260 and failed .
    nothing changed in the database
  • Manually ran revision db986a0b and completed successfully .
    nothing changed in the database
    1 page scraped
  • Created on morph.io

Scraper code

Ruby

SA_Parl_sitting_dates / scraper.rb