alisonkeen / SA_Parl_sitting_dates

Dates for both SA parliaments for which Hansard has been published (Senate and House of Assembly)

Scrapes hansardpublic.parliament.sa.gov.au, www.google-analytics.com, www.parliament.sa.gov.au, and 1 other domain

Search


A scraper to read Hansard dates

Scraper is built using Poltergeist / Capybara / Rspec due to heavy AJAX insertion of content that doesn't appear when scraped with Mechanize

Notes to self: When debugging/running scraper on own ubuntu system, prerequisites are [code] sudo gem install poltergeist [/code] Then go to the PhantomJS website and manually wget the PhantomJS build .tar.bz2 because the Ubuntu package is built against the wrong graphics toolkit and crashes on load.

This is a scraper that runs on Morph. To get started see the documentation

Contributors alisonkeen

Last run completed successfully .

Console output of last run

Injecting configuration and compiling...  -----> Ruby app detected -----> Compiling Ruby/Rack -----> Using Ruby version: ruby-2.3.4 -----> Installing dependencies using bundler 1.15.2  Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment  Fetching gem metadata from https://rubygems.org/........  Fetching version metadata from https://rubygems.org/.  Fetching https://github.com/openaustralia/scraperwiki-ruby.git  Fetching addressable 2.4.0  Using bundler 1.15.2  Fetching mime-types-data 3.2016.0521  Fetching mini_portile2 2.1.0  Installing mini_portile2 2.1.0  Fetching rack 2.0.1  Installing addressable 2.4.0  Installing mime-types-data 3.2016.0521  Installing rack 2.0.1  Fetching cliver 0.3.2  Fetching diff-lcs 1.2.5  Installing cliver 0.3.2  Installing diff-lcs 1.2.5  Fetching httpclient 2.6.0.1  Installing httpclient 2.6.0.1  Fetching websocket-extensions 0.1.2  Installing websocket-extensions 0.1.2  Fetching rspec-support 3.5.0  Fetching sqlite3 1.3.10  Installing rspec-support 3.5.0  Installing sqlite3 1.3.10 with native extensions  Fetching nokogiri 1.6.8.1  Fetching mime-types 3.1  Installing mime-types 3.1  Fetching websocket-driver 0.6.4  Installing websocket-driver 0.6.4 with native extensions  Installing nokogiri 1.6.8.1 with native extensions  Fetching rack-test 0.6.3  Installing rack-test 0.6.3  Fetching rspec-core 3.5.2  Installing rspec-core 3.5.2  Fetching rspec-expectations 3.5.0  Installing rspec-expectations 3.5.0  Fetching rspec-mocks 3.5.0  Installing rspec-mocks 3.5.0  Fetching sqlite_magic 0.0.3  Installing sqlite_magic 0.0.3  Using scraperwiki 3.0.1 from https://github.com/openaustralia/scraperwiki-ruby.git (at morph_defaults@fc50176)  Fetching rspec 3.5.0  Installing rspec 3.5.0  Fetching xpath 2.0.0  Installing xpath 2.0.0  Fetching capybara 2.10.1  Installing capybara 2.10.1  Fetching poltergeist 1.11.0  Installing poltergeist 1.11.0  Bundle complete! 5 Gemfile dependencies, 24 gems now installed.  Gems in the groups development and test were not installed.  Bundled gems are installed into ./vendor/bundle.  Bundle completed (19.87s)  Cleaning up the bundler cache. -----> Detecting rake tasks   ###### WARNING:  You have not declared a Ruby version in your Gemfile.  To set your Ruby version add this line to your Gemfile:  ruby '2.3.4'  # See https://devcenter.heroku.com/articles/ruby-versions for more information.   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... You're running an old version of PhantomJS, update to >= 2.1.1 for a better experience. waiting... all set! 0 date-ha-lc sitting days found. 0 date-ha sitting days found. 0 date-lc sitting days found.

Data

Downloaded 1 time by alisonkeen

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (35 KB) Use the API

rows 10 / 164

id date xml_file_url type_of_transcript
HANSARD-11-22043
2016.2.9
House of Assembly
HANSARD-10-17350
2016.2.9
Legislative Council
HANSARD-10-17406
2016.2.10
Legislative Council
HANSARD-11-22097
2016.2.10
House of Assembly
HANSARD-10-17453
2016.2.11
Legislative Council
HANSARD-11-22163
2016.2.11
House of Assembly
HANSARD-11-22239
2016.2.23
House of Assembly
HANSARD-10-17498
2016.2.23
Legislative Council
HANSARD-11-22303
2016.2.24
House of Assembly
HANSARD-10-17553
2016.2.24
Legislative Council

Statistics

Average successful run time: half a minute

Total run time: 9 minutes

Total cpu time used: less than a minute

Total disk space used: 79.2 KB

History

  • Manually ran revision c14f02c4 and completed successfully .
    nothing changed in the database
    38 pages scraped
  • Manually ran revision c14f02c4 and completed successfully .
    74 records added in the database
  • Manually ran revision c14f02c4 and completed successfully .
    nothing changed in the database
    40 pages scraped
  • Manually ran revision c14f02c4 and completed successfully .
    90 records added in the database
    39 pages scraped
  • Manually ran revision c14f02c4 and completed successfully .
    45 records added, 45 records removed in the database
    79 pages scraped
  • ...
  • Created on morph.io

Show complete history

Scraper code

Ruby

SA_Parl_sitting_dates / scraper.rb