otherchirps / nsw_gov_docs

morph.io scraper for tabled nsw parliament docs


This is a scraper that runs on Morph. To get started see the documentation

This scraper

Aims to collect the list of tabled documents listed on the NSW Parliament site.

Why?

Hitchiker’s Guide to the Galaxy says it best:

“But Mr Dent, the plans have been available in the local planning office for the last nine months.”

“Oh yes, well as soon as I heard I went straight round to see them, yesterday afternoon. You hadn’t exactly gone out of your way to call attention to them, had you? I mean, like actually telling anybody or anything.”

“But the plans were on display ...”

“On display? I eventually had to go down to the cellar to find them.”

“That’s the display department.”

“With a flashlight.”

“Ah, well the lights had probably gone.”

“So had the stairs.”

“But look, you found the notice didn’t you?”

“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’.”

Contributors otherchirps

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... 2016-06-20 01:25:15+0000 [legislative_assembly_tabled_docs] INFO: Closing spider (finished) 2016-06-20 01:25:15+0000 [legislative_assembly_tabled_docs] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 502, 'downloader/request_count': 2, 'downloader/request_method_count/GET': 2, 'downloader/response_bytes': 1244, 'downloader/response_count': 2, 'downloader/response_status_count/302': 1, 'downloader/response_status_count/404': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2016, 6, 20, 1, 25, 15, 179038), 'response_received_count': 1, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'start_time': datetime.datetime(2016, 6, 20, 1, 25, 12, 966797)} 2016-06-20 01:25:15+0000 [legislative_assembly_tabled_docs] INFO: Spider closed (finished)

Data

Downloaded 73 times by otherchirps MikeRalphson rustyb

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (14 MB) Use the API

rows 10 / 38890

url laid_by type paper_id date_tabled session_id title
Clerk
Annual Report
408
2015-05-28
561
Report of the University of Wollongong for 2014 (Volumes One and Two)
Clerk
Annual Report
407
2015-05-28
561
Report of the University of Western Sydney for 2014 (Volumes One and Two)
Clerk
Annual Report
406
2015-05-28
561
Report of the University of Technology, Sydney for 2014 (Volumes One and Two)
Clerk
Annual Report
405
2015-05-28
561
Report of the University of Sydney for 2014 (including the Financial Statements for 2014 of Bandwidth Foundry International Pty Limited; SydneyLearning Pty Limited; Sydney Talent Pty Limited; The Warren Centre for Advanced Engineering Limited; Wayahead Pty Limited; and the Financial Statement of Westmead IVF Pty Limited for the six month period ended 31 December 2014)
Clerk
Annual Report
404
2015-05-28
561
Report of the University of Newcastle for 2014 (Volumes One and Two)
Clerk
Annual Report
403
2015-05-28
561
Report of the University of New South Wales for 2014 (Volumes One and Two)
Clerk
Annual Report
400
2015-05-28
561
Report of Macquarie University for 2014 (Volumes One and Two)
Grant
Annual Report
549
2015-06-24
561
Assumed Identities reports for the year ended 30 June 2014 of – Australian Crime Commission; Australian Customs and Border Protection Services; Australian Federal Police; Australian Taxation Office; Independent Commission Against Corruption; and the Police Integrity Commission.
Clerk
Statutory Report
944
2015-10-08
561
Report of the section 430 investigation into Strathfield Municipal Council, dated September 2015 (including appendices 1 to 46)
Speaker
Statutory Report
943
2015-10-13
561
Register of Disclosures by Members of the Legislative Assembly as at 30 June 2015 (Volumes One and Two)

Statistics

Average successful run time: 3 minutes

Total run time: about 17 hours

Total cpu time used: about 7 hours

Total disk space used: 14.1 MB

History

  • Auto ran revision 97c8157a and completed successfully .
    nothing changed in the database
  • Auto ran revision 97c8157a and completed successfully .
    nothing changed in the database
  • Auto ran revision 97c8157a and completed successfully .
    nothing changed in the database
  • Auto ran revision 97c8157a and completed successfully .
    nothing changed in the database
  • Auto ran revision 97c8157a and completed successfully .
    nothing changed in the database
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

nsw_gov_docs / scraper.py