hutershvili / lobbyscraper

Scraper for the Austrian lobbying register


lobbysraper: Getting Data from the Austrian Lobbying Register

This repository is a python script to scrape the Austrian lobbying register. The scraper was written for the Gute Taten für gute Daten project from Open Knowledge Austria and is available under the MIT open source license.

This repository provides the code and keeps track of bugs as well as feature requests.

DOCUMENTATION

Some information about the Austrian lobbying register:

Lobbying Register

Types of lobbying organisations
A1 Lobbying-corporations or lobbyists (Lobbying-Unternehmen bzw. Lobbyisten)
A2 Areas of activity of lobbying corporations (not public) (Aufgabenbereiche der Lobbying-Unternehmen (nicht öffentlich))
B Companies or company-/(in-house-)lobbyists (Unternehmen bzw. Unternehmens-/(In-House-)Lobbyisten)
C Self-governing bodies (Selbstverwaltungskörper)
D Interest groups (Interessenverbände)

Scraper

  • Python with modules urllib2 and BeautifulSoup.

To run the python script, just enter this in the terminal when you are in the root folder of the repository. cd code python lobbyscraper.py To ease the server, you should download the html files just once and then work locally. To do this, just uncomment in the main section the lines with the FetchHtmlList() and FetchHtmlOrganisations() call and change the ts variable to the name of the directory with the downloaded html-files.

computational chain 1. Fetch the website and store the html locally - pack files after download into tar-ball and delete html-files. 2. Extract facts from html and store it in a json-file 2. Compare actual data with past one data 3. update past one to the new state

Contribution

In the spirit of free software, everyone is encouraged to help improve this project.

Here are some ways you can contribute:

  • by reporting bugs
  • by suggesting new features
  • by translating to a new language
  • by writing or editing documentation
  • by visualizing the data
  • by writing code (no pull request is too small: fix typos in the user interface, add code comments, clean up inconsistent whitespace)
  • by refactoring code
  • by closing issues
  • by reviewing pull requests
  • by enriching the data with other data sources

When you are ready, submit a pull request.

Submitting an Issue

We use the GitHub issue tracker to track bugs and features. Before submitting a bug report or feature request, check to make sure it hasn't already been submitted. When submitting a bug report, please try to provide a screenshot that demonstrates the problem.

License

This program is free software: you can redistribute it and/or modify it under the terms of the MIT License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Visit http://opensource.org/licenses/MIT to learn more about the MIT License.

STRUCTURE

Contributors julianaus skasberger hutershvili

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... Traceback (most recent call last): File "scraper.py", line 15, in <module> import pandas as pd ImportError: No module named pandas

Statistics

Total run time: less than 20 seconds

Total cpu time used: less than 5 seconds

Total disk space used: 68.8 KB

History

  • Manually ran revision f1b1b19c and failed .
    nothing changed in the database
  • Manually ran revision 723766e1 and failed .
    nothing changed in the database
  • Manually ran revision 723766e1 and failed .
    nothing changed in the database
  • Manually ran revision 0c9eb374 and failed .
    nothing changed in the database
  • Created on morph.io

Scraper code

Python

lobbyscraper / scraper.py