This is a scraper that runs on Morph. To get started see the documentation

Contributors andylolz chrismytton

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 2 is python-2.7.14 (you are using python-2.7.9, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-2.7.14).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-2.7.9 -----> Installing pip -----> Installing requirements with pip  DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support  Obtaining scraperwiki from git+http://github.com/openaustralia/scraperwiki-python.git@morph_defaults#egg=scraperwiki (from -r /tmp/build/requirements.txt (line 6))  Cloning http://github.com/openaustralia/scraperwiki-python.git (to revision morph_defaults) to /app/.heroku/src/scraperwiki  Running command git clone -q http://github.com/openaustralia/scraperwiki-python.git /app/.heroku/src/scraperwiki  Running command git checkout -b morph_defaults --track origin/morph_defaults  Switched to a new branch 'morph_defaults'  Branch morph_defaults set up to track remote branch morph_defaults from origin.  Collecting beautifulsoup4==4.4.0  Downloading https://files.pythonhosted.org/packages/9d/c8/cd70aabb46af8f30ed83c15287c3d8b1455ba7ee923b03870ee0cdb6ec4f/beautifulsoup4-4.4.0-py2-none-any.whl (81kB)  Collecting dumptruck>=0.1.2  Downloading https://files.pythonhosted.org/packages/15/27/3330a343de80d6849545b6c7723f8c9a08b4b104de964ac366e7e6b318df/dumptruck-0.1.6.tar.gz  Collecting requests  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)  Collecting certifi>=2017.4.17  Downloading https://files.pythonhosted.org/packages/b9/63/df50cac98ea0d5b006c55a399c3bf1db9da7b5a24de7890bc9cfd5dd9e99/certifi-2019.11.28-py2.py3-none-any.whl (156kB)  Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1  Downloading https://files.pythonhosted.org/packages/b4/40/a9837291310ee1ccc242ceb6ebfd9eb21539649f193a7c8c86ba15b98539/urllib3-1.25.7-py2.py3-none-any.whl (125kB)  Collecting idna<2.9,>=2.5  Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)  Collecting chardet<3.1.0,>=3.0.2  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)  Building wheels for collected packages: dumptruck  Building wheel for dumptruck (setup.py): started  Building wheel for dumptruck (setup.py): finished with status 'done'  Created wheel for dumptruck: filename=dumptruck-0.1.6-cp27-none-any.whl size=11845 sha256=8e2445990f6693884b31bd5225c77fb62cce6a714d669419e8744d2ff247762b  Stored in directory: /tmp/pip-ephem-wheel-cache-eF9qSy/wheels/57/df/83/32654ae89119876c7a7db66829bbdb646caa151589dbaf226e  Successfully built dumptruck  Installing collected packages: dumptruck, certifi, urllib3, idna, chardet, requests, scraperwiki, beautifulsoup4  Running setup.py develop for scraperwiki  Successfully installed beautifulsoup4-4.4.0 certifi-2019.11.28 chardet-3.0.4 dumptruck-0.1.6 idna-2.8 requests-2.22.0 scraperwiki urllib3-1.25.7 DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support    -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... Traceback (most recent call last): File "scraper.py", line 141, in <module> data += scrape_list(gender) File "scraper.py", line 110, in scrape_list member = scrape_person(url, id_) File "scraper.py", line 38, in scrape_person r = fetch_url(url, "member-{}.html".format(id_)) File "scraper.py", line 27, in fetch_url r = requests.get(url).text File "/app/.heroku/python/lib/python2.7/site-packages/requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "/app/.heroku/python/lib/python2.7/site-packages/requests/api.py", line 60, in request return session.request(method=method, url=url, **kwargs) File "/app/.heroku/python/lib/python2.7/site-packages/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/app/.heroku/python/lib/python2.7/site-packages/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/app/.heroku/python/lib/python2.7/site-packages/requests/adapters.py", line 498, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', error(104, 'Connessione interrotta dal corrispondente'))

Data

Downloaded 1122 times by everypolitician chrismytton

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (563 KB) Use the API

rows 10 / 1094

end_date election_list id birth_date start_date name source group area gender term email area_id image
306683
MINNUCCI Emiliano
male
17
2013-04-03
FRATELLI D'ITALIA
302757
1963-11-01
CORSARO Massimo Enrico
MISTO - non iscritto ad alcuna componente politica
LOMBARDIA 1
male
17
CORSARO_M@CAMERA.IT
III
302946
CUOMO Antonio
male
17
306687
CAMANI Vanessa
female
17
306681
CIRACI' Nicola
male
17
301565
VICO Ludovico
male
17
2015-11-06
SINISTRA ECOLOGIA LIBERTA'
33040
1957-04-15
2014-11-20
FAVA Claudio
MISTO - PARTITO SOCIALISTA ITALIANO (PSI) - LIBERALI PER L'ITALIA (PLI)
LOMBARDIA 1
male
17
III
2016-10-27
MOVIMENTO 5 STELLE BEPPEGRILLO.IT
305528
1986-10-17
2016-02-12
CATALANO Ivan
MISTO - non iscritto ad alcuna componente politica
LOMBARDIA 2
male
17
IV
307083
LA TERZA Valentina
female
17
306743
ALTIERI Trifone
male
17

Statistics

Average successful run time: about 2 hours

Total run time: about 2 months

Total cpu time used: about 12 hours

Total disk space used: 601 KB

History

  • Auto ran revision 2ddffc6f and failed .
    nothing changed in the database
  • Auto ran revision 2ddffc6f and failed .
    nothing changed in the database
  • Auto ran revision 2ddffc6f and failed .
    nothing changed in the database
  • Auto ran revision 2ddffc6f and failed .
    nothing changed in the database
  • Auto ran revision 2ddffc6f and failed .
    nothing changed in the database
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

italy-camera / scraper.py