wfdd / colombia-senado-scraper

A scraper for the members of the Colombian Senate.


A morph.io scraper for the members of the Colombian Senate.

Contributors wfdd

Last run completed successfully .

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 3 is python-3.6.2 (you are using python-3.6.0, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-3.6.2).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-3.6.0 -----> Installing pip -----> Installing requirements with pip  Collecting aiohttp==1.1.5 (from -r /tmp/build/requirements.txt (line 1))  Downloading https://files.pythonhosted.org/packages/5f/60/afb29b5712ade524efdce339e2a6a0cb69c44115804ab5d4e976bf3f1983/aiohttp-1.1.5.tar.gz (510kB)  Collecting async-timeout==1.1.0 (from -r /tmp/build/requirements.txt (line 2))  Downloading https://files.pythonhosted.org/packages/58/18/0747349c48d690f7d78fc1824e27a071534828023d005a4dd3308d9448f0/async_timeout-1.1.0-py3-none-any.whl  Collecting chardet==2.3.0 (from -r /tmp/build/requirements.txt (line 3))  Downloading https://files.pythonhosted.org/packages/7e/5c/605ca2daa5cf21c87690d8fe6ab05a6f2278c451f4ede6456dd26453f4bd/chardet-2.3.0-py2.py3-none-any.whl (180kB)  Collecting lxml==3.6.4 (from -r /tmp/build/requirements.txt (line 4))  Downloading https://files.pythonhosted.org/packages/4f/3f/cf6daac551fc36cddafa1a71ed48ea5fd642e5feabd3a0d83b8c3dfd0cb4/lxml-3.6.4.tar.gz (3.7MB)  Collecting multidict==2.1.2 (from -r /tmp/build/requirements.txt (line 5))  Downloading https://files.pythonhosted.org/packages/8b/99/a32210e82198db00d071aa207432b898ddd8061000d00d3841a63a734d31/multidict-2.1.2.tar.gz (91kB)  Collecting uvloop==0.6.5 (from -r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/78/f7/9a1ee6b9576e29296355147c60e641263b4c12c2e6146cc032a5cc1b0861/uvloop-0.6.5.tar.gz (2.0MB)  Collecting yarl==0.7.1 (from -r /tmp/build/requirements.txt (line 7))  Downloading https://files.pythonhosted.org/packages/e6/d1/0de27bc2350e679ac0f241f6641f75229e5b88ae8a7a0f2b8f79c871afca/yarl-0.7.1.tar.gz (117kB)  Installing collected packages: chardet, multidict, async-timeout, yarl, aiohttp, lxml, uvloop  Running setup.py install for multidict: started  Running setup.py install for multidict: finished with status 'done'  Running setup.py install for yarl: started  Running setup.py install for yarl: finished with status 'done'  Running setup.py install for aiohttp: started  Running setup.py install for aiohttp: finished with status 'done'  Running setup.py install for lxml: started  Running setup.py install for lxml: still running...  Running setup.py install for lxml: finished with status 'done'  Running setup.py install for uvloop: started  Running setup.py install for uvloop: finished with status 'done'  Successfully installed aiohttp-1.1.5 async-timeout-1.1.0 chardet-2.3.0 lxml-3.6.4 multidict-2.1.2 uvloop-0.6.5 yarl-0.7.1   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... extract_emails Couldn't find email in http://www.secretariasenado.gov.co/?Itemid=109&id=192%3Aaguilar-hurtado-nerthink-mauricio&option=com_content&view=article extract_emails Couldn't find email in http://www.secretariasenado.gov.co/?Itemid=109&id=1177%3Auribe-velez-alvaro&option=com_content&view=article extract_photo Discarding http://www.secretariasenado.gov.co/images/banners/Jose%2520David%2520Name%2520Cardozo.jpg in http://www.secretariasenado.gov.co/?Itemid=109&id=141%3Aname-cardozo-jose-david&option=com_content&view=article. Received error code 404 extract_emails Couldn't find email in http://www.secretariasenado.gov.co/?Itemid=109&id=1130%3Aprieto-riveros-jorge&option=com_content&view=article extract_photo Discarding http://www.secretariasenado.gov.co/images/banners/Juan%2520Carlos%2520Restepo%2520Escobar.jpg in http://www.secretariasenado.gov.co/?Itemid=109&id=80%3Arestrepo-escobar-juan-carlos&option=com_content&view=article. Received error code 404 extract_photo Discarding http://www.secretariasenado.gov.co/images/banners/Lidio%2520Arturo%2520Garcia%2520Turbay.jpg in http://www.secretariasenado.gov.co/?Itemid=109&id=117%3Agarcia-turbay-lidio-arturo&option=com_content&view=article. Received error code 404 extract_emails Couldn't find email in http://www.secretariasenado.gov.co/?Itemid=109&id=1143%3Acorrea-borrero-susana&option=com_content&view=article extract_website Email found in http://www.secretariasenado.gov.co/?Itemid=109&id=1173%3Aramos-maya-alfredo&option=com_content&view=article; skipping extract_photo Discarding http://www.secretariasenado.gov.co/images/banners/Luis%2520Emilio%2520Sierra%2520Grajales.jpg in http://www.secretariasenado.gov.co/?Itemid=109&id=176%3Asierra-grajales-luis-emilio&option=com_content&view=article. Received error code 404 extract_photo Discarding http://www.secretariasenado.gov.co/images/banners/Jaime%2520Enrique%2520Duran%2520Barrera.jpg in http://www.secretariasenado.gov.co/?Itemid=109&id=107%3Aduran-barrera-jaime-enrique&option=com_content&view=article. Received error code 404 extract_website Email found in http://www.secretariasenado.gov.co/?Itemid=109&id=1144%3Acristo-bustos-andres&option=com_content&view=article; skipping extract_photo Discarding http://www.secretariasenado.gov.co/images/banners/CARLOS%2520EDUARDO%2520ENRRIQUEZ%2520MAYA.jpg in http://www.secretariasenado.gov.co/?Itemid=109&id=109%3Aenriquez-maya-carlos-eduardo&option=com_content&view=article. Received error code 404

Data

Downloaded 207 times by everypolitician wfdd

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (60 KB) Use the API

rows 10 / 101

id name image group term email website phone facebook twitter place_of_birth source
1331:acuna-diaz-laureano-augusto
ACUÑA DIAZ LAUREANO AUGUSTO
PARTIDO CONSERVADOR
2014
utl.laureanoacuna@senado.gov.co
3823262 - 3823263
192:aguilar-hurtado-nerthink-mauricio
NERTHINK MAURICIO AGUILAR HURTADO
PARTIDO DE INTEGRACIÓN NACIONAL
2014
3823000 EXT 3335-3334-4398
SenadorAguilar
Bucaramanga - Santander
1127:alvarez-montenegro-javier-tato
ÁLVAREZ MONTENEGRO JAVIER TATO
PARTIDO LIBERAL
2014
tomachch@yahoo.es
3823000 EXT 3531-3032
Pasto - Nariño
1126:amin-escaf-miguel
AMÍN ESCAF MIGUEL
PARTIDO DE LA U
2014
Sabalzakm@hotmail.com
3823000 EXT 3740
Barranquilla - Atlántico
1128:amin-hernandez-jaime-alejandro
AMIN HERNANDEZ JAIME ALEJANDRO
PARTIDO CENTRO DEMOCRATICO
2014
leidycasallas_27@hotmail.com
3823000 EXT 4444-4443
Barranquilla - Atlantico
1129:andrade-casama-luis-evelis
ANDRADE CASAMA LUIS EVELIS
PARTIDO MOVIMIENTO ALTERNATIVO INDIGENA Y SOCIAL “MAIS”
2014
utlsenadorleac@gmail.com
3823000 EXT 4347-4349
Luis_Evelis
Rio Sucio -  Choco
194:andrade-serrano-hernan-francisco
HERNÁN FRANCISCO ANDRADE SERRANO
PARTIDO CONSERVADOR
2014
hector.alfonso.lopez@senado.gov.co
3823000 EXT 3162-3163
AndradeSenador
Neiva - Huila
1131:araujo-rumie-fernando-nicolas
ARAUJO RUMIE FERNANDO NICOLAS
PARTIDO CENTRO DEMOCRÁTICO
2014
nicolas.araujo@senado.gov.co;caronader@hotmail.com
3823000 EXT 3358-3359
https://www.facebook.com/Senador Fernando Nicolás Araujo
FNAraujoR
Cartagena - Bolívar
71:ashton-giraldo-alvaro-antonio
ÁLVARO ANTONIO ASTHON GIRALDO
PARTIDO LIBERAL
2014
alvaroashton11@gmail.com
3823000 EXT 3345-3346-3407
ALVAROASHTON
Barranquilla - Atlántico
196:avirama-marco-anibal
MARCO ANIBAL AVIRAMA AVIRAMA
PARTIDO ALIANZA SOCIAL INDEPENDIENTE
2014
marco.avirama.avirama@senado.gov.co
3823000 EXT 4012-4013-4045
Puracé - Cauca

Statistics

Average successful run time: 3 minutes

Total run time: 1 day

Total cpu time used: 12 minutes

Total disk space used: 101 KB

History

  • Auto ran revision 665fde6b and completed successfully .
    101 records added, 101 records removed in the database
  • Auto ran revision 665fde6b and completed successfully .
    101 records added, 101 records removed in the database
  • Auto ran revision 665fde6b and completed successfully .
    101 records added, 101 records removed in the database
  • Auto ran revision 665fde6b and completed successfully .
    101 records added, 101 records removed in the database
    202 pages scraped
  • Auto ran revision 665fde6b and completed successfully .
    101 records added, 101 records removed in the database
    202 pages scraped
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

colombia-senado-scraper / scraper.py