soit-sk / slovakia_parliament

Slovak parliament session transcripts database


Slovak parliament session transcripts database

Scraper created at OpenScraper Challenge 2014, improved in 2015

Dependencies

Scrapes has few dependencies that are listed in requirements.txt file

Scraper

This is a scraper that runs on Morph. To get started see the documentation

Contributors mnagy katkad ricco386 lkundrak

Last run completed successfully .

Console output of last run

Injecting configuration and compiling...  -----> Python app detected -----> Installing python-2.7.12  $ pip install -r requirements.txt  Collecting requests==2.8.0 (from -r /tmp/build/requirements.txt (line 1))  Downloading requests-2.8.0-py2.py3-none-any.whl (476kB)  Collecting scraperwiki==0.5.1 (from -r /tmp/build/requirements.txt (line 2))  Downloading scraperwiki-0.5.1.tar.gz  Collecting beautifulsoup4==4.4.1 (from -r /tmp/build/requirements.txt (line 3))  Downloading beautifulsoup4-4.4.1-py2-none-any.whl (81kB)  Collecting six (from scraperwiki==0.5.1->-r /tmp/build/requirements.txt (line 2))  Downloading six-1.11.0-py2.py3-none-any.whl  Collecting sqlalchemy (from scraperwiki==0.5.1->-r /tmp/build/requirements.txt (line 2))  Downloading SQLAlchemy-1.1.14.tar.gz (5.2MB)  Collecting alembic (from scraperwiki==0.5.1->-r /tmp/build/requirements.txt (line 2))  Downloading alembic-0.9.5.tar.gz (990kB)  Collecting Mako (from alembic->scraperwiki==0.5.1->-r /tmp/build/requirements.txt (line 2))  Downloading Mako-1.0.7.tar.gz (564kB)  Collecting python-editor>=0.3 (from alembic->scraperwiki==0.5.1->-r /tmp/build/requirements.txt (line 2))  Downloading python-editor-1.0.3.tar.gz  Collecting python-dateutil (from alembic->scraperwiki==0.5.1->-r /tmp/build/requirements.txt (line 2))  Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)  Collecting MarkupSafe>=0.9.2 (from Mako->alembic->scraperwiki==0.5.1->-r /tmp/build/requirements.txt (line 2))  Downloading MarkupSafe-1.0.tar.gz  Installing collected packages: requests, six, sqlalchemy, MarkupSafe, Mako, python-editor, python-dateutil, alembic, scraperwiki, beautifulsoup4  Running setup.py install for sqlalchemy: started  Running setup.py install for sqlalchemy: finished with status 'done'  Running setup.py install for MarkupSafe: started  Running setup.py install for MarkupSafe: finished with status 'done'  Running setup.py install for Mako: started  Running setup.py install for Mako: finished with status 'done'  Running setup.py install for python-editor: started  Running setup.py install for python-editor: finished with status 'done'  Running setup.py install for alembic: started  Running setup.py install for alembic: finished with status 'done'  Running setup.py install for scraperwiki: started  Running setup.py install for scraperwiki: finished with status 'done'  Successfully installed Mako-1.0.7 MarkupSafe-1.0 alembic-0.9.5 beautifulsoup4-4.4.1 python-dateutil-2.6.1 python-editor-1.0.3 requests-2.8.0 scraperwiki-0.5.1 six-1.11.0 sqlalchemy-1.1.14   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... Got 2 rows for page #1076 [{'term_nr': 7, 'proceedings_video': u'http://tv.nrsr.sk/archiv/schodza/7/19', 'transcript': u'http://tv.nrsr.sk/transcript?id=183685', 'meeting_number': 19, 'time_from': datetime.datetime(2017, 9, 19, 11, 36, 41), 'time_to': datetime.datetime(2017, 9, 19, 11, 36, 55), 'member': u'Bug\xe1r, B\xe9la', 'speech_video': u'http://tv.nrsr.sk/archiv/schodza/7/19?id=183685'}, {'term_nr': 7, 'proceedings_video': u'http://tv.nrsr.sk/archiv/schodza/7/19', 'transcript': u'http://tv.nrsr.sk/transcript?id=183680', 'meeting_number': 19, 'time_from': datetime.datetime(2017, 9, 19, 11, 36, 40), 'time_to': datetime.datetime(2017, 9, 19, 11, 36, 54), 'member': u'\u0160ebej, Franti\u0161ek', 'speech_video': u'http://tv.nrsr.sk/archiv/schodza/7/19?id=183680'}] Got 0 rows for page #1077 [] No data for page #1077, ending

Data

Downloaded 8 times by jakubcevela loren Protesuiq MiroJanosik MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (30.1 MB) Use the API

rows 10 / 48809

meeting_number time_from proceedings_video speech_video member term_nr time_to transcript
39
2014-10-29T15:58:47+00:00
Vážny, Ľubomír
6
2014-10-29T16:02:29+00:00
42
2014-11-08T22:32:27+00:00
Laššáková, Jana
6
2014-11-08T22:42:16+00:00
42
2014-11-08T22:42:16+00:00
Laššáková, Jana
6
2014-11-08T22:44:15+00:00
42
2014-11-08T22:44:15+00:00
Laššáková, Jana
6
2014-11-08T22:46:21+00:00
42
2014-11-08T22:47:24+00:00
Laššáková, Jana
6
2014-11-08T22:48:26+00:00
42
2014-11-08T22:48:26+00:00
Laššáková, Jana
6
2014-11-08T22:50:30+00:00
42
2014-11-08T22:50:30+00:00
Laššáková, Jana
6
2014-11-08T22:52:17+00:00
42
2014-11-08T22:52:17+00:00
Laššáková, Jana
6
2014-11-08T22:54:02+00:00
42
2014-11-08T22:54:02+00:00
Laššáková, Jana
6
2014-11-08T22:56:06+00:00
42
2014-11-08T22:56:06+00:00
Laššáková, Jana
6
2014-11-08T22:58:11+00:00

Statistics

Average successful run time: about 2 hours

Total run time: 3 days

Total cpu time used: about 1 hour

Total disk space used: 30.2 MB

History

  • Manually ran revision 252b9645 and completed successfully .
    nothing changed in the database
  • Manually ran revision 252b9645 and completed successfully .
    nothing changed in the database
  • Manually ran revision 1b8d366d and completed successfully .
    nothing changed in the database
    578 pages scraped
  • Manually ran revision 1b8d366d and completed successfully .
    nothing changed in the database
    362 pages scraped
  • Manually ran revision 1b8d366d and completed successfully .
    nothing changed in the database
    3033 pages scraped
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

slovakia_parliament / scraper.py