petrbouchal / GovJobsCZ

Scraper for Czech civil service jobs

Scrapes www.mpo.cz, www.mpsv.cz, www.vlada.cz, and 5 other domains

Internetové stránky Ministerstva průmyslu a obchodu poskytují svým návštěvníkům ucelenou informaci o práci úřadu, jehož resort obsahuje značnou šířku problematiky hospodářství České republiky.


Ready

In Progress

Volná místa ve státní správě

  • skript pro stahování volných pozic z webových stránek českých ministerstev
  • velmi hrubá verze
  • optimalizováno pro morph.io

Documentation

See wiki

Links

Contributors petrbouchal

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 2 is python-2.7.14 (you are using python-2.7.6, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-2.7.14).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-2.7.6 -----> Installing pip -----> Installing requirements with pip  Collecting beautifulsoup4==4.4.1 (from -r /tmp/build/requirements.txt (line 1))  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:339: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  SNIMissingWarning  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:137: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  InsecurePlatformWarning  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:137: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  InsecurePlatformWarning  Downloading https://files.pythonhosted.org/packages/33/62/f3e97eaa87fc4de0cb9b8c51d253cf0df621c6de6b25164dcbab203e5ff7/beautifulsoup4-4.4.1-py2-none-any.whl (81kB)  Installing collected packages: beautifulsoup4  Successfully installed beautifulsoup4-4.4.1   ! Hello! It looks like your application is using an outdated version of Python.  ! This caused the security warning you saw above during the 'pip install' step.  ! We recommend 'python-3.6.2', which you can specify in a 'runtime.txt' file.  ! -- Much Love, Heroku.   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... Starting scraper... 4.4.1 http://www.mpo.cz/cz/rozcestnik/ministerstvo/pracovni-prilezitosti/ Nalezeno 2 pozic na MPO http://www.mpsv.cz/cs/22466 Nalezeno 16 pozic na MPSV http://www.vlada.cz/scripts/detail.php?pgid=445 Nalezeno 4 pozic na ÚV http://www.msmt.cz/ministerstvo/volna-mista http://www.msmt.cz/modules/marwel/index.php?rewrite=ministerstvo%2Fvolna-mista&str=2 http://www.msmt.cz/modules/marwel/index.php?rewrite=ministerstvo%2Fvolna-mista&str=3 Nalezeno 31 pozic na MŠMT http://mmr.cz/cs/Neobsazena-mista/Pracovni-mista Nalezeno 0 pozic na MMR http://mmr.cz/cs/Neobsazena-mista/Sluzebni-mista Nalezeno 0 pozic na MMR http://www.mvcr.cz/nabidka-mist.aspx Nalezeno 1 pozic na MV http://eagri.cz/public/web/mze/ministerstvo-zemedelstvi/volna-pracovni-mista/?pageSize=50 Nalezeno 7 pozic na MZe http://www.mocr.army.cz/ministr-a-ministerstvo/kariera-vzdelavani/pracovni-prilezitosti/default.htm Error opening page Traceback (most recent call last): File "scraper.py", line 30, in <module> jobsallbodies = jobsallbodies + scrapepages(now, minparameters[dept]) File "/app/lib_minscrapers.py", line 119, in scrapepages thispagejoblist = scrapejobs(timestamp, newbodydata) File "/app/lib_minscrapers.py", line 53, in scrapejobs page = open_checksnag(bodydata['jobsurl']) File "/app/lib_minscrapers.py", line 18, in open_checksnag response = open_withcookies(urltouse) File "/app/lib_minscrapers.py", line 12, in open_withcookies r = opener.open(request) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found

Data

Downloaded 13821 times by petrbouchal MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (67.8 MB) Use the API

rows 10 / 245938

jobtitle joburl dept datetime
Referent/referentka do oddělení liniových staveb
MPO
2014-08-31 00:06:42 +1000
Referent/referentka do oddělení společné obchodní politiky, WTO a ostatních mezinárodních ekonomických organizací
MPO
2014-08-31 00:06:42 +1000
Referent/referentka do oddělení energetických úspor
MPO
2014-08-31 00:06:42 +1000
Referenta/referentky do oddělení investic
MPO
2014-08-31 00:06:42 +1000
Referent/referentka do oddělení monitoringu a vzdělávání
MPO
2014-08-31 00:06:42 +1000
Odborný lékař – přísedící při jednání posudkové komise MPSV
MPSV
2014-08-31 00:06:42 +1000
Projektový/á manažer/ka se zaměřením na OP LZZ v oddělení realizace programů – zaměstnanost
MPSV
2014-08-31 00:06:42 +1000
referent/ka v oddělení financí a rozpočtu
MPSV
2014-08-31 00:06:42 +1000
Odbor Rady pro výzkum, vývoj a inovace, hledá vedoucí/ho Oddělení informačního systému výzkumu a experimentálního vývoje a inovací
ÚV
2014-08-31 00:06:42 +1000
Odbor evropských fondů - oddělení monitoringu, kontroly a technické pomoci
MZd
2014-08-31 00:06:42 +1000

Statistics

Average successful run time: 4 minutes

Total run time: about 1 month

Total cpu time used: about 2 hours

Total disk space used: 67.9 MB

History

  • Auto ran revision 625eecfb and failed .
    nothing changed in the database
    30 pages scraped
  • Auto ran revision 625eecfb and failed .
    nothing changed in the database
    30 pages scraped
  • Auto ran revision 625eecfb and failed .
    nothing changed in the database
    25 pages scraped
  • Auto ran revision 625eecfb and failed .
    nothing changed in the database
    25 pages scraped
  • Auto ran revision 625eecfb and failed .
    nothing changed in the database
    25 pages scraped
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

GovJobsCZ / scraper.py