howawong / hong_kong_primary_school_vacancies

Vacancy Situation of Public Sector Primary Schools


This is a scraper that runs on Morph. To get started see the documentation

Contributors howawong

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 2 is python-2.7.14 (you are using python-2.7.9, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-2.7.14).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-2.7.9 -----> Installing pip -----> Installing requirements with pip  Obtaining scraperwiki from git+http://github.com/openaustralia/scraperwiki-python.git@morph_defaults#egg=scraperwiki (from -r /tmp/build/requirements.txt (line 6))  Cloning http://github.com/openaustralia/scraperwiki-python.git (to revision morph_defaults) to /app/.heroku/src/scraperwiki  Collecting lxml==3.4.4 (from -r /tmp/build/requirements.txt (line 8))  Downloading https://files.pythonhosted.org/packages/63/c7/4f2a2a4ad6c6fa99b14be6b3c1cece9142e2d915aa7c43c908677afc8fa4/lxml-3.4.4.tar.gz (3.5MB)  Collecting cssselect==0.9.1 (from -r /tmp/build/requirements.txt (line 9))  Downloading https://files.pythonhosted.org/packages/aa/e5/9ee1460d485b94a6d55732eb7ad5b6c084caf73dd6f9cb0bb7d2a78fafe8/cssselect-0.9.1.tar.gz  Collecting dumptruck>=0.1.2 (from scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/15/27/3330a343de80d6849545b6c7723f8c9a08b4b104de964ac366e7e6b318df/dumptruck-0.1.6.tar.gz  Collecting requests (from scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/65/47/7e02164a2a3db50ed6d8a6ab1d6d60b69c4c3fdf57a284257925dfc12bda/requests-2.19.1-py2.py3-none-any.whl (91kB)  Collecting idna<2.8,>=2.5 (from requests->scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/4b/2a/0276479a4b3caeb8a8c1af2f8e4355746a97fab05a372e4a2c6a6b876165/idna-2.7-py2.py3-none-any.whl (58kB)  Collecting certifi>=2017.4.17 (from requests->scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl (150kB)  Collecting urllib3<1.24,>=1.21.1 (from requests->scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/bd/c9/6fdd990019071a4a32a5e7cb78a1d92c53851ef4f56f62a3486e6a7d8ffb/urllib3-1.23-py2.py3-none-any.whl (133kB)  Collecting chardet<3.1.0,>=3.0.2 (from requests->scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)  Installing collected packages: dumptruck, idna, certifi, urllib3, chardet, requests, scraperwiki, lxml, cssselect  Running setup.py install for dumptruck: started  Running setup.py install for dumptruck: finished with status 'done'  Running setup.py develop for scraperwiki  Running setup.py install for lxml: started  Running setup.py install for lxml: still running...  Running setup.py install for lxml: finished with status 'done'  Running setup.py install for cssselect: started  Running setup.py install for cssselect: finished with status 'done'  Successfully installed certifi-2018.4.16 chardet-3.0.4 cssselect-0.9.1 dumptruck-0.1.6 idna-2.7 lxml-3.4.4 requests-2.19.1 scraperwiki urllib3-1.23   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... Traceback (most recent call last): File "scraper.py", line 52, in <module> fetch_record('http://www.edb.gov.hk/attachment/en/student-parents/sch-info/sch-vacancy-situation/primary-sch/Primary_E.pdf') File "scraper.py", line 14, in fetch_record root = lxml.etree.fromstring(pdf) File "lxml.etree.pyx", line 3103, in lxml.etree.fromstring (src/lxml/lxml.etree.c:70569) File "parser.pxi", line 1828, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:106403) File "parser.pxi", line 1716, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:105194) File "parser.pxi", line 1086, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:99876) File "parser.pxi", line 580, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:94350) File "parser.pxi", line 690, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:95786) File "parser.pxi", line 631, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:95065) lxml.etree.XMLSyntaxError: None

Data

Downloaded 4 times by howawong

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (12 KB) Use the API

rows 10 / 111

district p4 p6 p1 p2 p3 month region year p5 total
Central & Western
32
127
39
25
18
4
HK
2016
60
301
Hong Kong East
97
175
68
63
99
4
HK
2016
88
590
Islands
83
118
115
72
54
4
HK
2016
91
533
Southern
54
107
74
51
43
4
HK
2016
57
386
Wan Chai
33
83
80
60
33
4
HK
2016
48
337
Kowloon City
4
77
59
35
15
4
KLN
2016
21
211
Kwun Tong
0
3
74
47
0
4
KLN
2016
26
150
Sai Kung
112
94
137
100
62
4
KLN
2016
96
601
Sham Shui Po
7
40
31
24
19
4
KLN
2016
2
123
Wong Tai Sin
10
38
146
56
14
4
KLN
2016
41
305

Statistics

Average successful run time: half a minute

Total run time: about 1 month

Total cpu time used: 4 minutes

Total disk space used: 34.2 KB

History

  • Auto ran revision 34333489 and failed .
    nothing changed in the database
  • Auto ran revision 34333489 and failed .
    nothing changed in the database
  • Auto ran revision 34333489 and failed .
    nothing changed in the database
    1 page scraped
  • Auto ran revision 34333489 and failed .
    nothing changed in the database
    1 page scraped
  • Auto ran revision 34333489 and failed .
    nothing changed in the database
    1 page scraped
  • ...
  • Created on morph.io

Show complete history