howawong / hong_kong_current_consultation_pages

Current Consultation Papers from http://www.gov.hk/en/residents/government/publication/consultation/current.htm


This is a scraper that runs on Morph. To get started see the documentation

Contributors howawong

Last run completed successfully .

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 2 is python-2.7.14 (you are using python-2.7.9, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-2.7.14).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-2.7.9 -----> Installing pip -----> Installing requirements with pip  Obtaining scraperwiki from git+http://github.com/openaustralia/scraperwiki-python.git@morph_defaults#egg=scraperwiki (from -r /tmp/build/requirements.txt (line 6))  Cloning http://github.com/openaustralia/scraperwiki-python.git (to revision morph_defaults) to /app/.heroku/src/scraperwiki  Collecting lxml==3.4.4 (from -r /tmp/build/requirements.txt (line 8))  Downloading https://files.pythonhosted.org/packages/63/c7/4f2a2a4ad6c6fa99b14be6b3c1cece9142e2d915aa7c43c908677afc8fa4/lxml-3.4.4.tar.gz (3.5MB)  Collecting cssselect==0.9.1 (from -r /tmp/build/requirements.txt (line 9))  Downloading https://files.pythonhosted.org/packages/aa/e5/9ee1460d485b94a6d55732eb7ad5b6c084caf73dd6f9cb0bb7d2a78fafe8/cssselect-0.9.1.tar.gz  Collecting Scrapy==1.0.3 (from -r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/ae/09/2c4395d5d0f0a2e6610d70d13cb6e2037050a61b79acd4180ccc9b5bf350/Scrapy-1.0.3-py2-none-any.whl (290kB)  Collecting dumptruck>=0.1.2 (from scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/15/27/3330a343de80d6849545b6c7723f8c9a08b4b104de964ac366e7e6b318df/dumptruck-0.1.6.tar.gz  Collecting requests (from scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl (88kB)  Collecting service-identity (from Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/29/fa/995e364220979e577e7ca232440961db0bf996b6edaf586a7d1bd14d81f1/service_identity-17.0.0-py2.py3-none-any.whl  Collecting pyOpenSSL (from Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/79/db/7c0cfe4aa8341a5fab4638952520d8db6ab85ff84505e12c00ea311c3516/pyOpenSSL-17.5.0-py2.py3-none-any.whl (53kB)  Collecting queuelib (from Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/4c/85/ae64e9145f39dd6d14f8af3fa809a270ef3729f3b90b3c0cf5aa242ab0d4/queuelib-1.5.0-py2.py3-none-any.whl  Collecting six>=1.5.2 (from Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl  Collecting Twisted>=10.0.0 (from Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/a2/37/298f9547606c45d75aa9792369302cc63aa4bbcf7b5f607560180dd099d2/Twisted-17.9.0.tar.bz2 (3.0MB)  Collecting w3lib>=1.8.0 (from Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/37/94/40c93ad0cadac0f8cb729e1668823c71532fd4a7361b141aec535acb68e3/w3lib-1.19.0-py2.py3-none-any.whl  Collecting idna<2.7,>=2.5 (from requests->scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl (56kB)  Collecting urllib3<1.23,>=1.21.1 (from requests->scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl (132kB)  Collecting certifi>=2017.4.17 (from requests->scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl (150kB)  Collecting chardet<3.1.0,>=3.0.2 (from requests->scraperwiki->-r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)  Collecting pyasn1 (from service-identity->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/ba/fe/02e3e2ee243966b143657fb8bd6bc97595841163b6d8c26820944acaec4d/pyasn1-0.4.2-py2.py3-none-any.whl (71kB)  Collecting pyasn1-modules (from service-identity->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/e9/51/bcd96bf6231d4b2cc5e023c511bee86637ba375c44a6f9d1b4b7ad1ce4b9/pyasn1_modules-0.2.1-py2.py3-none-any.whl (60kB)  Collecting attrs (from service-identity->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/b5/60/4e178c1e790fd60f1229a9b3cb2f8bc2f4cc6ff2c8838054c142c70b5adc/attrs-17.4.0-py2.py3-none-any.whl  Collecting cryptography>=2.1.4 (from pyOpenSSL->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/b8/d2/34f54bf9459446965d0a4939ac872d6f82495cf16f48efc224af5de7f985/cryptography-2.2.2-cp27-cp27m-manylinux1_x86_64.whl (2.2MB)  Collecting zope.interface>=3.6.0 (from Twisted>=10.0.0->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/ac/8a/657532df378c2cd2a1fe6b12be3b4097521570769d4852ec02c24bd3594e/zope.interface-4.5.0.tar.gz (151kB)  Collecting constantly>=15.1 (from Twisted>=10.0.0->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/b9/65/48c1909d0c0aeae6c10213340ce682db01b48ea900a7d9fce7a7910ff318/constantly-15.1.0-py2.py3-none-any.whl  Collecting incremental>=16.10.1 (from Twisted>=10.0.0->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/f5/1d/c98a587dc06e107115cf4a58b49de20b19222c83d75335a192052af4c4b7/incremental-17.5.0-py2.py3-none-any.whl  Collecting Automat>=0.3.0 (from Twisted>=10.0.0->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/17/6a/1baf488c2015ecafda48c03ca984cf0c48c254622668eb1732dbe2eae118/Automat-0.6.0-py2.py3-none-any.whl  Collecting hyperlink>=17.1.1 (from Twisted>=10.0.0->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/a7/b6/84d0c863ff81e8e7de87cff3bd8fd8f1054c227ce09af1b679a8b17a9274/hyperlink-18.0.0-py2.py3-none-any.whl  Collecting cffi>=1.7; platform_python_implementation != "PyPy" (from cryptography>=2.1.4->pyOpenSSL->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/5d/a7/348bf05f004e7534012dc533ee29650d88fb25bf013988518e0acf6961fa/cffi-1.11.5-cp27-cp27m-manylinux1_x86_64.whl (407kB)  Collecting enum34; python_version < "3" (from cryptography>=2.1.4->pyOpenSSL->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/c5/db/e56e6b4bbac7c4a06de1c50de6fe1ef3810018ae11732a50f15f62c7d050/enum34-1.1.6-py2-none-any.whl  Collecting asn1crypto>=0.21.0 (from cryptography>=2.1.4->pyOpenSSL->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/ea/cd/35485615f45f30a510576f1a56d1e0a7ad7bd8ab5ed7cdc600ef7cd06222/asn1crypto-0.24.0-py2.py3-none-any.whl (101kB)  Collecting ipaddress; python_version < "3" (from cryptography>=2.1.4->pyOpenSSL->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/fc/d0/7fc3a811e011d4b388be48a0e381db8d990042df54aa4ef4599a31d39853/ipaddress-1.0.22-py2.py3-none-any.whl  Collecting pycparser (from cffi>=1.7; platform_python_implementation != "PyPy"->cryptography>=2.1.4->pyOpenSSL->Scrapy==1.0.3->-r /tmp/build/requirements.txt (line 10))  Downloading https://files.pythonhosted.org/packages/8c/2d/aad7f16146f4197a11f8e91fb81df177adcc2073d36a17b1491fd09df6ed/pycparser-2.18.tar.gz (245kB)  Installing collected packages: dumptruck, idna, urllib3, certifi, chardet, requests, scraperwiki, lxml, cssselect, pyasn1, pyasn1-modules, six, pycparser, cffi, enum34, asn1crypto, ipaddress, cryptography, pyOpenSSL, attrs, service-identity, queuelib, zope.interface, constantly, incremental, Automat, hyperlink, Twisted, w3lib, Scrapy  Running setup.py install for dumptruck: started  Running setup.py install for dumptruck: finished with status 'done'  Running setup.py develop for scraperwiki  Running setup.py install for lxml: started  Running setup.py install for lxml: still running...  Running setup.py install for lxml: finished with status 'done'  Running setup.py install for cssselect: started  Running setup.py install for cssselect: finished with status 'done'  Running setup.py install for pycparser: started  Running setup.py install for pycparser: finished with status 'done'  Running setup.py install for zope.interface: started  Running setup.py install for zope.interface: finished with status 'done'  Running setup.py install for Twisted: started  Running setup.py install for Twisted: finished with status 'done'  Successfully installed Automat-0.6.0 Scrapy-1.0.3 Twisted-17.9.0 asn1crypto-0.24.0 attrs-17.4.0 certifi-2018.4.16 cffi-1.11.5 chardet-3.0.4 constantly-15.1.0 cryptography-2.2.2 cssselect-0.9.1 dumptruck-0.1.6 enum34-1.1.6 hyperlink-18.0.0 idna-2.6 incremental-17.5.0 ipaddress-1.0.22 lxml-3.4.4 pyOpenSSL-17.5.0 pyasn1-0.4.2 pyasn1-modules-0.2.1 pycparser-2.18 queuelib-1.5.0 requests-2.18.4 scraperwiki service-identity-17.0.0 six-1.11.0 urllib3-1.22 w3lib-1.19.0 zope.interface-4.5.0   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... 2018-04-20 22:42:44 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot) 2018-04-20 22:42:44 [scrapy] INFO: Optional features available: ssl, http11 2018-04-20 22:42:44 [scrapy] INFO: Overridden settings: {} 2018-04-20 22:42:44 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CoreStats, SpiderState, CloseSpider 2018-04-20 22:42:44 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2018-04-20 22:42:44 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2018-04-20 22:42:44 [scrapy] INFO: Enabled item pipelines: 2018-04-20 22:42:44 [scrapy] INFO: Spider opened 2018-04-20 22:42:44 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2018-04-20 22:42:44 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2018-04-20 22:42:45 [scrapy] ERROR: Error downloading <GET http://www.gov.hk/en/residents/government/publication/consultation/current.htm> Traceback (most recent call last): File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/__init__.py", line 41, in download_request return handler(request, spider) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/http11.py", line 44, in download_request return agent.download_request(request) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/http11.py", line 211, in download_request d = agent.request(method, url, headers, bodyproducer) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1655, in request parsedURI.originForm) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1432, in _requestWithEndpoint d = self._pool.getConnection(key, endpoint) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1318, in getConnection return self._newConnection(key, endpoint) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1330, in _newConnection return endpoint.connect(factory) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/endpoints.py", line 903, in connect EndpointReceiver, self._hostText, portNumber=self._port File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/_resolver.py", line 189, in resolveHostName onAddress = self._simpleResolver.getHostByName(hostName) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/resolver.py", line 21, in getHostByName d = super(CachingThreadedResolver, self).getHostByName(name, timeout) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/base.py", line 276, in getHostByName timeoutDelay = sum(timeout) TypeError: 'float' object is not iterable 2018-04-20 22:42:45 [scrapy] ERROR: Error downloading <GET http://www.gov.hk/tc/residents/government/publication/consultation/current.htm> Traceback (most recent call last): File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/__init__.py", line 41, in download_request return handler(request, spider) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/http11.py", line 44, in download_request return agent.download_request(request) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/http11.py", line 211, in download_request d = agent.request(method, url, headers, bodyproducer) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1655, in request parsedURI.originForm) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1432, in _requestWithEndpoint d = self._pool.getConnection(key, endpoint) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1318, in getConnection return self._newConnection(key, endpoint) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1330, in _newConnection return endpoint.connect(factory) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/endpoints.py", line 903, in connect EndpointReceiver, self._hostText, portNumber=self._port File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/_resolver.py", line 189, in resolveHostName onAddress = self._simpleResolver.getHostByName(hostName) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/resolver.py", line 21, in getHostByName d = super(CachingThreadedResolver, self).getHostByName(name, timeout) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/base.py", line 276, in getHostByName timeoutDelay = sum(timeout) TypeError: 'float' object is not iterable 2018-04-20 22:42:45 [scrapy] ERROR: Error downloading <GET http://www.gov.hk/sc/residents/government/publication/consultation/current.htm> Traceback (most recent call last): File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred result = f(*args, **kw) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/__init__.py", line 41, in download_request return handler(request, spider) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/http11.py", line 44, in download_request return agent.download_request(request) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/core/downloader/handlers/http11.py", line 211, in download_request d = agent.request(method, url, headers, bodyproducer) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1655, in request parsedURI.originForm) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1432, in _requestWithEndpoint d = self._pool.getConnection(key, endpoint) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1318, in getConnection return self._newConnection(key, endpoint) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/web/client.py", line 1330, in _newConnection return endpoint.connect(factory) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/endpoints.py", line 903, in connect EndpointReceiver, self._hostText, portNumber=self._port File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/_resolver.py", line 189, in resolveHostName onAddress = self._simpleResolver.getHostByName(hostName) File "/app/.heroku/python/lib/python2.7/site-packages/scrapy/resolver.py", line 21, in getHostByName d = super(CachingThreadedResolver, self).getHostByName(name, timeout) File "/app/.heroku/python/lib/python2.7/site-packages/twisted/internet/base.py", line 276, in getHostByName timeoutDelay = sum(timeout) TypeError: 'float' object is not iterable 2018-04-20 22:42:45 [scrapy] INFO: Closing spider (finished) 2018-04-20 22:42:45 [scrapy] INFO: Dumping Scrapy stats: {'downloader/exception_count': 3, 'downloader/exception_type_count/exceptions.TypeError': 3, 'downloader/request_bytes': 804, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 3, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2018, 4, 20, 22, 42, 45, 113271), 'log_count/DEBUG': 1, 'log_count/ERROR': 3, 'log_count/INFO': 7, 'scheduler/dequeued': 3, 'scheduler/dequeued/memory': 3, 'scheduler/enqueued': 3, 'scheduler/enqueued/memory': 3, 'start_time': datetime.datetime(2018, 4, 20, 22, 42, 44, 802178)} 2018-04-20 22:42:45 [scrapy] INFO: Spider closed (finished)

Data

Downloaded 142 times by howawong

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (25 KB) Use the API

rows 10 / 64

link title lang date
Review of Statutory Minimum Wage Rate Public Consultation
en
2016-05-22T00:00:00+00:00
法定最低工資水平檢討公眾諮詢
tc
2016-05-22T00:00:00+00:00
法定最低工资水平检讨公众咨询
sc
2016-05-22T00:00:00+00:00
選舉委員會界別分組選舉活動建議指引公眾諮詢
tc
2016-06-09T00:00:00+00:00
香港將與格魯吉亞和馬爾代夫談判自由貿易協定
tc
2016-06-06T00:00:00+00:00
Public consultation on the Proposed Guidelines on Election-related Activities in respect of the Election Committee Subsector Elections
en
2016-06-09T00:00:00+00:00
Hong Kong to Negotiate Free Trade Agreements with Georgia and Maldives
en
2016-06-06T00:00:00+00:00
选举委员会界别分组选举活动建议指引公众咨询
sc
2016-06-09T00:00:00+00:00
香港将与格鲁吉亚和马尔代夫谈判自由贸易协定
sc
2016-06-06T00:00:00+00:00
《職業介紹所實務守則》草擬本諮詢
tc
2016-06-17T00:00:00+00:00

Statistics

Average successful run time: 2 minutes

Total run time: 17 days

Total cpu time used: 9 minutes

Total disk space used: 46.9 KB

History

  • Auto ran revision 053233cb and completed successfully .
    nothing changed in the database
  • Auto ran revision 053233cb and completed successfully .
    nothing changed in the database
  • Auto ran revision 053233cb and completed successfully .
    nothing changed in the database
  • Auto ran revision 053233cb and completed successfully .
    nothing changed in the database
  • Auto ran revision 053233cb and completed successfully .
    nothing changed in the database
  • ...
  • Created on morph.io

Show complete history