Functionality:

  • Two levels of pages
  • £'s in urls

Contributors woodbine blablupcom

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... Traceback (most recent call last): File "scraper.py", line 98, in <module> html = urllib2.urlopen(url) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 442, in error result = self._call_chain(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 629, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 442, in error result = self._call_chain(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 629, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: 404 File Not Found

Data

Downloaded 814 times by SimKennedy MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (84 KB) Use the API

rows 10 / 179

d l f
2015-05-15 07:02:58.295141
E5018_LLBC_gov_2015_03
2015-05-15 07:02:58.300715
E5018_LLBC_gov_2015_02
2015-05-15 07:02:58.310633
E5018_LLBC_gov_2015_01
2015-05-15 07:02:58.313278
E5018_LLBC_gov_2014_12
2015-05-15 07:02:58.315565
E5018_LLBC_gov_2014_11
2015-05-15 07:02:58.320978
E5018_LLBC_gov_2014_10
2015-05-15 07:02:58.323608
E5018_LLBC_gov_2014_09
2015-05-15 07:02:58.325831
E5018_LLBC_gov_2014_08
2015-05-15 07:02:58.328893
E5018_LLBC_gov_2014_07
2015-05-15 07:02:58.332727
E5018_LLBC_gov_2014_06

Statistics

Average successful run time: 5 minutes

Total run time: 4 days

Total cpu time used: about 2 hours

Total disk space used: 116 KB

History

  • Auto ran revision 046ca757 and failed .
    nothing changed in the database
  • Auto ran revision 046ca757 and failed .
    nothing changed in the database
  • Auto ran revision 046ca757 and failed .
    nothing changed in the database
  • Auto ran revision 046ca757 and failed .
    nothing changed in the database
  • Auto ran revision 046ca757 and failed .
    nothing changed in the database
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

sp_E5018_LLBC_gov / scraper.py