blablupcom / manta

Scrapes www.gov.uk

GOV.UK - The place to find government services and information - Simpler, clearer, faster


This is a scraper that runs on Morph. To get started see the documentation

Contributors blablupcom

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... /app/.heroku/python/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. To get rid of this warning, change this: BeautifulSoup([your markup]) to this: BeautifulSoup([your markup], "html.parser") markup_type=markup_type)) <!DOCTYPE html> <html> <head> <title>Pardon Our Interruption</title> <link href="//cdn.distilnetworks.com/css/distil.css" media="all" rel="stylesheet" type="text/css"> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <meta content="width=1000" name="viewport"/> <meta content="noindex, nofollow" name="robots"> <meta content="max-age=0" http-equiv="cache-control"/> <meta content="no-cache" http-equiv="cache-control"/> <meta content="0" http-equiv="expires"/> <meta content="Tue, 01 Jan 1980 1:00:00 GMT" http-equiv="expires"/> <meta content="no-cache" http-equiv="pragma"/> </meta></link></head> <body class="block-page"> <div class="container"> <div class="row"> <div class="sidebar col-lg-4 col-sm-5"> <img alt="0" src="//cdn.distilnetworks.com/images/anomaly-detected.png"> </img></div> <div class="content col-lg-8 col-sm-7"> <h1>Pardon Our Interruption...</h1> <p> As you were browsing <strong>http://www.manta.com</strong> something about your browser made us think you were a bot. There are a few reasons this might happen: </p> <ul> <li>You're a power user moving through this website with super-human speed.</li> <li>You've disabled JavaScript in your web browser.</li> <li>A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this <a href="https://support.distilnetworks.com/customer/portal/articles/1842381-third-party-browser-plugins-that-block-javascript" target="_blank" title="Third party browser plugins that block javascript">support article</a>.</li> </ul> <p> To request an unblock, please fill out the form below and we will review it as soon as possible. </p> <form action="axcvadyqeqqbscxvrfaxcva.html" id="bayeqqswaabxsf" method="POST" style="display:none"><label>Ignore: <input name="name" type="text"/></label><label>Ignore: <input name="email" type="text"/></label><label>Ignore: <input type="submit" value="Submit"/></label></form><form action="http://verify.distil.it/distil_blocked.php" id="demoForm" method="post"> <div class="form-group"> <label for="first_name">First Name</label> <input class="form-control" id="first_name" name="first_name" type="text" value=""> </input></div> <div class="form-group"> <label for="last_name">Last Name</label> <input class="form-control" id="last_name" name="last_name" type="text" value=""> </input></div> <div class="form-group"> <label for="email">E-mail</label> <input class="form-control" id="email" name="email" type="text" value=""> </input></div> <div class="form-group hide"> <label for="city">City</label> <input class="form-control hide" id="city" name="city" type="text" value=""> </input></div> <input name="B" type="hidden" value="2514:50.116.3.88:1E9B0FF7-9E1F-379F-A90E-F22277DBECF9"/> <input name="P" type="hidden" value="1E9B0FF7-9E1F-379F-A90E-F22277DBECF9"/> <input name="I" type="hidden" value=""/> <input name="U" type="hidden" value=""/> <input name="V" type="hidden" value="9"/> <input name="O" type="hidden" value=""/> <input name="D" type="hidden" value="2514"/> <input name="A" type="hidden" value="589"/> <input name="H" type="hidden" value="www.manta.com"/> <input name="LOADED" type="hidden" value="2015-08-02 13:14:06"/> <input id="distil_block_identity_info" name="PB" type="hidden" value=""/> <hr> <button class="btn btn-primary btn-lg" type="submit">Request Unblock</button> </hr></form> <p id="extraUnblock"> <small style="font-size: 8pt"> You reached this page when attempting to access http://www.manta.com/world/Oceania/Australia/ from 50.116.3.88 on 2015-08-02 13:14:06 GMT.<br/> Trace: 5A88AEB8-3918-11E5-B143-FCFD419E1EA2 via 94fab34c-ba88-4194-bb82-95bc7f22bea4 </small> </p> </div> </div> </div> </body> </html> Traceback (most recent call last): File "scraper.py", line 8, in <module> title = s.find('span', attrs={'itemprop':'title'}).text AttributeError: 'NoneType' object has no attribute 'text'

Statistics

Total run time: 7 minutes

Total cpu time used: less than 10 seconds

Total disk space used: 33.3 KB

History

  • Manually ran revision 16256ade and failed .
    nothing changed in the database
    2 pages scraped
  • Manually ran revision e84bc476 and failed .
    nothing changed in the database
    1 page scraped
  • Manually ran revision 3cce7be8 and failed .
    nothing changed in the database
  • Manually ran revision 6d8e0fc0 and failed .
    nothing changed in the database
  • Manually ran revision 472f7558 and failed .
    nothing changed in the database
  • Manually ran revision 5be68eea and failed .
    nothing changed in the database
  • Manually ran revision d8ee74f2 and failed .
    nothing changed in the database
  • Manually ran revision ddb7d1de and failed .
    nothing changed in the database
  • Manually ran revision e07c9fdc and failed .
    nothing changed in the database
  • Manually ran revision a57549e7 and failed .
    nothing changed in the database
  • Manually ran revision 320a5f97 and failed .
    nothing changed in the database
  • Manually ran revision e7b22744 and failed .
    nothing changed in the database
  • Manually ran revision eee86de8 and failed .
    nothing changed in the database
  • Manually ran revision 8ebd043f and failed .
    nothing changed in the database
  • Created on morph.io

Scraper code

Python

manta / scraper.py