bry0n969 / dsfsD

adsf

Scrapes www.kopavogur.is and www.rbht.nhs.uk

Kópavogsbær | Kópavogur.is


Contributors bry0n969

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... [1G [1G-----> PHP app detected [1G-----> Bootstrapping... [1G-----> Installing platform packages... [1G NOTICE: No runtime required in composer.lock; using PHP ^5.5.17 [1G - php (5.6.30) [1G - ext-gd (bundled with php) [1G - ext-mbstring (bundled with php) [1G - ext-pdo_sqlite (bundled with php) [1G - ext-sqlite3 (bundled with php) [1G - apache (2.4.20) [1G - nginx (1.8.1) [1G-----> Installing dependencies... [1G Composer version 1.1.3 2016-06-26 15:42:08 [1G Loading composer repositories with package information [1G Installing dependencies from lock file [1G - Installing openaustralia/scraperwiki (dev-morph_defaults e996fe0) [1G Cloning e996fe0253bb50330690f5d2bafb66f094dbacb8 [1G [1G Generating optimized autoload files [1G-----> Preparing runtime environment... [1G-----> Checking for additional extensions to install... [1G [1G-----> Discovering process types [1G Procfile declares types -> scraper Injecting scraper and running... import scraperwiki # Blank Python import scraperwiki import lxml.html urls = ["http://www.ebay.com/sch/m.html?_nkw=&_armrs=1&_from=&_ssn=offroadbelts&_pgn=2&_skc=200&rt=nc"] max_pages = 10000 for wurl in urls: curr_url = wurl page_idx = 1 while page_idx <= max_pages : error = True while error: try: html = scraperwiki.scrape(curr_url) root = lxml.html.fromstring(html) for tr in root.cssselect("div[class='ittl'] a"): url = tr.get("href") html = scraperwiki.scrape(url) if html.find("channeladvisor_poweredby-en.gif") != -1 : root2 = lxml.html.fromstring(html) for mname in root2.cssselect("span[class='mbg-nw']"): data = { 'url': url, 'merchant_name': mname.text } scraperwiki.sqlite.save(unique_keys=['url'],data=data) for next_page in root.cssselect("td[class='botpg-next'] a"): print curr_url curr_url = next_page.get("href") page_idx = page_idx +1 error = False except: print 'error' error = True

Statistics

Average successful run time: half a minute

Total run time: half a minute

Total cpu time used: less than 5 seconds

Total disk space used: 23 KB

History

  • Manually ran revision 9c01b5a8 and completed successfully .
    nothing changed in the database
    4 pages scraped
  • Created on morph.io

Scraper code

dsfsD