This is a scraper that runs on Morph. To get started see the documentation

Contributors paulbradshaw

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... <html> <head> <title>Scrape this table of best selling albums</title> </head> <body> <h1>Scrape this table of best selling albums</h1> <table class="data"> <thead><tr class="tableizer-firstrow"><th>Artist</th><th>Album</th><th>Released</th><th>Empty column</th><th>Sales m</th></tr></thead><tbody> <tr><td>Michael Jackson</td><td>Thriller</td><td>1982</td><td>&nbsp;</td><td>110</td></tr> <tr><td>AC/DC</td><td>Back in Black</td><td>1980</td><td>&nbsp;</td><td>49</td></tr> <tr><td>Pink Floyd</td><td>The Dark Side of the Moon</td><td>1973</td><td>&nbsp;</td><td>45</td></tr> <tr><td>Whitney Houston / Various artists</td><td>The Bodyguard</td><td>1992</td><td>&nbsp;</td><td>44</td></tr> <tr><td>Meat Loaf </td><td>Bat Out of Hell</td><td>1977</td><td>&nbsp;</td><td>43</td></tr> <tr><td>Eagles</td><td>Their Greatest Hits (1971â€"1975)</td><td>1976</td><td>&nbsp;</td><td>42</td></tr> </tbody></table> <p> <a class="next" href="/scraping-for-everyone/webpages/example_table_2.html">next page</a> </p> </body> </html> {'Sales m': '110', 'Artist': 'Michael Jackson', 'Album': 'Thriller', 'Released': '1982'} ------------ {'Sales m': '49', 'Artist': 'AC/DC', 'Album': 'Back in Black', 'Released': '1980'} ------------ {'Sales m': '45', 'Artist': 'Pink Floyd', 'Album': 'The Dark Side of the Moon', 'Released': '1973'} ------------ {'Sales m': '44', 'Artist': 'Whitney Houston / Various artists', 'Album': 'The Bodyguard', 'Released': '1992'} ------------ {'Sales m': '43', 'Artist': 'Meat Loaf ', 'Album': 'Bat Out of Hell', 'Released': '1977'} ------------ {'Sales m': '42', 'Artist': 'Eagles', 'Album': u'Their Greatest Hits (1971\xc3\xa2\xe2\x82\xac"1975)', 'Released': '1976'} ------------ [<Element a at 0x7f44345d3e10>] /scraping-for-everyone/webpages/example_table_2.html https://paulbradshaw.github.io/scraping-for-everyone/webpages/example_table_2.html <html> <head> <title>Scrape this table of best selling albums</title> </head> <body> <h1>Scrape this table of best selling albums</h1> <table class="data"> <thead><tr class="tableizer-firstrow"><th>Artist</th><th>Album</th><th>Released</th><th>Empty column</th><th>Sales m</th></tr></thead><tbody> <tr><td>Various artists</td><td>Dirty Dancing</td><td>1987</td><td>&nbsp;</td><td>42</td></tr> <tr><td>Fleetwood Mac</td><td>Rumours</td><td>1977</td><td>&nbsp;</td><td>40</td></tr> <tr><td>Backstreet Boys</td><td>Millennium</td><td>1999</td><td>&nbsp;</td><td>40</td></tr> <tr><td>Bee Gees / Various artists</td><td>Saturday Night Fever</td><td>1977</td><td>&nbsp;</td><td>40</td></tr> <tr><td>Shania Twain</td><td>Come On Over</td><td>1997</td><td>&nbsp;</td><td>39</td></tr> <tr><td>Led Zeppelin</td><td>Led Zeppelin IV</td><td>1971</td><td>&nbsp;</td><td>37</td></tr> </tbody></table> <p> <a class="previous" href="/scraping-for-everyone/webpages/example_table_1.html">previous page</a> </p> </body> </html> {'Sales m': '42', 'Artist': 'Various artists', 'Album': 'Dirty Dancing', 'Released': '1987'} ------------ {'Sales m': '40', 'Artist': 'Fleetwood Mac', 'Album': 'Rumours', 'Released': '1977'} ------------ {'Sales m': '40', 'Artist': 'Backstreet Boys', 'Album': 'Millennium', 'Released': '1999'} ------------ {'Sales m': '40', 'Artist': 'Bee Gees / Various artists', 'Album': 'Saturday Night Fever', 'Released': '1977'} ------------ {'Sales m': '39', 'Artist': 'Shania Twain', 'Album': 'Come On Over', 'Released': '1997'} ------------ {'Sales m': '37', 'Artist': 'Led Zeppelin', 'Album': 'Led Zeppelin IV', 'Released': '1971'} ------------ [] Traceback (most recent call last): File "scraper.py", line 51, in <module> scrape_and_look_for_next_link(starting_url) File "scraper.py", line 43, in scrape_and_look_for_next_link scrape_and_look_for_next_link(next_url) File "scraper.py", line 39, in scrape_and_look_for_next_link print next_link[0].attrib.get('href') #print the href attribute to see what it is IndexError: list index out of range

Data

Downloaded 3 times by MikeRalphson paulbradshaw

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (3 KB) Use the API

rows 10 / 12

Sales m Artist Album Released
110
Michael Jackson
Thriller
1982
49
AC/DC
Back in Black
1980
45
Pink Floyd
The Dark Side of the Moon
1973
44
Whitney Houston / Various artists
The Bodyguard
1992
43
Meat Loaf
Bat Out of Hell
1977
42
Eagles
Their Greatest Hits (1971–1975)
1976
42
Various artists
Dirty Dancing
1987
40
Fleetwood Mac
Rumours
1977
40
Backstreet Boys
Millennium
1999
40
Bee Gees / Various artists
Saturday Night Fever
1977

Statistics

Average successful run time: less than 10 seconds

Total run time: less than a minute

Total cpu time used: less than 5 seconds

Total disk space used: 27.7 KB

History

  • Manually ran revision a3d8819a and failed .
    nothing changed in the database
    6 pages scraped
  • Manually ran and failed .
  • Manually ran revision e72ee155 and completed successfully .
    nothing changed in the database
    684 pages scraped
  • Manually ran revision 90b878d2 and completed successfully .
    nothing changed in the database
    9 pages scraped
  • Manually ran revision 90b878d2 and failed .
    nothing changed in the database
    1 page scraped
  • Manually ran revision e58fb8ee and failed .
    nothing changed in the database
    2 pages scraped
  • Manually ran revision e58fb8ee and completed successfully .
    12 records added in the database
    3 pages scraped
  • Manually ran revision 43cfb8d2 and failed .
    nothing changed in the database
    1 page scraped
  • Created on morph.io

Scraper code

Python

Scraperwiki_tutorial_3 / scraper.py