This is a scraper that runs on Morph. To get started see the documentation

Contributors paulbradshaw

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Top-selling albums of all time</title> <link rel="stylesheet" type="text/css" href="http://media.scraperwiki.com/css/main.css" /> </head> <body> <div id="divPage"> <div class="page_title"> <h1 id="hello_world">Top-selling albums of all time: Page 1</h1> <p>Wikipedia's <a href="http://en.wikipedia.org/wiki/List_of_best-selling_albums_worldwide">list of the top-selling albums in history</a>, ready to scrape.</p> </div> <div class="content"> <table class="data"> <tr><th>Artist</th><th>Album</th><th>Released</th><th>Genre</th><th>Sales (millions)</th></tr> <tr><td>Michael Jackson</td><td>Thriller</td><td>1982</td><td>Pop / R&B / Rock</td><td>110</td></tr> <tr><td>AC/DC</td><td>Back in Black</td><td>1980</td><td>Hard rock / Heavy metal</td><td>49</td></tr> <tr><td>Pink Floyd</td><td>The Dark Side of the Moon</td><td>1973</td><td>Progressive rock</td><td>45</td></tr> <tr><td>Whitney Houston / Various artists</td><td>The Bodyguard</td><td>1992</td><td>Soundtrack</td><td>44</td></tr> </table> <a href="example_table_2.html" style="float:right;" class="next">&raquo; Page 2</a> </div> </div> </body> </html> {'Sales m': '110', 'Artist ': 'Michael Jackson', 'Album ': 'Thriller', ' Released': '1 982'} ------------ {'Sales m': '49', 'Artist': 'AC/ DC', 'Album': 'Back in Black', 'Released ': '1980'} ------------ {'Sales m': '45', 'Artist': 'P ink Floyd', 'Album': ' The Dark Side of the Mo on', 'Released': '1973'} ------------ {'Sales m': '44', 'Arti st': 'Whitney Houston / Various art ists', 'Album': ' The Bod yguard', 'Re leased': '1992'} ------------ [ <Element a at 0x7fbc53ab8d60>] http://www.madingley.org/uploaded/example_table_2.html <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Top-selling albums of all time: Page 2</title> <link rel="stylesheet" type="text/css" href="http://media.scraperwiki.com/css/main.css" /> </head> <body> <div id="divPage"> <div class="page_title"> <h1 id="hello_world">Top-selling albums of all time: Page 2</h1> <p>Wikipedia's <a href="http://en.wikipedia.org/wiki/List_of_best-selling_albums_worldwide">list of the top-selling albums in history</a>, ready to scrape.</p> </div> <div class="content"> <table class="data"> <tr><th>Artist</th><th>Album</th><th>Released</th><th>Genre</th><th>Sales (millions)</th></tr> <tr><td>Meat Loaf </td><td>Bat Out of Hell</td><td>1977</td><td>Rock</td><td>43</td></tr> <tr><td>Eagles</td><td>Their Greatest Hits (1971–1975)</td><td>1976</td><td>Rock</td><td>42</td></tr> <tr><td>Various artists</td><td>Dirty Dancing</td><td>1987</td><td>Dance / Pop</td><td>42</td></tr> <tr><td>Fleetwood Mac</td><td>Rumours</td><td>1977</td><td>Rock</td><td>40</td></tr> </table> <a href="example_table_1.html" style="float:left;" class="previous">Page 1 &laquo;</a> <a href="example_table_3.html" style="float:right;" class="next">&raquo; Page 3</a> </div> </div> </body> </html> {'Sales m' : '43', 'Artist': 'Meat Loaf ' , 'Album': 'B at Out of H ell', 'Relea sed': '1977 '} ------------ {'Sales m': '42', 'Artist': 'Eagles', 'Album': u'Their Greatest Hits (1971\u20131975)', 'R eleased': '1 976'} ------------ {'Sales m ': '42', 'Artist': 'V arious artists ', 'Album': 'Dirty Danci ng', 'Re leased ': '1987'} ------------ {'Sales m': ' 40', 'Artist': 'Fl eetwood Mac', 'Album': 'Ru mours', 'Rel eased': '1977'} ------------ [<Element a at 0x7fbc53ab8ec0>] http://www.madingley.org/uploaded/example_table_3.html <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <title>Top-selling albums of all time: Page 3</title> <link rel="stylesheet" type="text/css" href="http://media.scraperwiki.com/css/main.css" /> </head> <body> <div id="divPage"> <div class="page_title"> <h1 id="hello_world">Top-selling albums of all time: Page 3</h1> <p>Wikipedia's <a href="http://en.wikipedia.org/wiki/List_of_best-selling_albums_worldwide">list of the top-selling albums in history</a>, ready to scrape.</p> </div> <div class="content"> <table class="data"> <tr><th>Artist</th><th>Album</th><th>Released</th><th>Genre</th><th>Sales (millions)</th></tr> <tr><td>Backstreet Boys</td><td>Millennium</td><td>1999</td><td>Pop</td><td>40</td></tr> <tr><td>Bee Gees / Various artists</td><td>Saturday Night Fever</td><td>1977</td><td>Disco</td><td>40</td></tr> <tr><td>Shania Twain</td><td>Come On Over</td><td>1997</td><td>Country / Pop</td><td>39</td></tr> <tr><td>Led Zeppelin</td><td>Led Zeppelin IV</td><td>1971</td><td>Hard rock / Heavy metal</td><td>37</td></tr> </table> <a href="example_table_2.html" style="float:left;" class="previous">Page 2 &laquo;</a> </div> </div> </body> </html> {'Sales m': '40', 'Artist': 'Backstreet Boys', 'Album': 'Mil lennium', 'Released': '1999'} ------------ {'Sales m': '40', 'Artist': 'Bee Gees / Various artists', 'Albu m': 'Saturday Night F ever', 'Relea sed': '1977' } ------------ {'Sales m ': '39', 'Artist': 'Shan ia Twain' , 'Album ': 'Come On Over' , 'Rel eased': ' 1997'} ------------ {'Sales m': ' 37', 'Artist': 'Led Zeppelin', 'Album': 'Led Zeppelin IV', 'Released': '1971'} ------------ []

Data

Downloaded 2 times by MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (3 KB) Use the API

rows 10 / 12

Sales m Artist Album Released
110
Michael Jackson
Thriller
1982
49
AC/DC
Back in Black
1980
45
Pink Floyd
The Dark Side of the Moon
1973
44
Whitney Houston / Various artists
The Bodyguard
1992
43
Meat Loaf
Bat Out of Hell
1977
42
Eagles
Their Greatest Hits (1971–1975)
1976
42
Various artists
Dirty Dancing
1987
40
Fleetwood Mac
Rumours
1977
40
Backstreet Boys
Millennium
1999
40
Bee Gees / Various artists
Saturday Night Fever
1977

Statistics

Average successful run time: half a minute

Total run time: half a minute

Total cpu time used: less than 5 seconds

Total disk space used: 26.8 KB

History

  • Manually ran revision e58fb8ee and completed successfully .
    12 records added in the database
    3 pages scraped
  • Manually ran revision 43cfb8d2 and failed .
    nothing changed in the database
    1 page scraped
  • Created on morph.io

Scraper code

Python

Scraperwiki_tutorial_3 / scraper.py