tfmorris / desert_island_discs_records

BBC Desert Island Discs picks


This scraper is currently BROKEN

The BBC Radio 4 web site changed format on August 29, 2017 and this scraper has not been updated to deal with the new format yet. The last show scraped was August 8, 2017.

BBC Radio’s Desert Island Discs show 1942-August 28, 2017.

http://www.bbc.co.uk/programmes/b006qnmr/episodes/guide (was http://www.bbc.co.uk/radio4/features/desert-island-discs/find-a-castaway)

Scraper runs on morph.io where the data is available:
https://morph.io/tfmorris/desert_island_discs_records

Record types include:

  • url – link to BBC web page for the show
  • alternate_name – name on show page if it differs from index page
  • occupation – whatever was listed as a tag line for the guest, typically their occupation or what they’re known for
  • record – one of the 9 songs/tracks they selected to bring with them
  • record_keep – their favorite song/track
  • book – the book they chose to bring with them (only for more recent shows)
  • luxury – luxury item they selected to bring with them (only for more recent shows)

Most of the column names should be self explanatory.

  • MB_ID is the MusicBrainz ID for the composer or artist.
  • “principal” tells whether the main heading for the item, as specified by the BBC was the composer or performer (it varies). The MusicBrainz ID also goes with this person.

Migrated from ScraperWiki when it was abandoned: Tom Morris / BBC Desert Island Discs picks
Based on (and replaces) a pair of scrapers: Francis Irving / Desert Island Disc broadcasts & Francis Irving / Desert Island Disc records)

Database reset & completely refreshed 1 Sept 2015

Forked from ScraperWiki

Contributors tfmorris ROMitat

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 2 is python-2.7.14 (you are using python-2.7.6, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-2.7.14).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-2.7.6 -----> Installing pip -----> Installing requirements with pip  Obtaining scraperwiki from git+http://github.com/openaustralia/scraperwiki-python.git@morph_defaults#egg=scraperwiki (from -r /tmp/build/requirements.txt (line 3))  Cloning http://github.com/openaustralia/scraperwiki-python.git (to revision morph_defaults) to /app/.heroku/src/scraperwiki  Collecting requests==2.14.2 (from -r /tmp/build/requirements.txt (line 5))  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:339: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  SNIMissingWarning  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:137: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  InsecurePlatformWarning  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:137: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  InsecurePlatformWarning  Downloading https://files.pythonhosted.org/packages/e4/b0/286e8a936158e5cc5791d5fa3bc4b1d5a7e1ff4e5b3f3766b63d8e97708a/requests-2.14.2-py2.py3-none-any.whl (560kB)  Collecting lxml==3.8.0 (from -r /tmp/build/requirements.txt (line 6))  Downloading https://files.pythonhosted.org/packages/62/b7/aafdcf0c0ad0cf36a0835adde50f4a7e18241440b9897a88c80f520d0c76/lxml-3.8.0-cp27-cp27m-manylinux1_x86_64.whl (6.8MB)  Collecting cssselect==1.0.1 (from -r /tmp/build/requirements.txt (line 7))  Downloading https://files.pythonhosted.org/packages/1d/e5/f1d410192e34b1034dba7804de5dbcdece20a883c445ad661e5ea8226b42/cssselect-1.0.1-py2.py3-none-any.whl  Collecting dumptruck>=0.1.2 (from scraperwiki->-r /tmp/build/requirements.txt (line 3))  Downloading https://files.pythonhosted.org/packages/15/27/3330a343de80d6849545b6c7723f8c9a08b4b104de964ac366e7e6b318df/dumptruck-0.1.6.tar.gz  Installing collected packages: dumptruck, requests, scraperwiki, lxml, cssselect  Running setup.py install for dumptruck: started  Running setup.py install for dumptruck: finished with status 'done'  Running setup.py develop for scraperwiki  Successfully installed cssselect-1.0.1 dumptruck-0.1.6 lxml-3.8.0 requests-2.14.2 scraperwiki   ! Hello! It looks like your application is using an outdated version of Python.  ! This caused the security warning you saw above during the 'pip install' step.  ! We recommend 'python-3.6.2', which you can specify in a 'runtime.txt' file.  ! -- Much Love, Heroku.   -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... Traceback (most recent call last): File "scraper.py", line 30, in <module> raise NotImplementedError("BBC Radio 4 site changed format August 28, 2017 and this code has not yet been updated for new layout") NotImplementedError: BBC Radio 4 site changed format August 28, 2017 and this code has not yet been updated for new layout

Data

Downloaded 34 times by tfmorris Satchmo76 edjefferson parsleybrimbrim simonarawlings radders Wiretrip chrisdunigan mseiderer nabillia ameliaparker flerlagekr seanelvidge willglanville

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (9.53 MB) Use the API

rows 10 / 41917

guest type title date_scraped date performer mb_id composer principal
Dr Bill Frankland
book
The Story of San Michele by Axel Munthe
2015-08-30T19:34:36+00:00
2015-08-09
Dr Bill Frankland
luxury
Binoculars
2015-08-30T19:34:36+00:00
2015-08-09
Ruth Rogers
book
The River Cafe Classic Italian Cookbook by Rose Gray and Ruth Rogers
2015-08-30T19:34:41+00:00
2015-08-02
Ruth Rogers
luxury
A bottle of extra virgin olive oil pressed at either Felsina or Fontodi vineyards
2015-08-30T19:34:41+00:00
2015-08-02
Professor Monica Grady
book
Ulysses by James Joyce
2015-08-30T19:36:03+00:00
2015-07-26
Professor Monica Grady
luxury
Flute
2015-08-30T19:36:03+00:00
2015-07-26
Noel Gallagher
book
On the Road by Jack Kerouac
2015-08-30T19:36:07+00:00
2015-07-19
Noel Gallagher
luxury
Guitar and plectrum
2015-08-30T19:36:07+00:00
2015-07-19
Imtiaz Dharker
book
An atlas of the whole world
2015-08-30T19:36:11+00:00
2015-07-12
Imtiaz Dharker
luxury
The Victoria and Albert Museum in London
2015-08-30T19:36:11+00:00
2015-07-12

Statistics

Average successful run time: 9 minutes

Total run time: about 1 month

Total cpu time used: about 1 hour

Total disk space used: 9.58 MB

History

  • Auto ran revision 778a71be and failed .
    nothing changed in the database
  • Auto ran revision 778a71be and failed .
    nothing changed in the database
  • Auto ran revision 778a71be and failed .
    nothing changed in the database
    1 page scraped
  • Auto ran revision 778a71be and failed .
    nothing changed in the database
    1 page scraped
  • Auto ran revision 778a71be and failed .
    nothing changed in the database
    61 pages scraped
  • ...
  • Forked from ScraperWiki

Show complete history