ianibo / SirsiDynixIBistroScraper

Initially Bib data from Sheffield Public Library

Scrapes library.sheffield.gov.uk and syndetics.com


This is a scraper that runs on Morph. To get started see the documentation

apt-get install libxslt-dev
apt-get install phantomjs

virtualenv --system-site-packages oaf
source oaf/bin/activate
pip install -r requirements.txt

Run the dev version with

python scraper.py

Contributors ianibo ianibbo

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... DoIt platform Linux Python sys.version_info(major=2, minor=7, micro=9, releaselevel='final', serial=0) splinter 0.7.3 No version info for scraperwiki scraping a starting Looking for power search button Waiting for first item in results page to appear Clicking button with name VIEW^1 Waiting for details page to finish loading Got full details page got form_type input control.. good to continue selecting full holdings and marc tags Got full details page done scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title': u'A bird in the hand :chicken recipes for every day and every mood', 'hashCode': 'ad21318a40274 e7b324447a9e2629ca5'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title': u'A bend in the Nile', 'hashCode': '657304442a27cfd2bd5ebdac5642c08c'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = { 'Title': u'A bound set of 5 plans to illustrate a Town Plan lecture by Edward M Gibbs: Plan No 1: General, showing areas, rivers etc', 'hashCode': '6d30d2b2b02d4833e78b f8291e145b80'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title' : u'A[sound recording] /artist, Agnetha F\ufffdltskog ; producers, J\ufffdrgen Elofsson, Peter Nordahl.', 'hashCode': 'd92d7bf344f855065bffe32ef088cf74'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {' Title': u'A Bill to empower the South Yorkshire Passenger Transport Executive to develop and operate a system of light rail transit ; to authorise the construction of works and the acquisition of land for that purpose ; to confer further powers upon the Executive ; and for other purposes [South Yorkshire Light Rail Transit Bill]', 'hashCode': 'bb3daed046590cd0fa45bd beab912cc9'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Ti tle': u'A Better Place to Live and Work by the Year 2000 ? a Strategy for Sharrow', 'hashCode': '8a6dc80a8c23cefbadac4beab3fe048b'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Ti tle': u'A brief history of King Arthur', 'hashCode': '03a49d8d22f5fc97a2e10b4f18b65041'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = { 'Title': u'A Bill to confer further powers upon the lord mayor, aldermen and citizens of the city of Sheffield and to make further provision for the improvement, health, local government and finances of the city; and for other purposes :in parliament - session 1971-72', 'hashCode': '0f8638fcd71d8ef298688ecf5a9 48552'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'T itle': u"A' bhean iadach", 'hashCode': '470d908bd42 f0a3d7828f59b2d5fb4fa'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Ti tle': u"The A.B.C. of Musical Handbell Ringing :Or the Hand-bell Ringers' Instructor /edited by J.j Hannon", 'hashCode': '929aded695975ce071a1fa68f5867d19'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title ': u'A.A. Milne, A Handlist of his Writings for Children, with Decorations by Ernest H. Shepard', 'hashCode': '7bd87fcaa07fd57b7ff8aff31191b700'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Looks like we reached the end of the next page links... scraping aa starting Looking for power search button Waiting for first item in results page to appear Clicking button with name VIEW^1 Waiting for details page to finish loading Got full details page got form_type input control.. good to continue selecting full holdings and marc tags Got full details page done scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title': u'AA 2010 big road atlas USA[cartographic material] /from the American Automobile Association.', 'hashC ode': 'e4ebe2471ef7ed40a49311a2ea7ed932'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = { 'Title': u'AA big easy read France 2011[cartographic material].', 'hashCode': 'a05d183 f83e6b41fa22c49eca836c408 '} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'T itle': u'The AA 100 walks in Wales & The Marches.', 'hashCode': 'eb08b258bd92aca5d508ad75e2bf59c2'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = { 'Title': u'The AA 100 walks in Southwest England.', 'hashCode': '88f55caaf6b233b988bb86e60537b78e'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title ': u'The AA 100 walks in Southeast England.', 'hashCode': 'a60449c1ab14ed9b7a16892aaba27a2e'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title': u'The AA 100 walks in Scotland.', 'hashCode': '40d1321c79055292e6268a68ae13a6fc'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title': u'The AA 100 walks in Northern England.' , 'hashCode': 'be7ee0c503310a929246bc6847a52736'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = { 'Title': u'The AA 100 walks in Heart of England.', 'hashCode': 'd958ae8cb0bf1512f53953f46e4962 c1'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Tit le': u'The AA 100 walks in Eastern England.', 'hashCode': 'd62bf706d3efd4da53fe6efca9ab62ff' } Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = { 'Title': u'AA big road atlas :Europe', 'hashCode': '03adfb39a8d72d4ff89dfdde0ece9314'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title': u'Bed and breakfast guide', 'hashCode': 'da7f9c5eb805698b1ddc7499ff54546e'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {' Title': u'The Aa :A History of the First 75 Years of the Automobile Association 1905-1980 /by Hugh Barty-king', 'hashCode': '307d8fb8dc5c27c58f7b34b1279f9723'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title': u"AA Book of Britain's countryside /edited by Rick Morris... et al", 'hashCode': '6d125a3644fea1aa672ecd8ff9461e19'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Looks like we reached the end of the next page links... scraping aaa starting Looking for power search button Waiting for first item in results page to appear Clicking button with name VIEW^1 Waiting for details page to finish loading Got full details page got form_type input control.. good to continue selecting full holdings and marc tags Got full details page done scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title' : u'Aaaarrgghh, spider! /Lydia Monks.', 'hashCode': '81c2d9f01f707a0f5f454f15215c686d'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'Title': u'Aaaarrgghh, spider!', 'hashCode': '6ec6df3ec2f4f49077c56bb626e65f0b'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Looks like we reached the end of the next page links... scraping aaaa starting Looking for power search button Waiting for first item in results page to appear Clicking button with name VIEW^1 Waiting for details page to finish loading Got full details page got form_type input control.. good to continue selecting full holdings and marc tags Got full details page done scraping a resource Getting item info Getting catalog info Get marc_data table Processing data = {'T itle': u'Aaaarrgghh, spider! /Lydia Monks.', 'hashCode': '81c2d9f01f707a0f5f454f15215c686d'} Moving to next record scraping a resource Getting item info Getting catalog info Get marc_data table Looks like we reached the end of the next page links... scraping aaab starting Looking for power search button Waiting for first item in results page to appear Possible the search returned no results.. continue Looks like we reached the end of the next page links... scraping aaac Unexpected error: (<class 'urllib2.URLError'>, URLError(error(111, 'Connection refused'),), <traceback object at 0x7f3ee4ab0368>) File "scraper.py", line 150, in scrape_ibistro scrape_a_letter(browser,''+a+b+c+d) File "scraper.py", line 172, in scrape_a_letter browser.visit('http://library.sheffield.gov.uk/uhtbin/webcat') File "/app/.heroku/python/lib/python2.7/site-packages/splinter/driver/webdriver/__init__.py", line 184, in visit self.driver.get(url) File "/app/.heroku/python/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 208, in get self.execute(Command.GET, {'url': url}) File "/app/.heroku/python/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 194, in execute response = self.command_executor.execute(driver_command, params) File "/app/.heroku/python/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute return self._request(command_info[0], url, body=data) File "/app/.heroku/python/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 463, in _request resp = opener.open(request, timeout=self._timeout) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 431, in open response = self._open(req, data) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 449, in _open '_open', req) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 409, in _call_chain result = func(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 1227, in http_open return self.do_open(httplib.HTTPConnection, req) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 1197, in do_open raise URLError(err) DoneIt

Data

Downloaded 3 times by MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (12 KB) Use the API

rows 10 / 26

hashCode Title
ad21318a40274e7b324447a9e2629ca5
A bird in the hand :chicken recipes for every day and every mood
657304442a27cfd2bd5ebdac5642c08c
A bend in the Nile
6d30d2b2b02d4833e78bf8291e145b80
A bound set of 5 plans to illustrate a Town Plan lecture by Edward M Gibbs: Plan No 1: General, showing areas, rivers etc
d92d7bf344f855065bffe32ef088cf74
A[sound recording] /artist, Agnetha F�ltskog ; producers, J�rgen Elofsson, Peter Nordahl.
bb3daed046590cd0fa45bdbeab912cc9
A Bill to empower the South Yorkshire Passenger Transport Executive to develop and operate a system of light rail transit ; to authorise the construction of works and the acquisition of land for that purpose ; to confer further powers upon the Executive ; and for other purposes [South Yorkshire Light Rail Transit Bill]
8a6dc80a8c23cefbadac4beab3fe048b
A Better Place to Live and Work by the Year 2000 ? a Strategy for Sharrow
03a49d8d22f5fc97a2e10b4f18b65041
A brief history of King Arthur
0f8638fcd71d8ef298688ecf5a948552
A Bill to confer further powers upon the lord mayor, aldermen and citizens of the city of Sheffield and to make further provision for the improvement, health, local government and finances of the city; and for other purposes :in parliament - session 1971-72
470d908bd42f0a3d7828f59b2d5fb4fa
A' bhean iadach
929aded695975ce071a1fa68f5867d19
The A.B.C. of Musical Handbell Ringing :Or the Hand-bell Ringers' Instructor /edited by J.j Hannon

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (12 KB) Use the API

rows 1 / 1

value_blob type name
aaab
text
completed_prefix

Statistics

Average successful run time: 5 minutes

Total run time: about 3 hours

Total cpu time used: 24 minutes

Total disk space used: 55.3 MB

History

  • Manually ran revision 46ed7676 and completed successfully .
    27 records added, 26 records removed in the database
    103 pages scraped
  • Manually ran revision dd55acbd and completed successfully .
    26 records added, 26 records removed in the database
    103 pages scraped
  • Manually ran revision dd55acbd and completed successfully .
    11 records added, 11 records removed in the database
    40 pages scraped
  • Manually ran revision 5a72d737 and completed successfully .
    26 records added, 3 records removed in the database
    103 pages scraped
  • Manually ran revision 6b64e86a and completed successfully .
    3 records added, 3 records removed in the database
    32 pages scraped
  • Manually ran revision 31b4e506 and completed successfully .
    3 records added, 3 records removed in the database
    32 pages scraped
  • Manually ran revision 28c8d0e2 and completed successfully .
    3 records added in the database
    32 pages scraped
  • Manually ran revision 28c8d0e2 and completed successfully .
    3 records added, 2 records removed in the database
    32 pages scraped
  • Manually ran revision 1a24a330 and completed successfully .
    3 records added in the database
    32 pages scraped
  • Manually ran revision fa9e5343 and completed successfully .
    175 records added in the database
    215 pages scraped
  • Manually ran revision c991709e and completed successfully .
    nothing changed in the database
    40 pages scraped
  • Manually ran revision 6f58c7f8 and completed successfully .
    nothing changed in the database
  • Manually ran revision a3d5bd09 and completed successfully .
    nothing changed in the database
  • Manually ran revision a3d5bd09 and completed successfully .
    nothing changed in the database
  • Manually ran revision 4dd7e840 and completed successfully .
    nothing changed in the database
  • Manually ran revision 941c1118 and completed successfully .
    nothing changed in the database
  • Manually ran revision 5bf51c4d and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 2b8d7888 and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 5b127d84 and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision ac321f03 and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 03822f0e and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 9ac8448e and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision a4969251 and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision aa19f673 and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 946f77c7 and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 827060ae and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision c8a6d733 and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision ab379fbc and completed successfully .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 5ca36f82 and failed .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 5ca36f82 and failed .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision af72fad4 and failed .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision ebd8021b and failed .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision a136b6f4 and failed .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 11720715 and failed .
    nothing changed in the database
    8 pages scraped
  • Manually ran revision 7cb97f56 and failed .
    nothing changed in the database
  • Manually ran revision 7de10fb8 and failed .
    nothing changed in the database
    9 pages scraped
  • Created on morph.io

Scraper code

Python

SirsiDynixIBistroScraper / scraper.py