This is a scraper that runs on Morph. To get started see the documentation

apt-get install libxslt-dev
apt-get install phantomjs

virtualenv --system-site-packages oaf
source oaf/bin/activate
pip install -r requirements.txt

Run the dev version with


Contributors ianibbo otherchirps ianibo

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... -----> Python app detected -----> Stack changed, re-installing runtime -----> Installing runtime (python-2.7.9) -----> Installing dependencies with pip  Obtaining scraperwiki from git+ (from -r requirements.txt (line 7))  Cloning (to morph_defaults) to ./.heroku/src/scraperwiki  Collecting lxml==3.4.4 (from -r requirements.txt (line 9))  Downloading lxml-3.4.4.tar.gz (3.5MB)  Building lxml version 3.4.4.  Building without Cython.  Using build configuration of libxslt 1.1.28  /app/.heroku/python/lib/python2.7/distutils/ UserWarning: Unknown distribution option: 'bugtrack_url'  warnings.warn(msg)  Collecting cssselect==0.9.1 (from -r requirements.txt (line 10))  Downloading cssselect-0.9.1.tar.gz  Collecting selenium==2.47.1 (from -r requirements.txt (line 11))  Downloading selenium-2.47.1-py2-none-any.whl (3.0MB)  Collecting splinter==0.7.3 (from -r requirements.txt (line 12))  Downloading splinter-0.7.3.tar.gz  Collecting dumptruck>=0.1.2 (from scraperwiki->-r requirements.txt (line 7))  Downloading dumptruck-0.1.6.tar.gz  Collecting requests (from scraperwiki->-r requirements.txt (line 7))  Downloading requests-2.7.0-py2.py3-none-any.whl (470kB)  Installing collected packages: requests, dumptruck, splinter, selenium, cssselect, lxml, scraperwiki   Running install for dumptruck  Running install for splinter  Compiling /tmp/pip-build-986l4V/selenium/selenium/test/selenium/webdriver/    Running install for cssselect  Running install for lxml  Building lxml version 3.4.4.  Building without Cython.  Using build configuration of libxslt 1.1.28  /app/.heroku/python/lib/python2.7/distutils/ UserWarning: Unknown distribution option: 'bugtrack_url'  warnings.warn(msg)  building 'lxml.etree' extension  gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/include/libxml2 -I/tmp/pip-build-986l4V/lxml/src/lxml/includes -I/app/.heroku/python/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w  gcc -pthread -shared build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -lxslt -lexslt -lxml2 -lz -lm -o build/lib.linux-x86_64-2.7/lxml/  building 'lxml.objectify' extension  gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/include/libxml2 -I/tmp/pip-build-986l4V/lxml/src/lxml/includes -I/app/.heroku/python/include/python2.7 -c src/lxml/lxml.objectify.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.objectify.o -w  gcc -pthread -shared build/temp.linux-x86_64-2.7/src/lxml/lxml.objectify.o -lxslt -lexslt -lxml2 -lz -lm -o build/lib.linux-x86_64-2.7/lxml/  Running develop for scraperwiki  Creating /app/.heroku/python/lib/python2.7/site-packages/scraperwiki.egg-link (link to .)  Adding scraperwiki 0.3.7 to easy-install.pth file  Installed /app/.heroku/src/scraperwiki  Successfully installed cssselect-0.9.1 dumptruck-0.1.6 lxml-3.4.4 requests-2.7.0 scraperwiki selenium-2.47.1 splinter-0.7.3  -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... DoIt platform Linux Python sys.version_info(major=2, minor=7, micro=9, releaselevel='final', serial=0) splinter 0.7.3 No version info for scraperwiki Get front page starting Waiting for first item in results page to appear Clicking button with name VIEW^1 Waiting for details page to finish loading Got full details page got form_type input control.. good to continue selecting full holdings and marc tags Got full details page done scraping a resource Getting item info Getting catalog info Get marc_data table Handling content am Got tag 000 indicators value am Handling content 150814n 000 0 eng u Got tag 008 indicators value 150814n 000 0 eng u Handling content A Web of Air Got tag 245 indicators value None Handling content 5 Got tag 596 indicators value 5 DoneIt


Average successful run time: 3 minutes

Total run time: 3 minutes

Total cpu time used: less than 5 seconds

Total disk space used: 49.5 MB


  • Manually ran revision 46896f45 and completed successfully .
    nothing changed in the database
    88 pages scraped
  • Created on

Scraper code


ibistro_test /