Frdrkhglnd / pdftoxml_livetutorial_pt3

pdftoxml_livetutorial_pt3


Tutorial scraper for the ebook Scraping for Journalists demonstrating how to scrape a PDF and use XPath to identify parts of that.

Forked from ScraperWiki

Contributors Frdrkhglnd

This scraper has not yet been run

Data

Downloaded 1 time by MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (0 Bytes) Use the API

rows 10 / 290

date code location uniquekey
Monday : 03/09/2012
0C18
Tower Hill Road, Brown Lees.
37
Monday : 03/09/2012
0C18
Park Lane, Knypersley.
38
Tuesday : 04/09/2012
007
A34 Cannock North.
39
Tuesday : 04/09/2012
016
A4601 Wedges Mills between Longford Island toward junction 11 to just before Saredon Road.
40
Tuesday : 04/09/2012
017
A4601 Cannock between A34 Walsall Road junction to Longford island, A5 etc.
41
Tuesday : 04/09/2012
033
A5 Cannock between A460/A4601 and A34 Bridgetown.
42
Tuesday : 04/09/2012
035
A5 between Hanney Hay / Barracks Lane island to Muckley Corner island.
43
Tuesday : 04/09/2012
0C39
Pye Green Road Cannock between A34 Stafford Road and the junction of Pye Green Road / Brindley Road.
44
Tuesday : 04/09/2012
043
A511 Burton between Anslow Lane to island of A5121.
45
Tuesday : 04/09/2012
044
A511 Burton between A5121 and Brizlincote Lane (near Derbyshire boundary).
46

Statistics

Total run time: less than 5 seconds

Total cpu time used: less than 5 seconds

Total disk space used: 20.2 KB

History

Scraper code

Python

pdftoxml_livetutorial_pt3 / scraper.py