bensoltoff / jarchive_scraper

jarchive scraper


This scraper takes all the Jeopardy questions from http://www.j-archive.com/listseasons.php .
The unique ID could use some work, just threw something together for the time being.

To do:
-slow down page requests to not overload archive servers

Forked from ScraperWiki

Contributors bensoltoff

Last run failed with status code 1.

Console output of last run

Traceback (most recent call last): File "/repo/scraper.py", line 88, in <module> scrape_season(base_url+"showseason.php?season=30") File "/repo/scraper.py", line 35, in scrape_season scrape_episode(episode['href'], ep_num, timestamp) File "/repo/scraper.py", line 39, in scrape_episode soup = BeautifulSoup(scraperwiki.scrape(url)) File "/usr/local/lib/python2.7/dist-packages/scraperwiki-0.3.7-py2.7.egg/scraperwiki/utils.py", line 31, in scrape f = urllib2.urlopen(req) File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 406, in open response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", line 519, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.7/urllib2.py", line 444, in error return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError : HTTP Error 500: Internal Server Error

Data

Downloaded 1 time by clydeclod

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (1.6 MB) Use the API

rows 0 / 0

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (1.6 MB) Use the API

rows 0 / 0

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (1.6 MB) Use the API

rows 10 / 5926

category episode uid air_date text dollar_value answer dj order_number
NOVELS BY QUOTE
6771
6771NOVELS BY QUOTE$200
1391990400.0
A classic: "on the breast of her gown, in fine red cloth... appeared the letter A"
$200
The Scarlet Letter
BACKING BANDS
6771
6771BACKING BANDS$200
1391990400.0
Bob Marley & the _____
$200
the Wailers
NAMES IN NATURE
6771
6771NAMES IN NATURE$200
1391990400.0
Use caution when trying to extinguish these ants that are named for their painful, burning stings
$200
fire ants
CAL TECH
6771
6771CAL TECH$200
1391990400.0
Established in Mountain View in 1956, Shockley Semiconductor helped put this element in the name of a "valley"
$200
silicon
OTHER COLLEGES
6771
6771OTHER COLLEGES$200
1391990400.0
Founded in 1961, hamburger university is this chain's training center
$200
McDonald\'s
"ACA"DEMIA
6771
6771"ACA"DEMIA$200
1391990400.0
Kraft first boxed it up with cheese in 1937
$200
macaroni
NOVELS BY QUOTE
6771
6771NOVELS BY QUOTE$600
1391990400.0
A modern classic: "my guess was that Richard Parker was on the floor of the lifeboat"
$600
The Life of Pi
BACKING BANDS
6771
6771BACKING BANDS$600
1391990400.0
Florence + the _____
$600
the Machine
NAMES IN NATURE
6771
6771NAMES IN NATURE$600
1391990400.0
You'll notice that something is missing on this critter named for a British isle
$600
a Manx (cat)
CAL TECH
6771
6771CAL TECH$600
1391990400.0
Tony Hawk owns a Huntington Beach company that makes these pieces of sporting equipment
$600
skateboards

Statistics

Average successful run time: 4 minutes

Total run time: 7 minutes

Total cpu time used: 3 minutes

Total disk space used: 2.05 MB

History

  • Manually ran revision 92d52b67 and failed .
  • Manually ran revision e5b0b1f9 and completed successfully .
  • Manually ran revision 3e13f6af and failed .
  • Manually ran revision dddb3726 and failed .
  • Manually ran revision 5b769e68 and failed .
  • Forked from ScraperWiki

Scraper code

Python

jarchive_scraper / scraper.py