bensoltoff / jarchive_scraper

jarchive scraper


This scraper takes all the Jeopardy questions from http://www.j-archive.com/listseasons.php .
The unique ID could use some work, just threw something together for the time being.

To do:
-slow down page requests to not overload archive servers

Forked from ScraperWiki

Last run failed with status code 1.

Console output of last run

Traceback (most recent call last): File "/repo/scraper.py", line 88, in <module> scrape_season(base_url+"showseason.php?season=30") File "/repo/scraper.py", line 35, in scrape_season scrape_episode(episode['href'], ep_num, timestamp) File "/repo/scraper.py", line 39, in scrape_episode soup = BeautifulSoup(scraperwiki.scrape(url)) File "/usr/local/lib/python2.7/dist-packages/scraperwiki-0.3.7-py2.7.egg/scraperwiki/utils.py", line 31, in scrape f = urllib2.urlopen(req) File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 406, in open response = meth(req, response) File "/usr/lib/python2.7/urllib2.py", line 519, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python2.7/urllib2.py", line 444, in error return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError : HTTP Error 500: Internal Server Error

Data

Downloaded 1 time by clydeclod

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (1.6 MB) Use the API

rows 0 / 0

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (1.6 MB) Use the API

rows 0 / 0

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (1.6 MB) Use the API

rows 10 / 5926

category episode uid air_date text dollar_value answer dj order_number
NOVELS BY QUOTE
6771
6771NOVELS BY QUOTE$200
1391990400.0
A classic: "on the breast of her gown, in fine red cloth... appeared the letter A"
$200
The Scarlet Letter
BACKING BANDS
6771
6771BACKING BANDS$200
1391990400.0
Bob Marley & the _____
$200
the Wailers
NAMES IN NATURE
6771
6771NAMES IN NATURE$200
1391990400.0
Use caution when trying to extinguish these ants that are named for their painful, burning stings
$200
fire ants
CAL TECH
6771
6771CAL TECH$200
1391990400.0
Established in Mountain View in 1956, Shockley Semiconductor helped put this element in the name of a "valley"
$200
silicon
OTHER COLLEGES
6771
6771OTHER COLLEGES$200
1391990400.0
Founded in 1961, hamburger university is this chain's training center
$200
McDonald\'s
"ACA"DEMIA
6771
6771"ACA"DEMIA$200
1391990400.0
Kraft first boxed it up with cheese in 1937
$200
macaroni
NOVELS BY QUOTE
6771
6771NOVELS BY QUOTE$600
1391990400.0
A modern classic: "my guess was that Richard Parker was on the floor of the lifeboat"
$600
The Life of Pi
BACKING BANDS
6771
6771BACKING BANDS$600
1391990400.0
Florence + the _____
$600
the Machine
NAMES IN NATURE
6771
6771NAMES IN NATURE$600
1391990400.0
You'll notice that something is missing on this critter named for a British isle
$600
a Manx (cat)
CAL TECH
6771
6771CAL TECH$600
1391990400.0
Tony Hawk owns a Huntington Beach company that makes these pieces of sporting equipment
$600
skateboards

Statistics

Average successful run time: 4 minutes

Total run time: 7 minutes

Total cpu time used: 3 minutes

Total disk space used: 2.05 MB

History

  • Manually ran revision 92d52b67 and failed .
  • Manually ran revision e5b0b1f9 and completed successfully .
  • Manually ran revision 3e13f6af and failed .
  • Manually ran revision dddb3726 and failed .
  • Manually ran revision 5b769e68 and failed .
  • Forked from ScraperWiki

Scraper code

Python

jarchive_scraper / scraper.py