corajr / jarchive_scraper

jarchive scraper

Scrapes www.j-archive.com

The fan-created archive of Jeopardy! games and players--281,659 clues and counting!


This scraper takes all the Jeopardy questions from http://www.j-archive.com/listseasons.php . The unique ID could use some work, just threw something together for the time being.

Forked from ScraperWiki

Contributors corajr

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... Injecting scraper and running...

Data

Downloaded 8 times by corajr cs2388 izaic3

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (9.17 MB) Use the API

rows 10 / 38219

episode air_date text dollar_value answer category uid
7079
1432771200.0
The Flash doesn't need a bike to excel at this message delivery job whose name comes from the French for "run"
$200
a courier
SUPERHERO DAY JOBS?
7079SUPERHERO DAY JOBS?$200
7079
1432771200.0
The Beat Generation Bible:"I first met Dean not long after my wife and I split up"
$200
On the Road
BOOKS' FIRST LINES
7079BOOKS' FIRST LINES$200
7079
1432771200.0
In 2009 New Yorkers watched this major league park demolished; Yankee Stadium lasted another year
$200
Shea Stadium
BASEBALL STADIUMS
7079BASEBALL STADIUMS$200
7079
1432771200.0
The ruins of Knossos are on this Greek island
$200
Crete
ISLANDS IN THE "C"s
7079ISLANDS IN THE "C"s$200
7079
1432771200.0
$76 can get you a bottle of Ralph from this designer
$200
Ralph Lauren
PERFUME
7079PERFUME$200
7079
1432771200.0
A fruit or cheese covering
$200
rind
WORDS WITH(IN) FRIENDS
7079WORDS WITH(IN) FRIENDS$200
7079
1432771200.0
The new tower on the building named for this legendary publisher leaves many windows for Spidey to wash
$400
(William Randolph) Hearst
SUPERHERO DAY JOBS?
7079SUPERHERO DAY JOBS?$400
7079
1432771200.0
A dystopian novel:"It was a bright cold day in April, and the clocks were striking thirteen"
$400
1984
BOOKS' FIRST LINES
7079BOOKS' FIRST LINES$400
7079
1432771200.0
The clock at 20th & Blake Streets is a meeting place for Colorado Rockies fans prior to entering this field
$400
Coors
BASEBALL STADIUMS
7079BASEBALL STADIUMS$400
7079
1432771200.0
An old song says, "26 miles across the sea" this island "is a-waitin' for me"
$400
Catalina
ISLANDS IN THE "C"s
7079ISLANDS IN THE "C"s$400

Statistics

Average successful run time: 19 minutes

Total run time: about 2 hours

Total cpu time used: about 4 hours

Total disk space used: 9.38 MB

History

  • Manually ran revision 06646cce and completed successfully .
    38219 records added, 37576 records removed in the database
    648 pages scraped
  • Manually ran revision f8c81e77 and failed .
    77 records added, 77 records removed in the database
    4 pages scraped
  • Manually ran revision 445ffeed and failed .
    77 records added, 76 records removed in the database
    4 pages scraped
  • Manually ran revision 5fa82490 and completed successfully .
    37575 records added, 37515 records removed in the database
    671 pages scraped
  • Manually ran revision 415922bc and completed successfully .
    37515 records added, 2696 records removed in the database
    647 pages scraped
  • Manually ran revision 01097582 and failed .
    2696 records added in the database
    50 pages scraped
  • Manually ran revision 01097582 and completed successfully .
    37515 records added, 37515 records removed in the database
    4529 pages scraped
  • Manually ran revision 01097582 and completed successfully .
    37515 records added, 37515 records removed in the database
    3882 pages scraped
  • Manually ran revision 01097582 and completed successfully .
    37515 records added, 37514 records removed in the database
    3235 pages scraped
  • Manually ran revision bc89314e and failed .
    nothing changed in the database
    2 pages scraped
  • Manually ran revision 8b2727de and failed .
    nothing changed in the database
  • Forked from ScraperWiki

Scraper code

Python

jarchive_scraper / scraper.py