g0rd / nyc_school_budgets

NYC School Budgets


Made this at the #jdcny event at Columbia.

Scrapes school budgets from the NYC Department of Ed site, see for example http://schools.nyc.gov/AboutUs/funding/schoolbudgets/GalaxyAllocationFY2010.htm?BSSS_INPUT=M411

Tricky thing is the markup changed slightly each year… and getting a list of school IDs.

Thanks to Julian Todd for making category part of the row key, this gives a more useful table with tons of rows and not so many columns.

Needs more work to categorize the categories into “major” categories (there are thousands of categories)

Address: Tweed Courthouse, 52 Chambers Street, New York, NY 10007, USA

Forked from ScraperWiki

Last run failed with status code -1.

Console output of last run

Got 1716 school ids... Starting with K637 [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range [] Unhandled exception on school None Traceback (most recent call last): File "/repo/scraper.py", line 167, in main scrape(first_code) File "/repo/scraper.py", line 70, in scrape totalbudget = total_budget[0].text_content().encode("utf-8").strip() IndexError: list index out of range

Data

Downloaded 1 time by MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (95.1 MB) Use the API

rows 10 / 546805

category val majorcategory fiscal_year school_name school_id dollars budget
TL NYSTL Library Books
1,197
tl
2012
PS 015 ROBERTO CLEMENTE
M015
1197
TL NYSTL Textbooks
22,462
tl
2012
PS 020 ANNA SILVER
M020
22462
TL NYSTL TEXTBOOKS
3,895
tl
2012
PS 063 WILLIAM MCKINLEY
M063
3895
TL NYSTL SOFTWARE
4,088
tl
2012
PS 110 FLORENCE NIGHTING
M110
4088
TL SBST SHARED
TBD
tl
2012
PS 110 FLORENCE NIGHTING
M110
IDEA SBST Shared
47,173
unknown
2012
PS 134 HENRIETTA SZOLD
M134
47173
TL NYSTL TEXTBOOKS
15,875
tl
2012
PS 184
M184
15875
TL Potential Mid Year Adj to be Repaid
10,964
tl
2012
PS 188 JOHN BURROUGHS
M188
10964
TL NYSTL Library Books
1,863
tl
2012
NEIGHBORHOOD SCHL @ PS 63
M363
1863
TL NYSTL LIBRARY BOOKS
1,974
tl
2012
THE EARTH SCHOOL
M364
1974

Statistics

Average successful run time: less than a minute

Total run time: 33 minutes

Total cpu time used: 3 minutes

Total disk space used: 95.2 MB

History

  • Manually ran revision d42f6e62 and failed .
    nothing changed in the database
  • Manually ran revision 9078b498 and failed .
    nothing changed in the database
  • Manually ran revision 594b3251 and failed .
    nothing changed in the database
  • Manually ran revision 978371fb and failed .
    nothing changed in the database
  • Manually ran revision 86c0d06f and failed .
    nothing changed in the database
  • ...
  • Forked from ScraperWiki

Show complete history

Scraper code

Python

nyc_school_budgets / scraper.py