austensen / nycha-outages

Web scraper for NYCHA service outages reporting data


NYCHA Outages

NYCHA regularly updates a web page that displays information about service outages for heat/water, electric, elevators, and gas that are planned, ongoing, or recently resolved. This web scraper extracts the data from all of these tables and saves the results in a SQLite database.

Thanks to Steve Giordano (@steve52) for his help on this project.

morph.io

This web scraper is set up on morph.io to run once a day and update a SQLite database that is publicly available for download and accessible via API.

NYCHA Outages on morph.io

Running the scraper locally

You can use Docker to get everything set up and run the scraper. From this directory run:

docker-compose build docker-compose run app

Data dictionary

| columnname | description | | --- | --- | | developmentname | Name of NYCHA development | | buildingnumber | Number of building within NYCHA development. If NULL, then entire development is affected | | address | Street address of building within NYCHA development | | gaslines | The gas lines affected by gas service outage | | interruptions | List of services affected by interruption (Ex. "Heat", "Hot Water", etc.) | | planned | List of values ("Planned", "Unplanned") corresponding to services listed in interruptions column | | reportedscheduled | The date (and time) when the service outage was reported, or for planned outages,the date the outage was was scheduled for | | gasrestoredon | The "Est. Completion" date that the gas outage. If NULL, then status is "In Progress" | | restorationtime | The number of hours for the service to be restored | | status | The status of the service outage. The source of values for this column depend on the table that the record originates from on the website. For the "Current" tab the "Status" column is used directly (Ex. "NYCHA Staff Assigned", "NYCHA Staff Working", etc.), for the "Restored in the last 24 hours" tab it will always be "Restored", for the "Upcoming planned outages" tab it will always be "Planned", and for the "Gas" tab the value comes from the "Est. Completion" column and will either be "In Progress" or NULL. | | buildingsimpacted | The number of buildings that are impacted by the service outage | | unitsimpacted | The number of units that are impacted by the service outage | | populationimpacted | The number of people that are impacted by the service outage | | importedon | The datetime that the web scraper was run and the data was imported. |

Last run failed with status code 255.

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 2 is python-2.7.14 (you are using python-2.7.6, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-2.7.14).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-2.7.6  ! Requested runtime (python-2.7.6) is not available for this stack (cedar-14).  ! Aborting. More info: https://devcenter.heroku.com/articles/python-support

Data

Downloaded 56 times by danielcarpentergold austensen mcleonard143 sophoniemj paulaznyc prisreeni minibibiyu lblok steve52 snphillips arvindsindhwani

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (13.2 MB) Use the API

rows 10 / 39017

population_impacted units_impacted building_number reported_scheduled development_name status address planned interruptions buildings_impacted imported_on restoration_time gas_lines
279
123
4
2020-01-27T10:00:00+00:00
INDEPENDENCE
Planned
121 WILSON STREET BROOKLYN, NY 11211
Planned
Heat
1
2020-01-26T23:35:05-05:00
84
48
2
2020-01-27T09:00:00+00:00
PARKSIDE
Planned
2970 BRONX PARK EAST BRONX, NY 10467
Planned
Hot Water
1
2020-01-26T23:35:05-05:00
912
418
1
2020-01-27T09:00:00+00:00
POLO GROUNDS TOWERS
Planned
2931 F DOUGLASS BOULEVARD NEW YORK, NY 10039
Planned
Heat
1
2020-01-26T23:35:05-05:00
269
112
6
2020-01-28T07:30:00+00:00
RANGEL
Planned
159-64 HARLEM RIVER DRIVE NEW YORK, NY 10039
Planned
Water
1
2020-01-26T23:35:05-05:00
529
202
2
2020-01-27T09:00:00+00:00
TOMPKINS
Planned
744 PARK AVENUE BROOKLYN, NY 11206
Planned
Heat
1
2020-01-26T23:35:05-05:00
994
376
2020-01-25T12:00:00+00:00
CONEY ISLAND I (SITES 4 & 5)
Restored
2925 WEST 27TH STREET BROOKLYN, NY 11224
Unplanned
Water
2
2020-01-26T23:35:05-05:00
26
337
135
18
2020-01-25T10:53:00+00:00
DOUGLASS ADDITION
Restored
868 AMSTERDAM AVENUE NEW YORK, NY 10025
Unplanned, Unplanned
Heat, Hot Water
1
2020-01-26T23:35:05-05:00
5
297
140
10
2020-01-24T07:33:00+00:00
FARRAGUT
Restored
190 YORK STREET BROOKLYN, NY 11201
Unplanned
Water
1
2020-01-26T23:35:05-05:00
24
108
42
3
2020-01-24T13:30:00+00:00
LOWER EAST SIDE II
Restored
620 EAST 5TH STREET NEW YORK, NY 10009
Unplanned
Heat
1
2020-01-26T23:35:05-05:00
11
103
59
5
2020-01-24T13:45:00+00:00
SACKWERN
Restored
710 NOBLE AVENUE BRONX, NY 10473
Unplanned
Hot Water
1
2020-01-26T23:35:05-05:00
29

Statistics

Average successful run time: 2 minutes

Total run time: about 20 hours

Total cpu time used: 33 minutes

Total disk space used: 13.3 MB

History

  • Auto ran revision d5a7b3a2 and failed .
    nothing changed in the database
  • Auto ran revision d5a7b3a2 and failed .
    nothing changed in the database
  • Auto ran revision d5a7b3a2 and failed .
    nothing changed in the database
  • Auto ran revision d5a7b3a2 and failed .
    nothing changed in the database
  • Auto ran revision d5a7b3a2 and failed .
    nothing changed in the database
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

nycha-outages / scraper.py