This is a script that pulls down information on the New York Homes and Community Renewal's list of community housing based organizations that are currently stuck in a not so useful HTML table that contains links to the organization details such as contact info, website, service area, etc.
These orgs perform work related to affordable housing, community development and civil legal services in New York State and New York City.
Requires Python with pip and Node JS with npm.
First, in terminal
cd to the folder containing this repo.
To grab python dependencies do:
pip install python-requirements.txt
To grab Node JS dependencies do:
python scraper.py to create the json file of the HCR organization list and pull down the organization details from the HCR's server. This will take a while.
node join_json.js > data_joined.json to join the JSON files of organization list and organization details and output them to a joined JSON file.
I manually converted the data from JSON to CSV format using an web data converter.
Note: there are more organizations listed in the organization details JSON file. I believe this has to do with the HCR listing organizations elsewhere on their website, some of these may relate to housing and community development but for whatever reason are not listed on the original HTML table.
Average successful run time: less than 10 seconds
Total run time: less than 20 seconds
Total cpu time used: less than 5 seconds
Total disk space used: 1.22 MB