This is a scraper that runs on Morph. To get started see the documentation.
The purpose of this scraper is to scrape the number of followers we have for a number of our corporate accounts and store these in a database table.
The scraper.py file is in two parts.
Section A is a set-up routine which creates the database table, assigns columns, then writes in data in the new format from some lists which I generated from the followers.csv file. I chose to do this as I was unable to import and use the CSV library that I had used on my home laptop.
If you want to experiment with this, uncomment Section A, and comment out (using ''') section B.
Section B is the scraper itself.
It first checks the date - and only runs if it is the 1st of the month.
Then it processes a list of twitter account names, constructing full URLs and scraping the number of followers for each from the twitter page. The actual figure only appears as a tooltip mouse-over - and I had some help working out how to get it out of the code!
Once it has the number of followers per account it writes these off to the database.
When I wrote this (30/08/14 - or 20140830 as the script would call it!) the code hadn't executed for real - although it has all been tested in chunks.
Since then it ran as planned in Sept and Oct 2014!
It subsequently failed in November. I traced it to a Twitter account having being closed down. The result was that there was no valid page available for checking followers. So better error trapping would be good - someday!
If you have any questions you can contact me on Twitter: @watty62
To download data sign in with GitHub
rows 10 / 1017
DATE | TWITTERAC | FCOUNT |
---|---|---|
20090701
|
abernet
|
1
|
20090801
|
ACC_Business
|
1
|
20091001
|
DanceAberdeen
|
1
|
20100301
|
Aberdeencc
|
1
|
20100401
|
Aberdeencc
|
52
|
20100701
|
mjs_abc
|
1
|
20100901
|
EventsAberdeen
|
1
|
20101201
|
Aberdeencc
|
554
|
20110201
|
AbdnArtMuseums
|
1
|
20110501
|
AberdeenCSP
|
1
|
Average successful run time: less than a minute
Total run time: 5 days
Total cpu time used: 23 minutes
Total disk space used: 79.7 KB