This is a scraper that runs on Morph. To get started see the documentation.

The purpose of this scraper is to scrape the number of followers we have for a number of our corporate accounts and store these in a database table.

The scraper.py file is in two parts.

Section A is a set-up routine which creates the database table, assigns columns, then writes in data in the new format from some lists which I generated from the followers.csv file. I chose to do this as I was unable to import and use the CSV library that I had used on my home laptop.

If you want to experiment with this, uncomment Section A, and comment out (using ''') section B.

Section B is the scraper itself.

It first checks the date - and only runs if it is the 1st of the month.

Then it processes a list of twitter account names, constructing full URLs and scraping the number of followers for each from the twitter page. The actual figure only appears as a tooltip mouse-over - and I had some help working out how to get it out of the code!

Once it has the number of followers per account it writes these off to the database.

When I wrote this (30/08/14 - or 20140830 as the script would call it!) the code hadn't executed for real - although it has all been tested in chunks.

Since then it ran as planned in Sept and Oct 2014!

It subsequently failed in November. I traced it to a Twitter account having being closed down. The result was that there was no valid page available for checking followers. So better error trapping would be good - someday!

If you have any questions you can contact me on Twitter: @watty62

Contributors watty62

Last run completed successfully .

Console output of last run

Injecting configuration and compiling... Injecting scraper and running... Not today

Data

Downloaded 6 times by watty62 shamsul910 MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (25 KB) Use the API

rows 10 / 893

DATE TWITTERAC FCOUNT
20090701
abernet
1
20090801
ACC_Business
1
20091001
DanceAberdeen
1
20100301
Aberdeencc
1
20100401
Aberdeencc
52
20100701
mjs_abc
1
20100901
EventsAberdeen
1
20101201
Aberdeencc
554
20110201
AbdnArtMuseums
1
20110501
AberdeenCSP
1

Statistics

Average successful run time: less than a minute

Total run time: 4 days

Total cpu time used: 15 minutes

Total disk space used: 76.7 KB

History

  • Auto ran revision a2d550d0 and completed successfully .
    nothing changed in the database
  • Auto ran revision a2d550d0 and completed successfully .
    nothing changed in the database
  • Auto ran revision a2d550d0 and completed successfully .
    nothing changed in the database
  • Auto ran revision a2d550d0 and completed successfully .
    nothing changed in the database
    2 pages scraped
  • Auto ran revision a2d550d0 and completed successfully .
    nothing changed in the database
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

Count_twitter_followers / scraper.py