amoghds / basic_twitter_scraper_3

Basic Twitter Scraper


import scraperwiki
import simplejson
import urllib2

  1. Change QUERY to your search term of choice.
  2. Examples: ‘newsnight’, ‘from:bbcnewsnight’, ‘to:bbcnewsnight’
    QUERY = ‘#opendata’
    RESULTS_PER_PAGE = ‘100’
    LANGUAGE = ‘en’
    NUM_PAGES = 1000

for page in range(1, NUM_PAGES+1):
base_url = ‘http://search.twitter.com/search.json?q=%s&rpp=%s&lang=%s&page=%s’ \
% (urllib2.quote(QUERY), RESULTS_PER_PAGE, LANGUAGE, page)
try:
results_json = simplejson.loads(scraperwiki.scrape(base_url))
for result in results_json[‘results’]:
#print result
data = {}
data[‘id’] = result[‘id’]
data[‘text’] = result[‘text’]
data[‘from_user’] = result[‘from_user’]
data[‘created_at’] = result[‘created_at’]
print data[‘from_user’], data[‘text’]
scraperwiki.sqlite.save([“id”], data)
except:
print ‘Oh dear, failed to scrape %s’ % base_url
break

Forked from ScraperWiki

Contributors amoghds

Last run completed successfully .

Console output of last run

Oh dear, failed to scrape http://search.twitter.com/search.json?q=%23opendata&rpp=100&lang=en&page=1

Data

Downloaded 1 time by MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (55.8 MB) Use the API

rows 10 / 216538

text id from_user created_at
first @fixmystreet fr @mysociety now @FixMyTransport http://t.co/9re4iom we need real @open311 in #seattle #opengov #opendata #transit
108990595585949696
MrDataFerret
@Gov Walker of WI can't be pleased w poor audit of state contract sunshine site http://t.co/pByo2g1 #opengov #opendata #wisconsin
108986525211037696
MrDataFerret
" #OpenData Geeks to the Rescue" http://t.co/osK0iNu > nice work by @dbhume & @data_bc team. good step 2ward gr8r #opengov #dbchack
108984924614307841
asterix
RT @chilobbyists: Official statement from Rahm's office on the release of more lobbyist data and our collaboration: http://t.co/ENpUMjY #opengov #opendata
108982291694493696
billgatewood
RT @WorldBank: Where should we take #opendata next? Tell us. Live 9/13 1800 GMT http://ow.ly/6hCvz
108980355138199552
Mitalikm
RT @WorldBank: Where should we take #opendata next? Tell us. Live 9/13 1800 GMT http://ow.ly/6hCvz
108976950365798400
abmakulec
RT @chilobbyists: Official statement from Rahm's office on the release of more lobbyist data and our collaboration: http://t.co/ENpUMjY #opengov #opendata
108976121231577088
rougeux
RT @WorldBank: Where should we take #opendata next? Tell us. Live 9/13 1800 GMT http://ow.ly/6hCvz
108976005695283200
avilarenata
RT @vinsumner: #AmsterdamSmartCity add your information http://t.co/1z5SzDI about the city , #opendata , #picnic11
108975904633524224
spolliaro
RT @worldbank: Where should we take #opendata next? Tell us. Live 9/13 1800 GMT http://t.co/h9PV2iO
108975758717886464
MGPSI

Statistics

Average successful run time: less than 5 seconds

Total run time: half a minute

Total cpu time used: less than 5 seconds

Total disk space used: 55.8 MB

History

  • Manually ran revision 91f0e089 and completed successfully .
    nothing changed in the database
  • Manually ran revision 91f0e089 and completed successfully .
    nothing changed in the database
  • Manually ran revision aa346ef1 and completed successfully .
    nothing changed in the database
  • Manually ran revision 3bf19285 and failed .
    nothing changed in the database
  • Forked from ScraperWiki

Scraper code

Python

basic_twitter_scraper_3 / scraper.py