Get structured data out of the web. Code collaboration through GitHub. Run your scrapers in the cloud.
At this stage it's a draft only—but it does run and generate data using a fairly direct ClojureScript port of the base Node.js scraper.
What I'd like to do with it is generate a repository of match report data from arsenal.com or another thorough Arsenal Football Club news service.
In this scraper, the 'scraper.js' file is a compiled output. If you're looking for the actual ClojureScript code it's found here.
Average successful run time: less than 20 seconds
Total run time: 2 minutes
Total cpu time used: less than 10 seconds
Total disk space used: 40.2 MB