mobeets / intriguing-things-scraper

scraper for Alexis Madrigal's 5 Intriguing Things


Alexis Madrigal's 5 Intriguing Things is a usually-daily newsletter containing links to things. The file scraper.py is run daily by Morph to update a (good enough) archive of those things.

View the archive here, or better yet, browse the results.

Contributors mobeets

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 2 is python-2.7.14 (you are using python-2.7.6, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-2.7.14).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-2.7.6 -----> Installing pip -----> Installing requirements with pip  DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.  Obtaining scraperwiki from git+http://github.com/openaustralia/scraperwiki-python.git@morph_defaults#egg=scraperwiki (from -r /tmp/build/requirements.txt (line 1))  Cloning http://github.com/openaustralia/scraperwiki-python.git (to revision morph_defaults) to /app/.heroku/src/scraperwiki  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:387: SNIMissingWarning: An HTTPS request has been made, but the SNI (Server Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  SNIMissingWarning,  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:142: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  InsecurePlatformWarning,  Collecting python-dateutil==2.1  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:142: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  InsecurePlatformWarning,  Downloading python-dateutil-2.1.tar.gz (152 kB)  Collecting BeautifulSoup==3.2.1  Downloading BeautifulSoup-3.2.1.tar.gz (31 kB)  Collecting unidecode==0.4.16  Downloading Unidecode-0.04.16.tar.gz (200 kB)  Collecting dumptruck>=0.1.2  Downloading dumptruck-0.1.6.tar.gz (15 kB)  Collecting requests  Downloading requests-2.24.0-py2.py3-none-any.whl (61 kB)  Collecting six  Downloading six-1.15.0-py2.py3-none-any.whl (10 kB)  Collecting idna<3,>=2.5  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)  Collecting chardet<4,>=3.0.2  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)  Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1  Downloading urllib3-1.25.11-py2.py3-none-any.whl (127 kB)  Collecting certifi>=2017.4.17  Downloading certifi-2020.6.20-py2.py3-none-any.whl (156 kB)  Building wheels for collected packages: python-dateutil, BeautifulSoup, unidecode, dumptruck  Building wheel for python-dateutil (setup.py): started  Building wheel for python-dateutil (setup.py): finished with status 'done'  Created wheel for python-dateutil: filename=python_dateutil-2.1-py2-none-any.whl size=119901 sha256=a3366dc5293fd2b2ad01b7419e3df28e7f1c118bb1252f52eb4093d14cabd7c7  Stored in directory: /tmp/pip-ephem-wheel-cache-RjnN0_/wheels/38/1a/fa/cae02d14293ebd251e5a28dad44f626724b96e88f18a2cfe95  Building wheel for BeautifulSoup (setup.py): started  Building wheel for BeautifulSoup (setup.py): finished with status 'done'  Created wheel for BeautifulSoup: filename=BeautifulSoup-3.2.1-py2-none-any.whl size=31961 sha256=a656706776cf31c2c75fd12930db1f7b37742ec087a33b60029932d11666ac91  Stored in directory: /tmp/pip-ephem-wheel-cache-RjnN0_/wheels/4d/ca/f6/2638be1fa1df72e30c9f0264c6e4fd77b97eb0044aa8083e12  Building wheel for unidecode (setup.py): started  Building wheel for unidecode (setup.py): finished with status 'done'  Created wheel for unidecode: filename=Unidecode-0.4.16-py2-none-any.whl size=228246 sha256=87cbee5a6080422e6df28483c3b7a0277ba37250a918bff12a6677d6c25455ec  Stored in directory: /tmp/pip-ephem-wheel-cache-RjnN0_/wheels/53/57/27/3f535ae8f352375be1e7df13e5d2b583ded21333f08711ddfd  Building wheel for dumptruck (setup.py): started  Building wheel for dumptruck (setup.py): finished with status 'done'  Created wheel for dumptruck: filename=dumptruck-0.1.6-py2-none-any.whl size=11843 sha256=40649135584380bac0170213b50637e30cdb10e4f01d84aa2fbfb7d10b7a2b66  Stored in directory: /tmp/pip-ephem-wheel-cache-RjnN0_/wheels/dc/75/e9/1e61c4080c73e7bda99614549591f83b53bcc2d682f26fce62  Successfully built python-dateutil BeautifulSoup unidecode dumptruck  Installing collected packages: dumptruck, idna, chardet, urllib3, certifi, requests, scraperwiki, six, python-dateutil, BeautifulSoup, unidecode  Running setup.py develop for scraperwiki  Successfully installed BeautifulSoup-3.2.1 certifi-2020.6.20 chardet-3.0.4 dumptruck-0.1.6 idna-2.10 python-dateutil-2.1 requests-2.24.0 scraperwiki six-1.15.0 unidecode-0.4.16 urllib3-1.25.11   ! Hello! It looks like your application is using an outdated version of Python.  ! This caused the security warning you saw above during the 'pip install' step.  ! We recommend 'python-3.6.2', which you can specify in a 'runtime.txt' file.  ! -- Much Love, Heroku. DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.    -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... Loading previous entries... Found 1863 urls Starting at http://tinyletter.com/realfuture/letters/from-fuld-hall-to-olden-farm-witnessed-enviously Currently have 1863 entries Starting at http://tinyletter.com/realfuture/letters/from-fuld-hall-to-olden-farm-witnessed-enviously http://tinyletter.com/realfuture/letters/from-fuld-hall-to-olden-farm-witnessed-enviously Traceback (most recent call last): File "scraper.py", line 144, in <module> main() File "scraper.py", line 141, in main io(starturl, urls, inds) File "scraper.py", line 106, in io dt, ts, new_url = load(next_url) File "scraper.py", line 76, in load dt, contents, next_url = parse(read(url)) File "scraper.py", line 31, in read response = urllib2.urlopen(url) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found

Data

Downloaded 2507 times by mobeets MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (2.25 MB) Use the API

rows 10 / 1863

ps title url index number src_url dt
<p>"But [Emory University's Gregory] Berns hopes to respond with future fMRI work, which will compare brain activity in dogs being fed by automated mechanisms with that of dogs being fed by humans."</p><p> </p>
People are putting dogs in MRI machines to determine if they love us like we love them. But will they love robots, too?
2013-11-5.1
1
2013-11-5
<p>"If you want a picture of the future, imagine a robot hand playing rock paper scissors with a human hand -- forever" -- <a href="https://twitter.com/flaneur/status/397503991707598848">@MatthewOgle</a></p><p> </p>
You can't beat this robot at rock-paper-scissors because it detects your initial hand movement and forms its own fingers into a winning configuration before you can finish.
2013-11-5.2
2
2013-11-5
<p>"...<span>more than offsetting losses in other divisions."</span></p><p> </p>
AOL's dial-up Internet business generated almost $150 million in income in the last quarter
2013-11-5.3
3
2013-11-5
<p>"<span>So he took my computer and sort of typed a few things, and right in front of me he downloaded, like, 4,000 words from the pool of the internet."</span></p><p> </p>
Julian Assange helped MIA find words with T-E-N-T in them for a song about the plight of refugees
2013-11-5.4
4
2013-11-5
<p>"<span>The camp was centered around a beautiful wild hot spring. 70 miles to the nearest phone. They erected a dome in the desert and then battled the winds while trying to erect an inflatable structure. It was Burning Man 40 years ago."</span></p><p><span><br /></span></p><p><span>As always, send feedback and ideas for inclusion to amadrigal@theatlantic.com.</span></p>
Whole Earth Catalog founder Steward Brand shot this footage in the desert in 1971.
2013-11-5.5
5
2013-11-5
<p>"Mobile is eating the world."</p><p> </p>
Analyst Benedict Evans lays out the 73-slide case for the end of the Internet, media, and technology industries as we've known them
2013-11-6.1
1
2013-11-6
<p>"<span>The institution with the most to gain is the Internal Revenue Service."</span></p><p> </p>
Steven Levy's 1994 Wired article on digital currency, including a swath of defunct BitCoin wannabes.
2013-11-6.2
2
2013-11-6
<p>"<span>David Milarch of the <a href="http://www.ancienttreearchive.org/">Archangel Ancient Tree Archive</a>, the group cloning the trees, says the clones are living links to Muir's life."</span></p><p> </p>
A Michigan company successfully cloned a 130-year old sequoia that Atlantic contributor John Muir planted in his yard in the 19th century
2013-11-6.3
3
2013-11-6
<p>"<span>The Jawbone Canyon siphon, pictured above in </span><a href="http://digitallibrary.usc.edu/cdm/singleitem/collection/p15799coll65/id/17789/rec/1" target="_blank">a photograph from the California Historical Society Collection at the USC Libraries</a><span>, is among the aqueduct's most impressive features. Workers assembled the massive steel pipe (measuring 8,095 feet in length and up to ten feet in diameter) in 36-foot, 25-ton segments, each hauled to the work site by </span><a href="http://content.cdlib.org/ark:/13030/hb2q2nb1jr/" target="_blank">a team of 52 mules</a><span>. Water falls through the tube, 850 feet to the canyon floor, generating hydraulic pressure that then forces it up and over the opposite ridge without the aid of a pump."</span></p><p> </p>
A reflection on the Rube Goldbergian engineering of Los Angeles' Owens Valley aqueduct.
2013-11-6.4
4
2013-11-6
<p>* Thanks to <a href="https://twitter.com/smc90/status/397971728153862145">Sonal Chokshi</a>, <a href="https://twitter.com/NathanUnbound/status/397790865193578496">Nathan Masters</a>, <a href="https://twitter.com/MattPRD/status/397933227584655360">Matt Schlicht</a>.</p>
UX Archive, a site that has collected 241 "user flows," which show how people accomplish anything with their phones
2013-11-6.5
5
2013-11-6

Statistics

Average successful run time: 2 minutes

Total run time: 14 days

Total cpu time used: about 1 hour

Total disk space used: 2.3 MB

History

  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

intriguing-things-scraper / scraper.py