mobeets / intriguing-things-scraper

scraper for Alexis Madrigal's 5 Intriguing Things


Alexis Madrigal's 5 Intriguing Things is a usually-daily newsletter containing links to things. The file scraper.py is run daily by Morph to update a (good enough) archive of those things.

View the archive here, or better yet, browse the results.

Contributors mobeets

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling...  -----> Python app detected  ! The latest version of Python 2 is python-2.7.14 (you are using python-2.7.6, which is unsupported).  ! We recommend upgrading by specifying the latest version (python-2.7.14).  Learn More: https://devcenter.heroku.com/articles/python-runtimes -----> Installing python-2.7.6 -----> Installing pip -----> Installing requirements with pip  DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support  Obtaining scraperwiki from git+http://github.com/openaustralia/scraperwiki-python.git@morph_defaults#egg=scraperwiki (from -r /tmp/build/requirements.txt (line 1))  Cloning http://github.com/openaustralia/scraperwiki-python.git (to revision morph_defaults) to /app/.heroku/src/scraperwiki  Running command git clone -q http://github.com/openaustralia/scraperwiki-python.git /app/.heroku/src/scraperwiki  Running command git checkout -b morph_defaults --track origin/morph_defaults  Switched to a new branch 'morph_defaults'  Branch morph_defaults set up to track remote branch morph_defaults from origin.  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:380: SNIMissingWarning: An HTTPS request has been made, but the SNI (Server Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  SNIMissingWarning,  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:139: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  InsecurePlatformWarning,  Collecting python-dateutil==2.1  /app/.heroku/python/lib/python2.7/site-packages/pip/_vendor/urllib3/util/ssl_.py:139: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings  InsecurePlatformWarning,  Downloading https://files.pythonhosted.org/packages/65/52/9c18dac21f174ad31b65e22d24297864a954e6fe65876eba3f5773d2da43/python-dateutil-2.1.tar.gz (152kB)  Collecting BeautifulSoup==3.2.1  Downloading https://files.pythonhosted.org/packages/1e/ee/295988deca1a5a7accd783d0dfe14524867e31abb05b6c0eeceee49c759d/BeautifulSoup-3.2.1.tar.gz  Collecting unidecode==0.4.16  Downloading https://files.pythonhosted.org/packages/ec/d8/97c4c7ed5ad3cd2511d8896b2973b1f403110e07b38ea310f8703ba8485f/Unidecode-0.04.16.tar.gz (200kB)  Collecting dumptruck>=0.1.2  Downloading https://files.pythonhosted.org/packages/15/27/3330a343de80d6849545b6c7723f8c9a08b4b104de964ac366e7e6b318df/dumptruck-0.1.6.tar.gz  Collecting requests  Downloading https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl (57kB)  Collecting six  Downloading https://files.pythonhosted.org/packages/65/26/32b8464df2a97e6dd1b656ed26b2c194606c16fe163c695a992b36c11cdf/six-1.13.0-py2.py3-none-any.whl  Collecting chardet<3.1.0,>=3.0.2  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)  Collecting idna<2.9,>=2.5  Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)  Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1  Downloading https://files.pythonhosted.org/packages/b4/40/a9837291310ee1ccc242ceb6ebfd9eb21539649f193a7c8c86ba15b98539/urllib3-1.25.7-py2.py3-none-any.whl (125kB)  Collecting certifi>=2017.4.17  Downloading https://files.pythonhosted.org/packages/b9/63/df50cac98ea0d5b006c55a399c3bf1db9da7b5a24de7890bc9cfd5dd9e99/certifi-2019.11.28-py2.py3-none-any.whl (156kB)  Building wheels for collected packages: python-dateutil, BeautifulSoup, unidecode, dumptruck  Building wheel for python-dateutil (setup.py): started  Building wheel for python-dateutil (setup.py): finished with status 'done'  Created wheel for python-dateutil: filename=python_dateutil-2.1-cp27-none-any.whl size=119902 sha256=ef5284e1ed51cf6d5f91fbaa71251835d5407ee2481525d173d8b5854e21db52  Stored in directory: /tmp/pip-ephem-wheel-cache-M8HGf7/wheels/a3/b4/9e/be446328c3728631f286e9cc832b8b00ca99480eefa1a6db4e  Building wheel for BeautifulSoup (setup.py): started  Building wheel for BeautifulSoup (setup.py): finished with status 'done'  Created wheel for BeautifulSoup: filename=BeautifulSoup-3.2.1-cp27-none-any.whl size=31960 sha256=c98eb3a589afc06ceb3b6da8bda8ef904c91de5fc9731e8031ad43d05f462a56  Stored in directory: /tmp/pip-ephem-wheel-cache-M8HGf7/wheels/74/d2/0b/8ef02aab9e15c6e5158d7aee909adab931a9c54920e99f468e  Building wheel for unidecode (setup.py): started  Building wheel for unidecode (setup.py): finished with status 'done'  Created wheel for unidecode: filename=Unidecode-0.4.16-cp27-none-any.whl size=228248 sha256=838e660aae04592e099cf1294803d32ab4156e0b80b114ed6a22e68c4ba75b70  Stored in directory: /tmp/pip-ephem-wheel-cache-M8HGf7/wheels/11/39/a6/61c25c7caa30123280d52e1064a30253b9a909a741361dbd26  Building wheel for dumptruck (setup.py): started  Building wheel for dumptruck (setup.py): finished with status 'done'  Created wheel for dumptruck: filename=dumptruck-0.1.6-cp27-none-any.whl size=11845 sha256=26ecf5f6b8d51e684721565062bdee39b90547e240719b0596aa2e696160d885  Stored in directory: /tmp/pip-ephem-wheel-cache-M8HGf7/wheels/57/df/83/32654ae89119876c7a7db66829bbdb646caa151589dbaf226e  Successfully built python-dateutil BeautifulSoup unidecode dumptruck  Installing collected packages: dumptruck, chardet, idna, urllib3, certifi, requests, scraperwiki, six, python-dateutil, BeautifulSoup, unidecode  Running setup.py develop for scraperwiki  Successfully installed BeautifulSoup-3.2.1 certifi-2019.11.28 chardet-3.0.4 dumptruck-0.1.6 idna-2.8 python-dateutil-2.1 requests-2.22.0 scraperwiki six-1.13.0 unidecode-0.4.16 urllib3-1.25.7   ! Hello! It looks like your application is using an outdated version of Python.  ! This caused the security warning you saw above during the 'pip install' step.  ! We recommend 'python-3.6.2', which you can specify in a 'runtime.txt' file.  ! -- Much Love, Heroku. DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support    -----> Discovering process types  Procfile declares types -> scraper Injecting scraper and running... Loading previous entries... Found 1863 urls Starting at http://tinyletter.com/realfuture/letters/from-fuld-hall-to-olden-farm-witnessed-enviously Currently have 1863 entries Starting at http://tinyletter.com/realfuture/letters/from-fuld-hall-to-olden-farm-witnessed-enviously http://tinyletter.com/realfuture/letters/from-fuld-hall-to-olden-farm-witnessed-enviously Traceback (most recent call last): File "scraper.py", line 144, in <module> main() File "scraper.py", line 141, in main io(starturl, urls, inds) File "scraper.py", line 106, in io dt, ts, new_url = load(next_url) File "scraper.py", line 76, in load dt, contents, next_url = parse(read(url)) File "scraper.py", line 31, in read response = urllib2.urlopen(url) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 442, in error result = self._call_chain(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 629, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/app/.heroku/python/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not found

Data

Downloaded 2240 times by mobeets MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (2.25 MB) Use the API

rows 10 / 1863

ps title url index number src_url dt
<p>"But [Emory University's Gregory] Berns hopes to respond with future fMRI work, which will compare brain activity in dogs being fed by automated mechanisms with that of dogs being fed by humans."</p><p> </p>
People are putting dogs in MRI machines to determine if they love us like we love them. But will they love robots, too?
2013-11-5.1
1
2013-11-5
<p>"If you want a picture of the future, imagine a robot hand playing rock paper scissors with a human hand -- forever" -- <a href="https://twitter.com/flaneur/status/397503991707598848">@MatthewOgle</a></p><p> </p>
You can't beat this robot at rock-paper-scissors because it detects your initial hand movement and forms its own fingers into a winning configuration before you can finish.
2013-11-5.2
2
2013-11-5
<p>"...<span>more than offsetting losses in other divisions."</span></p><p> </p>
AOL's dial-up Internet business generated almost $150 million in income in the last quarter
2013-11-5.3
3
2013-11-5
<p>"<span>So he took my computer and sort of typed a few things, and right in front of me he downloaded, like, 4,000 words from the pool of the internet."</span></p><p> </p>
Julian Assange helped MIA find words with T-E-N-T in them for a song about the plight of refugees
2013-11-5.4
4
2013-11-5
<p>"<span>The camp was centered around a beautiful wild hot spring. 70 miles to the nearest phone. They erected a dome in the desert and then battled the winds while trying to erect an inflatable structure. It was Burning Man 40 years ago."</span></p><p><span><br /></span></p><p><span>As always, send feedback and ideas for inclusion to amadrigal@theatlantic.com.</span></p>
Whole Earth Catalog founder Steward Brand shot this footage in the desert in 1971.
2013-11-5.5
5
2013-11-5
<p>"Mobile is eating the world."</p><p> </p>
Analyst Benedict Evans lays out the 73-slide case for the end of the Internet, media, and technology industries as we've known them
2013-11-6.1
1
2013-11-6
<p>"<span>The institution with the most to gain is the Internal Revenue Service."</span></p><p> </p>
Steven Levy's 1994 Wired article on digital currency, including a swath of defunct BitCoin wannabes.
2013-11-6.2
2
2013-11-6
<p>"<span>David Milarch of the <a href="http://www.ancienttreearchive.org/">Archangel Ancient Tree Archive</a>, the group cloning the trees, says the clones are living links to Muir's life."</span></p><p> </p>
A Michigan company successfully cloned a 130-year old sequoia that Atlantic contributor John Muir planted in his yard in the 19th century
2013-11-6.3
3
2013-11-6
<p>"<span>The Jawbone Canyon siphon, pictured above in </span><a href="http://digitallibrary.usc.edu/cdm/singleitem/collection/p15799coll65/id/17789/rec/1" target="_blank">a photograph from the California Historical Society Collection at the USC Libraries</a><span>, is among the aqueduct's most impressive features. Workers assembled the massive steel pipe (measuring 8,095 feet in length and up to ten feet in diameter) in 36-foot, 25-ton segments, each hauled to the work site by </span><a href="http://content.cdlib.org/ark:/13030/hb2q2nb1jr/" target="_blank">a team of 52 mules</a><span>. Water falls through the tube, 850 feet to the canyon floor, generating hydraulic pressure that then forces it up and over the opposite ridge without the aid of a pump."</span></p><p> </p>
A reflection on the Rube Goldbergian engineering of Los Angeles' Owens Valley aqueduct.
2013-11-6.4
4
2013-11-6
<p>* Thanks to <a href="https://twitter.com/smc90/status/397971728153862145">Sonal Chokshi</a>, <a href="https://twitter.com/NathanUnbound/status/397790865193578496">Nathan Masters</a>, <a href="https://twitter.com/MattPRD/status/397933227584655360">Matt Schlicht</a>.</p>
UX Archive, a site that has collected 241 "user flows," which show how people accomplish anything with their phones
2013-11-6.5
5
2013-11-6

Statistics

Average successful run time: 2 minutes

Total run time: 14 days

Total cpu time used: about 1 hour

Total disk space used: 2.3 MB

History

  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • Auto ran revision c1dbd724 and failed .
    nothing changed in the database
  • ...
  • Created on morph.io

Show complete history

Scraper code

Python

intriguing-things-scraper / scraper.py