askandtheanswer / data_journalism_handbook

Data Journalism Handbook

The HTML and titles of every chapter from the OKFN’s Data Journalism Handbook.

ScraperWiki’s mentioned in 7 chapters !! :-)

Forked from ScraperWiki

Contributors askandtheanswer

Last run failed with status code 1.

Console output of last run

Injecting configuration and compiling... Injecting scraper code and running... Traceback (most recent call last): File "", line 30, in <module> html = requests.get(baseurl) NameError: name 'baseurl' is not defined


Downloaded 2 times by MikeRalphson

To download data sign in with GitHub

Download table (as CSV) Download SQLite database (547 KB) Use the API

rows 10 / 65

url chapter_number html section_number title
<div class="sect2"> <h3 id="_for_the_great_unnamed">For the Great Unnamed</h3> <div class="imageblock" id="FIG0.1"> <div class="content"> <img src="" alt="figs/incoming/00-01.jpg"></div> <div class="title">Figure 1. How it all began</div> </div> <div class="paragraph"><p>The Data Journalism Handbook was born at a 48 hour workshop at MozFest 2011 in London. It subsequently spilled over into an international, collaborative effort involving dozens of data journalism’s leading advocates and best practitioners.</p></div> <div class="paragraph"><p>In the 6 months that passed between the book’s inception to its first full release, hundreds of people have contributed in various ways. While we have done our best to keep track of them all, we have had our fair share of anonymous, pseudonymous and untraceable edits.</p></div> <div class="paragraph"><p>To all of those people who have contributed and are not listed below, we say two things. Firstly, thank you. Secondly, can please tell us who you are so that we can give credit where credit is due.</p></div> </div>
For the Great Unnamed
<div class="sect2"> <h3 id="_contributor_list">Contributor List</h3> <div class="paragraph"><p>The following people have drafted or otherwise directly contributed to text which is in the current version of the book. The illustrations are by graphic designer Kate Hudson.</p></div> <div class="ulist"><ul><li> <p> Gregor Aisch, Open Knowledge Foundation </p> </li> <li> <p> Brigitte Alfter, </p> </li> <li> <p> David Anderton, Freelance Journalist </p> </li> <li> <p> James Ball, The Guardian </p> </li> <li> <p> Caelainn Barr, Citywire </p> </li> <li> <p> Mariana Berruezo, Hacks/Hackers Buenos Aires </p> </li> <li> <p> Michael Blastland, Freelance Journalist </p> </li> <li> <p> Mariano Blejman, Hacks/Hackers Buenos Aires </p> </li> <li> <p> John Bones, Verdens Gang </p> </li> <li> <p> Marianne Bouchart, Bloomberg News </p> </li> <li> <p> Liliana Bounegru, European Journalism Centre </p> </li> <li> <p> Brian Boyer, Chicago Tribune </p> </li> <li> <p> Paul Bradshaw, Birmingham City University </p> </li> <li> <p> Wendy Carlisle, Australian Broadcasting Corporation </p> </li> <li> <p> Lucy Chambers, Open Knowledge Foundation </p> </li> <li> <p> Sarah Cohen, Duke University </p> </li> <li> <p> Alastair Dant, The Guardian </p> </li> <li> <p> Helen Darbishire, Access Info Europe </p> </li> <li> <p> Chase Davis, Center for Investigative Reporting </p> </li> <li> <p> Steve Doig, Walter Cronkite School of Journalism of Arizona State University </p> </li> <li> <p> Lisa Evans, The Guardian </p> </li> <li> <p> Tom Fries, Bertelsmann Stiftung </p> </li> <li> <p> Duncan Geere, Wired UK </p> </li> <li> <p> Jack Gillum, Associated Press </p> </li> <li> <p> Jonathan Gray, Open Knowledge Foundation </p> </li> <li> <p> Alex Howard, O’Reilly Media </p> </li> <li> <p> Bella Hurrell, BBC </p> </li> <li> <p> Nicolas Kayser-Bril, Journalism++ </p> </li> <li> <p> John Keefe, WNYC </p> </li> <li> <p> Scott Klein, ProPublica </p> </li> <li> <p> Alexandre Léchenet, Le Monde </p> </li> <li> <p> Mark Lee Hunter, INSEAD </p> </li> <li> <p> Andrew Leimdorfer, BBC </p> </li> <li> <p> Friedrich Lindenberg, Open Knowledge Foundation </p> </li> <li> <p> Mike Linksvayer, Creative Commons </p> </li> <li> <p> Mirko Lorenz, Deutsche Welle </p> </li> <li> <p> Esa Mäkinen, Helsingin Sanomat </p> </li> <li> <p> Pedro Markun, Transparência Hacker </p> </li> <li> <p> Isao Matsunami, Tokyo Shimbun </p> </li> <li> <p> Lorenz Matzat, OpenDataCity </p> </li> <li> <p> Geoff McGhee, Stanford University </p> </li> <li> <p> Philip Meyer, Professor Emeritus, University of North Carolina at Chapel Hill </p> </li> <li> <p> Claire Miller, WalesOnline </p> </li> <li> <p> Cynthia O’Murchu, Financial Times </p> </li> <li> <p> Oluseun Onigbinde, BudgIT </p> </li> <li> <p> Djordje Padejski, Knight Journalism Fellow, Stanford University </p> </li> <li> <p> Jane Park, Creative Commons </p> </li> <li> <p> Angélica Peralta Ramos, La Nacion (Argentina) </p> </li> <li> <p> Cheryl Phillips, The Seattle Times </p> </li> <li> <p> Aron Pilhofer, New York Times </p> </li> <li> <p> Lulu Pinney, Freelance Infographic Designer </p> </li> <li> <p> Paul Radu, Organised Crime and Corruption Reporting Project </p> </li> <li> <p> Simon Rogers, The Guardian </p> </li> <li> <p> Martin Rosenbaum, BBC </p> </li> <li> <p> Amanda Rossi, Friends of Januária </p> </li> <li> <p> Martin Sarsale, Hacks/Hackers Buenos Aires </p> </li> <li> <p> Fabrizio Scrollini, London School of Economics and Political Science </p> </li> <li> <p> Sarah Slobin, Wall Street Journal </p> </li> <li> <p> Sergio Sorin, Hacks/Hackers Buenos Aires </p> </li> <li> <p> Jonathan Stray, The Overview Project </p> </li> <li> <p> Brian Suda, ( </p> </li> <li> <p> Chris Taggart, OpenCorporates </p> </li> <li> <p> Jer Thorp, The New York Times R&D Group </p> </li> <li> <p> Andy Tow, Hacks/Hackers Buenos Aires </p> </li> <li> <p> Luk N. Van Wassenhove, INSEAD </p> </li> <li> <p> Sascha Venohr, Zeit Online </p> </li> <li> <p> Jerry Vermanen, </p> </li> <li> <p> César Viana, University of Goiás </p> </li> <li> <p> Farida Vis, University of Leicester </p> </li> <li> <p> Pete Warden, Independent Data Analyst and Developer </p> </li> <li> <p> Chrys Wu, Hacks/Hackers </p> </li> </ul></div> </div>
Contributor List
<div class="sect2"> <h3 id="_what_this_book_is_and_what_it_isn_8217_t">What This Book Is (And What It Isn’t)</h3> <div class="paragraph"><p>This book is intended to be a useful resource for anyone who thinks that they might be interested in becoming a data journalist, or dabbling in data journalism.</p></div> <div class="paragraph"><p>Lots of people have contributed to writing it, and through our editorial we have tried to let their different voices and views shine through. We hope that it reads like a rich and informative conversation about what data journalism is, why it is important, and how to do it.</p></div> <div class="paragraph"><p>Lamentably the act of reading this book will not supply you with a comprehensive repertoire of all if the knowledge and skills you need to become a data journalist. This would require a vast library manned by hundreds of experts able to help answer questions on hundreds of topics. Luckily this library exists and it is called the internet. Instead, we hope this book will give you a sense of how to get started and where to look if you want to go further. Examples and tutorials serve to be illustrative rather than exhaustive.</p></div> <div class="paragraph"><p>We count ourselves very lucky to have had so much time, energy, and patience from all of our contributors and have tried our best to use this wisely. We hope that - in addition to being a useful reference source - the book does something to document the passion and enthusiasm, the vision and energy of a nascent movement. The book attempts to give a sense of what happens behind the scenes, the stories behind the stories.</p></div> <div class="paragraph"><p>The Data Journalism Handbook is a work in progress. If you think there is anything which needs to be amended or is conspicuously absent, then please flag it for inclusion in the next version. It is also freely available under a <a href="">Creative Commons Attribution-ShareAlike</a> license, and we strongly encourage you to share it with anyone that you think might be interested in reading it.</p></div> <div class="paragraph"><p><em>Jonathan Gray (<a href="">@jwyg</a>)</em><br><em>Liliana Bounegru (<a href="">@bb_liliana</a>)</em><br><em>Lucy Chambers (<a href="">@lucyfedia</a>)</em><br><em>March 2012</em></p></div> </div>
What This Book Is (And What It Isn’t)
<div class="sect2"> <h3 id="_the_handbook_at_a_glance">The Handbook At A Glance</h3> <div class="paragraph"><p>Infographic impresario Lulu Pinney created this superb poster, which gives an overview of the contents of the Data Journalism Handbook.</p></div> <div class="imageblock" id="FIG0.2"> <div class="content"> <img src="" alt="figs/incoming/00-poster.png"></div> <div class="title">Figure 2. The handbook at a glance</div> </div> </div>
The Handbook At A Glance
<div class="sect2"> <h3 id="_what_is_data_journalism">What Is Data Journalism?</h3> <div class="imageblock" id="FIG012"> <div class="content"> <img src="" alt="figs/incoming/01-01.png"></div> <div class="title">Figure 3. <em>Investigate your MP’s Expenses</em> (The Guardian)</div> </div> <div class="paragraph"><p>What is data journalism? I could answer, simply, that it is journalism done with data. But that doesn’t help much.</p></div> <div class="paragraph"><p>Both ‘data’ and ‘journalism’ are troublesome terms. Some people think of ‘data’ as any collection of numbers, most likely gathered on a spreadsheet. 20 years ago, that was pretty much the only sort of data that journalists dealt with. But we live in a digital world now, a world in which almost anything can be — and almost everything is — described with numbers.</p></div> <div class="paragraph"><p>Your career history, 300,000 confidential documents, who knows who in your circle of friends can all be (and are) described with just two numbers: zeroes, and ones. Photos, video and audio are all described with the same two numbers: zeroes and ones. Murders, disease, political votes, corruption and lies: zeroes and ones.</p></div> <div class="paragraph"><p>What makes data journalism different to the rest of journalism? Perhaps it is the new possibilities that open up when you combine the traditional ‘nose for news’ and ability to tell a compelling story, with the sheer scale and range of digital information now available.</p></div> <div class="paragraph"><p>And those possibilities can come at any stage of the journalist’s process: using programming to automate the process of gathering and combining information from local government, police, and other civic sources, as Adrian Holovaty did with <a href="">ChicagoCrime</a> and then <a href="">EveryBlock</a>.</p></div> <div class="paragraph"><p>Or using software to find connections between hundreds of thousands of documents, as The Telegraph did with <a href="">MPs' expenses</a>.</p></div> <div class="paragraph"><p>Data journalism can help a journalist tell a complex story through engaging infographics. Hans Rosling’s spectacular talks on visualizing world poverty with <a href="">Gapminder</a>, for example, have attracted millions of views across the world. And David McCandless’s popular work in distilling big numbers — such as putting public spending into context, or the pollution generated and prevented by the Icelandic volcano — shows the importance of clear design at <a href="">Information is Beautiful</a>.</p></div> <div class="paragraph"><p>Or it can help explain how a story relates to an individual, as the BBC and the Financial Times now routinely do with their budget interactives (where you can find out how the budget affects you, rather than ‘Joe Public’). And it can open up the news gathering process itself, as The Guardian do so successfully in sharing data, context, and questions with their <a href="">Datablog</a>.</p></div> <div class="paragraph"><p>Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it.</p></div> <div class="paragraph"><p>— <em>Paul Bradshaw, Birmingham City University</em></p></div> </div>
What Is Data Journalism?
<div class="sect2"> <h3 id="_why_journalists_should_use_data">Why Journalists Should Use Data</h3> <div class="paragraph"><p>Journalism is under siege. In the past we, as an industry, relied on being the only ones operating a technology to multiply and distribute what had happened over night. The printing press served as a gateway, if anybody wanted reach the people of a city or region the next morning, they would turn to newspapers. This is over.</p></div> <div class="paragraph"><p>Today news stories are flowing in as they happen, from multiple sources, eye-witnesses, blogs and what has happened is filtered through a vast network of social connections, being ranked, commented and more often than not: ignored.</p></div> <div class="paragraph"><p>This is why data journalism is so important. Gathering, filtering and visualizing what is happening beyond what the eye can see has a growing value. The orange juice you drink in the morning, the coffee you brew — in today’s global economy there are invisible connections between these products, other people and you. The language of this network is data: little points of information that are often not relevant in a single instance, but massively important when viewed from the right angle.</p></div> <div class="paragraph"><p>Right now, a few pioneering journalists already demonstrate how data can be used to create deeper insights into what is happening around us and how it might affect us.</p></div> <div class="paragraph"><p>Data analysis can reveal “a story’s shape” (Sarah Cohen), or provides us with a “new camera” (David McCandless). Using data the job of journalists shifts its main focus from being the first ones to report to being the ones telling us what a certain development might actually mean. The range of topics can be far and wide. The next financial crisis that is in the making. The economics behind the products we use. The misuse of funds or political blunders, presented in a compelling data visualization that leaves little room to argue with it.</p></div> <div class="paragraph"><p>This is why journalists should see data as an opportunity. They can, for example, reveal how some abstract threat such as unemployment affects people based on their age, gender, education. Using data transforms something abstract into something everyone can understand and relate to.</p></div> <div class="paragraph"><p>They can create personalized calculators to help people to make decisions, be this buying a car, a house, deciding on an education or professional path in life or doing a hard check on costs to keep out of debt.</p></div> <div class="paragraph"><p>They can analyze the dynamics of a complex situation like riots or political debates, show the fallacies and help everyone to see possible solutions to complex problems.</p></div> <div class="paragraph"><p>Becoming knowledgeable in searching, cleaning, and visualizing data is transformative for the profession of information gathering, too. Journalists who master this will experience that building articles on facts and insights is a relief. Less guessing, less looking for quotes — instead, a journalist can build a strong position supported by data and this can affect the role of journalism greatly.</p></div> <div class="paragraph"><p>Additionally, getting into data journalism offers a future perspective. Today, when newsrooms cut down, most journalists hope to switch to public relations. Data journalists or data scientists though are already a sought-after group of employees, not only in the media. Companies and institutions around the world are looking for “sensemakers” and professionals, who know how to dig through data and transform it into something tangible.</p></div> <div class="paragraph"><p>There is a promise in data and this is what excites newsrooms, making them look for a new type of reporter. For freelancers proficiency with data provides a route to new offerings and stable pay, too. Look at it this way: instead of hiring journalists to quickly fill pages and websites with low value content the use of data could create demand for interactive packages, where spending a week on solving one question is the only way to do it. This is a welcome change in many parts of the media.</p></div> <div class="paragraph"><p>There is one barrier keeping journalists from using this potential: training in order to learn how to work with data through all the steps from a first question to a big data-driven scoop.</p></div> <div class="paragraph"><p>Working with data is like stepping into vast, unknown territory. At first look, raw data is puzzling to the eyes and to the mind. Data as such is unwieldy. It is quite hard to shape it correctly for visualization. It needs experienced journalists, who have the stamina to look at often confusing, often boring raw data and “see” the hidden stories in there.</p></div> <div class="imageblock" id="FIG013"> <div class="content"> <img src="" alt="figs/incoming/01-DD.png"></div> <div class="title">Figure 4. European Journalism Centre Survey on Training Needs</div> </div> <div class="sidebarblock"> <div class="content"> <div class="title">The Survey</div> <div class="paragraph"><p>The European Journalism Centre <a href="">conducted a survey</a> to find out more about training needs of journalists. We found there is a big willingness to get out of the comfort zone of traditional journalism and to invest time to master the new skills. The results from the survey showed us that journalists see the opportunity, but need a bit of support to cut through the initial problems keeping them from working with data. There is a confidence, that should data journalism get more adopted, the workflows, the tools and the results will improve quite quickly. Pioneers such as the Guardian, the New York Times, the Texas Tribune and Die Zeit continue to raise the bar with their data-driven stories.</p></div> <div class="paragraph"><p>Will data journalism remain the preserve of a small handful of pioneers, or will every news organization soon have its own dedicated data journalism team? We hope this handbook will help more journalists and newsrooms to take advantage of this emerging field.</p></div> </div></div> <div class="paragraph"><p>— <em>Mirko Lorenz, Deutsche Welle</em></p></div> </div>
Why Journalists Should Use Data
<div class="sect2"> <h3 id="_why_is_data_journalism_important">Why Is Data Journalism Important?</h3> <div class="paragraph"><p>We asked some of data journalism’s leading practitioners and proponents why they think data journalism is an important development. Here is what they said.</p></div> <div class="sect3"> <h4 id="_filtering_the_flow_of_data">Filtering the Flow of Data</h4> <div class="paragraph"><p>When information was scarce, most of our efforts were devoted to hunting and gathering. Now that information is abundant, processing is more important. We process at two levels: (1) analysis to bring sense and structure out of the never-ending flow of data and (2) presentation to get what’s important and relevant into the consumer’s head. Like science, data journalism discloses its methods and presents its findings in a way that can be verified by replication.</p></div> <div class="paragraph"><p>— <em>Philip Meyer, Professor Emeritus, University of North Carolina at Chapel Hill</em></p></div> </div> <div class="sect3"> <h4 id="_new_approaches_to_storytelling">New Approaches to Storytelling</h4> <div class="paragraph"><p>Data journalism is an umbrella term that, to my mind, encompasses an ever-growing set of tools, techniques and approaches to storytelling. It can include everything from traditional computer-assisted reporting (using data as a ‘source’) to the most cutting edge data visualization and news applications. The unifying goal is a journalistic one: providing information and analysis to help inform us all about important issues of the day.</p></div> <div class="paragraph"><p>— <em>Aron Pilhofer, New York Times</em></p></div> </div> <div class="sect3"> <h4 id="_like_photo_journalism_with_a_laptop">Like Photo Journalism with a Laptop</h4> <div class="paragraph"><p>‘Data journalism’ only differs from ‘words journalism’ in that we use a different kit. We all sniff out, report, and relate stories for a living. It’s like ‘photo journalism’; just swap the camera for a laptop.</p></div> <div class="paragraph"><p>— <em>Brian Boyer, Chicago Tribune</em></p></div> </div> <div class="sect3"> <h4 id="_data_journalism_is_the_future">Data Journalism is the Future</h4> <div class="paragraph"><p>Data-driven journalism is the future. Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you’ll do it that way some times. But now it’s also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what’s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what’s going on in the country.</p></div> <div class="paragraph"><p>— <em>Tim Berners-Lee, founder of the World Wide Web</em></p></div> </div> <div class="sect3"> <h4 id="_number_crunching_meets_word_smithing">Number-Crunching Meets Word-Smithing</h4> <div class="paragraph"><p>Data journalism is bridging the gap between stat technicians and wordsmiths. Locating outliers and identifying trends that are not just statistically significant, but relevant to de-compiling the inherently complex world of today.</p></div> <div class="paragraph"><p>— <em>David Anderton, freelance journalist</em></p></div> </div> <div class="sect3"> <h4 id="_updating_your_skills_set">Updating Your Skills Set</h4> <div class="paragraph"><p>Data journalism is a new set of skills for searching, understanding and visualizing digital sources in a time that basic skills from traditional journalism just aren’t enough. It’s not a replacement of traditional journalism, but an addition to it.</p></div> <div class="paragraph"><p>In a time where sources go digital, journalists can and have to be closer to those sources. The Internet opened up possibilities beyond our current understanding. Data journalism is just the beginning of evolving our past practices to adapt to the online.</p></div> <div class="paragraph"><p>Data journalism serves two important purposes for news organizations: finding unique stories (not from news wires) and execute your watchdog function. Especially in times of financial peril, these are important goals for newspapers to achieve.</p></div> <div class="paragraph"><p>From the standpoint of a regional newspaper, data journalism is crucial. We have the saying ‘a loose tile in front of your door is considered more important than a riot in a far-away country’. It’s hits you in the face and impacts your life more directly. At the same time, digitisation is everywhere. Because local newspapers have this direct impact in their neighbourhood and sources become digitalised, a journalist must know how to find, analyze and visualise a story from data.</p></div> <div class="paragraph"><p>— <em>Jerry Vermanen</em></p></div> </div> <div class="sect3"> <h4 id="_a_remedy_for_information_asymmetry">A Remedy for Information Asymmetry</h4> <div class="paragraph"><p>Information asymmetry — not the lack of information, but the inability to take in and process it with the speed and volume that it comes to us — is one of the most significant problems that citizens face in making choices about how to live their lives. Information taken in from print, visual and audio media influence citizens' choices and actions. Good data journalism helps to combat information asymmetry.</p></div> <div class="paragraph"><p>— <em>Tom Fries, Bertelsmann Foundation</em></p></div> </div> <div class="sect3"> <h4 id="_an_answer_to_data_driven_pr">An Answer to Data-driven PR</h4> <div class="paragraph"><p>The availability of measurement tools and their decreasing prices, in a self-sustaining combination with a focus on performance and efficiency in all aspects of society, have led decision-makers to quantify the progresses of their policies, monitor trends and identify opportunities.</p></div> <div class="paragraph"><p>Companies keep coming up with new metrics showing how well they perform. Politicians love to brag about reductions in unemployment numbers and increases in GDP. The lack of journalistic insight in the Enron, Worldcom, Madoff or Solyndra affairs is proof of many a journalist’s inability to clearly see through numbers. Figures are more likely to be taken at face value than other facts as they carry an aura of seriousness, even when they are entirely fabricated.</p></div> <div class="paragraph"><p>Fluency with data will help journalists sharpen their critical sense when faced with numbers and will hopefully help them gain back some terrain in their exchanges with PR departments.</p></div> <div class="paragraph"><p>— <em>Nicolas Kayser-Bril, Journalism++</em></p></div> </div> <div class="sect3"> <h4 id="_providing_independent_interpretations_of_official_information">Providing Independent Interpretations of Official Information</h4> <div class="paragraph"><p>After the devastating earthquake and subsequent Fukushima nuclear plants disaster in 2011, the importance of data journalism has been driven home to media people in Japan, a country which is generally lagging behind in digital journalism.</p></div> <div class="paragraph"><p>We were at a loss when the government and experts had no credible data about the damage. When officials hid SPEEDI data (predicted diffusion of radioactive materials) from the public, we were not prepared to decode it even if it were leaked. Volunteers began to collect radioactive data by using their own devices but we were not armed with the knowledge of statistics, interpolation, visualization and so on. Journalists need to have access to raw data, and to learn not to rely on official interpretations of it.</p></div> <div class="paragraph"><p>— <em>Isao Matsunami, Chunichi/Tokyo Shimbun</em></p></div> </div> <div class="sect3"> <h4 id="_dealing_with_the_data_deluge">Dealing with the Data Deluge</h4> <div class="paragraph"><p>The challenges and opportunities presented by the digital revolution continue to disrupt journalism. In an age of information abundance, journalists and citizens alike all need better tools, whether we’re curating the samizdat of the 21st century in the Middle East, processing a late night data dump, or looking for the best way to visualise water quality for a nation of consumers. As we grapple with the consumption challenges presented by this deluge of data, new publishing platforms are also empowering everyone to gather and share data digitally, turning it into information. While reporters and editors have been the traditional vectors for information gathering and dissemination, the flattened information environment of 2012 now has news breaking first online, not on the news desk.</p></div> <div class="paragraph"><p>Around the globe, in fact, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. That doesn’t mean that the integrated media organizations of today don’t play a crucial role. Far from it. In the information age, journalists are needed more than ever to curate, verify, analyze and synthesise the wash of data. In that context, data journalism has profound importance for society.</p></div> <div class="paragraph"><p>Today, making sense of big data, particularly unstructured data, will be a central goal for data scientists around the world, whether they work in newsrooms, Wall Street or Silicon Valley. Notably, that goal will be substantially enabled by a growing set of common tools, whether they’re employed by government technologists opening Chicago, healthcare technologists or newsroom developers.</p></div> <div class="paragraph"><p>— <em>Alex Howard, O’Reilly Media</em></p></div> </div> <div class="sect3"> <h4 id="_our_lives_are_data">Our Lives are Data</h4> <div class="paragraph"><p>Good data journalism is hard, because good journalism is hard. It means figuring out how to get the data, how to understand it, and how to find the story. Sometimes there are dead ends, and sometimes there’s no great story. After all if it were just a matter of pressing the right button, it wouldn’t be journalism. But that’s what makes it worthwhile, and — in a world where our lives are increasingly data — essential for a free and fair society.</p></div> <div class="paragraph"><p>— <em>Chris Taggart, OpenCorporates</em></p></div> </div> <div class="sect3"> <h4 id="_a_way_to_save_time">A Way to Save Time</h4> <div class="paragraph"><p>Journalists don’t have time to waste transcribing things by hand and messing around trying to get data out of PDFs, so learning a little bit of code, or knowing where to look for people who can help, is incredibly valuable.</p></div> <div class="paragraph"><p>One reporter from Folha de São Paulo was working with the local budget and called me to thank us for putting up the accounts of the municipality of São Paolo online (two days work from a single hacker!). He said he had been transcribing them by hand for the past three months, trying to build up a story. I also remember solving a ‘PDF issue’ for ‘Contas Abertas’, a parliamentary monitoring news organisation: 15 minutes and 15 lines of code solved a months worth of work.</p></div> <div class="paragraph"><p>— <em>Pedro Markun, Transparência Hacker</em></p></div> </div> <div class="sect3"> <h4 id="_an_essential_part_of_the_journalists_toolkit">An Essential Part of the Journalists' Toolkit</h4> <div class="paragraph"><p>I think it’s important to stress the “journalism” or reporting aspect of ‘data journalism’. The exercise should not be about just analyzing data or visualizing data for the sake of it, but to use it as a tool to get closer to the truth of what is going on in the world. I see the ability to be able to analyze and interpret data as an essential part of today’s journalists' toolkit, rather than a separate discipline. Ultimately, it is all about good reporting, and telling stories in the most appropriate way.</p></div> <div class="paragraph"><p>Data journalism is another way to scrutinise the world and hold the powers that be to account. With an increasing amount of data available, now more than ever it is important that journalists are of aware of data journalism techniques. This should be a tool in the toolkit of any journalist: whether learning how to work with data directly, or collaborating with someone who can.</p></div> <div class="paragraph"><p>Its real power is in helping you to obtain information that would otherwise be very difficult to find or to prove. A good example of this is Steve Doig’s story that analyzed damage patterns from Hurricane Andrew. He joined two different datasets: one mapping the level of destruction caused by the hurricane and one showing wind speeds. This allowed him to pinpoint areas where weakened building codes and poor construction practices contributed to the impact of the disaster. He won a Pulitzer Prize for the story in 1993 and it’s great inspiration of what is possible.</p></div> <div class="paragraph"><p>Ideally you use the data to pinpoint outliers, areas of interest, or things which are surprising. In this sense data can act as a lead or a tip off. While numbers can be interesting, just writing about the data is not enough. You still need to do the reporting to explain what it means.</p></div> <div class="paragraph"><p>— <em>Cynthia O’Murchu, Financial Times</em></p></div> </div> <div class="sect3"> <h4 id="_adapting_to_changes_in_our_information_environment">Adapting to Changes in Our Information Environment</h4> <div class="paragraph"><p>New digital technologies bring new ways of producing and disseminating knowledge in society. Data journalism can be understood as the media’s attempt to adapt and respond to the changes in our information environment — including more interactive, multi-dimensional story-telling, enabling readers to explore the sources underlying the news and encouraging them to participate in the process of creating and evaluating stories.</p></div> <div class="paragraph"><p>— <em>César Viana, University of Goiás</em></p></div> </div> <div class="sect3"> <h4 id="_a_way_to_see_things_you_might_not_otherwise_see">A Way to See Things You Might Not Otherwise See</h4> <div class="paragraph"><p>Some stories can only be understood and explained through analyzing — and sometimes visualizing — the data. Connections between powerful people or entities would go unrevealed, deaths caused by drug policies that would remain hidden, environmental policies that hurt our landscape would continue unabated. But each of the above was changed because of data that journalists have obtained, analyzed and provided to readers. The data can be as simple as a basic spreadsheet or a log of cell phone calls, or complex as school test scores or hospital infection data, but inside it all are stories worth telling.</p></div> <div class="paragraph"><p>— <em>Cheryl Phillips, The Seattle Times</em></p></div> </div> <div class="sect3"> <h4 id="_a_way_to_tell_richer_stories">A Way To Tell Richer Stories</h4> <div class="paragraph"><p>We can paint pictures of our entire lives with our digital trails. From what we consume and browse, to where and when we travel, to our musical preferences, our first loves, our children’s milestones, even our last wishes – it all can be tracked, digitised, stored in the cloud and disseminated. This universe of data can be surfaced to tell stories, answer questions and impart an understanding of life in ways that is currently surpassing even the most rigorous and careful reconstruction of anecdotes.</p></div> <div class="paragraph"><p>— <em>Sarah Slobin, Wall Street Journal</em></p></div> </div> </div>
Why Is Data Journalism Important?
<div class="sect2"> <h3 id="_some_favorite_examples">Some Favorite Examples</h3> <div class="paragraph"><p>We asked some of our contributors for their favorite examples of data journalism and what they liked about them. Here they are.</p></div> <div class="sect3"> <h4 id="_em_do_no_harm_em_in_the_las_vegas_sun"><em>Do No Harm</em> in the Las Vegas Sun</h4> <div class="imageblock" id="FIG014"> <div class="content"> <img src="" alt="figs/incoming/01-GG.png"></div> <div class="title">Figure 5. <em>Do No Harm</em> (The Las Vegas Sun)</div> </div> <div class="paragraph"><p>My favourite example is the Las Vegas Sun’s 2010 <a href="">Do No Harm</a> series on hospital care (see <a href="">Figure 5</a>). The Sun analyzed more than 2.9 million hospital billing records, which revealed more than 3600 preventable injuries, infections and surgical mistakes. They obtained data through a public records request and identified more than 300 cases in which patients died because of mistakes that could have been prevented. It contains different elements, including: <a href="">an interactive graphic</a> which allows the reader to see by hospital, where surgical injuries happened more often than would be expected; <a href="">a map</a> with a timeline that shows infections spreading hospital by hospital; and <a href="">an interactive graphic</a> that allows users to sort data by preventable injuries or by hospital to see where people are getting hurt. I like it because it is very easy to understand and navigate. Users can explore the data in a very intuitive way. Also it had a real impact: the Nevada legislature responded with <a href="">six pieces of legislation</a>. The journalists involved worked very hard to acquire and clean up the data. One of the journalists, Alex Richards, sent data back to hospitals and to the state <a href="">at least a dozen times</a> to get mistakes corrected.</p></div> <div class="paragraph"><p>— <em>Angélica Peralta Ramos, La Nación, Argentina</em></p></div> </div> <div class="sect3"> <h4 id="_government_employee_salary_database">Government Employee Salary Database</h4> <div class="imageblock" id="FIG015"> <div class="content"> <img src="" alt="figs/incoming/01-FF.png"></div> <div class="title">Figure 6. <em>Government Employee Salaries</em> (The Texas Tribune)</div> </div> <div class="paragraph"><p>I love the work that small independent organizations are performing every day, such as ProPublica or the Texas Tribune who have a great data reporter in Ryan Murphy. If I had to choose, I’d pick the <a href="">Government Employee Salary Database</a> project from the Texas Tribune (<a href="">Figure 6</a>). This project collects 660,000 government employee salaries into a database for users to search and help generate stories from. You can search by agency, name or salary. It’s simple, meaningful and is making inaccessible information public. It is easy to use and automatically generates stories. It is a great example of why the Texas Tribune gets most of its traffic from the data pages.</p></div> <div class="paragraph"><p>— <em>Simon Rogers, The Guardian</em></p></div> </div> <div class="sect3"> <h4 id="_full_text_visualization_of_the_iraqi_war_logs_associated_press">Full-text visualization of the Iraqi War Logs, Associated Press</h4> <div class="imageblock" id="FIG016"> <div class="content"> <img src="" alt="figs/incoming/01-YY.jpg"></div> <div class="title">Figure 7. Analyzing the War Logs (Associated Press)</div> </div> <div class="paragraph"><p>Jonathan Stray and Julian Burgess’ work on <a href="">Iraq War Logs</a> is an inspiring foray into text analysis and visualization using experimental techniques to gain insight into themes worth exploring further within a large textual dataset (<a href="">Figure 7</a>).</p></div> <div class="paragraph"><p>By means of text-analytics techniques and algorithms, Jonathan and Julian created a method that showed clusters of keywords contained in thousands of US-government reports on the Iraq war leaked by Wikileaks in visual form.</p></div> <div class="paragraph"><p>Though there are limitations to the methods presented and the approach is experimental, it presents an innovative approach. Rather than trying to read all the files or reviewing the War Logs with a preconceived notion of what may be found by inputting particular keywords and reviewing the output, this technique calculates and visualises topics/keywords of particular relevance.</p></div> <div class="paragraph"><p>With increasing amounts of data — both textual (emails, reports, etc.) and numeric — coming into the public domain, finding ways to pinpoint key areas of interest will become more and more important — it is an exciting sub-field of data journalism.</p></div> <div class="paragraph"><p>— <em>Cynthia O’Murchu, Financial Times</em></p></div> </div> <div class="sect3"> <h4 id="_murder_mysteries">Murder Mysteries</h4> <div class="imageblock" id="FIG017"> <div class="content"> <img src="" alt="figs/incoming/01-XX.jpg"></div> <div class="title">Figure 8. <em>Murder Mysteries</em> (Scripps Howard News Service)</div> </div> <div class="paragraph"><p>One of my favorite pieces of data journalism is the <a href="">“Murder Mysteries”</a> project by Tom Hargrove of the Scripps Howard News Service (<a href="">Figure 8</a>). He built from government data and public records requests a demographically-detailed database of more than 185,000 unsolved murders, and then designed an algorithm to search it for patterns suggesting the possible presence of serial killers. This project has it all: hard work gathering a database better than the government’s own, clever analysis using social science techniques, and interactive presentation of the data online so readers can explore it themselves.</p></div> <div class="paragraph"><p>— <em>Steve Doig, Walter Cronkite School of Journalism of Arizona State University</em></p></div> </div> <div class="sect3"> <h4 id="_message_machine">Message Machine</h4> <div class="imageblock" id="FIG018"> <div class="content"> <img src="" alt="figs/incoming/01-HH.png"></div> <div class="title">Figure 9. <em>Message Machine</em> (ProPublica)</div> </div> <div class="paragraph"><p>I love ProPublica’s <a href="">Message Machine</a> story and <a href="">nerd blog post</a> (<a href="">Figure 9</a>). It all got started when some Twitterers expressed their curiosity about having received different emails from the Obama campaign. The folks at ProPublica noticed, and asked their audience to forward any emails they got from the campaign. The presentation is elegant, a visual diff of several different emails that were sent out that evening. It’s awesome because they gathered their own data (admittedly a small sample, but big enough to tell the story). But it’s even more awesome because they’re telling the story of an emerging phenomena, big data used in political campaigns to target messages to specific individuals. It is just a taste of things to come.</p></div> <div class="paragraph"><p>— <em>Brian Boyer, Chicago Tribune</em></p></div> </div> <div class="sect3"> <h4 id="_chartball">Chartball</h4> <div class="imageblock" id="FIG019"> <div class="content"> <img src="" alt="figs/incoming/01-JJ.png"></div> <div class="title">Figure 10. Charting victory and defeat (Chartball)</div> </div> <div class="paragraph"><p>One of my favourite data journalism projects is Andrew Garcia Phillips' work on <a href="">Chartball</a> (<a href="">Figure 10</a>). Andrew is a huge sports fan with a voracious appetite for data, a terrific eye for design and the capacity to write code. With Chartball he visualises not only the sweep of history, but details the success and failures of individual players and teams. He makes context, he makes an inviting graphic and his work is deep and fun and interesting – and I don’t even care much for sports!</p></div> <div class="paragraph"><p>— <em>Sarah Slobin, Wall Street Journal</em></p></div> </div> </div>
Some Favorite Examples
<div class="sect2"> <h3 id="_data_journalism_in_perspective">Data Journalism in Perspective</h3> <div class="paragraph"><p>In August 2010 some colleagues and I organised what we believe was one of the <a href="">first international ‘data journalism’ conferences</a>, which took place in Amsterdam. At this time there wasn’t a great deal of discussion around this topic and there were only a couple of organizations that were widely known for their work in this area.</p></div> <div class="paragraph"><p>The way that media organizations like Guardian and the New York Times handled the large amounts of data released by Wikileaks is one of the major steps that brought the term into prominence. Around that time the term started to enter into more widespread usage, alongside ‘computer-assisted reporting’, to describe how journalists were using data to improve their coverage and to augment in-depth investigations into a given topic.</p></div> <div class="paragraph"><p>Speaking to experienced data journalists and journalism scholars <a href="!/smfrogers/status/108238296685096961">on Twitter</a> it seems that one of the earliest formulations of what we now recognise as data journalism was in 2006 by Adrian Holovaty, founder of EveryBlock — an information service which enables users to find out what has been happening in their area, on their block. In his short essay <a href="">“A fundamental way newspaper sites need to change”</a>, he argues that journalists should publish structured, machine-readable data, alongside the traditional ‘big blob of text’:</p></div> <div class="quoteblock"> <div class="content"> <div class="paragraph"><p>For example, say a newspaper has written a story about a local fire. Being able to read that story on a cell phone is fine and dandy. Hooray, technology! But what I really want to be able to do is explore the raw facts of that story, one by one, with layers of attribution, and an infrastructure for comparing the details of the fire — date, time, place, victims, fire station number, distance from fire department, names and years experience of firemen on the scene, time it took for firemen to arrive — with the details of previous fires. And subsequent fires, whenever they happen.</p></div> </div> <div class="attribution"> </div></div> <div class="paragraph"><p>But what makes this distinctive from other forms of journalism which use databases or computers? How — and to what extent — is data journalism different from other forms of journalism from the past?</p></div> <div class="sect3"> <h4 id="_8216_computer_assisted_reporting_8217_and_8216_precision_journalism_8217">‘Computer-Assisted Reporting’ and ‘Precision Journalism’</h4> <div class="paragraph"><p>Using data to improve reportage and delivering structured (if not machine readable) information to the public has a long history. Perhaps most immediately relevant to what we now call data journalism is ‘computer-assisted reporting’ or ‘CAR’, which was the first organised, systematic approach to using computers to collect and analyze data to improve the news.</p></div> <div class="paragraph"><p>CAR was first used in 1952 by CBS to predict the result of the presidential election. Since the 1960s, (mainly investigative, mainly US-based) journalists, have sought to independently monitor power by analyzing databases of public records with scientific methods. Also known as ‘public service journalism’, advocates of these computer-assisted techniques have sought to reveal trends, debunk popular knowledge and reveal injustices perpetrated by public authorities and private corporations. For example, Philip Meyer tried to debunk received readings of the 1967 riots in Detroit — to show that it was not just less-educated Southerners who were participating. Bill Dedman’s “The Color of Money” stories in the 1980s revealed systemic racial bias in lending policies of major financial institutions. In his “What Went Wrong” Steve Doig sought to analyze the damage patterns from Hurricane Andrew in the early 1990s, to understand the effect of flawed urban development policies and practices. Data-driven reporting has brought valuable public service, and has won journalists famous prizes.</p></div> <div class="paragraph"><p>In the early 1970s the term <a href="">‘precision journalism’</a> was coined to describe this type of news-gathering: “the application of social and behavioral science research methods to the practice of journalism.” Precision journalism was envisioned to be practiced in mainstream media institutions by professionals trained in journalism and social sciences. It was born in response to “new journalism”, a form of journalism in which fiction techniques were applied to reporting. Meyer suggests that scientific techniques of data collection and analysis rather than literary techniques are what is needed for journalism to accomplish its search for objectivity and truth.</p></div> <div class="paragraph"><p>Precision journalism can be understood as a reaction to some of journalism’s commonly cited inadequacies and weaknesses: dependence on press releases (later described as “churnalism”), bias towards authoritative sources, and so on. These are seen by Meyer as stemming from a lack of application of information science techniques and scientific methods such as polls and public records. As practiced in the 1960s, precision journalism was used to represent marginal groups and their stories. According to <a href="">Meyer</a>:</p></div> <div class="quoteblock"> <div class="content"> <div class="paragraph"><p>Precision journalism was a way to expand the tool kit of the reporter to make topics that were previously inaccessible, or only crudely accessible, subject to journalistic scrutiny. It was especially useful in giving a hearing to minority and dissident groups that were struggling for representation.</p></div> </div> <div class="attribution"> </div></div> <div class="paragraph"><p>An <a href="">influential article</a> published in the 1980s about the relationship between journalism and social science echoes current discourse around data journalism. The authors, two US journalism professors, suggest that in the 1970s and 1980s the public’s understanding of what news is broadens from a narrower conception of ‘news events’ to ‘situational reporting’, or reporting on social trends. By using databases of — for example — census data or survey data, journalists are able to “move beyond the reporting of specific, isolated events to providing a context which gives them meaning.”</p></div> <div class="paragraph"><p>As we might expect, the practise of using data to improve reportage goes back as far as ‘data’ has been around. As Simon Rogers <a href="">points out</a>, the first example of data journalism at the Guardian dates from 1821. It is a leaked table of schools in Manchester listing the number of students who attended it and the costs per school. According to Rogers this helped to show for the first time the real number of students receiving free education, which was much higher than what official numbers showed.</p></div> <div class="imageblock" id="FIG0110"> <div class="content"> <img src="" alt="figs/incoming/01-LL.jpg"></div> <div class="title">Figure 11. Data Journalism in the Guardian in 1821 (The Guardian)</div> </div> <div class="paragraph"><p>Another early example in Europe is Florence Nightingale and her key report, <a href="">‘Mortality of the British Army’</a>, published in 1858. In her report to the parliament she used graphics to advocate improvements in health services for the British army. The most famous is her ‘coxcomb’, a spiral of sections, each representing deaths per month, which highlighted that the vast majority of deaths were from preventable diseases rather than bullets.</p></div> <div class="imageblock" id="FIG0111"> <div class="content"> <img src="" alt="figs/incoming/01-MM.jpg"></div> <div class="title">Figure 12. Mortality of the British Army by Florence Nightingale (Image from Wikipedia)</div> </div> </div> <div class="sect3"> <h4 id="_data_journalism_and_computer_assisted_reporting">Data journalism and Computer-Assisted Reporting</h4> <div class="paragraph"><p>At the moment there is a “continuity and change” debate going on around the label “data journalism” and its relationship with these previous journalistic practices which employ computational techniques to analyze datasets.</p></div> <div class="paragraph"><p>Some argue that there is a difference between CAR and data journalism. They say that CAR is a technique for gathering and analyzing data as a way of enhancing (usually investigative) reportage, whereas data journalism pays attention to the way that data sits within the whole journalistic workflow. In this sense data journalism pays as much  — and sometimes more — attention to the data itself, rather than using data simply as a means to find or enhance stories. Hence we find the Guardian Datablog or the Texas Tribune publishing datasets alongside stories, or even just datasets by themselves for people to analyze and explore.</p></div> <div class="paragraph"><p>Another difference is that in the past investigative reporters would suffer from a poverty of information relating to a question they were trying to answer or an issue that they were trying to address. While this is of course still the case, there is also an overwhelming abundance of information that journalists don’t necessarily know what to do with. They don’t know how to get value out of data. A recent example is the Combined Online Information System, the UK’s biggest database of spending information — which was long sought after by transparency advocates, but which baffled and stumped many journalists upon its release. As Philip Meyer recently wrote to me: “When information was scarce, most of our efforts were devoted to hunting and gathering. Now that information is abundant, processing is more important.”</p></div> <div class="paragraph"><p>On the other hand, some argue that there is no meaningful difference between data journalism and computer-assisted reporting. It is by now common sense that even the most recent media practices have histories, as well as something new in them. Rather than debating whether or not data journalism is completely novel, a more fruitful position would be to consider it as part of a longer tradition, but responding to new circumstances and conditions. Even if there might not be a difference in goals and techniques, the emergence of the label “data journalism” at the beginning of the century indicates a new phase wherein the sheer volume of data that is freely available online combined with sophisticated user-centric tools, self-publishing and crowdsourcing tools enables more people to work with more data more easily than ever before.</p></div> </div> <div class="sect3"> <h4 id="_data_journalism_is_about_mass_data_literacy">Data journalism is about mass data literacy</h4> <div class="paragraph"><p>Digital technologies and the web are fundamentally changing the way information is published. Data journalism is one part in the ecosystem of tools and practices that have sprung up around data sites and services. Quoting and sharing source materials is in the nature of the hyperlink structure of the web and the way we are accustomed to navigate information today. Going further back, the principle that sits at the foundation of the hyperlinked structure of the web is the citation principle used in academic works. Quoting and sharing the source materials and the data behind the story is one of the basic ways in which data journalism can improve journalism, what Wikileaks founder Julian Assange calls “scientific journalism”.</p></div> <div class="paragraph"><p>By enabling anyone to drill down into data sources and find information that is relevant to them, as well as to to verify assertions and challenge commonly received assumptions, data journalism effectively represents the mass democratisation of resources, tools, techniques and methodologies that were previously used by specialists — whether investigative reporters, social scientists, statisticians, analysts or other experts. While currently quoting and linking to data sources is particular to data journalism, we are moving towards a world in which data is seamlessly integrated into the fabric of media. Data journalists have an important role in helping to lower the barriers to understanding and interrogating data, and increasing the data literacy of their readers on a mass scale.</p></div> <div class="paragraph"><p>At the moment the nascent community of people who called themselves data journalists is largely distinct from the more mature CAR community. Hopefully in the future we will see stronger ties between these two communities, in much the same way that we see new NGOs and citizen media organizations like ProPublica and the Bureau of Investigative Journalism work hand in hand with traditional news media on investigations. While the data journalism community might have more innovative ways delivering data and presenting stories, the deeply analytical and critical approach of the CAR community is something that data journalism could certainly learn from.</p></div> <div class="paragraph"><p>— <em>Liliana Bounegru, European Journalism Centre</em></p></div> </div> </div>
Data Journalism in Perspective
<div class="sect2"> <h3 id="_the_abc_8217_s_data_journalism_play">The ABC’s Data Journalism Play</h3> <div class="paragraph"><p>Now in its 70th year the Australian Broadcasting Corporation is Australia’s national public broadcaster. Annual funding is around AUS$1bn which delivers seven radio networks, 60 local radio stations, three digital television services, a new international television service and an online platform to deliver this ever expanding offering of digital and user generated content. At last count there were in excess of 4,500 full time equivalent staff and nearly 70% of them make content.</p></div> <div class="paragraph"><p>We are a national broadcaster fiercely proud of our independence — because although funded by government — we are separated at arm’s length through law. Our traditions are independent public service journalism. The ABC is regarded the most trusted news organzation in the country.</p></div> <div class="paragraph"><p>These are exciting times and under a managing director — the former newspaper executive Mark Scott — content makers at the ABC have been encouraged to as the corporate mantra puts it — be ‘agile’.</p></div> <div class="paragraph"><p>Of course, that’s easier said than done.</p></div> <div class="paragraph"><p>But one initiative in recent times designed to encourage this has been a competitive staff pitch for money to develop multi-platform projects.</p></div> <div class="paragraph"><p>This is how the ABC’s first ever data journalism project was conceived.</p></div> <div class="paragraph"><p>Sometime early in 2010 I wandered into the pitch session to face with three senior ‘ideas’ people with my proposal.</p></div> <div class="paragraph"><p>I’d been chewing it over for some time. Greedily lapping up the data journalism that the now legendary Guardian data journalism blog was offering, and that was just for starters.</p></div> <div class="paragraph"><p>It was my argument that no doubt within 5 years the ABC would have its own data journalism unit. It was inevitable, I opined. But the question was how are we going to get there, and who’s going to start.</p></div> <div class="paragraph"><p>For those readers unfamiliar with the ABC, think of a vast bureaucracy built up over 70 years. Its primary offering was always radio and television. With the advent of online in the last decade this content offering unfurled into text, stills and a degree of interactivity previously unimagined. The web space was forcing the ABC to rethink how it cut the cake (money) and rethink what kind of cake it was baking (content).</p></div> <div class="paragraph"><p>It is of course a work in progress.</p></div> <div class="paragraph"><p>But something else was happening with data journalism. Government 2.0 (which as we discovered is largely observed in the breach in Australia) was starting to offer new ways of telling stories that were hitherto buried in the zero’s and dots.</p></div> <div class="paragraph"><p>All this I said to the folk during my pitch. I also said we needed to identify new skills sets, train journalists in new tools. We needed a project to hit play.</p></div> <div class="paragraph"><p>And they gave me the money.</p></div> <div class="paragraph"><p>On the 24th of November 2011 the ABC’s multi-platform project and ABC News Online went live with <a href="">‘Coal Seam Gas by the Numbers’</a>.</p></div> <div class="imageblock" id="FIG021"> <div class="content"> <img src="" alt="figs/incoming/02-01.png"></div> <div class="title">Figure 13. <em>Coal Seam Gas by the Numbers</em> (ABC News Online)</div> </div> <div class="paragraph"><p>It was five pages of interactive maps, data visualizations and text.</p></div> <div class="paragraph"><p>It wasn’t exclusively data journalism — but a hybrid of journalisms that was born of the mix of people on the team and the story, which to put in context is raging as one of the hottest issues in Australia.</p></div> <div class="paragraph"><p>The jewel was an interactive map showing coal seam gas wells and leases in Australia. Users could search by location and switch between modes to show leases or wells. By zooming in users could see who the explorer was, the status of the well and its drill date. Another map showed the location of coal Seam gas activity compared to the location of groundwater systems in Australia.</p></div> <div class="imageblock" id="FIG023"> <div class="content"> <img src="" alt="figs/incoming/02-02.png"></div> <div class="title">Figure 14. Interactive map of gas wells and leases in Australia (ABC News Online)</div> </div> <div class="paragraph"><p>We had data visualizations which specifically addressed this issue of waste salt and water production that would be produced depending on the scenario that emerged.</p></div> <div class="paragraph"><p>Another section of the project investigated the release of chemicals into a local river system</p></div> <div class="sect3"> <h4 id="_our_team">Our team</h4> <div class="ulist"><ul><li> <p> A web developer and designer </p> </li> <li> <p> A lead journalist </p> </li> <li> <p> A part time researcher with expertise in data extraction, excel spread sheets and data cleaning </p> </li> <li> <p> A part time junior journalist </p> </li> <li> <p> A consultant executive producer </p> </li> <li> <p> A academic consultant with expertise in data mining, graphic visualization and advanced research skills </p> </li> <li> <p> The services of a project manager and the administrative assistance of the ABC’s multi-platform unit </p> </li> <li> <p> Importantly we also had a reference group of journalists and others whom we consulted on a needs basis </p> </li> </ul></div> </div> <div class="sect3"> <h4 id="_where_did_we_get_the_data_from">Where did we get the data from?</h4> <div class="paragraph"><p>The data for the interactive maps were scraped from shapefiles (a common kind of file for geospatial data) downloaded from government websites.</p></div> <div class="paragraph"><p>Other data on salt and water were taken from a variety of reports.</p></div> <div class="paragraph"><p>The data on chemical releases was taken from environmental permits issued by the government.</p></div> </div> <div class="sect3"> <h4 id="_what_did_we_learn">What did we learn?</h4> <div class="paragraph"><p>‘Coal Seam Gas by the Numbers’ was an ambitious in content and scale. Uppermost in my mind was what did we learn and how might we do it differently next time?</p></div> <div class="paragraph"><p>The data journalism project brought a lot of people into the room who do not normally meet at the ABC. In lay terms — the hacks and the hackers. Many of us did not speak the same language or even appreciate what the other does. Data journalism is disruptive!</p></div> <div class="paragraph"><p>The practical things:</p></div> <div class="ulist"><ul><li> <p> Co-location of the team is vital. Our developer and designer were off-site and came in for meetings. This is definitely not optimal! Place in the same room as the journalists. </p> </li> <li> <p> Our consultant EP was also on another level of the building. We needed to be much closer, just for the drop-by factor </p> </li> <li> <p> Choose a story that is solely data driven. </p> </li> </ul></div> </div> <div class="sect3"> <h4 id="_the_big_picture_some_ideas">The big picture: some ideas</h4> <div class="paragraph"><p>Big media organzations need to engage in capacity building to meet the challenges of data journalism. My hunch is there are a lot of geeks and hackers hiding in media technical departments desperate to get out. So we need ‘hack and hacker meets’ workshops where the secret geeks, younger journalists, web developers and designers come out to play with more experienced journalists for skill sharing and mentoring. Task: download this data set and go for it!</p></div> <div class="paragraph"><p>Ipso facto Data journalism is interdisciplinary. Data journalism teams are made of people who would not in the past have worked together. The digital space has blurred the boundaries.</p></div> <div class="paragraph"><p>We live in a fractured, distrustful body politic. The business model that formerly delivered professional independent journalism – imperfect as it is — is on the verge of collapse. We ought to ask ourselves — as many now are — what might the world look like without a viable fourth estate? The American journalist and intellectual Walter Lippman remarked in the 1920’s that “it is admitted that a sound public opinion cannot exist without access to news”. That statement is no less true now. In the 21st century everyone’s hanging out in the blogosphere. It’s hard to tell the spinners, liars, dissemblers and vested interest groups from the professional journalists. Pretty much any site or source can be made to look credible, slick and honest. The trustworthy mastheads are dying in the ditch. And in this new space of junk journalism, hyperlinks can endlessly take the reader to other more useless but brilliant looking sources that keep hyperlinking back into the digital hall of mirrors. The technical term for this is: bullshit baffles brains. In the digital space everyone’s a storyteller now — right? Wrong. If professional journalism – and by that I mean those who embrace ethical, balanced, courageous truth seeking storytelling – is to survive then the craft must reassert itself in the digital space. Data journalism is just another tool by which we will navigate the digital space. It’s where we will map, flip, sort, filter, extract and see the story amidst all those 0’s and 1’s. In the future we’ll be working side by side with the hackers, the developers the designers and the coders. It’s a transition that requires serious capacity building. We need news managers who “get” the digital/journalism connection to start investing in the build.</p></div> <div class="paragraph"><p>— <em>Wendy Carlisle, Australian Broadcasting Corporation</em></p></div> </div> </div>
The ABC’s Data Journalism Play


Total run time: less than 10 seconds

Total cpu time used: less than 5 seconds

Total disk space used: 566 KB


  • Manually ran revision 2d7c083d and failed .
    nothing changed in the database
  • Forked from ScraperWiki

Scraper code


data_journalism_handbook /