Tuesday, May 10, 2011

Data Are Not Information

by Jeff Stanger

Note: This piece is cross-posted at NTEN.org

We are obsessed with data — big data, open data, linked data, personal data, government data, visualized data. And rightly so. Data carry enormous potential. But that's all they are — potential. It's important that we don't confuse "data" for "information." Data are only the raw material of information. Unlike data alone, information carries meaning, relevance and utility. So how do we get from data to information? Last summer I explored the distinction in an online piece that argued that communication is what transforms data into useful information, and that this communication process has been fundamentally altered by the latest wave of digital technology. The underlying message is even more relevant now as we seem to be moving from Big Data to Bigger Data. I'm grateful to NTEN for extending an offer to post an update.

First, research creates data

Data don't just materialize. They are the result of research — measuring something quantitatively or qualitatively, formally or informally; usually involving a question, a purpose, and by necessity a methodology and instrument. This process produces data broadly defined, whether numeric, textual, or audio-visual, structured or unstructured. (See, for example, Lucy Bernholz's "What Kind of Data are we Talking About?" and a video interview explaining "data" as anything that can be digitized.) All data depend on measurement, even data as common and seemingly objective as the weather. I could conduct research by walking outside and gathering the data points "hot" and "humid." Or I could collect the data "90 degrees" and "90% humidity" from the same set of circumstances using a thermometer and a hygrometer. Same reality, different data. Why? Different research.

Research → Data

Next, communication turns data into information

In gathering data, we've only created the potential to inform. The next step is to make those data meaningful, relevant and actionable information by communicating them in some way via a channel to an audience. Information equals data plus communication.

Information results from discovering something and then telling others about it. Data un-communicated, or ineffectively communicated, amount to personal knowledge or useless raw material. My measurement of "hot" and "humid" remains my personal data/knowledge until I walk back into the house and tell my wife. But what if I tell her in a language she doesn't understand? What if I don't speak loudly enough? What if I give her a written note that she ignores in favor of a richly interactive iPad app? I have failed to inform not because of a failure of data — the data, after all, are the same in every case — but because of a failure of communication. Communication turns data into information.

Information = Data + Communication

Because data must be communicated to be informative, our methods of communication become paramount. In a digital age, these methods have been revolutionized, fundamentally altering the landscape in which data become information. This shift started fifteen years ago with the widespread adoption of the World Wide Web. It accelerated 8-10 years ago with the start of broadband proliferation and the introduction of more capable Web browsers and programming platforms. It flew off the desktop four years ago with the release of the iPhone (followed later by the SDK and App Store) and other rich-media smartphones, and most recently hit the gas again with the unveiling of the iPad and other tablets.

This rapid expansion of the possibilities in digital communication has unquestionably transformed information consumption. According to the latest survey by the Center for the Digital Future at the University of Southern California's Annenberg School for Communication & Journalism, 82% of Americans now have access to the Internet, with 78% of them rating the Net as a "very important" or "important" source of information, surpassing both television (68%) and newspapers (56%) by wide margins (view survey results). In a similar finding, the Pew Research Center reported in January 2011 that among the general public the Internet has now surpassed newspapers as a primary source of news and is steadily closing the gap with television. This digital experience is now being carried around in millions of pockets. The Pew Internet & American Life Project finds that 85% of Americans have mobile phones, a majority of which will be Internet connected multimedia smartphones by year's end. Mobile users are doing a whole lot more than making calls — 38% of them (and rising) use their devices to access the Internet (View survey results).

IP networking technology has blossomed into a widely available, and widely used, interactive mass medium capable of spawning entirely new and uniquely digital communication forms. The Internet's data-handling capabilities are one of its distinguishing (and revolutionary) characteristics. As a result, the conditions under which data become information look different every day, whether in journalism, government, health care, education, publishing, philanthropy, policy research, nonprofit work, advocacy or countless other fields.

Meantime, also due to the powerful effects of digital technology, our ability to gather and store raw data — aka, research — has expanded exponentially. This is well-addressed in a 2010 Economist cover story "Data, data everywhere." [subscription required; note that the article admittedly, and incorrectly in my opinion, uses the terms data and information interchangeably]:

Quoting Kenneth Cukier: Information has gone from scarce to superabundant. That brings huge new benefits — but also big headaches.

The world contains an unimaginably vast amount of digital information which is getting vaster ever more rapidly. This makes it possible to do many things that previously could not be done... But [data] are also creating a host of new problems.

Joe Hellerstein, a computer scientist at the University of California in Berkeley, calls it 'the industrial revolution of data.' The effect is being felt everywhere, from business to science, from government to the arts. Scientists and computer engineers have coined a new term for the phenomenon: 'big data'.

Citing Hal Varian, Google's chief economist: Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

The communication platforms for converting data into information are shifting under our feet, while the data fire hose hits us full-force. As a result, to paraphrase Varian, we are awash in data, but barely wet with information.

The opportunities and strains in transforming data into information in a digital environment are evident in the open data, open government, and Gov 2.0 movements related to government data; linked data or semantic web efforts in the Web standards community; the newspaper industry's approach to evolving digital platforms and new commercial models related thereto; the push for electronic health records; and elsewhere.

Although we are early in this revolution, there are some examples of data being communicated in phenomenal ways. In journalism: outfits like the New York Times and Guardian UK, along with digital newcomers ProPublica and Texas Tribune are leaders in the field, churning out data-driven interactive information almost daily. In publishing: Al Gore's recently released iPad edition of "Our Choice" is chock full of data-driven interactive features. In government: designers and developers are transforming government data into useful information, particularly in the health field. Even the White House is getting in on the act.

A handful of examples from just this past week:

Note: The Center for Digital Information is compiling a catalogue of examples in its Digital Information Showcase.

What's important to note here is that the data really aren't all that different than they would be in a pre-digital era (except perhaps the New York Times Bin Laden reaction data that were gathered on the Web). What's new is the communication of those data, completely transformed by the latest digital platforms and techniques. The revolution in abundance and availability of data is being matched, I think more importantly, by a digital communication revolution in how we represent and understand those data. And we are only beginning to scratch the surface of these digital-native, interactive communication forms.

These sorts of examples illustrate what is at the heart of what I call digital information — raising the information-equals-data-plus-communication equation to the technology power.

Digital Information = (Data + Communication)Technology

Technology has transformed both the way we collect data and the way we communicate them, and has therefore rewritten the information equation. As the examples show, producing digital information can and indeed must be more than creating digital clones of constructs such as books, pages, articles, reports, white papers, and linear text stories borrowed from a pre-digital era. That old approach — digital distribution — amounts to unimaginative (read, ineffective) communication of data given today's diverse and transformative digital toolbox. Instead, successful information in a rapidly changing communication age demands that we use new mechanisms, born natively in the media of our time, utilizing their unique and powerful interactive capabilities. That growing set of tools includes interactive information graphics, data-rich Web applications, integrated multimedia, interactive maps, smartphone and tablet applications all baked with dynamic computation, customization, and visualization. These are things that were impossible in a pre-digital era (or even a year ago). They can and should raise our narratives to the technology power. Only then will digital data become digital information.

Comments welcome...