On data’s role in investigative reporting: “It’s all about the evidence”

by Sam Berkhead
Oct 30, 2018 in Data Journalism

It’s all about the evidence.

David Donald, a data journalist in residence at the American University School of Communication, stressed this fact during a recent talk with visiting Pakistani journalists from ICFJ’s U.S.-Pakistan Professional Partnership in Journalism.

“Everything I talk about is in the context of evidence,” Donald said. “Investigative reporting is about evidence; data journalism is about evidence.”

Donald, whose investigative work with the Center for Public Integrity (CPI) exposed the lending methods that led to the 2008 U.S. financial crisis and health care providers’ pervasive overcharging of their services to Medicare patients, emphasized that data should always play a role in effective reporting.

Here’s a look at the main points we took away from Donald’s talk:

Have a “document frame of mind”

To effectively use databases and documents as a journalist, Donald said it’s important to have a “document frame of mind.” Essentially, this means you need to work to find documents and data just as much as — if not more than — you work to find sources for quotes.

According to Donald, investigative journalists Don Barlett and Jim Steele define the “document frame of mind” as the state in which a journalist always assumes that somewhere, a document or database exists on any given topic.

“Until proven otherwise, it's your job to figure out where that is,” he explained. “Documents and databases are often official records, so you can quote them as such. A document or database will never call you up before you’re ready to publish and say, ‘I never said that.’ A few sources try to do that, but not a document or database.”

Numbers have a cultural context

Just as one can choose words that represent his or her personal biases, numbers and data can be equally manipulated to support a specific viewpoint without showing the full picture.

"Just because you did the math right, doesn’t mean the number isn’t taking sides,” Donald said, emphasizing the importance of computing the median rather than the average.

Donald cited the dispute over player salaries between baseball team owners and athletes that led to the 1994-95 Major League Baseball strike. Team owners typically reported the average, or mean, player salary to the press, which was boosted into the millions of dollars thanks to a few star players with extraordinary salaries. The players, meanwhile, reported the median salary, which removed the outlier effect, revealing that the typical player salary was much less.

“Every baseball [reporter] back then used the average because they probably didn’t even know how to compute the median,” Donald said.

There’s no such thing as perfect data

When using data in one’s reporting, it’s crucial to remember that all data is inherently flawed, Donald stressed.

“The problem with data is there's no such thing as perfect data,” he said. “There's always problems with the data, because somewhere, somehow, at some point, every piece of data has been touched by human hands, which makes for a high likelihood of errors.”

Data forces the source to be accountable

In addition to changing the way journalists write and present their information, data and evidence also profoundly affect the relationship between journalists and their sources, he said.

“You will have evidence that the source will have to respond to or choose not to respond to — which can make them look really foolish,” Donald said. “If you have no evidence, then what you’re discussing is very general. You’ll write down whatever the source wants to tell you, and don’t know what to ask other than to say ‘What do you think about this?’ or ‘What’s your opinion on this?’

“When you have evidence, you get to say things like ‘I found this evidence here, can you tell me your reaction?’ — and all of a sudden, things change profoundly as a reporter.”

Image CC-licensed on Flickr via romanlily.