From infections to mobility: Tips for finding and analyzing COVID-19 data

by Rowan Philp
Jul 8, 2020 in COVID-19 Reporting
People walking

Organized crime is changing its trafficking routes, under the cover of COVID-19. Timber smugglers in the Amazon are boosting their trade in the absence of supervision. Unemployment and alcohol dependency rates are jumping, and climate change continues unabated.

The world is changing rapidly — and at almost every level — in the shadow of the coronavirus pandemic, says Giannina Segnini, director of the Data Journalism Program at Columbia University in the United States.

But Segnini says data and tools are available to investigate and analyze these changes, and that reporters can track many of these shifts in real time, including changes in behavior.

Data ideas for the post-COVID-19 landscape
Data ideas for the post-COVID-19 landscape. Image: Columbia University and CLIP


“Listen, your grandchildren will be talking about the pre- and post-COVID-19 era. We are definitely witnessing a historical situation here,” she said in a June 18 webinar, part of GIJN’s series Investigating the Pandemic. “This thing is just starting. There are so many things happening across borders that are not being monitored. But there is data to monitor what is happening, and never before has data been a better tool to make sense of the world around us.”

[Read more: Tips for visualizing COVID-19 data]


A co-founder of The Latin American Center for Investigative Journalism (CLIP, for its acronym in Spanish), Segnini and her data scientist colleague at CLIP, Rigoberto Carvajal, shared insights on finding new data sources to investigate this new world.

Beyond the direct health threats of COVID-19 and its fallout, Segnini said newsrooms could create data dashboards showing changes to ordinary life in their communities.

“Using automated data integration and standardized scales, you can imagine dashboards that reflect changes in variables like, say, traffic tickets, arrests, food prices, evictions,” she said. “All these changes are going to happen right away in society. And bad actors are taking advantage of the fact that we are all distracted by [the] coronavirus. Human trafficking and corruption is still happening, but they are changing routes and methods. There is destruction of supply chains. There are dramatic changes in shipping and airlines, with the travel restrictions.”

Newsrooms could create their own dashboards
Although the numbers are fictitious in this mock-up, Segnini said newsrooms could create their own dashboards with data from “ordinary life” overlaid on COVID-19 case data to show the broader impacts. Image: Columbia University and CLIP


Segnini was previously head of the investigations unit at Costa Rica’s La Nación, and her team’s work led to the prosecution of more than 50 public figures, including three former presidents.

In his previous role with the International Consortium of Investigative Journalists, Carvajal was one of the data experts on the Panama Papers investigation.

Carvajal said COVID-19 case data provided by governments ranged from raw numbers and basic dashboards to downloadable, aggregated data — the most common form — and the best, but rarest, form: granular, case-by-case data. Within Latin America, he said, Mexico, Colombia, and Peru stood out as countries offering the richest case data.

[Read more: Using data journalism to cover the pandemic in Latin America]


“The best way to get rich visualization of knowledge from datasets is to mine granular data, with individual [anonymized] records for each patient [case],” said Carvajal.

He said it was important to use “ETL” programs (extract, transform, load) to automatically import that data into dashboards or visualizations because of its sheer volume. He uses an open source tool, Talend Open Studio, for data integration.

However, with COVID-19 data still unreliable in many countries, Segnini said mining excess mortality data remained a powerful technique for showing the pandemic’s broader impacts.

“Whether you have granular or aggregated data, we know that not all the cases are being counted — because many die at home, or they were not tested and the policy only [cites] positive-tested people, or because the reporting systems are inadequate or inaccurate,” she said. “Many are scared to go to hospitals, and could have died because they had complications. There is a methodology that allows you to calculate this excess mortality. You need to have data on all previous deaths during the same period of time in previous years. You can represent it by absolute numbers or as a percentage. The more previous years you have, the better the calculation.”

She said the emergence of mobility data — where personal mobile phone signals can be anonymized and aggregated — represented a powerful new tool for describing rapid change.

Data tools recommended by Carvajal and Segnini 

Talend data integration
Image: Columbia University and CLIP

This article was originally published by the Global Investigative Journalism Network. For more information on their webinar series related to COVID-19, check out their website.

Rowan Philp is a reporter for GIJN. Rowan was formerly chief reporter for South Africa’s Sunday Times. As a foreign correspondent, he has reported on news, politics, corruption, and conflict from more than two dozen countries around the world.

Main image CC-licensed by Unsplash via Ryoji Iwata