How an Italian data journalism collaboration investigated dark money across Europe

by Andrea Nelson Mauro
Oct 30, 2018 in Data Journalism

Editor's Note: Confiscati Bene, released in mid-December in Europe, is a pioneering data journalism collaboration that digs into the EUR4 billion of goods in the EU confiscated from criminals by European authorities. An international team of journalists and their allies sought to create a European database of seized assets and answer troubling questions about the accountability of the process.

In this post, Andrea Nelson Mauro, founder of project leader DataNinja.it, tells GIJN about organizing the project in Italy, which involved a diverse group of journalists, activists and technologists. Mauro describes step-by-step the investigation — now published in 19 Italian newspapers and across Europe — and explains the various tools used, including web scraping, content curation, data mining and coding. 

On September 5, we had a Publication Day in Italy for our inve­sti­ga­tion regar­ding goods confiscated from the Mafia: one natio­nal newspa­per (L’Espresso) and 18 web­si­tes of the same publi­sher (Repubblica-L’Espresso) — see the map below — put online our series revealing how many buil­dings and com­pa­nies have been sei­zed, region by region, to whom they did belon­ged and what the government is doing to give back these assets to Italian citi­zens. It was a big oppor­tu­nity and an ama­zing expe­rience for those of us wor­king on the inve­sti­ga­tion. We began the project in July 2015.

Meanwhile, a very inte­re­sting blog ­post by Alberto Cairo, the Knight Chair in Visual Journalism at the University of Miami, appeared on the Nieman Lab website. Titled "Data Journalism Needs To Up Its Own Stan­dards,” the story talked about over-promises from FiveThirtyEight and Vox​.com pro­jects that should “treat their data with more scien­ti­fic rigor,” according to Cairo. For these and for other exam­ples he cited, you may find — IMHO — a lot of inte­re­sting sug­ge­stions, expe­cially if you’re doing jour­na­lism with data, and the kind of issues and doubts we see every day in our jobs inside DataNinja’s pipe­line. Until now data jour­na­lism — as I saw it — has deve­lo­ped too much as descrip­tive sta­ti­stics, data visua­li­za­tions, pre­dic­tive ana­ly­sis and spe­cial effects on the web (the “Wow! Effect,” as some friends say — or “map-itis,” about peo­ple who pu­blish a map every minute without any news value).

So, I’d like to share what we did for the pro­ject “Confiscati Bene” (literally, “Well Confiscated") with the aim of starting a dialogue and getting feedback on we did well and what we need to improve.

Step 1: From meeting the open data project “Confiscati Bene” to working inside

Spaghetti Open Data (SOD) is a group of Italian citizens interested in release of public data in an open format.

The open data world gave me a great oppor­tu­nity for refac­to­ring my skills, and some years back I joi­ned Italy's “Spaghetti Open Data” community. In March 2014, during a hac­ka­thon, we deve­lo­ped the first ver­sion of “Confiscati Bene,” an indepen­dent pro­ject powe­red by citi­zens to open data on goods sei­zed from the Mafia. As first step, all data was scra­ped from the offi­cial web­site of the agency which has a data­base of con­fi­sca­ted goods. What a great oppor­tu­nity! Not only for publishing the data, but for try­ing to improve the pro­ject with our jour­na­li­stic and data skills. 

We joi­ned the team and helped build an online plat­form with a data cata­log on Mafia assets that needed to be upda­ted. Working this way we lear­ned a lot about con­fi­sca­ted goods (by rea­ding Acts from Parliament and disco­ve­ring various reports and docu­ments); Team mem­bers shared these docu­ments on a project mailing list. How long would I have spent fin­ding these resour­ces on my own, instead of having a team that shared it quic­kly? How much could peo­ple help us (as jour­na­lists) do our jobs better, if only we gave them the oppor­tu­nity? Doing it toge­ther — and not only with jour­na­lists — should work better.

Step 2: From starting the investigation to publishing in 19 newspapers and on the web

By the end of July, we had star­ted our inve­sti­ga­tion and built a team of three jour­na­lists (Andrea Nelson Mauro — that’s me!; Alessio Cimarelli and Gianluca De Martino). We read some­thing like 3,000 pages of docu­ments and reports by various insti­tu­tions and obser­va­to­ries, to better understand the data (even we are not experts in this area). By matching results and leads, we created a kind of “con­tent cura­tion” from the docu­ments, extrac­ting the most important jour­na­li­stic issues (for us).

For instance, we disco­ve­red that the Italian government (with the EU) provided EUR6 million to the public agency overseeing con­fi­sca­ted goods to build a big data­base to collect these data, but no one did anything, no one knows where the money went and no one ever saw the project.

Regarding the skills and acti­vi­ties we developed:

  • Data (and sto­ry) mining — This was a big chap­ter of the inve­sti­ga­tion: we did it on offi­cial docu­ments and also on the web for finding mat­ching results and sta­ti­stics with data scra­ped from the public agency of confiscated goods. Sometimes you need to be determined to understand exactly at which step the good is taken. For example, is it sei­zed, con­fi­sca­ted, frozen by the law, ceded to an NGO?
  • Coding and geo issues — For sho­wing con­fi­sca­ted goods on a map, we nee­ded to develop a visua­li­za­tion tool. This was created by Alessio Cimarelli, using only open ­source tools (Leaflet, D3js, OSM Nominatim, and others). Data are sho­wn on the Italian regions by abso­lute values and not nor­ma­li­zed by popu­la­tion or other dimen­sions, because we aimed to draft a kind of raw over­view: to show where the Mafia spent the money, and the dif­fe­rences bet­ween big cities and small towns.
  • Content curation — We thought every con­fi­sca­tion should be told by every new­spa­per, as well. Starting from this idea, we aggre­ga­ted every story by region and from new­spa­per archi­ves, and from the most impor­tant bosses from whom goods were con­fi­sca­ted. Working this way (and after mat­ching results with quan­ti­ta­tive data) we could draw an over­view showing which Mafia (the Sicilian Mafia, the Camorra, the Ndrangheta), sho­wing a kind of distri­bu­tion by region.
  • The review process — Working in a team is very hel­p­ful for poin­ting out mista­kes, but even better in my honest opi­nion was sha­ring article drafts with other mem­bers of the pro­ject.

Step 3: Looking forward, we’re doing database journalism

After we pu­bli­shed, we gave the data back to Confiscati Bene by uploa­ding in a data cata­log deve­lo­ped with DKAN (a Drupal CMS like CKAN, powe­red for us by Twinbit). We’re part of the team of this pro­ject, so we’re inte­re­sted for impro­ving it, col­lec­ting other data and deve­lo­ping other chap­ters (for instance across Europe). With the release of the project in 19 new­spa­pers, we’ve now successfully disseminated not only the news from the project but the data itself, and we're continuing to update the data. I don’t know where we will end up, but I know we’re moving forward and try­ing to improve this, so maybe you will hear yet more about Confiscati Bene.

This post was originally published at DataNinja.it. It was then cross-posted by the Global Investigative Journalism Network, and is republished on IJNet with permission. Andrea Nelson Mauro is a data journalist and founder of DataNinja.it and Datamediahub.it.

Image is a screenshot of Confiscati Bene's homepage.