Editor's Note: Confiscati Bene, released in mid-December in Europe, is a pioneering data journalism collaboration that digs into the EUR4 billion of goods in the EU confiscated from criminals by European authorities. An international team of journalists and their allies sought to create a European database of seized assets and answer troubling questions about the accountability of the process.
In this post, Andrea Nelson Mauro, founder of project leader DataNinja.it, tells GIJN about organizing the project in Italy, which involved a diverse group of journalists, activists and technologists. Mauro describes step-by-step the investigation — now published in 19 Italian newspapers and across Europe — and explains the various tools used, including web scraping, content curation, data mining and coding.
On September 5, we had a Publication Day in Italy for our investigation regarding goods confiscated from the Mafia: one national newspaper (L’Espresso) and 18 websites of the same publisher (Repubblica-L’Espresso) — see the map below — put online our series revealing how many buildings and companies have been seized, region by region, to whom they did belonged and what the government is doing to give back these assets to Italian citizens. It was a big opportunity and an amazing experience for those of us working on the investigation. We began the project in July 2015.
Meanwhile, a very interesting blog post by Alberto Cairo, the Knight Chair in Visual Journalism at the University of Miami, appeared on the Nieman Lab website. Titled "Data Journalism Needs To Up Its Own Standards,” the story talked about over-promises from FiveThirtyEight and Vox.com projects that should “treat their data with more scientific rigor,” according to Cairo. For these and for other examples he cited, you may find — IMHO — a lot of interesting suggestions, expecially if you’re doing journalism with data, and the kind of issues and doubts we see every day in our jobs inside DataNinja’s pipeline. Until now data journalism — as I saw it — has developed too much as descriptive statistics, data visualizations, predictive analysis and special effects on the web (the “Wow! Effect,” as some friends say — or “map-itis,” about people who publish a map every minute without any news value).
So, I’d like to share what we did for the project “Confiscati Bene” (literally, “Well Confiscated") with the aim of starting a dialogue and getting feedback on we did well and what we need to improve.
Step 1: From meeting the open data project “Confiscati Bene” to working inside
Spaghetti Open Data (SOD) is a group of Italian citizens interested in release of public data in an open format.
The open data world gave me a great opportunity for refactoring my skills, and some years back I joined Italy's “Spaghetti Open Data” community. In March 2014, during a hackathon, we developed the first version of “Confiscati Bene,” an independent project powered by citizens to open data on goods seized from the Mafia. As first step, all data was scraped from the official website of the agency which has a database of confiscated goods. What a great opportunity! Not only for publishing the data, but for trying to improve the project with our journalistic and data skills.
We joined the team and helped build an online platform with a data catalog on Mafia assets that needed to be updated. Working this way we learned a lot about confiscated goods (by reading Acts from Parliament and discovering various reports and documents); Team members shared these documents on a project mailing list. How long would I have spent finding these resources on my own, instead of having a team that shared it quickly? How much could people help us (as journalists) do our jobs better, if only we gave them the opportunity? Doing it together — and not only with journalists — should work better.
Step 2: From starting the investigation to publishing in 19 newspapers and on the web
By the end of July, we had started our investigation and built a team of three journalists (Andrea Nelson Mauro — that’s me!; Alessio Cimarelli and Gianluca De Martino). We read something like 3,000 pages of documents and reports by various institutions and observatories, to better understand the data (even we are not experts in this area). By matching results and leads, we created a kind of “content curation” from the documents, extracting the most important journalistic issues (for us).
For instance, we discovered that the Italian government (with the EU) provided EUR6 million to the public agency overseeing confiscated goods to build a big database to collect these data, but no one did anything, no one knows where the money went and no one ever saw the project.
Regarding the skills and activities we developed:
- Data (and story) mining — This was a big chapter of the investigation: we did it on official documents and also on the web for finding matching results and statistics with data scraped from the public agency of confiscated goods. Sometimes you need to be determined to understand exactly at which step the good is taken. For example, is it seized, confiscated, frozen by the law, ceded to an NGO?
- Coding and geo issues — For showing confiscated goods on a map, we needed to develop a visualization tool. This was created by Alessio Cimarelli, using only open source tools (Leaflet, D3js, OSM Nominatim, and others). Data are shown on the Italian regions by absolute values and not normalized by population or other dimensions, because we aimed to draft a kind of raw overview: to show where the Mafia spent the money, and the differences between big cities and small towns.
- Content curation — We thought every confiscation should be told by every newspaper, as well. Starting from this idea, we aggregated every story by region and from newspaper archives, and from the most important bosses from whom goods were confiscated. Working this way (and after matching results with quantitative data) we could draw an overview showing which Mafia (the Sicilian Mafia, the Camorra, the Ndrangheta), showing a kind of distribution by region.
- The review process — Working in a team is very helpful for pointing out mistakes, but even better in my honest opinion was sharing article drafts with other members of the project.
Step 3: Looking forward, we’re doing database journalism
After we published, we gave the data back to Confiscati Bene by uploading in a data catalog developed with DKAN (a Drupal CMS like CKAN, powered for us by Twinbit). We’re part of the team of this project, so we’re interested for improving it, collecting other data and developing other chapters (for instance across Europe). With the release of the project in 19 newspapers, we’ve now successfully disseminated not only the news from the project but the data itself, and we're continuing to update the data. I don’t know where we will end up, but I know we’re moving forward and trying to improve this, so maybe you will hear yet more about Confiscati Bene.
This post was originally published at DataNinja.it. It was then cross-posted by the Global Investigative Journalism Network, and is republished on IJNet with permission. Andrea Nelson Mauro is a data journalist and founder of DataNinja.it and Datamediahub.it.
Image is a screenshot of Confiscati Bene's homepage.