ScraperWiki puts its tools in the hands of non-coding journalists

by Jessica Weiss
Oct 1, 2013 in Data Journalism

Journalists who have ignored repeated calls to learn how to code are in luck.

ScraperWiki, a platform for extracting and analyzing data from the web, has launched a new set of tools for journalists and is currently offering them free of charge. While maintaining and enhancing its original tools for coders, ScraperWiki’s new iteration is designed to be more powerful for the non-coding user as well.

Since winning the Knight News Challenge in 2011, UK-based ScraperWiki has been developing tools to help journalists analyze and use more data for stories. The team ran a series of data journalism camps and NewsHack Days to teach reporters how to scrape data for their stories.

The conclusion: There are some journalists who want to learn to code, but many others who don’t. Still, the non-coders need to be able to use data in their reporting.

As a response to this feedback, ScraperWiki is offering new options for pulling data from websites and into a format readable by humans using “point and click” tools, without programming. The tools enable users to scrape data from PDFs, CSVs, Excel files and HTML; to search Flickr to pull out the images containing geodata; or to gather all the audio tracks a Last.fm user has played. Journalists interested in searching Twitter users or hashtags can now scrape raw data about users’ accounts in a few steps.

To stay true to its roots, ScraperWiki has kept the “code in your browser” facility from its original version for people who want to code or learn to code.

Whereas in the previous version data was automatically made public, data is now kept private in the system by default, so journalists can “guard their scoop” and have control over their data until they’re ready to publish it.

“The idea is simply to try to make data much more portable,” ScraperWiki Marketing Manager Aine McGuire told IJNet. “This is a place where you can liberate data away from the web to make it much more accessible.”

After scraping the data and making them portable, users have several new possibilities for working with data. Users can view data in a table format; create a graph or map from the dataset; query a dataset using SQL; download or share the data; or summarize it with a number of visualization tools.

Coders are encouraged to collaborate to find new ways to use and visualize the data by coding their own tools using Git or SSH. And ScraperWiki also has a Web API to allow users to extract data for use on the web or in other applications.

GigaOm technology journalist Derrick Harris recently wrote about possibilities for visualizing Twitter followers using ScraperWiki’s new tool. After scraping and cleaning his data on ScraperWiki, Harris used Tableau Public, IBM Many Eyes and Google Fusion Tables to create visualizations.

Future plans include a “push to Tableau” tool inside ScraperWiki, to seamlessly make data useable in the popular visualization suite.

McGuire says the idea is to let journalists expand the boundaries for how to use and visualize data in their stories.

“When you decouple data from an app, we believe you might want to perform lots of different things against that data,” she says. “Now, you might be able to do things you didn’t think were possible before.”

In the short term, ScraperWiki is not asking people to confirm that they are affiliated to any official organizations. Journalists who use the site are requested to cite ScraperWiki as their data collection source.

To upgrade: Journalists who are existing ScraperWiki account holders can email hello@scraperwiki.com with the subject “journalist [your full name]” to ask for an upgrade. Those who are new to ScraperWiki should set up a free Community account then email to request an upgrade.

For more, visit ScraperWiki.

Jessica Weiss, a former IJNet managing editor, is a Buenos Aires-based freelance journalist.

Image of toy scraper construction truck courtesy of Flickr user Leap Kye under a Creative Commons license.