<Data Journalist Playbook – Part 4Monday, April 29th, 2013
by contributor

The following is Part 4 of 4 (read parts 1, 2, 3) in our series dissecting the role of a Data Journalist. In this post Liv Buli (@lbuli) – Data Journalist of music industry analytics and insights provider Next Big Sound – offers some final tips and tricks. At the bottom we close with a full PDF download of the series (including bonus appendix) and a post-Playbook interview with Liv.

When you are presenting data in graphical form, there are several considerations you must make. Understanding the numbers is of course imperative, but presenting the numbers in a manner your audience will easily comprehend is the most pressing challenge. The advantage of working for Next Big Sound is that I very rarely have to think about how to present the data; it is easily done for me through a platform that is already carefully crafted and considered. But for the typical data journalist, knowing how to present statistical information in a way that intrigues and involves the reader can be a challenge. Taking care to not obfuscate the data, and creating charts that are easily readable.

One of the greatest resources I found in learning how to deal with data was the Wall Street Journal Guide to Information Graphics, written by Dona M. Wong. This book gives you a comprehensive insight into the basics of presenting statistical information, and the dos and don’ts of charting data. Extremely useful, this guide is an overview of key rules that will help any data journalist understand how to present data in a manner that is not only valuable, but also completely accurate.

I don’t for a minute wish to imply that I now know everything about graphical design or charts, but I have learned the basics of how to create a basic visual representation of data from this book, and believe it to be integral to explaining this phenomenon we call Big Data.

Learning to query with R
Another step in becoming integrated into this company, was realizing I knew next to nothing about how to deal with information stored in a database. A database is a central collection of information, organized in tables, which when relevant and utilized in the correct manner, will be a source of answers, to any question you might have.

Working with a team of engineers, data scientists, and designers who understood programming like the back of their hand, it became imminently obvious to me that the ability to extract information from whatever database I had at hand, whether it that of that Next Big Sound, or any other, was integral to independently mastering this role.

In this vein I decided to delve into learning how to query. Now, the differences in programming language can be somewhat complicated for those of us who aren’t engineers. There are several ways in which you can interact with a database, whether it be languages such as Java, Python, R, or whatever else these guys (who are definitely smarter than me) come up with. But a basic requirement, I believe, for a data journalist, is to be able to extract the relevant information you need for a specific story from it.

The initial step is to learn Sequence Query Language (SQL), a standard vernacular for computer programming which allows you pull information from a database. For instance, if you would like to get a list of all artists whose name starts with the letter B, or who have between 5000 and 50,000 fans on Facebook from the Next Big Sound database, you would need to know the basic commands of SQL, such as SELECT, AND/OR etc. There are several free online resources where you can learn the basics.

Taking this a step further, you might like to narrow down this query to a more specific question, such as, how many of these artists gained a certain amount of followers, within a certain time period, and how rapidly did that growth occur. Our data scientists rely heavily on R, which is the language I am in the process of learning. Using this I can ask more complicated questions of the data, take a stab at various graph formats in order to see what might work best, and eliminate data that for various reasons might not be relevant. With the ability to query, the opportunities for what you can learn from a collection of numbers starts to become not only unimaginable, but also overwhelming. Over the coming months, I will continue to study the various programming languages, in hopes of eventually mastering them.

As data journalists, we are opening a whole new world. In terms of what we can do, what we can learn, and what we can explain to our audience. And as this world of data grows larger, faster and ever more unmanageable, it is our job to understand, even though that means stepping outside our comfort zone of writing, recording and editing information. You may never have thought that as a journalist that you would have to learn how to code, but now that is becoming a basic requirement of telling the narrative of Big Data.

Data Journalist Playbook from IA Ventures
blog comments powered by Disqus