Data visualisation at STC Summit 2013
This week I’m attending STC Summit 2013, the annual conference of the Society for Technical Communication. I’ll blog about the sessions I attend, and give you some links to other news I hear about too. You’ll find my posts under the tag stc13 on this blog.
The most important map in the world
Phylise showed us a map of the Soho area of London. For the work we do as technical communicators, this is the most important map in the world.
John Snow, who lived in England in the 1800s. During the great cholera outbreak, he plotted the homes of the people who were dying. From this map, he was able to trace the source of the disease to the water source. The notion of relational data telling a story. As a result, the city owners took the handle off the Broad Street pump, and the outbreak stopped.
Ten questions we need to ask
- What story are you trying to tell?
- What actions do you expect people to take? Are you creating visualisations for a decision maker, an explorer, or whatever, and what actions will they take based on your data?
- What route will you take? Explore the data further. There are places where you can get data and explore it. Until you see what is there, you don’t know what you’re looking for.
- What is the difference between qualitative and quantitative data? You need to know the difference. Quantitative is numbers and values. Qualitative is descriptive data. Some data is both.
- Where does the data come from and what are the sources? Site your sources, and link to it if you want. Make sure it’s real, secure and correct. This may not be an easy task. Think about where else you can get it from.
- What tools are available? There are some really interesting and good tools available:
- ManyEyes, from IBM. This is an online open tool, one of the best out there. Explore the data sets available here, as well as the visualisations. Be aware, if you put your data up here it will be visible to the world. Playing with this tool is a great way of learning. Read the section on data format and style.
- Tableau Public is a free version of Tableau. A great free tool.
- What are the best practices? Best practices for working with data are different from those for working with visualisations. It’s not about being a mathematician or statistician, but it is about getting to know the field.
- Read Stephen Few and Tufte.
- Avoid pie charts. (This has come up in all three of the data visualisation sessions I’ve attended at this conference.)
- What to do with the data?
- As an exercise, try comparing and contrasting the nutrition facts from difference restaurants. Phylise showed us three PDF files from various restaurants. How can you process a PDF file? You could try an OCR. But the first step is to decide the story you want to tell, then extract the relevant data. Type it into Excel so you can work with it.
- Phylise showed us an Excel file where a single column contained 1 to 3 values, and occasionally text. This is the kind of mess you often get. So, you’ll need to clean the data before you can use it. This often takes up a large part of the task.
- Learn regular expressions and Excel’s find/replace syntax, including wild cards. See Regular-Expressions.info.
- Code dichotomous variables to 1 and 0.
- Pay attention to case. In qualitative data, a difference in case is a different variable.
- Watch for invisible baddies! For example, spaces at the beginning and end of column. This expression is your friend:
- Watch out for ampersands. Use the word “and” instead.
- Remember that 1.0 is not the same as 1. Check your character types,and watch your floats versus integers. When rounding, make sure you have a standard policy.
- Watch for outliers – data points that don’t fit in with the data. You can decide to take these out, if they’re not important to your data. But think about what it means. The outlier may be what tells your story.
- Beware of the acceptable leading zero. For example, there are zip codes that begin with zero. Make sure the data type is text, if relevant.
- If you’re using comma-separated values, use quotes. Otherwise you’ll run into trouble when the data itself contains commas.
- Write macros to clean your data, if you get the same data regularly. This is where technical communicators have the skill sets required.
- Figure out what the data is saying. For example, in qualitative data, you may need to know how many times certain words come up.
- Decide on the best way of representing the data. For example, to create a word cloud, you may need a document containing all the words, repeated the relevant number of times. Phylise showed us a case where it took her more than a week to clean the data, then 30 seconds to generate the word cloud.
- What is the context? What is around the data, and what do you want to say in the specific environment?
- What comes next? What other stories can you tell, and what new data can come from the data you have. You’ll never know this until you know the context of the data you have.
Some data to play with
Phylise shows us a list of data sources. They will be available on SlideShare. Here are a few I had time to note down:
Some I noted:
There’s data behind every visualisation. Unless you get to know the data, you’re just creating another picture. You can become incredibly valuable if you go for what it means to work with the data.
ManyEyes is a wonderful way to start.