Homework 5: Critiquing Commercial InfoVis Systems



This assignment will familiarize you with a number of systems that have been built for analyzing multivariate data sets. You will be working with Spotfire and Tableau. Tableau’s data visualization software is provided through the Tableau for Teaching program. For license information, please see T-Square. For Spotfire, please download the free trial here. For Spotfire, you have the option of using either the Windows-only desktop client, or the Spotfire Cloud online system.

The goals of the assignment are for you to learn the capabilities provided by these types of systems, learn the visualization methods that they provide, and assess their utility in analyzing information repositories. You will work with some provided data sets in the assignment. Think about the kinds of questions that an analyst would be asking about the data sets. IMPORTANT: The choice of which two is up to you. (Feel free to work with all three as well.)

The assignment has four parts:

1. Gain familiarity with the systems
Familiarize yourself with the visualization techniques and the user interfaces of the different systems. Each one has a tutorial that you should try out with a sample data set. Work your way through the tutorial and become familiar with the system, its interface and its capabilities.

2. Examine the sample data sets
Each tool includes a few sample data sets, but often it's best to learn with something new (or data of your own). There are a collection of datasets uploaded to t-square (folder named HW5_datasets in the Resources). You must work with the Food Nutrition (food.xls) data as one, and choose one of the other datasets that you want to work with. Briefly scan the text/.xls of the files and familiarize yourself with the variables. In particular, remember that the data is “raw”, meaning you may have to do some “wrangling” and “cleaning” to have labels, headings, etc. be usable/readable. Remember (/write down) what you do in these steps, as you will want to include that as you turn in your assignment. For example, did you do it in the tool? Did you have to back out and do it in a text editor? Etc.

Generate and write down (you will need to turn them in) a few hypotheses to be considered, tasks to be performed, or questions to be asked about the data elements. Think about all the different kinds of analysis tasks that a person might want to perform in working with data sets such as these. Remember the task taxonomies we have talked about. Use them to frame your selection of tasks/questions/hypotheses.

For instance, someone working with a data set about breakfast cereals might have tasks like:
Identify the cereals with the most salt.
Do the different companies producing cereals have different styles of cereal that they favor?
Does high fat mean high calories?
What cereals would you recommend to someone on a diet who still wants some good taste?
Does the nutritional value of cereals vary a lot? If so, how?
etc.

Try not to make all of your questions be about correlations, which seems to be a common thing to do...

3. Load and examine the data sets into the systems
Load the food nutrition and other data set that you selected into each of the two visualization tools, then consider your hypotheses, tasks, and questions. Also use the systems to explore the data sets and see if you can discover other interesting or unexpected findings in the data sets. Put yourself in the shoes of a data analyst, and consider questions that such a person would confront. Note that some tools will do better with some datasets (size, type of variable, etc.). Do what you can do make them work in the tools, but remember what you struggle with to write up for #4 below.

4. Write a report on your findings
Write up a summary of your exploration process, findings, and impressions of the systems. Include your hypotheses/tasks/questions and what you found. Furthermore, critique the different tools in a general sense. Include screenshots to help explain your analyses and critiques. What are the systems' strengths and weaknesses? How do their visualization capabilities differ? For what kinds of user tasks is each tool suited? Focus more here on the visualization techniques as opposed to the particular user interface quirks, though you should feel free to comment on UI aspects when they are particularly good or bad. Additionally, for each tool, list one unexpected finding, insight, or discovery made while exploring one of the datasets with that tool. Explain how the system helped to facilitate the finding. Was it through interaction? visualizations? by accident??

We recommend that you not walk through each question/task one-by-one for each of the two systems you used. (There simply won't be space to do so.) You might want to include specific examples of how the systems assisted or did not assist work on specific tasks, however. Point out interesting, insightful observations; you don't need to tell us how a system works -- we already know that. Think of this like a report to your manager (a person knowledgeable in visualization) who wants to know what each system can provide, its pros and cons. Think about this manager as having to purchase on of these tools for his or her analysts to use. Focus specifically on how its visualizations help or hinder analysis. How dd the systems compare? A table of pros and cons is a good way to do this.

Your document is limited to a maximum of 8 pages, single-spaced, reasonable font size, including embedded screenshots. Please bring two hardcopies to class on the day that it is due.

Acknowledgments: Special thanks go out to Chris Ahlberg of Spotfire, Jock Mackinlay and Chris Stolte of Tableau, Michael Spenke and Christian Beilken of the Fraunhofer Institute for Applied Information Technology, and Steven Pesklo and Doug Molumby of Softlake Solutions for all their help in getting and working with the systems.