Welcome to the final homework assignment for CS1331. This homework will pull together the Java skills you have learned this semester and apply them all to build a functioning product. You will notice that this homework is significantly longer than past assignments (which is why it is worth 100 points rather than the usual 50) and so it carries greater weight on your final grade. For this reason, you have a full two weeks to finish it. It would be advisable to start as early as possible. It is very important that you read through this entire project description before you implement anything.
We are confronted in our daily lives with ever-increasing amounts of data and information. As technologists, we should think about ways that we can help people to use and take advantage of that information. Since you are studying computing and object-oriented programming this term, you might think about the kinds of software applications that would help people gain insight from data.
In this assignment you will create a system that visualizes information in a way that helps people to better browse, analyze, and understand that data. The application will provide an interactive visual representation of a moderately large data set.
In order to visualize data, we must understand that data. In this assignment, you will be given an example data set that is one likely familiar to everyone in class: breakfast cereals. Each individual cereal in the data set is considered to be a data case. Data cases can then have any number of attributes or variables. For cereals, example variables include the amount of calories per serving, how much fat, how much protein, the company that makes the cereal, and so on. Notice that variables can be quantitative or textual.
Below is a link to the sample data set. The data is a list of textual lines, with one cereal per line. On each line, variables are separated by commas. The first line lists what each variable is. You must read this data set into your application and store it in a data structure for subsequent use. You can assume that all the potential data sets used by your program will be in this comma-separated format.
On the cereals data set, the Manufacturer attribute has values G-General Mills, K-Kelloggs, N-Nabisco, P-Post, Q-Quaker Oats, R-Ralston Purina, and A-American Home Foods. The Type attribute has values C-Cold, H-Hot.
Another data set, one about cars, is included below.
In your assignment, you must at least work with this data set. (You'll see below that the program has a variety of possible levels of implementation for varying levels of credit.) At the simplest level, just hard-wire in reading and use of this data set. For a more flexible capability, allow the end-user to read in an arbitrary data set.
Our next concern is the visual representation of the data set. In this assignment, you will generate a relatively straightforward view of the data, a scatter plot. In a scatter plot, one variable is drawn onto the x-axis and a second variable is drawn onto the y-axis. Each axis has a scale in that variable's potential values. A data case (cereal) is plotted by drawing some kind of symbol or mark at the junction of the cereal's value on each of the axes. A couple example scatter plot visualizations are shown below. These images are taken from the commerical system Spotfire that provides a scatter plot style display.
Your assignment is to build an interactive GUI that reads in a data set such as the cereals and builds a scatter plot visualization of it. Once the data set has been read by the program, it draws the corresponding icons identifying the data cases at the correct positions. Design a nice, flexible graphical interface with reasonable UI controls and that can be resized.
We will evaluate your program according to the level of implementation you choose to address. Remember that the assignment is worth a total of 100 points. Below we list the functionality needed in your application in order to gain increasing amounts of points. For each new level, we implicitly assume that all of the previous levels' functionality is included.
50 points - At this level your program should be able to read in the cereals data set (hard-wired to cereals.csv) and draw a scatter plot visualization for two particular variables. For instance, on the cereals data set, we suggest that you could plot calories by sodium. Note that your program should be able to read an arbitrary cereals.csv file, ie, with different numbers of cereals and different attribute values. The format of the file will stay consistent, however.
75 points - The user should be able to select an arbitrary data file (differing names) to use through a dialog box selection. Obviously, for data sets on about different kinds of data other than cereals, the attribute names and types will change. Simply pick two variables to show upon start-up. Also, provide the ability for the user to click on an item in order to select it. You then indicate which item it is by providing some data about the item such as its name (the adventurous of you might want to draw an image of the cereal box cover for that data set!). You can indicate the item info in a region on your user interface outside the scatter plot drawing panel or in a pop-up window or however you think the user would best like. Also provide simple search, that is, the user should be able to type in a data case's name and you highlight that item.
100 points - Allow the user to change the attributes/variables that are being drawn on each axis. You should likely provide some kind of pop-up or pull-down menu choice that lists out the different variables in the data set. The user simply selects one of these and your user interface updates to show the data under these new constraints.
Extra credit - This assignment has a multitide of extra credit possibilities that should only be limited by your imagination. Some possibilities include:
For the truly ambitious - Develop other visualizations or representations of the data. One that we might suggest is a kind of graphical spreadsheet. Instead of showing textual values in spreadsheet cells, draw small horizontal bars (like in a bar chart) where the length of each bar is scaled to the value in that cell. For data that is not quantitative (like text) or that is categorical, you might use small different-colored marks that are drawn at different horizontal positions within the cell to indicate the attribute value. This visualization has many user interface customizations as well. For instance, when you click on a particular variable (column header) all the data cases (rows) are resorted according to that value. An example of what such a visualization might look like is shown below. The figure is taken from the commerical system Eureka from Inxight.
The assignment is ripe with different design choices that you will need to make. For instance, consider textual attributes like the cereal manufacturer such as Quaker Oats or Post. How do you map this along an axis?
What happens when many members of the data set have very similar values on the two attributes being shown? You will get a tight cluster of marks all on top of each other. A technique well known in information visualization is called jitter where you slightly perturb the positions of elements to "spread them out" a little and allow easier viewing.
Consider the data structures that you use to implement your program carefully. As the data sets get larger and larger, a more efficient data structure will become more advantageous and will keep the interactive response time of the GUI faster.
For fun, see if you can find or even create interesting other data sets. For instance, a data set about cars with variables for cost, horse power, miles per gallon, etc., would be very interesting. How about a data set with players in a sport such as baseball, or products like cameras, or different stocks.
Turn in all the java source files that are used in your system and any accompanying media files needed by your application. Please include the cereals data set in your submission for simplicity and any other data sets you used and would like to showcase. Important: please also turn in a one-page README.txt file that describes your program's functionality and highlights any key features that you want to tell us about.