Sponsor |
Ling Liu (lingliu@cc.gatech.edu) |
Area |
Information and Knowledge Management |
Problem
Data clustering technique is the unsupervised classification of patterns into
groups. The patterns in the same group are more similar to the patterns in
other groups. Cluster analysis is an important technique for understanding of
large multi-dimensional datasets. Most of clustering research to date has been
focused on developing automatic clustering algorithms and cluster validation
methods. The automatic algorithms are known to work well in dealing with
clusters of regular shapes, e.g. compact spherical shapes, but may incur higher
error rates when dealing with arbitrarily shaped clusters. Arbitrarily shaped
clusters also bring extra problems in cluster validation and cluster hierarchy
definition.
Interactive data visualization was proved effective in understanding complicated multi-dimensional datasets in many applications. This technique can also be applied in cluster analysis very well. We have developed an interactive visual data-clustering tool called VISTA. VISTA maps the multi-dimensional (>3D) data onto the 2D visual space via the VISTA visualization model, which also enables us to create an effective user interface manipulating the mapping conveniently. Using VISTA tool, we can possibly observe the distribution of clusters in irregular shape.
The tool is available at:
http://disl.cc.gatech.edu/VISTA/
Here is what you need to do.
Deliverables
The dataset you used and a report including:
· a description of the dataset you used,
· the interpretation of clusters,
· any domain knowledge about the dataset,
· your findings and the screen shots
· discussion about the difference and consistency between your findings and the documentation, if there is documentation about the cluster distribution.
· the parameter setting of the interesting visualizations ( alpha values, theta values and zooming factor. Note: alpha and theta values are estimated with the visualization)
Evaluation
Based on the report turned in to the sponsor of the project by the due date.