Ling Liu (firstname.lastname@example.org)
Information and Knowledge Management
Data clustering technique is the unsupervised classification of patterns into groups. The patterns in the same group are more similar to the patterns in other groups. Cluster analysis is an important technique for understanding of large multi-dimensional datasets. Most of clustering research to date has been focused on developing automatic clustering algorithms and cluster validation methods. The automatic algorithms are known to work well in dealing with clusters of regular shapes, e.g. compact spherical shapes, but may incur higher error rates when dealing with arbitrarily shaped clusters. Arbitrarily shaped clusters also bring extra problems in cluster validation and cluster hierarchy definition.
Interactive data visualization was proved effective in understanding complicated multi-dimensional datasets in many applications. This technique can also be applied in cluster analysis very well. We have developed an interactive visual data-clustering tool called VISTA. VISTA maps the multi-dimensional (>3D) data onto the 2D visual space via the VISTA visualization model, which also enables us to create an effective user interface manipulating the mapping conveniently. Using VISTA tool, we can possibly observe the distribution of clusters in irregular shape.
The tool is available at:
Here is what you need to do.
The dataset you used and a report including:
· a description of the dataset you used,
· the interpretation of clusters,
· any domain knowledge about the dataset,
· your findings and the screen shots
· discussion about the difference and consistency between your findings and the documentation, if there is documentation about the cluster distribution.
· the parameter setting of the interesting visualizations ( alpha values, theta values and zooming factor. Note: alpha and theta values are estimated with the visualization)
Based on the report turned in to the sponsor of the project by the due date.