In this project, you will apply several algorithms to two data sets. Please answer each question in the order they appear. Do not skip to later steps to answer earlier questions that ask you to predict outcomes based on your analysis of the data and understanding of the algorithms.
Submit your report, titled <yourgtname>-project2.pdf, as an attachment to a submission to Project 2 on T-Square by the end of the day on the due date. T-Square will be set to accept repeat submissions.
Copy and paste the text below into a text editor or use this LaTeX template: yourgtname-project2.tex . Submit only a PDF. Sending me a MS Word document is equivalent to sending me nothing.
Run SVMs on the Iris data set from Project 1 using polynomial and RBF kernels.
Run PCA to reduce the dimensionality of the churn calibration data and run SVMs on the reduced data as you did for the original data.
Reduce the dimensionality of the churn calibration data using subset selection.
Run AdaBoost with decision stumps on the Iris data.
Use PCA to reduce the dimensionality of the churn calibration data to visualize the data set in two or three dimensions.
Use one of the techniques dicussed in class to choose a k setting (number of clusters) for k-means and EM on the churn calibration data without the churn labels.
Run k-means and EM on the data.
Use the clusters as labels for the churn data, run decision tree algorithm, extract rules from the data, and give decriptive names to the labels/clusters. You may use the original churn data or use dimensionality-reduced churn data.
Now use the clusters as attributes for the churn data instances, add the churn labels, and run SVMs to predict churn as you did in the first part of this project.