In this project, you will apply several algorithms to two data sets. Please answer each question in the order they appear. Do not skip to later steps to answer earlier questions that ask you to predict outcomes based on your analysis of the data and understanding of the algorithms.

Submit your report, titled <yourgtname>-project2.pdf, as an attachment to a submission to Project 2 on T-Square by the end of the day on the due date. T-Square will be set to accept repeat submissions.

- The telecom churn data set on T-Square
- The Iris data from Weka

- Clustering
- clusterers/SimpleKMeans
- clusterers/EM

- Decision Trees
- classifiers/trees/J48

- PCA
- attributeSelection/PrincipalComponents

- SVM
- classifiers/functions/SMO

- Boosting
- classifiers/meta/AdaBoostM1

Copy and paste the text below into a text editor or use this LaTeX template: yourgtname-project2.tex . Submit only a PDF. Sending me a MS Word document is equivalent to sending me nothing.

**Run SVMs on the Iris data set from Project 1 using polynomial and RBF
kernels.**

- Which kernel works better? Why?
- How did the SVMs compare to the classifiers from Project 1 in terms of training time and test performance?
- Run support vector machines on the churn calibration data using polynomial and RBF kernels.
- Which kernel works better? Why?

**Run PCA to reduce the dimensionality of the churn calibration data and
run SVMs on the reduced data as you did for the original data.**

- How many principal components did you pick? Why? How much of the variance in the churn data is described by the principal components you chose?
- How did the SVMs perform on the reduced data compared to the original data? Why?

**Reduce the dimensionality of the churn calibration data using subset
selection.**

- Compare the process of using subset selection to PCA.
- How did the performance on the reduced data set using subset selection compare to the PCA reduced data?

**Run AdaBoost with decision stumps on the Iris data.**

- How did the boosted decision stumps compare to the J48 classifier from Project 1?

**Use PCA to reduce the dimensionality of the churn calibration data to
visualize the data set in two or three dimensions.**

- How much of the variance in the data is described by the first two or three principal components?
- What does the visualization tell you about the data?

**Use one of the techniques dicussed in class to choose a k setting
(number of clusters) for k-means and EM on the churn calibration data
without the churn labels.**

- How did you choose k? Why?

**Run k-means and EM on the data.**

- Are the clusters good? Why?

**Use the clusters as labels for the churn data, run decision tree
algorithm, extract rules from the data, and give decriptive names to
the labels/clusters. You may use the original churn data or use
dimensionality-reduced churn data.**

- Is there a reason to prefer dimensionality-reduced data over the original data?
- Which dimensionality reduction technique did you use? Why?

**Now use the clusters as attributes for the churn data instances, add
the churn labels, and run SVMs to predict churn as you did in the
first part of this project.**

- How did the performance compare to previous methods?
- What does this suggest about the clusters and the knowledge extracted from them?