CS 7641
Machine Learning
Assignment #3
Unsupervised Learning and Dimensionality Reduction

Numbers

Due: April 3, 2009 23:59:59 EST
Please submit via tsquare.

The assignment is worth 10% of your final grade.

Why?

So far this term we have explored supervised learning algorithms. Now it's time to explore unsupervised learning algorithms. This assignment asks you to use some of the clustering and dimensionality reduction algorithms we've looked at in class and to revisit earlier assignments. The goal is for you to think about how these algorithms are the same as, different than, and interact with your earlier work.

The same ground rules apply for programming languages as with assigments #1 and #2.

Read everything below carefully!

The Problems Given to You

You are to implement (or find the code for) six algorithms. The first two are clustering algorithms:

You can choose your own measures of distance/similarity. Naturally, you'll have to justify your choices, but you're practiced at that sort of thing by now.

The last four algorithms are dimensionality reduction algorithms:

You are to run a number of experiments. Come up with at least two datasets. If you'd like (and it makes a lot of sense in this case) you can use the ones you used in previous assignments.

  1. Run the clustering algorithms on the data sets and describe what you see.
  2. Apply the dimensionality reduction algorithms to the two datasets and describe what you see.
  3. Reproduce your clustering experiments, but on the data after you've run dimensionality reduction on it.
  4. Apply the dimensionality reduction algorithms to one of your datasets from assignment #1 (if you've reused the datasets from assignment #1 to do experiments 1-3 above then you've already done this) and rerun your neural network learner on the newly projected data.
  5. Apply the clustering algorithms to the same dataset to which you just applied the dimensionality reduction algorithms (you've probably already done this), treating the clusters as if they were new features. In other words, treat the clustering algorithms as if they were dimensionality reduction algorithms. Again, rerun your neural network learner on the newly projected data.

What to Turn In

  1. A file named README.txt that contains instructions for running your code
  2. your code
  3. a file named analysis.pdf that contains your writeup.
  4. any supporting files you need (for example, your datasets).

The file analysis.pdf should contain:

It might be difficult to generate the same kinds of graphs for this assignment as you did before; however, you should come up with some way to describe the kinds of clusters you get. If you can do that visually all the better. Note: Analysis writeup is limited to 10 pages.

Grading Criteria

As always you are being graded on your analysis more than anything else.