CS 4600
Introduction to Intelligent Systems
Project #5
Supervised Learning

Numbers

Due: December 4, 2003 23:59:59 EST
Please email your code and explanations to Patrick Yaner with the subject line "cs4600 project #5" unless you're in the graduate class, in which case replace cs4600 with cs6600. Also, be certain to include your name in your message.

The assignment is worth 6% of your final grade. There is an opportunity for bonus points.

Why?

The purpose of this project is to explore supervised learning. As with project 3, it is important to realize that understanding an algorithm or technique requires understanding how it behaves under a variety of circumstances. As such, you will be asked to implement some simple learning algorithms, and to compare their performance.

The language? Any language you want, so long as it is LISP.

Read everything below carefully!

The Problems Given to You

You must implement three learning algorithms. They are:

Each is described in detail in your textbook.

In addition, you must design and implement a classification problem.

For the purposes of this assignment, a "classification problem" is just:

Note: to make things really simple, let us assume that every element of your n-dimensional vector can only take on one of two values: either 0 or 1 (in other words, they are boolean attributes).

You might think of taking the problem you came up with for part A of project 4 and using that, or perhaps some slightly more complicated variation on it. Alternatively, you could come up with a different one altogether.

If you choose to go the first route and notice that your old problem had variables that could take on more than two values, do not fret. It is simple to take any multi-valued attribute and turn it into a set of boolean-valued attributes. For example, the attribute Restaurant = {Thai, Burger, Italian, French} can be turned into four boolean attributes (that we might name Thai?, Burger?, Italian?, French?).

We will provide you with:

What to Turn In

You must email the TA two attachments.

  1. The first is a lisp file named yourgtaccount.lisp containing functions named:

    If your LISP code contains any helper functions or variables, please suffix those functions and variables with -yourgtaccount.

    The signatures of your implementations are:

      (defun knn (examples classes k) ...)
      (defun dt (examples classes) ...)
      (defun ds (examples classes weights) ...)
      (defun ab (examples classes learning-function) ...)
    

    Weights are a list of weight values for each example. Each implementation should return a function that when funcall'd on an example returns a classification label (t or nil). The parameter learning-function for ab should be a function with the same signature as ds.

  2. Second, you should include a text file named yourgtaccount.txt with your name and gt account at the top that contains an analysis of:

Bonus Points

For an extra 2% (that doesn't sound like much, but, hey, that's the equivalent of adding 33% to this grade), implement dt so that:

  1. it does take weights into account.
  2. it implements pruning.

Run ab again, but this time using your weighted decision tree learner. Include the results of running dtw alone (you might get different answers running it alone than you would with dt because dtw implements pruning) and ab using dtw in your analysis.

Naturally, because it is a learning function to be used by ab, this is the signature for dtw:

   (defun dtw (examples classes weights) ...)

A side note worth reading

I'd like to point out that if you think about it, you'll realize that if you do this bonus, it's very little additional work. In fact, you really only have to write one function for ds, dt and dtw. After all, each is just a special case of a more general function. We might call that function:

  (defun dt-with-weights-a-pruning-value-and-maximum-depth-limit (...) ...)

So ds, dt, and dtw are just functions that call this more general function... but with different values for the parameters it would need.

Of course, you are not required to do the bonus or, even if you do the bonus, to try to implement it this way, but I suspect it will make the problem easier.

Compiling code

As always you should compile your functions. See Project #2 for information on how.