Data Mining Algorithms


Sponsor

Edward Omiecinski
edwardo@cc.gatech.edu
138 CoC

Area Database Systems

Problem
Data mining is the process of discovering interesting knowledge from large amounts of data stored in databases. Typically, the knowledge is in the form of patterns or associations. In this project, you will get an introduction to one of the main data mining problems, called association rule mining, and two algorithms that can be used to solve this problem.

Tasks:

  1. Read the notes on Association Rule Mining by Jeff Ullman (Stanford) assocrules.ps which gives a good conceptual description of the problem and algorithms.
  2. Implement the standard levelwise Apriori Algorithm and a variation of it based on sampling (i.e., Toivonen's Algorithm) which are described in the notes.
  3. Write a simple data generation program to produce market-basket data. It will produce a set of M records (i.e., baskets) where each record contains a unique record identifier and a set of N items purchased. N should be between some min and max value specified.
  4. Devise and run a set of experiments with your two algorithms.

Deliverables

Evaluation
Based on the report turned in to the sponsor of the project by the due date.