Data Mining in a Variety of Applications

We have been working with mining of association rules in large databases for the last two to three years. These rules 'discover" new associations form large sets of records such as transaction data from supermarkets (also known as "market basket data"). The rules are of the type "if a customer buys coke he/she is more than likely to buy potato chips." A general form of the rule says that "if a customer buys soft drinks, they are likely to buy snack foods, " etc. We have published this work in VLDB 1995 where we demonstrated efficient algorithms to mine association rules using partitioning that exceeded the performance of previously published algorithms. We will publish about negative rules in the 1998 Data Engineering Conference. There is hardly any previous work in the area of discovering negative rules, especially due to the vast possible space of possibilities. Negative rules state negative conclusions such as 'if someone buys mineral water, they are likely not to buy donuts.'

We have been extending our work on association rules in several directions:

  1. Extending to different types of domains and of data types. In this effort, we have started work with the medical domain. We have already showed that images of cardiac data (SPEC images) can be preprocessed and subjected to the discovery of association rules. They reveal some interesting relationships among known conditions related to heart disease. We are also interested in processing simulation experimental data and analyzing simulation results to discover association rules among events and agents that lead to those events. Another area is mining of time series data and of image data. The image data work leads to interesting new research which is described in (2) below.

  2. We are currently investigating techniques to discover knowledge in image databases. We rely on the output from an image understanding system in the form of feature vectors and using data mining algorithms, we find associations among objects identified in each image. Once the associations are obtained, we can use them to define images that or similar, or are related in terms of some given properties a user may be interested in. In our current work no assumptions are made about the image content but in the future we expect to use domain knowledge to speed up the process.

FUTURE WORK:

We plan to implement the algorithms using partitioning by further enhancing them with possible incremental computation to incorporate new data as it arrives into the database. We are also interested in time-series data and its mining in financial applications. There is also a possibility of doing a parallel server implementation of the partitioning algorithms which we have already developed.

PUBLICATIONS:

  1. A. Savasere, E. Omiecinski and S. Navathe. " An Efficient Algorithm for Mining Association Rules in Large Databases ," In Proceedings of the Very Large Data Base Conference, September, 1995.

  2. A. Savasere, E. Omiecinski and S. Navathe. "Mining for Strong Negative Associations in a Large Database of Customer Transactions," In Proceedings of the IEEE 14th Int. Conference. on Data Engineering , Orlando, FL, February 1998 (forthcoming).


Interested researchers and industry collaborators should contact Profs. Navathe (sham@cc.gatech.edu) or Omiecinski (edwardo@cc.gatech.edu).