Professor Haesun Park - Georgia Tech

Some Text Data

Related to the results in the paper 'Text classification using support vector machines with dimension reduction' by H. Kim, P. Howland, and H. Park, Proceedings of Text Mining Workshop of the 3rd SIAM International Conference on Data Mining, San Francisco, CA, May, 2003.

MEDLINE dataset: 1250x22095 sparse matrices with 5 clusters in sparse representation of column index, row index, and value. The five clusters start from (1 251 501 751 1001) row training data, test data
REUTERS dataset in sparse representation using column index, row index, and value. training data, test data. The matrix size and starting position of each cluster are summarized in training data category, test data category.

Missing Value estimation software in MATLAB

impute the missing values after determining k by k-values estimator mink = kestimate (set, minexp, fig); main missing value code
determine k by k-values estimator mink=kestimate(set,minexp,fig); k value estimator used by impute_llsq_l2.m
determine the imputation on a row averaged matrix used by impute_llsq_l2.m and impute_llsq_l2_blind.m
imputation without a k estimate (blind imputation) blind imputation main code
Test gene data for the missing value testing: test data

Office:	1306 Klaus
Phone:	404.385.2170
Fax:	404.385.7337

[Click for More Contact Info]

Haesun Park

Professor, School of Computational Science and Engineering

Some Text Data

Missing Value estimation software in MATLAB