%================================================== % % LLSimpute - Local Least Squares Imputation % % (Missing Value Estimation Package) % % Author: Hyunsoo Kim % Date: Fall/2003 - Spring/2004 % E-mail: hskim@cs.umn.edu % Personal homepage: % http://www.cs.umn.edu/~hskim % Reference: Missing value estimation for DNA microarray gene % expression data: Local Least Squares Imputation, H. Kim, % G. H. Golub, and H. Park, Bioinformatics, to appear, 2004. % This software may be free downloaded from site: % http://www.cs.umn.edu/~hskim/tools.html % License: % It is free for academic or nonprofit insistutions. % All right is reserved regarding commecial usage. % Please consult if you try to use this package for % commercial purpose. % Comments: % Please let me know if you have done any improvement. % % Sample Usage: % E=impute_rowavg(0,miss_matrix,10); % % Description: % % function E=impute_rowavg(set,miss_matrix,minexp) % % Input parameter: % set - the number of set, % If the miss_matrix came from miss0.mat, then assign set=0 % miss_matrix - a matrix that has missing values (1e99) % minexp - if the number of non-missing values in a gene (row) is less than % minexp, then report the gene (row). % Output parameter: % E - the estimated matrix % %==================================================== function [E] = impute_rowavg(set,miss_matrix,minexp) global m n big; %----------------------------------------- % get input matrix E that contains row average %----------------------------------------- fprintf('Generating row-averaged E...\n'); E=miss_matrix; gene0=[]; gene1=[]; for i=1:m missidxj=find(miss_matrix(i,:) == big); if (length(missidxj) > 0) nomissidxj=find(miss_matrix(i,:) < big); exp=length(nomissidxj); if(exp==0) %fprintf('%d ', i); gene0=[gene0 i]; else % row average avg=mean(miss_matrix(i,nomissidxj)); E(i,missidxj)=avg; if(exp < minexp) gene1=[gene1 i]; end end end end fprintf('\n'); % save bad genes fnout=sprintf('gene0_%d.csv', set); csvwrite(fnout, gene0); fprintf('the number of genes that have no non-missing entries: %d\n',length(gene0)); fnout=sprintf('gene1_%d.csv', set); csvwrite(fnout, gene1); fprintf('the number of genes that have less than %.2f non-missing entries: %d\n',minexp,length(gene1)); return;