function samples = ssvs(X,Y,m,cfg) %SSVS Stochastic search variable selection. % SAMPLES = SSVS(X,Y,M,CFG) generates M samples from the Gibbs sequence % described in [1]. X is an N-by-P matrix of observations and Y is an % N-by-1 column vector of the dependent variable, where N is the number % of observations and P is the number of predictors. SAMPLES is an % M-by-P matrix whose rows are the Gibbs samples. Each sample represents % a model, e.g. a row [1 0 0 1] would be the model in which features X1 % and X4 should be included and features X2 and X3 should be excluded. % (X1,X2,X3,X4 would be the columns of X). % % Currently, CFG is used to specify some settings on the prior on the % "betas". The authors suggest the four semi-automatic settings [1 5], % [1 10], [10 100], and [10 500]. % % CFG is meant for configuration information, i.e. settings for the % handful of parameters that SSVS requires. Currently, I use % indifference/ignorance settings for as many parameters as I can, e.g. I % set the R matrix (the prior information on the correlation between the % features) to the identity matrix. In general, using ignorance settings % still results in reasonable output from the sampler. If you want to % edit these paramter settings, either edit the code, or change the way % configuration information is supplied to this function. % % [1] George, Edward I.; McCulloch, Robert E. Variable Selection Via % Gibbs Sampling. Journal of American Statistical Association; September % 1993; 88, 423; pg 881. % % See example{1,2,3,4}.m for usage examples. % % author: Sooraj Bhat, 4/16/2007 % Requires the Statistics Toolbox % n - number of observations % p - number of covariates/predictors/features % m - number of samples to generate % X - the observations, nXp % Y - the dependent variable, nX1 % b - "beta"; regression coefficients, pX1 % s - "sigma^2"; variance parameter % g - "gamma"; participation parameters, pX1 % R - correlation matrix, pXp % t - significance parameter for the coefficients, pX1 % c - prior on participation for the coefficients, pX1 % samples - the samples, mXp n = size(X,1); p = size(X,2); XtX = X'*X; R = eye(p,p); R_ = inv(R); % least squares estimate is used to seed the sequence [b_LS, sigma_b_LS, s_LS] = lscov(X,Y); t = sqrt(sigma_b_LS)/cfg(1); c = zeros(p,1)+cfg(2); b = b_LS; s = s_LS; g = ones(p,1); samples = zeros(m,p); % sampling order is important! for j=1:m b = b_rnd(g,s,XtX,R_,b_LS,c,t); s = s_rnd(g,n,Y,X,b); for i=1:p g(i) = g_rnd(i,b,g,s,R,c,t); end samples(j,:) = g; end % I have annotated the following helper functions with their corresponding % Equation/Section number from the paper. They are explained there in % detail. % Equation 13 function z = b_rnd(g,s,XtX,R_,b_LS,c,t) D_ = diag( 1./D(g,c,t) ); temp = XtX/s; A = inv( temp + D_*R_*D_ ); z = mvnrnd( A*temp*b_LS , A )'; % Equation 14 function z = s_rnd(g,n,Y,X,b) u = 0.5*( n + nu(g) ); v = 0.5*( norm(Y-X*b)^2 + nu(g)*lambda(g) ); z = 1/gamrnd(u,1/v); % sample from inverse gamma distribution % Equations 15 & 16{a,b} function z = g_rnd(i,b,g,s,R,c,t) g0 = g; g0(i) = 0; g1 = g; g1(i) = 1; % Currently, the settings of the parameters make these last two terms in % the product unnecessary; uncomment them if you do something interesting p0 = b_prior(b,g0,R,c,t); % * s_prior(s,g0) * g_prior(g0); p1 = b_prior(b,g1,R,c,t); % * s_prior(s,g1) * g_prior(g1); z = rand < (p1/(p0+p1)); % Bernoulli sampling % Section 2.4 function z = nu(g) z = 0.0; % ignorance % Section 2.4 function z = lambda(g) z = 1.0; % doesn't matter if nu == 0 % Equation 5 function z = D(g,c,t) a = 1 + (c-1).*g; z = a.*t; % diagonal of the D matrix % Equation 4 function z = b_prior(b,g,R,c,t) p = size(g,1); d = D(g,c,t)'; temp = diag(d)*R*diag(d); z = mvnpdf(b,[],temp); % Section 2.1 function z = g_prior(g) p = size(g,1); z = 2^(-p); % indifference prior % Equation 6 function z = s_prior(s,g) u = 0.5*( nu(g) ); v = 0.5*( nu(g)*lambda(g) ); if u==0 z = 1.0; % simulate indifference prior else % pdf for inverse gamma distribution z = (v^u)*exp(-v/s)/((s^(u+1)) * gamma(u)); end