Software

The following collection of my MATLAB scripts is provided “as is” and should work with MATLAB releases 2011a onwards. If you find any bugs or issues with this code, please email me.

Hodges-Lehmann Estimator of the one sample location parameter

Implementation of the Hodges and Lehmann [1,2] estimator for the location parameter. The ZIP file includes both a (pure) MATLAB implementation and a C implementation that can be executed within MATLAB; to compile the C code, type “mex hhodglehm.cpp”. [code]

References:

1. Estimates of Location Based on Rank Tests, J. L. Hodges and E. L. Lehmann, Annals of Mathematical Statistics, Vol. 34, No. 2, pp. 598-611, 1963.

2. Theory of Point Estimation, E. L. Lehmann and G. Casella, Springer, 2003.

Contingency tables

Implementation of the Casella and Moreno [1] Bayes test for independence in two-way contingency tables using an intrinsic prior. Tables of size a x b (a,b>1) are allowed and some usage examples included. [code]

References:

1. Assessing Robustness of Intrinsic Tests of Independence in Two-Way Contingency Tables, G. Casella and E. Moreno, Journal of the American Statistical Association , Vol. 104, No. 487, pp. 1261-1271, 2009.

Sparse covariance matrix estimation

Implementation of the Cai and Liu [1] estimator for sparse covariance matrices. Cross-validation is used to select the user settable parameter. Currently, the script uses the adaptive LASSO thresholding function and can be changed to use any other hard/soft threshold. A simple simulation script is included to test the estimator. [code]

References:

1. Adaptive Thresholding for Sparse Covariance Matrix Estimation, T. Cai and W. Liu, Journal of the American Statistical Association, Vol. 106, No. 494, pp. 672-684, 2011.

MML Single-Factor Analysis

This script implements the Minimum Message Length method for parameter estimation and model selection in single-factor analysis. The single factor analysis model is a special case of the multivariate Gaussian model where the correlation structure is modelled by a single common factor. [code]

References:

1. Single-Factor Analysis by Minimum Message Length Estimation, C. S. Wallace and P. R. Freeman, Journal of the Royal Statistical Society (Series B), Vol. 54, No. 1, pp. 195-209, 1992.

Linear Regression

Bayesian ridge regression, Bayesian LASSO, Bayesian group LASSO, Bayesian fused LASSO and Bayesian elastic net
Implementation of Bayesian ridge regression, Bayesian LASSO, Bayesian group LASSO, Bayesian fused LASSO and Bayesian elastic net based on the Gibbs sampler from [2]; note, the paper contains at least three typographical errors in the Gibbs conditionals. This code uses the MATLAB Central “randraw()” function to sample from an inverse Gaussian distribution. The ZIP file includes an example script showing how to run the main function. [code]

The Bayesian LASSO
Please note, the previous MATLAB script also implements the Bayesian LASSO. This script is an implementation of the Bayesian LASSO for parameter estimation and subset selection in linear regression based on [1]. This code uses the MATLAB Central “randraw()” function to sample from an inverse Gaussian distribution. [code]

References:

1. The Bayesian LASSO, T. Park and G. Casella, Journal of the American Statistical Association, Vol. 103, No. 482, pp. 681-686, 2008.

1. Penalized Regression, Standard Errors, and Bayesian Lassos, M. Kyung, J. Gill, M. Ghosh and G. Casella, Bayesian Analysis, Vol. 5, No. 2, pp. 369-412, 2010.

The Non-negative Garotte

Leo Breiman’s non-negative garotte method for parameter estimation and subset selection in linear regression. The garotte constraint is estimated using k-fold cross validation or the little bootstrap. [code]

References:

1. Better Subset Regression Using the Nonnegative Garrote, Leo Breiman, Technometrics, Vol. 37, No. 4, pp. 373-384, 1995.

Logistic Regression

LASSO
Compute the Least Absolute Shrinkate and Selection Operator (LASSO) coefficients in a logisitic regression model. A C++ mex function is included and can be compiled to significantly speed up the numerical search. [code]

The lasso_kcv() function can be used to estimate the penalty parameter with k-fold cross-validation.

Ridge Regression
The above LASSO code is easily modified to do other penalized logistic models. As an example, I’ve implemented the ridge regression method. This package also includes a function to estimate the ridge penalty parameter by cross validation. [code]

Bayesian Regularized Logistic Regression
This program implements the Bayesian regularized logistic regression Gibbs sampling approach published in [4]. This is actually a conversion of R and C code to MATLAB and C/C++. The original program was written by Robert B. Gramacy and kindly provided to this author. The ZIP file includes an example on how to run the program. For more information about the sampling method, see [4]. [code]

References:

1. The Bayesian LASSO, R. Tibshirani, Journal of the Royal Statistical Society (Series B), Vol. 58, No. 1, pp. 267-288, 1996.

2. Large-Scale Bayesian Logistic Regression for Text Categorization, A. Genkin, D. D. Lewis and D. Madigan,Technometrics, Vol. 49, No. 3, pp. 291-304, 2007.

3. Ridge regression. A. Hoerl and R. Kennard,Encyclopedia of Statistical Sciences, Vol. 8, pp. 129-136, 1988.

4. Simulation-based regularized logistic regression. R. B. Gramacy and N. G. Polson, arXiv:1005.3430v1, 2010.

Stochastic Complexity of a Multinomial Distribution

Compute the stochastic complexity of a multinomial distribution in linear time. [code]

References:

1. A linear-time algorithm for computing the multinomial stochastic complexity, Petri Kontkanen and Petri Myllymaki, Information Processing Letters, Vol. 103, No. 6, pp. 227-233, 2007.