#### Least Angle Shrinkage and Selection Operator (LASSO)

Below is a list of some of the important LASSO theory and implementation papers that I have come across. Although this list is by no means complete, I have endevoured to include what I consider the most interesting and important papers on the topic. For each paper, I have provided URLs to facilitate easy download, though in some cases the download is not free unless you or your institution has access to the particular repository. I have also included a brief summary of each paper along with the reference. If you believe I have omitted a paper that should be on this list, please let me know.

#### LASSO Theory

The seminal paper which introduced the LASSO idea. Note that Tibshirani credits Breiman’s paper on the non-negative garotte as inspiration for the LASSO.

[1] R. Tibshirani

Regression Shrinkage and Selection via the Lasso

*Journal of the Royal Statistical Society (Series B)*, 1996, 58, 267-288

[2] K. Knight and W. Fu

Asymptotics for lasso-type estimators

*The Annals of Statistics*, 2000, 28, 1356-1378

[3] C. Leng, Y. Lin and G. Wahba

A Note On The Lasso And Related Procedures In Model Selection

*Statistica Sinica*, 2006, 16, 1273-1284

Zou et al. show that the number of non-zero coefficients in a linear model is an unbiased estimate of the the degress of freedom of the LASSO. This implies that model selection criteria, such as AIC and BIC, can be used for determining the optimal shrinkage parameter.

[4] H. Zou, T. Hastie and R. Tibshirani,

On the “degrees of freedom” of the lasso

*The Annals of Statistics*, 2007, 35, 2173-2192

Zhao and Yu examine the asymptotics of the LASSO procedure and define the condition (deemed “Irrepresentable Condition”) under which the LASSO can select the true model consistently. The Irrepresentable Condition depends only on the (unknown) variance-covariance matrix of the predictors. Finite sample performance of the LASSO is not discussed.

[5] P. Zhao and B. Yu

On Model Selection Consistency of Lasso

*Journal of Machine Learning Research*, 2006, 7, 2541-2563

[6] C.-H. Zhang and J. Huang

The sparsity and bias of the Lasso selection in high-dimensional linear regression

*The Annals of Statistics*, 2008, 36, 1567-1594

Hall et. al. show how the m-out-of-n boostrap method can be used to estimate the LASSO shrinkage parameters such that the resulting model has the oracle property; here it is assumed that there is one shrinkage parameter for each coefficient. Unfortunately, the technique involves solving p constrained quadratic optimisation problems, where p is the number of regressor coefficients. Thus, it is impractical when p is moderate to large.

[7] P. Hall, E. R. Lee and B. U. Park

Bootstrap-based Penalty Choice For The Lasso, Achieving Oracle Performance

*Statistica Sinica*, 2009, 19, 449-471

Chatterjee and Lahiri examine the LASSO assuming that the number of regressors is fixed and the sample size goes to infinity. Strong convergence of the LASSO is shown provided the shrinkage parameter grows at a specified rate. Convergence rates of the LASSO and least squares estimates are also compared.

[8] A. Chatterjee and S. N. Lahiri

Strong consistency of Lasso estimators.

*Sankhya, the Indian Journal of Statistics*, 2011, (accepted)

#### Solving the LASSO optimisation problem

[9] B. Efron, T. Hastie, I. Johnstone and R. Tibshirani

Least Angle Regression

*The Annals of Statistics*, 2004, 32, 407-451

The algorithm in Genkin et. al. is based on cyclic coordinate descent with Newton moves at each step and solves the LASSO problem in logistic models. The same idea could be used to generate ridge regression solutions. My implementation of the LASSO (see the Software page) is based on this algorithm.

[10] A. Genkin, D. D. Lewis and D. Madigan,

Large-Scale Bayesian Logistic Regression for Text Categorization

*Technometrics*, 2007, 49, 291-304

[11] M. Y. Park and T. Hastie

L1-regularization path algorithm for generalized linear models

*Journal of the Royal Statistical Society (Series B)*, 2007, 69, 659-677

[12] K. Koh, S.-J. Kim and S. Boyd

An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression

*Journal of Machine Learning Research*, 2007, 8, 1519-1555

[13] T. T. Wu and K. Lange

Coordinate descent algorithms for lasso penalized regression

*The Annals of Applied Statistics*, 2008, 2, 224-244

[14] J. Shi, W. Yin, S. Osher and P. Sajda

A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression

*Journal of Machine Learning Research*, 2010, 11, 713-741

The coordinate descent algorithm by Friedman et al. is very fast and generates entire paths for regularized regression with convex penalties. It can be used for finding ridge estimates as well as LASSO estimates in generalized linear models.

[15] J. Friedman, T. Hastie and R. Tibshirani

Regularized Paths for Generalized Linear Models via Coordinate Descent

*Journal of Statistical Software*, 2010, 33

#### Feature elimination

This is a really neat idea that allows fast removal of regressors before running the LASSO optimisation procedure. The procedure is called SAFE-LASSO and is applicable in regularized linear and logistic regression. Interestingly, the regressors that are removed by SAFE-LASSO are guaranteed to be absent in the LASSO solution!

[16] L. El Ghaoui, V. Viallon and T. Rabbani

Safe Feature Elimination in Sparse Supervised Learning

Technical Report No. UCB/EECS-2010-126, University of California, Berkeley, 2010

This paper proposes a new feature elimination method (STRONG-LASSO) following on the ideas in El Ghaoui et. al. As STRONG-LASSO does not guarantee safe feature removal, it is recommended that Karush-Kuhn-Tucker (KKT) conditions are checked prior to removal of a regressor. Extensions to general L1 regularisation problems and connections to the LASSO Irrepresentability Condition are also discussed.

[17] R. Tibshirani, J. Bien, J. Friedman, T. Hastie, N. Simon, J. Taylor and R. J. Tibshirani

Strong rules for discarding predictors in lasso-type problems

Unpublished, 2011

#### Bayesian interpretation of the LASSO

The LASSO can also be interpreted from a Bayesian point of view as a particular form of prior density on the regression coefficients. Park and Cassela were the first to show how the Gibbs sampler can be used to obtain samples from a posterior distribution with a LASSO-type prior. The nice feature of the Bayesian approach is that we automatically obtain an estimate of the shrinkage coefficient. My MATLAB implementation of this sampler can be found in the Software section. An R package is also available here. Kyung et al. extend the Bayes LASSO idea from Park and Casella to various other Bayesian regularisation methods.

[18] T. Park and G. Casella

The Bayesian Lasso

*Journal of the American Statistical Association*, 2008, 103, 681-686

Hans details an alternative sampler for the Bayesian LASSO that, unlike the Park and Casella sampler, does not require any latent variables. An R package implementing the method is available here.

[19] C. Hans

Bayesian lasso regression

*Biometrika*, 2009, 96, 835-845

[20] M. Kyung, J. Gill, M. Ghosh and G. Casella

Penalized Regression, Standard Errors, and Bayesian Lassos

*Bayesian Analysis*, 2010, 5, 369-412