Linear regression is perhaps the most commonly used statistical model in practice. The standard multivariate linear regression model for explaning a vector of observed data can be written as
- is the vector of linear regression coefficients
- are i.i.d. variates distributed as per
- denotes the (k x k) identity matrix
- is an index vector determining which regressors comprise the design matrix
The design matrix comprising all regressors is denoted as
where and is the maximum number of candidate regressors. The set then indexes any possible design matrix that can be derived from the full matrix . The aim in linear regression is to estimate the unknown parameters and as well as determine the optimal regressor subset .
A popular method for estimating the regression parameters is Fisher’s maximum likelihood approach. The idea is to set the regression coefficients to the values that maximise the likelihood given the observed data. In the case of linear regression, the maximum likelihood estimates exist in closed form provided that (1) , and (2) the regressors are not highly correlated. The maximum likelihood estimator can be written as
or in MATLAB,
1 2 3 4 5 6
% Assuming targets y and regressor matrix X exist, estimate coefficients beta by maximum likelihood % Option 1 beta=inv(X'*X)'*X'*y; % Option 2; much better; more numerically stable and faster than calculating the inverse explicitly beta=X\y;
The maximum likelihood estimate of the regression parameters has some nice statistical properties. It is an unbiased estimator of the true regression coefficients, and it is strongly consistent provided that
I recommend the paper by Lai et. al. for consistency proofs and further results (see References below). However, we cannot use maximum likelihood if:
- the regressors are highly correlated, or
- if the number of regressors is greater than the number of samples.
Since maximum likelihood does not zero out regressors, we cannot use maximum likelihood alone to select the optimal regressor subset.
Recently, there has been a large amount of interest in regularisation approaches to linear regression. The idea here is to again maximise the (log)likelihood subject to inequality constraints on the regression coefficients. The nature of the inequality constraints determines the properties of the resulting estimates and the type of regularisation. Some commonly used regularisation methods are discussed below:
 T. L. Lai, Herbert Robbins and C. Z. Wei
Strong consistency of least squares estimates in multiple regression
Proceedings of the National Academy of Sciences of the United States of America, 1978, 75, 3034-3036