Archive for category Linear algebra

Linear algebra in C/C++ using Eigen

I’ve recently started using the free, open-source library Eigen for linear algebra routines in C/C++. The library is absolutely fantastic in terms of perfomance and functionality, however, as is sometimes the case with open-source software, the API documentation is somewhat lacking. If I get some time in the future I hope to contribute a few articles to the online documentation. Here is an example of what I mean.

Suppose you want to do a QR decomposition of a matrix X. There are three classes that can do this for you, one of which is ColPivHouseholderQR, and you may write something like

// perform a QR decomposition of the [n x p] X matrix; where n > p
Eigen::ColPivHouseholderQR< Eigen::MatrixXd > qr = X.colPivHouseholderQr();

Seems fairly straightforward, right? But, how do we get Q and R? If we look at the online documentation for ColPivHouseholderQR, we find a public method called matrixQR() which seems to return a reference to a matrix related to Q and R in some fashion. The documentation doesn’t state anything about how to get Q and R from the returned matrix. It turns out, that the matrix returned by matrixQR() contains both Q and R, with R occupying the ‘top’ block and Q the ‘bottom’ block. Thus, to get, say, R we need to use something like

// perform a QR decomposition of the [n x p] X matrix; where n > p
Eigen::ColPivHouseholderQR< Eigen::MatrixXd > qr = X.colPivHouseholderQr();  
Eigen::Matrix<double,p,p> R; 
R = qr.matrixQR().topRightCorner(p,p).triangularView<Eigen::Upper>();	// R matrix

Here, topRightCorner() extracts the matrix of interest and triangularView() ensures that R is stored as an upper triangular dense matrix.

It didn’t take too long to figure out how QR works in Eigen. After discovering the solution I found a post on the Eigen forum that describes a similar solution to the aforementioned. I urge anyone who uses Eigen to contribute code and documentation to the project, if possible. It really is a fantastic open-source library and equals and surpasses many other commercial linear algebra packages in both performance and quality.

No Comments

Fisher Information in Single Factor Analysis

The single factor analysis model for data \{ {\bf x}_1, {\bf x}_2, \ldots, {\bf x}_n \} is

 {\bf x}_i  = {\bf m} + v_i {\bf a} + \boldsymbol{\epsilon}_i \quad \; (i=1,2,\ldots,n),

where {\bf x}_i \in \mathbb{R}^k, \boldsymbol{\epsilon}_i is an i.i.d. zero-mean multivariate Gaussian random variable with a symmetric, positive definite variance-covariance matrix \boldsymbol{\Sigma} \in \mathbb{R}^{k \times k}; that is, \boldsymbol{\epsilon}_i \sim {\rm N}_k({\bf 0}, \boldsymbol{\Sigma}) for all i=1,2,\ldots,n. It is thus assumed that each data vector {\bf x}_i follows a k-variate Gaussian distribution with moments

	{\rm E} \{ {\bf x}_i \} = {\bf m} + v_i {\bf a},

	{\rm Var} \{ {\bf x} \} = \boldsymbol{\Sigma}.

The aim of single factor analysis is to estimate the unknown factor scores {\bf v} = (v_1, v_2, \ldots, v_n)^\prime \in \mathbb{R}^n, the mean {\bf m} = (m_1,m_2, \ldots, m_k)^\prime \in \mathbb{R}^k, the variance-covariance matrix \boldsymbol{\Sigma} and the factor loadings {\bf a} = (a_1, a_2, \ldots, a_k)^\prime \in \mathbb{R}^k.

I’ve written this note to show how one may derive the Fisher information for the single factor model. This turns out to be a fairly involved task requiring a lot of linear algebra and matrix differential calculus identities. I’ve uploaded a document describing all the necessary mathematical steps to the Publications section. Anyone interested in matrix differential calculus and linear algebra should find it useful.

No Comments

Matrix inversion lemma

The other day at work I came across an interesting problem while trying to optimise some MATLAB MCMC sampling code. The major bottleneck in the code was the inversion of a [p x p] matrix M, where p can be quite large (in the order of thousands). Now, I noticed that M can be written as

 {\bf M} = {\bf A} + {\bf X} {\bf G} {\bf X}^{T}

where A is a diagonal [p x p] matrix, X is [p x n] and G is a full rank diagonal [n x n] matrix. In my setting, p could be much larger than n and speed is important since this particular code is executed numerous times within a loop. I immediately thought about using the matrix inversion lemma (or the Sherman–Morrison–Woodbury formula) to speed up the inversion when p >> n. However, it turns out that in my case the matrix A is

 {\bf A} = {\rm diag}(0, a_1, \ldots, a_{p-1})

which is of rank (p – 1) and singular, so the matrix inversion lemma cannot be applied in a straightforward manner. After talking to a colleague about this issue, he suggested a nice trick to make A full rank by replacing the top-left zero element with a non-zero entry, and then changing X and G to correct for this modification. If we apply this trick, we can write M as

 {\bf M} = \underbrace{\left({\bf A} - {\bf e} {\bf e}^T\right)}_{{\bf A}_*} + \left({\bf e} \; \; {\bf X}\right) \left(  \begin{array}{cc}  1         & {\bf 0}^T \\  {\bf 0} & {\bf G}  \end{array}\right)  \left(  \begin{array}{cc}  {\bf e}^T\\  {\bf X}^T  \end{array} \right)

where e = (1, 0, …, 0) is a [p x 1] vector. Application of the matrix inversion lemma is now straightforward and reduces the computational cost of inverting M from O(p^3) to O(n^3). I did some rough timing of the new code and it is (unsurprisingly) significantly faster than the previous version when p / n gets large. I’ve updated my Bayesian LASSO code for logistic regression (see Publications) to include this neat trick.

No Comments