### Hyperspherical Coordinates

Suppose we have a vector ${\bf x}=(x_1,x_2,\ldots,x_n)^\prime \in \mathbb{R}^n$ residing in an $n$-dimensional Euclidean space. How do we express this vector using (hyper-)spherical coordinates in MATLAB? MATLAB implements a couple of functions, namely cart2pol() and cart2sph() , that handle the conversion in two and three dimensions, respectively. However, there is no general method for vectors of arbitrary dimension.

Fortunately, Wikipedia lists the formulas required to convert Euclidean coordinates to hyperspherical coordinates and these are relatively straightforward to implement in MATLAB. Below is a MATLAB function that takes an $n$-dimensional Euclidean vector as input and returns the corresponding radial coordinate $r > 0$ as well as the $(n-1)$ angular coordinates $\phi_1, \phi_2, \ldots, \phi_{n-1}$; note that, $\phi_{n-1} \in [0,2\pi]$, while all the other angles are in the set $[0, \pi]$.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 %% The input vector x is a n-dim. column vector. function [r,theta]=cart2nsph(x)   %% setup n = length(x); theta = zeros(n-1,1); x2 = x.^2; cumx = cumsum(x2(end:-1:2));   %% Compute the radial coordinate r = sqrt(sum(x2));   %% Compute the angles theta(1:end-1,1) = x(1:end-2,1) ./ sqrt( cumx(end:-1:2) ) ; theta(end,1) = (sqrt(sum(x2(end:-1:end-1,1))) + x(end-1,1)) ./ x(end,1); theta = acot(theta); theta(theta<0) = theta(theta<0)+pi; % Convert acot principal value to [0,pi] theta(end,1) = 2*theta(end,1); % Last angle is between [0,2pi]   %% done return;

Line 11 computes the radial coordinate which is simply the square root of the sum of the squares of the original, Euclidean coordinates. Lines 14-18 are required to obtain the angular coordinates. Some care is necessary in order to get all the angles in the correct range, since the principal value of the function acot() in MATLAB is in $[-\pi/2,\pi/2]$ and not in the (usual) range of $[0,\pi]$. We correct for this in Line 17 by adding $\pi$ to all the “negative” angles.

The function to compute $n$-dimensional Euclidean coordinates from hyperspherical coordinates is somewhat simpler. One possible implementation is given below:

1 2 3 4 5 6 7 8 9 10 11 12 %% Radial coordinate, r>0, must be positive. %% The angles theta_1,...,theta_{n-2} must be in [0,pi], while %% theta_{n-1} should be in [0,2pi]. function x = nsph2cart(r,theta)   c=[cos(theta); 1]; s=[1 ;cumprod(sin(theta))];   x=r*(c.*s);   %% done return;

Except in some special cases the transform is unique. Both functions were tested on MATLAB version 7.12.0 (R2011a) and dont have any error checking. I hope that some of you can find these functions useful in your work.

### Fisher information for irregular likelihood functions

I recently came across the following problem: how do you work out the Fisher information for a parameter $\theta \in \Omega$ when the likelihood function, say $p(\theta; x)$, is continuous, but not everywhere differentiable with respect to the parameter. The standard formula for the Fisher information ${\rm J}(\theta) = \int_{\Omega} \left(\frac{\partial^2 l}{\partial \theta^2}\right) p(\theta; x) dx$

assumes regularity conditions which no longer hold, and hence is not applicable. After hunting around with Google for some time, I came across the following (freely downloadable) paper

H. E. Daniels
The Asymptotic Efficiency of a Maximum Likelihood Estimator
Fourth Berkeley Symp. on Math. Statist. and Prob., University of California Press, 1961, 1, 151-163

which, fortunately, had exactly what I need. It turns out, we can still compute the Fisher information, without existence of second derivatives, using ${\rm J}(\theta) = \int_{\Omega} \left(\frac{\partial l}{\partial \theta}\right)^2 p(\theta; x) dx$

provided a set of weaker conditions holds. To get my head around the issue, I decided to look at a simple problem of working out the Fisher information for the mean $\mu \in \Omega = \mathbb{R}$ of a Laplace distribution $p(x,\mu; s)= \frac{1}{2s} \exp(-| x - \mu| / s)$

The log-likelihood is now given by $l(\mu; s) = -\log(2s) - \frac{|x-\mu|}{s} = \left\{\begin{array}{ll} -\log(2s) - \frac{x-\mu}{s} & x > \mu \\ -\log(2s) + \frac{x-\mu}{s} & x < \mu \\ \end{array} \right. [/latex] The first derivative with respect to $\mu$ is $\frac{\partial l}{\partial \mu} = \left\{\begin{array}{ll} \frac{1}{s} & x > \mu \\ -\frac{1}{s} & x < \mu \\ \end{array} \right.$ Therefore, the Fisher information is
[latex] {\rm J}(\mu) = \int_{-\infty}^\mu \left(\frac{\partial l}{\partial \mu}\right)^2 p(x,\mu; s) dx + \int_{\mu}^\infty \left(\frac{\partial l}{\partial \mu}\right)^2 p(x,\mu; s) dx = \frac{1}{s^2}$

The Fisher information for the scale parameter can be obtained in a similar manner. The first derivative with respect to $s$ is $\frac{\partial l}{\partial s} = -\frac{1}{s} + \frac{1}{s^2} |x - \mu| = \frac{1}{s^2} (|x - \mu| - s )$

Since, ${\rm E}\{|x-\mu|\} = s$, ${\rm E}\{|x-\mu|^2\} = 2 s^2$

the Fisher information for the scale parameter is ${\rm J}(s) = \frac{1}{s^4} {\rm E} \left\{(|x-\mu| - s)^2 \right\} = \frac{1}{s^2}$

The paper by Daniels details the exact conditions needed for the Fisher information to be derived in this way. Note, there is a mistake in one of the proofs, the correction is detailed in

J. A. Williamson
A Note on the Proof by H. E. Daniels of the Asymptotic Efficiency of a Maximum Likelihood Estimator
Biometrika, 1984, 71, 651-653

### The blog expands

I’ve started writing a new page for the blog that examines the theory behind various common statistical methods and models. The idea here is to include references and brief descriptions of some of the important and interesting papers on various topics. I also intend to include web links to software implementations whenever possible. The new material can be found by clicking on the “Statistics Theory and Applications” tab above. I am currently working on the linear regression section and the LASSO regularisation method for linear regression.

In other news, the grant writing season has again started. This year, we aim to write one early career researcher ARC grant, and at least one NHMRC grant. The ARC grant will be a two year proposal for work on modern logistic regression techniques. We are in the process of determining which NHMRC grant(s) will be written and which researchers will be involved. I can only hope that this grant season is as successful as last years!

### More Grant News

The NHMRC funding outcomes have been released and we have been awarded another grant! This is fantastic news given the competitive nature of the funding process. The list of project grants that were funded for 2011 can be found here. We were awarded approximately \$400k over a period of two years for a research project on mammographic density. This is a very exciting area of research and we hope to positively contribute towards breast cancer research and making mammographic density more clinically useful.

Continuing on with more good news, the list of accepted papers for the 23rd Australasian Joint Conference on Artificial Intelligence (AJCAI 2010) has been announced a couple of weeks ago. AJCAI is a premier conference on artificial intelligence and machine learning that is held annually in the Australasian region. We usually send at least one paper every year to AJCAI given how rare international machine learning conferences are in Australia. This year we ended up submitting two papers, both of which were accepted. Moreover, one of the papers we submitted has also been awarded the best conference paper! Following the conference, I will be putting up both papers in the Publications section along with any relevant MATLAB and C/C++ code.

Lastly, I’d like to congratulate my colleague and good friend Daniel Schmidt for getting engaged this year to one lovely lady! I wish you both the best of luck!

### ARC Grant Time

The Australian Research Council (ARC) have announced the funding outcomes for the 2011 round of Discovery Projects. The sucess rate this year was 22.0% compared to 22.7% in the last round. The success rate really is quite low considering the amount of time and effort required to fill out one of these applications. The great news is that our first ever ARC grant got funded! Although we didn’t quite get the amount of money we requested, we still got more than the national average. Now we nervously await the NHMRC funding outcomes which should be released in a week or so.

### Matrix inversion lemma

The other day at work I came across an interesting problem while trying to optimise some MATLAB MCMC sampling code. The major bottleneck in the code was the inversion of a [p x p] matrix M, where p can be quite large (in the order of thousands). Now, I noticed that M can be written as ${\bf M} = {\bf A} + {\bf X} {\bf G} {\bf X}^{T}$

where A is a diagonal [p x p] matrix, X is [p x n] and G is a full rank diagonal [n x n] matrix. In my setting, p could be much larger than n and speed is important since this particular code is executed numerous times within a loop. I immediately thought about using the matrix inversion lemma (or the Sherman–Morrison–Woodbury formula) to speed up the inversion when p >> n. However, it turns out that in my case the matrix A is ${\bf A} = {\rm diag}(0, a_1, \ldots, a_{p-1})$

which is of rank (p – 1) and singular, so the matrix inversion lemma cannot be applied in a straightforward manner. After talking to a colleague about this issue, he suggested a nice trick to make A full rank by replacing the top-left zero element with a non-zero entry, and then changing X and G to correct for this modification. If we apply this trick, we can write M as ${\bf M} = \underbrace{\left({\bf A} - {\bf e} {\bf e}^T\right)}_{{\bf A}_*} + \left({\bf e} \; \; {\bf X}\right) \left( \begin{array}{cc} 1 & {\bf 0}^T \\ {\bf 0} & {\bf G} \end{array}\right) \left( \begin{array}{cc} {\bf e}^T\\ {\bf X}^T \end{array} \right)$

where e = (1, 0, …, 0) is a [p x 1] vector. Application of the matrix inversion lemma is now straightforward and reduces the computational cost of inverting M from O(p^3) to O(n^3). I did some rough timing of the new code and it is (unsurprisingly) significantly faster than the previous version when p / n gets large. I’ve updated my Bayesian LASSO code for logistic regression (see Publications) to include this neat trick.

### Penalized Logistic Regression

Last week I gave a seminar at my work place on the advantages of using penalized logistic regression methods (such as the LASSO, elastic net, etc.) over the standard maximum likelihood approach. The target audience was genetic epidemiologists who have some practical knowledge of fitting logistic models, but may not be aware of the recent theoretical work in the area. The slides from the seminar are now available from the Publications page.

### Statistical Analysis Questions

The other day I came across a new Q&A site for statistical analysis called Statistical Analysis Questions. I am not sure how long the site has been running, but there are already about 200 questions, 750 answers and more than 600 users. I recommended checking it out. It seems quite useful for anyone involved in both applied and theoretical statistics work.

### Hypothesis testing with Paul the Octopus

For the past four weeks, I’ve been enjoying the FIFA World Cup 2010, the most watched television event in the world. This world cup is held in South Africa making it the first time ever an African nation hosted the prestigious tournament. One of the surprise teams of the tournament has been Germany, beating both England and Argentina (4-1 and 4-0 respectively) before losing 0-1 to current European champions Spain in a tightly contested semi-final encounter.

Meanwhile in Oberhausen, Germany, a somewhat odd event took place before each of the Germany matches. Paul the Octopus, who resides at the local Sea Life Aquarium, was used as an oracle to predict the outcomes of all Germany world cup matches prior to the games taking place. For a description of exactly how Paul makes his predictions, see this Wikipedia article. Amazingly, Paul has successfully predicted all six of the German games so far and has recently tipped Germany, to the delight of many Germans, to beat Uruguay at the upcoming game for 3rd/4th place. This should hopefully put a stop to those anti-octopus songs and calls to eat Paul. As statisticians, let us ask the question “Is Paul really an animal oracle or just one extremely lucky octopus?”.

We can model the number of Paul’s successful predictions at this world cup as a binomial distribution B(p, n=6); that is, we have six independent trials (matches) with p being the probability of success (Paul predicting correctly) at each trial. In order to test whether Paul is psychic, we shall construct a 95% confidence interval for the probability of success, p. The standard confidence interval, often called the Wald interval, is known to have poor coverage properties in this scenario and exhibits erratic behaviour, even if the sample size is large or p is near 0.5. Instead, we compute the modified Jeffreys 95% CI, recommended in , and find that $CI_{M-J} = [0.54, 1.0]$

This CI is quite wide, which is not unexpected given such a small sample size (n=6), and excludes the possibility that Paul is just plain old lucky (p=0.5)!

What can Minimum Message Length (MML) and Minimum Description Length (MDL) tell us about Paul’s psychic powers? We shall use the Wallace-Freeman (MML) codelength formula [2,3] and the Normalized Maximum Likelihood (NML) distribution (MDL)  for this task. Let A denote the hypothesis that Paul is lucky, and B the alternative hypothesis that Paul is an animal oracle. We compute the codelength of data and hypothesis for both scenarios A and B, and use the difference in codelengths (i.e., codelength A – codelength B) as a probability in favour of the hypothesis with a smaller codelength. From standard information theory, the codelength for hypothesis A is 6 * log(2) = 6 bits. The codelength for hypothesis B is 2.82 bits using the WF formula and 1.92 bits using the NML distribution. Thus, both MML and MDL prefer hypothesis B.

So there you have it, Paul must be the real deal! 😉

References:
 Lawrence D. Brown, T. Tony Cai, and Anirban DasGupta. Interval Estimation for a Binomial Proportion, Statistical Science, Vol. 16, No. 2, pp. 101-133, 2001.
 C. S. Wallace and P. R. Freeman. Estimation and inference by compact coding, Journal of the Royal Statistical Society (Series B) Vol. 49, No. 3, pp. 230-265, 1987.
 C. S. Wallace. Statistical and Inductive Inference by Minimum Message Length, 1st ed., Springer, 2005.
 Jorma Rissanen. Information and Complexity in Statistical Modeling, 1st ed., Springer, 2009.

### Eurovision 2010 Forecast (Part 2)

The Eurovision 2010 competition finished last Saturday with Lena Meyer-Landrut from Germany taking the title for her song “Satellite”. If you missed the show, you can see Lena performing the song at the official Eurovision YouTube channel here. Given the current state of the world economy, this is a pretty good outcome as Germany is one of a few countries left in Europe with enough finances to host next years show. So how did my team, StatoVIC, go in the Kaggle Eurovision 2010 competition? The results have been tabulated and released here. It looks like StatoVIC took seventh place out of 22 submissions with an absolute error of 2626 points calculated from the predicted ratings. This score is in the top quartile of the submissions and about 1000 points better than “Lucky Guess”, the last place submission (I assume this submission is just a random selection of ratings). Not a bad result for StatoVIC, really. Congratulations to Jure Zbontar for winning the competition with an impressive absolute error score of about 400 rating points less than our team.

It’s time for StatoVIC to look at the HIV progression challenge and see if we can do better than seventh place!