I recently came across the following problem: how do you work out the Fisher information for a parameter when the likelihood function, say , is continuous, but not everywhere differentiable with respect to the parameter. The standard formula for the Fisher information
assumes regularity conditions which no longer hold, and hence is not applicable. After hunting around with Google for some time, I came across the following (freely downloadable) paper
H. E. Daniels
The Asymptotic Efficiency of a Maximum Likelihood Estimator
Fourth Berkeley Symp. on Math. Statist. and Prob., University of California Press, 1961, 1, 151-163
which, fortunately, had exactly what I need. It turns out, we can still compute the Fisher information, without existence of second derivatives, using
provided a set of weaker conditions holds. To get my head around the issue, I decided to look at a simple problem of working out the Fisher information for the mean of a Laplace distribution
The log-likelihood is now given by
The Fisher information for the scale parameter can be obtained in a similar manner. The first derivative with respect to is
the Fisher information for the scale parameter is
The paper by Daniels details the exact conditions needed for the Fisher information to be derived in this way. Note, there is a mistake in one of the proofs, the correction is detailed in
J. A. Williamson
A Note on the Proof by H. E. Daniels of the Asymptotic Efficiency of a Maximum Likelihood Estimator
Biometrika, 1984, 71, 651-653
Unfortunately, the Williamson paper requires a JSTOR subscription for download.
I’ve started writing a new page for the blog that examines the theory behind various common statistical methods and models. The idea here is to include references and brief descriptions of some of the important and interesting papers on various topics. I also intend to include web links to software implementations whenever possible. The new material can be found by clicking on the “Statistics Theory and Applications” tab above. I am currently working on the linear regression section and the LASSO regularisation method for linear regression.
In other news, the grant writing season has again started. This year, we aim to write one early career researcher ARC grant, and at least one NHMRC grant. The ARC grant will be a two year proposal for work on modern logistic regression techniques. We are in the process of determining which NHMRC grant(s) will be written and which researchers will be involved. I can only hope that this grant season is as successful as last years!
The NHMRC funding outcomes have been released and we have been awarded another grant! This is fantastic news given the competitive nature of the funding process. The list of project grants that were funded for 2011 can be found here. We were awarded approximately $400k over a period of two years for a research project on mammographic density. This is a very exciting area of research and we hope to positively contribute towards breast cancer research and making mammographic density more clinically useful.
Continuing on with more good news, the
Lastly, I’d like to congratulate my colleague and good friend Daniel Schmidt for getting engaged this year to one lovely lady! I wish you both the best of luck!
The Australian Research Council (ARC) have announced the funding outcomes for the 2011 round of Discovery Projects. The sucess rate this year was 22.0% compared to 22.7% in the last round. The success rate really is quite low considering the amount of time and effort required to fill out one of these applications. The great news is that our first ever ARC grant got funded! Although we didn’t quite get the amount of money we requested, we still got more than the national average. Now we nervously await the NHMRC funding outcomes which should be released in a week or so.
The other day at work I came across an interesting problem while trying to optimise some MATLAB MCMC sampling code. The major bottleneck in the code was the inversion of a [p x p] matrix M, where p can be quite large (in the order of thousands). Now, I noticed that M can be written as
where A is a diagonal [p x p] matrix, X is [p x n] and G is a full rank diagonal [n x n] matrix. In my setting, p could be much larger than n and speed is important since this particular code is executed numerous times within a loop. I immediately thought about using the matrix inversion lemma (or the Sherman–Morrison–Woodbury formula) to speed up the inversion when p >> n. However, it turns out that in my case the matrix A is
which is of rank (p – 1) and singular, so the matrix inversion lemma cannot be applied in a straightforward manner. After talking to a colleague about this issue, he suggested a nice trick to make A full rank by replacing the top-left zero element with a non-zero entry, and then changing X and G to correct for this modification. If we apply this trick, we can write M as
where e = (1, 0, …, 0) is a [p x 1] vector. Application of the matrix inversion lemma is now straightforward and reduces the computational cost of inverting M from O(p^3) to O(n^3). I did some rough timing of the new code and it is (unsurprisingly) significantly faster than the previous version when p / n gets large. I’ve updated my Bayesian LASSO code for logistic regression (see Publications) to include this neat trick.
Last week I gave a seminar at my work place on the advantages of using penalized logistic regression methods (such as the LASSO, elastic net, etc.) over the standard maximum likelihood approach. The target audience was genetic epidemiologists who have some practical knowledge of fitting logistic models, but may not be aware of the recent theoretical work in the area. The slides from the seminar are now available from the Publications page.
The other day I came across a new Q&A site for statistical analysis called Statistical Analysis Questions. I am not sure how long the site has been running, but there are already about 200 questions, 750 answers and more than 600 users. I recommended checking it out. It seems quite useful for anyone involved in both applied and theoretical statistics work.
For the past four weeks, I’ve been enjoying the FIFA World Cup 2010, the most watched television event in the world. This world cup is held in South Africa making it the first time ever an African nation hosted the prestigious tournament. One of the surprise teams of the tournament has been Germany, beating both England and Argentina (4-1 and 4-0 respectively) before losing 0-1 to current European champions Spain in a tightly contested semi-final encounter.
Meanwhile in Oberhausen, Germany, a somewhat odd event took place before each of the Germany matches. Paul the Octopus, who resides at the local Sea Life Aquarium, was used as an oracle to predict the outcomes of all Germany world cup matches prior to the games taking place. For a description of exactly how Paul makes his predictions, see this Wikipedia article. Amazingly, Paul has successfully predicted all six of the German games so far and has recently tipped Germany, to the delight of many Germans, to beat Uruguay at the upcoming game for 3rd/4th place. This should hopefully put a stop to those anti-octopus songs and calls to eat Paul. As statisticians, let us ask the question “Is Paul really an animal oracle or just one extremely lucky octopus?”.
We can model the number of Paul’s successful predictions at this world cup as a binomial distribution B(p, n=6); that is, we have six independent trials (matches) with p being the probability of success (Paul predicting correctly) at each trial. In order to test whether Paul is psychic, we shall construct a 95% confidence interval for the probability of success, p. The standard confidence interval, often called the Wald interval, is known to have poor coverage properties in this scenario and exhibits erratic behaviour, even if the sample size is large or p is near 0.5. Instead, we compute the modified Jeffreys 95% CI, recommended in , and find that
This CI is quite wide, which is not unexpected given such a small sample size (n=6), and excludes the possibility that Paul is just plain old lucky (p=0.5)!
What can Minimum Message Length (MML) and Minimum Description Length (MDL) tell us about Paul’s psychic powers? We shall use the Wallace-Freeman (MML) codelength formula [2,3] and the Normalized Maximum Likelihood (NML) distribution (MDL)  for this task. Let A denote the hypothesis that Paul is lucky, and B the alternative hypothesis that Paul is an animal oracle. We compute the codelength of data and hypothesis for both scenarios A and B, and use the difference in codelengths (i.e., codelength A – codelength B) as a probability in favour of the hypothesis with a smaller codelength. From standard information theory, the codelength for hypothesis A is 6 * log(2) = 6 bits. The codelength for hypothesis B is 2.82 bits using the WF formula and 1.92 bits using the NML distribution. Thus, both MML and MDL prefer hypothesis B.
So there you have it, Paul must be the real deal! 😉
 Lawrence D. Brown, T. Tony Cai, and Anirban DasGupta. Interval Estimation for a Binomial Proportion, Statistical Science, Vol. 16, No. 2, pp. 101-133, 2001.
 C. S. Wallace and P. R. Freeman. Estimation and inference by compact coding, Journal of the Royal Statistical Society (Series B) Vol. 49, No. 3, pp. 230-265, 1987.
 C. S. Wallace. Statistical and Inductive Inference by Minimum Message Length, 1st ed., Springer, 2005.
 Jorma Rissanen. Information and Complexity in Statistical Modeling, 1st ed., Springer, 2009.
The Eurovision 2010 competition finished last Saturday with Lena Meyer-Landrut from Germany taking the title for her song “Satellite”. If you missed the show, you can see Lena performing the song at the official Eurovision YouTube channel here. Given the current state of the world economy, this is a pretty good outcome as Germany is one of a few countries left in Europe with enough finances to host next years show. So how did my team, StatoVIC, go in the Kaggle Eurovision 2010 competition? The results have been tabulated and released here. It looks like StatoVIC took seventh place out of 22 submissions with an absolute error of 2626 points calculated from the predicted ratings. This score is in the top quartile of the submissions and about 1000 points better than “Lucky Guess”, the last place submission (I assume this submission is just a random selection of ratings). Not a bad result for StatoVIC, really. Congratulations to Jure Zbontar for winning the competition with an impressive absolute error score of about 400 rating points less than our team.
It’s time for StatoVIC to look at the HIV progression challenge and see if we can do better than seventh place!
Last week I submitted predictions for the Kaggle Eurovision 2010 competition under the team name StatoVIC. The first part of the competition requires selecting the 25 countries that will make the Eurovision 2010 final. Once the 25 finalists are chosen, you are asked to predict the voting behaviour of all the participating countries based on 10 years of data collected from previous Eurovision competitions. In this years Eurovision, 20 countries are selected for the final based on the outcome of two semi-finals. In both semi-finals, there are 17 countries competing and the 10 countries with the most points go through to the final. The remaining five countries (Spain, Germany, United Kingdom, France and Norway) are guaranteed final competitors. With the second semi-final finishing last Thursday, it is time to see how the StatoVIC team has fared thus far.
In the first semi-final, I ended up predicting five (Bosnia, Russia, Greece, Serbia and Belgium) out of the ten finalists correctly. In the second semi-final, I fared somewhat better selecting eight of the ten countries that made the final. I missed out on picking Romania and Cyprus and instead chose Croatia and Finland. Given the relatively naive strategy that was used to select the finalists, these numbers are certainly not too bad.
Out of interest, I had a brief look at how you would fare if you were to randomly select all the finalists in any of the two semi-finals. First, the “good” news is that you are guaranteed to select at least three finalists correctly. The odds of correctly guessing all the ten finalists in a semi are unfortunately 1 in 19,448. The probability of correctly guessing five and eight finalists is about 0.27 and 0.05 respectively. In expectation, the mean number of finalists guessed correctly with this strategy is between five and six. In light of this, the performance of StatoVIC is about average in the first semi, and moderately better than average in the second.
The Eurovision final is on this Saturday night, but is shown Sunday night on SBS if you are in Australia. It will be interesting to see how StatoVIC fares in predicting the voting behaviour. In the mean time, here are the predictions of the fine folks at Google: