Hypothesis testing with Paul the Octopus

For the past four weeks, I’ve been enjoying the FIFA World Cup 2010, the most watched television event in the world. This world cup is held in South Africa making it the first time ever an African nation hosted the prestigious tournament. One of the surprise teams of the tournament has been Germany, beating both England and Argentina (4-1 and 4-0 respectively) before losing 0-1 to current European champions Spain in a tightly contested semi-final encounter.

Meanwhile in Oberhausen, Germany, a somewhat odd event took place before each of the Germany matches. Paul the Octopus, who resides at the local Sea Life Aquarium, was used as an oracle to predict the outcomes of all Germany world cup matches prior to the games taking place. For a description of exactly how Paul makes his predictions, see this Wikipedia article. Amazingly, Paul has successfully predicted all six of the German games so far and has recently tipped Germany, to the delight of many Germans, to beat Uruguay at the upcoming game for 3rd/4th place. This should hopefully put a stop to those anti-octopus songs and calls to eat Paul. As statisticians, let us ask the question “Is Paul really an animal oracle or just one extremely lucky octopus?”.

We can model the number of Paul’s successful predictions at this world cup as a binomial distribution B(p, n=6); that is, we have six independent trials (matches) with p being the probability of success (Paul predicting correctly) at each trial. In order to test whether Paul is psychic, we shall construct a 95% confidence interval for the probability of success, p. The standard confidence interval, often called the Wald interval, is known to have poor coverage properties in this scenario and exhibits erratic behaviour, even if the sample size is large or p is near 0.5. Instead, we compute the modified Jeffreys 95% CI, recommended in [1], and find that

CI_{M-J} = [0.54, 1.0]

This CI is quite wide, which is not unexpected given such a small sample size (n=6), and excludes the possibility that Paul is just plain old lucky (p=0.5)!

What can Minimum Message Length (MML) and Minimum Description Length (MDL) tell us about Paul’s psychic powers? We shall use the Wallace-Freeman (MML) codelength formula [2,3] and the Normalized Maximum Likelihood (NML) distribution (MDL) [4] for this task. Let A denote the hypothesis that Paul is lucky, and B the alternative hypothesis that Paul is an animal oracle. We compute the codelength of data and hypothesis for both scenarios A and B, and use the difference in codelengths (i.e., codelength A – codelength B) as a probability in favour of the hypothesis with a smaller codelength. From standard information theory, the codelength for hypothesis A is 6 * log(2) = 6 bits. The codelength for hypothesis B is 2.82 bits using the WF formula and 1.92 bits using the NML distribution. Thus, both MML and MDL prefer hypothesis B.

So there you have it, Paul must be the real deal! 😉

[1] Lawrence D. Brown, T. Tony Cai, and Anirban DasGupta. Interval Estimation for a Binomial Proportion, Statistical Science, Vol. 16, No. 2, pp. 101-133, 2001.
[2] C. S. Wallace and P. R. Freeman. Estimation and inference by compact coding, Journal of the Royal Statistical Society (Series B) Vol. 49, No. 3, pp. 230-265, 1987.
[3] C. S. Wallace. Statistical and Inductive Inference by Minimum Message Length, 1st ed., Springer, 2005.
[4] Jorma Rissanen. Information and Complexity in Statistical Modeling, 1st ed., Springer, 2009.

  1. No comments yet.
(will not be published)