# Archive for category Survival Analysis

### Parameter Inference under Type I Censoring

Suppose we observe n data points generated from an exponential distribution with probability density function $p(t_i |\theta) = \theta^{-1} \exp(-t_i/\theta), \quad (i=1,\ldots,n)$

and would like to infer the mean. It is known that each datum represents the lifetime of an item, and that the observed item lifetimes are independent and identically distributed. We shall also assume that each datum is subject to Type I censoring with a common censoring threshold C > 0. That is, we have a censoring indicator together with each survival time, such that $t_i = \min(T_i, C), \quad \delta_i = I(T_i \leq C)$

In words, we observe t if and only if t is less than or equal to the threshold C, otherwise we only know that t is greater than C. After some basic manipulation, the log likelihood function for this model is $l(\theta) = - r \log \theta - \left( \sum_{i=1}^n t_i \right) / \theta$

where $r= \sum_{i=1}^n \delta_i$

is the number of uncensored observations. We then maximise the log likelihood function and obtain the maximum likelihood (ML) estimator of the mean $\hat{\theta}_{\rm{ml}} = \left( \sum_{i=1}^n t_i \right) / r$

This is certainly a reasonable estimator; it is asymptotically unbiased and consistent. However, we wish to investigate it’s performance under finite (small) sample sizes.

As an alternative, I have derived the Wallace–Freeman estimator (WF) $\hat{\theta}_{\rm{wf87}} =\displaystyle{\mathop{\mbox{arg min}}_{\theta}} \left\{ (r-1) \log \theta + \frac{1}{\theta} \sum_{i=1}^n t_i + \frac{1}{2} \log \left( 1 - \exp(-C/\theta) \right) + \theta \right\}$

The WF estimator is Bayesian and requires specification of a prior distribution for the mean. For this example, it is assumed that the mean is distributed as per an exponential distribution with prior mean one. Unlike the maximum likelihood estimator, the WF estimator is not analytic and must be obtained by numerical optimisation. In this setting, the WF estimator equals to the ML estimator in the limit, and hence possesses the same favourable properties of asymptotic unbiasedness and consistency. The question we wish to investigate is how does the WF estimator compare to the ML estimator when the sample size is small?

We compare the WF and ML estimators using the loss function $L(\theta, \hat{\theta}) = (\theta - \hat{\theta})^2 / \theta^2$

We note in passing that this is of course not the only possible loss function for this problem. In fact, it may be preferable to use, say, the Kullback-Leibler divergence which, arguably, is a natural divergence between two distributions and is invariant under one-to-one transformation of the parameters. When comparing the estimators, we resort to numerical experiments since the WF estimator is not analytic. A simple MATLAB function to compare the two estimators is given below:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 function [loss_mml, loss_ml] = cen_exp(niter, n, theta, C)   %% Setup numerical optimisation options options = optimset('TolFun',1e-6,'GradObj','off','Algorithm','interior-point'); options = optimset(options, 'Display','off');   %% Record losses for mml and ml loss_mml=zeros(niter,1); loss_ml=zeros(niter,1);   %% Do niter iterations for i=1:niter, %% generate data y=exprnd(theta,n,1); d=ones(n,1); t=y; t(y>=C)=C; d(y>=C)=0;   %% estimate parameters msglen=@(x) sum(d)*log(x)+(sum(t))./x+1/2*log(1-exp(-C./x) )-log(x)+x; mml_theta=fmincon(msglen, 1.0, [],[],[],[],0.01,100.0,[],options); ml_theta=sum(t)/sum(d);   %% compute losses loss_mml(i)=(theta-mml_theta)^2/theta^2; loss_ml(i)=(theta-ml_theta)^2/theta^2; end;   %% done return;

As an example, we run the script for 1000 iterations with the following parameters $n=15, \theta = 3, C = 2.5$

>> [loss_mml, loss_ml] = cen_exp(1000, 15, 3.0, 2.5); >> prctile(loss_mml,[5 25 50 75 95]), prctile(loss_ml,[5 25 50 75 95])   ans = 0.0003 0.0085 0.0375 0.0978 0.2270 ans = 0.0007 0.0129 0.0559 0.1501 0.9566

In this particular run, the WF estimator was slightly superior to the ML estimator. I briefly played around with various values for the data generating mean, the sample size and the censoring threshold. For sample sizes less than about n=30, the WF estimator commonly outperforms the ML estimator. If the data generating mean is unlikely a priori, the ML estimator is slightly superior to the WF estimator, the difference being about 5%. The ML estimator is problematic when there is a large number of censored observations, tending to grossly overestimate the true mean. As expected, as the sample size gets larger, both estimators converge. Overall, the WF estimator seems preferable to the ML estimator in small samples; in larger samples, both estimators are more or less identical, however the ML estimator is much easier to compute.