Computing the Hodges–Lehmann estimator


The Hodges–Lehmann estimator (HL) of the location parameter for n samples of data is defined as the median of the averages of all n(n+1)/2 data pairs. This set of all pairwise averages is usually referred to as the Walsh averages. The Hodges-Lehmann estimator has some excellent efficiency and robustness properties if the observed data is generated by a symmetric distribution.

Below is a simple C function to compute the HL estimator for a given array of observations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Comparison function needed for quicksort                                          
int compare (const void * a, const void * b)
{
    const double *da = (const double *) a;
    const double *db = (const double *) b;
 
    return (*da > *db) - (*da < *db);
}
 
 
// Function to compute the Hodges-Lehmann estimator
double hodgeslehmann(double *x, int n)
{
    // store and compute Walsh averages
    unsigned int nw=n*(n+1)/2;
    double *walsh_avgs=(double*) malloc(sizeof(double)*nw);
 
    unsigned int k=0;
    for(unsigned int i=0; i<n; i++) {
        for(unsigned int j=i; j<n; j++) {
            walsh_avgs[k]=(x[i]+x[j])/2.0;
            k++;
        }
    }
 
    // sort the vector
    qsort(walsh_avgs,nw,sizeof(double),compare);
 
    // median of walsh averages
    double m=walsh_avgs[nw/2];
    if(nw % 2 == 0) {
        m=((walsh_avgs[nw/2] + walsh_avgs[nw/2 - 1]) / 2.0);
    }
 
    // done
    free(walsh_avgs);
    return m;
}

I have also written a MATLAB driver function that allows the aforementioned C code to be executed within MATLAB, see the Software section to download this implementation. An efficient (pure) MATLAB implementation that avoids the use of nested for loops is a little bit tricky. Here is a simple (memmory intensive!) implementation that relies on the ndgrid(.) function. This code works reasonably well as long as the sample size is not too big.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
function m=hodglehm_slow(x)                                                         
 
%% Length of vector
x=x(:);         % make sure x is a column vector
n=length(x);
 
%% compute cartesian product
[a,b]=ndgrid(x,x);
 
%% Sum of all pairs
A=triu((a+b)./2);  
 
%% Extract upper diagonal
Ix=triu(true(n));
A=A(Ix);
 
%% Compute median of averages
m=median(A(:));
 
%% done
return;

We can get the standard error of the HL estimator in MATLAB by using the bootstrap:

1
2
3
4
5
6
% Generate 50 observations from a Student-t distribution with 2 dof                  
x=trnd(2,50,1);
 
% Compute standard errors for the mean and HL estimators
std(bootstrp(1000, @(a) mean(a), x))
std(bootstrp(1000, @(a) hodglehm_slow(a), x))
  1. No comments yet.
(will not be published)