Computing the Hodges–Lehmann estimator

The Hodges–Lehmann estimator (HL) of the location parameter for $n$ samples of data is defined as the median of the averages of all $n(n+1)/2$ data pairs. This set of all pairwise averages is usually referred to as the Walsh averages. The Hodges-Lehmann estimator has some excellent efficiency and robustness properties if the observed data is generated by a symmetric distribution.

Below is a simple C function to compute the HL estimator for a given array of observations.

```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 // Comparison function needed for quicksort int compare (const void * a, const void * b) { const double *da = (const double *) a; const double *db = (const double *) b;   return (*da > *db) - (*da < *db); }     // Function to compute the Hodges-Lehmann estimator double hodgeslehmann(double *x, int n) { // store and compute Walsh averages unsigned int nw=n*(n+1)/2; double *walsh_avgs=(double*) malloc(sizeof(double)*nw);   unsigned int k=0; for(unsigned int i=0; i<n; i++) { for(unsigned int j=i; j<n; j++) { walsh_avgs[k]=(x[i]+x[j])/2.0; k++; } }   // sort the vector qsort(walsh_avgs,nw,sizeof(double),compare);   // median of walsh averages double m=walsh_avgs[nw/2]; if(nw % 2 == 0) { m=((walsh_avgs[nw/2] + walsh_avgs[nw/2 - 1]) / 2.0); }   // done free(walsh_avgs); return m; }```

I have also written a MATLAB driver function that allows the aforementioned C code to be executed within MATLAB, see the Software section to download this implementation. An efficient (pure) MATLAB implementation that avoids the use of nested for loops is a little bit tricky. Here is a simple (memmory intensive!) implementation that relies on the ndgrid(.) function. This code works reasonably well as long as the sample size is not too big.

```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 function m=hodglehm_slow(x)   %% Length of vector x=x(:); % make sure x is a column vector n=length(x);   %% compute cartesian product [a,b]=ndgrid(x,x);   %% Sum of all pairs A=triu((a+b)./2);   %% Extract upper diagonal Ix=triu(true(n)); A=A(Ix);   %% Compute median of averages m=median(A(:));   %% done return;```

We can get the standard error of the HL estimator in MATLAB by using the bootstrap:

```1 2 3 4 5 6 % Generate 50 observations from a Student-t distribution with 2 dof x=trnd(2,50,1);   % Compute standard errors for the mean and HL estimators std(bootstrp(1000, @(a) mean(a), x)) std(bootstrp(1000, @(a) hodglehm_slow(a), x))```
1. No comments yet.