Archive for category Programming

MATLAB Bayesian regression software updated

We have updated bayesreg, a MATLAB toolbox that implements Bayesian linear and logistic regression with sparsity-inducing priors, to version 1.6. The package now handles logistic regression without the need for mex files, but big speed-ups can be obtained when using compiled code, so this is recommended. To compile the C++ code, run compile.m from the bayesreg directory within MATLAB. Alternatively, for convenience, the pre-compiled MEX files (MATLAB R2017a) for Windows, Linux and Mac OSX can be downloaded from here. To use these, all you need to do is download them and unzip into the “bayesreg” folder.

Version 1.6 includes the following changes:

  • Implemented block sampling of betas for data with large numbers of predictors (options ‘blocksample’ and ‘blocksize’)
  • Display the Widely Applicable Akaike’s Information Criterion (WAIC) instead of DIC in summary output
  • Implemented alternative priors for tau2
  • Added an example to demonstrate block sampling and alternative priors for tau2 (see examples\br_example13.m)

Version 1.5 includes the following changes:

  • Written a new parallelised C++ implementation of sampling code for logistic regression
  • efficient MATLAB implementation of logistic regression sampling included; works even when MEX files are not available but not as fast

Version 1.4 includes the following changes:

  • Added option ‘groups’ which allows grouping of variables into potentially overlapping groups
  • Groups work with HS, HS+ and lasso priors
  • Fixed a regression bug with g priors and logistic models
  • Updated examples

Version 1.3 includes the following changes:

  • Tidied up the summary display to switch between scientific and normal notation depending
    on size of numbers in table
  • Fixed the summary banner to adjust to the size of the table
  • Added support for MATLAB tables
    • Code now accepts either X/y as matrix/vector, or table/string name of variable to use as table
  • Added support for categorical predictors
    • These are now either specified using ‘catvars’ option for matrix inputs, or automatically determined from the contents of the table
  • Removed the dependency of br_summary() on X and y
  • Consolidated all likelihood functions into a single br_regnlike() function that is vectorized using bsxfun
  • Added a br_predict function to produce predictions over data given design matrices
    • Checks input design matrix/table against structure of training data
    • Automatically handles categorical expansions as required
    • Produces credible intervals if requested
    • Produces prediction statistics if targets are supplied
  • Updated and improved the example scripts
  • Added initial internal code structures required for grouping variables and variable expansions
  • Fix bug in computation of R2

No Comments

Sampling from a k-nomial distribution with Boost

Boost is free, open source and peer-reviewed set of C++ libraries that is used in academic as well as commercial environments. I have recently used the Boost random number generation and special mathematical functions library for a project to detect gene-gene interactions in breast cancer. Unfortunately, the Boost random number generation library dose not currently include the k-nomial distribution, which is something I needed for the project. Here is a simple implementation I ended up with in the hope that it may be useful to other people. Note, my implementation is definitelly not optimal and is inefficient for generating large samples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <cassert>
 
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/discrete_distribution.hpp>
 
boost::mt19937 rng;	// random number generator
boost::random::uniform_01<double> dist;
 
/*
 * Generate n samples from a k-nomial distribution.
 * Input:	
 *     tab - vector of length k to store the counts; 
 *         - tab[j] \in [0, n] for all j=1,...,k
 *         - sum_{j=0}^k tab[j] = n
 *     p   - the probability vector of length k 
 *         - does not assume p(0)+p(1)+...+p(k)=1; 
 *     k   - k-nomial (k>1)
 *     n   - sample size
 */
void sample_knomial(unsigned int *tab, double *p, unsigned int k, unsigned int n)
{
        assert(k>1);
 
	// set all k-cells to zero
	memset(tab,0,sizeof(unsigned int)*k);
 
	// normalise the probabilities
	double norm_factor=0.0;
	for(unsigned int j=0; j<k; j++) {
		assert(p[j] > 0); 
		norm_factor += p[j];
	}
 
	// pre-compute cumulative sums [p(0),p(0)+p(1),...,p(0)+p(1)+...+p(k-1)]
	// (could sort p prior to this)
	double *cumsum = new double[k];
	cumsum[0]=p[0]/norm_factor;
	for(unsigned int j=1; j<k; j++)
	{
		cumsum[j] = cumsum[j-1] + p[j]/norm_factor;
	}
 
	// Generate n samples from a k-nomial with probabilities p_1,...,p_k
	unsigned int t = 0;
	double u = 0.0;
	for(unsigned int i=0; i<n; i++)
	{
		// Generate uniform u \in [0,1)
		u = dist(rng);
		// Determine in which of the k cells u falls
		for(t=0; u>cumsum[t]; t++);
		// Increment count for cell j
		tab[t]++;
	}
 
	// free memory
	delete[] cumsum;
}

No Comments