Archive for category Programming

MATLAB Bayesian regression software updated

We have updated bayesreg, a MATLAB toolbox that implements Bayesian linear and logistic regression with sparsity-inducing priors, to version 1.3. The new version can be downloaded here.

Version 1.3 includes the following changes:

  • Tidied up the summary display to switch between scientific and normal notation depending
    on size of numbers in table
  • Fixed the summary banner to adjust to the size of the table
  • Added support for MATLAB tables
    • Code now accepts either X/y as matrix/vector, or table/string name of variable to use as table
  • Added support for categorical predictors
    • These are now either specified using ‘catvars’ option for matrix inputs, or automatically determined from the contents of the table
  • Removed the dependency of br_summary() on X and y
  • Consolidated all likelihood functions into a single br_regnlike() function that is vectorized using bsxfun
  • Added a br_predict function to produce predictions over data given design matrices
    • Checks input design matrix/table against structure of training data
    • Automatically handles categorical expansions as required
    • Produces credible intervals if requested
    • Produces prediction statistics if targets are supplied
  • Updated and improved the example scripts
  • Added initial internal code structures required for grouping variables and variable expansions
  • Fix bug in computation of R2

No Comments

Sampling from a k-nomial distribution with Boost

Boost is free, open source and peer-reviewed set of C++ libraries that is used in academic as well as commercial environments. I have recently used the Boost random number generation and special mathematical functions library for a project to detect gene-gene interactions in breast cancer. Unfortunately, the Boost random number generation library dose not currently include the k-nomial distribution, which is something I needed for the project. Here is a simple implementation I ended up with in the hope that it may be useful to other people. Note, my implementation is definitelly not optimal and is inefficient for generating large samples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <cassert>
 
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/discrete_distribution.hpp>
 
boost::mt19937 rng;	// random number generator
boost::random::uniform_01<double> dist;
 
/*
 * Generate n samples from a k-nomial distribution.
 * Input:	
 *     tab - vector of length k to store the counts; 
 *         - tab[j] \in [0, n] for all j=1,...,k
 *         - sum_{j=0}^k tab[j] = n
 *     p   - the probability vector of length k 
 *         - does not assume p(0)+p(1)+...+p(k)=1; 
 *     k   - k-nomial (k>1)
 *     n   - sample size
 */
void sample_knomial(unsigned int *tab, double *p, unsigned int k, unsigned int n)
{
        assert(k>1);
 
	// set all k-cells to zero
	memset(tab,0,sizeof(unsigned int)*k);
 
	// normalise the probabilities
	double norm_factor=0.0;
	for(unsigned int j=0; j<k; j++) {
		assert(p[j] > 0); 
		norm_factor += p[j];
	}
 
	// pre-compute cumulative sums [p(0),p(0)+p(1),...,p(0)+p(1)+...+p(k-1)]
	// (could sort p prior to this)
	double *cumsum = new double[k];
	cumsum[0]=p[0]/norm_factor;
	for(unsigned int j=1; j<k; j++)
	{
		cumsum[j] = cumsum[j-1] + p[j]/norm_factor;
	}
 
	// Generate n samples from a k-nomial with probabilities p_1,...,p_k
	unsigned int t = 0;
	double u = 0.0;
	for(unsigned int i=0; i<n; i++)
	{
		// Generate uniform u \in [0,1)
		u = dist(rng);
		// Determine in which of the k cells u falls
		for(t=0; u>cumsum[t]; t++);
		// Increment count for cell j
		tab[t]++;
	}
 
	// free memory
	delete[] cumsum;
}

No Comments