## Sampling from a k-nomial distribution with Boost

Boost is free, open source and peer-reviewed set of C++ libraries that is used in academic as well as commercial environments. I have recently used the Boost random number generation and special mathematical functions library for a project to detect gene-gene interactions in breast cancer. Unfortunately, the Boost random number generation library dose not currently include the k-nomial distribution, which is something I needed for the project. Here is a simple implementation I ended up with in the hope that it may be useful to other people. Note, my implementation is definitelly not optimal and is inefficient for generating large samples.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 #include <cassert>   #include <boost/random/mersenne_twister.hpp> #include <boost/random/discrete_distribution.hpp>   boost::mt19937 rng; // random number generator boost::random::uniform_01<double> dist;   /* * Generate n samples from a k-nomial distribution. * Input: * tab - vector of length k to store the counts; * - tab[j] \in [0, n] for all j=1,...,k * - sum_{j=0}^k tab[j] = n * p - the probability vector of length k * - does not assume p(0)+p(1)+...+p(k)=1; * k - k-nomial (k>1) * n - sample size */ void sample_knomial(unsigned int *tab, double *p, unsigned int k, unsigned int n) { assert(k>1);   // set all k-cells to zero memset(tab,0,sizeof(unsigned int)*k);   // normalise the probabilities double norm_factor=0.0; for(unsigned int j=0; j<k; j++) { assert(p[j] > 0); norm_factor += p[j]; }   // pre-compute cumulative sums [p(0),p(0)+p(1),...,p(0)+p(1)+...+p(k-1)] // (could sort p prior to this) double *cumsum = new double[k]; cumsum=p/norm_factor; for(unsigned int j=1; j<k; j++) { cumsum[j] = cumsum[j-1] + p[j]/norm_factor; }   // Generate n samples from a k-nomial with probabilities p_1,...,p_k unsigned int t = 0; double u = 0.0; for(unsigned int i=0; i<n; i++) { // Generate uniform u \in [0,1) u = dist(rng); // Determine in which of the k cells u falls for(t=0; u>cumsum[t]; t++); // Increment count for cell j tab[t]++; }   // free memory delete[] cumsum; }