I have a 1D array A representing categorical data (where each entry is the number of an element of a certain category):
A = array([ 1, 8, 2, 5, 10, 32, 0, 0, 1, 0])
and I'm trying to write a sample (A, N) function to create an array B that contains N elements randomly generated from elements A (keeping categories):
>>> sample(A, 20)
array([ 1, 3, 0, 1, 4, 11, 0, 0, 0, 0])
I wrote this:
def sample(A, N):
AA = A.astype(float).copy()
Z = zeros(A.shape)
for _ in xrange(N):
drawn = random.multinomial(1, AA/AA.sum())
Z = Z + drawn
AA = AA - drawn
return Z.astype(int)
This is probably pretty naive, is there a better / faster way to do this? Maybe using a quick numpy function? Edit: It was incomprehensible: it should be without replacement !!!
source
share