The easiest way to create an array of NumPy entries from a list of dictionaries?

Question

The easiest way to create an array of NumPy entries from a list of dictionaries?

Let's say I have data like d = [dict(animal='cat', weight=5), dict(animal='dog', weight=20)] (basically JSON, where all entries have consistent data types )

In Pandas, you can make this table with df = pandas.DataFrame(d) - is there anything comparable for simple NumPy record arrays? np.rec.fromrecords(d) doesn't seem to give me what I want.

+6

python numpy

Roger Jul 16 '14 at 23:48

source share

3 answers

hpaulj · Answer 1 · 2014-07-17T04:14:16+0000

You can create an empty structured array of the required size and dtype, and then fill it from the list.

http://docs.scipy.org/doc/numpy/user/basics.rec.html

Structured arrays can be filled with a field or line by line .... If you fill line by line, you need to take a tuple (but not a list or an array!):

 In [72]: dt=dtype([('weight',int),('animal','S10')]) In [73]: values = [tuple(each.values()) for each in d] In [74]: values Out[74]: [(5, 'cat'), (20, 'dog')]

in dt occur in the same order as in values .

 In [75]: a=np.zeros((2,),dtype=dt) In [76]: a[:]=[tuple(each.values()) for each in d] In [77]: a Out[77]: array([(5, 'cat'), (20, 'dog')], dtype=[('weight', '<i4'), ('animal', 'S10')])

With a bit more complex testing, I can create an array directly from values .

 In [83]: a = np.array(values, dtype=dt) In [84]: a Out[84]: array([(5, 'cat'), (20, 'dog')], dtype=[('weight', '<i4'), ('animal', 'S10')])

dtype can be inferred from one (or more) dictionary entries:

 def gettype(v): if isinstance(v,int): return 'int' elif isinstance(v,float): return 'float' else: assert isinstance(v,str) return '|S%s'%(len(v)+10) d0 = d[0] names = d0.keys() formats = [gettype(v) for v in d0.values()] dt = np.dtype({'names':names, 'formats':formats})

production:

 dtype=[('weight', '<i4'), ('animal', 'S13')]

Zjs · Answer 2 · 2014-07-17T01:20:44+0000

Well, you could make your life simpler and just rely on Pandas since numpy doesn't use column headers

Pandas

 df = pandas.DataFrame(d) numpyMatrix = df.as_matrix() #spits out a numpy matrix

Or you can ignore Pandas and use the numpy + list method to knock dicts to values and save as a matrix

Numpy

 numpMatrix = numpy.matrix([each.values() for each in d])

agconti · Answer 3 · 2014-07-17T01:08:33+0000

You can use np.asaray() :

 In [1]: import numpy as np In [2]: d =np.asarray( [dict(animal='cat', weight=5), dict(animal='dog', weight=20)]) In [3]: d Out[3]: array([{'weight': 5, 'animal': 'cat'}, {'weight': 20, 'animal': 'dog'}], dtype=object)

The easiest way to create an array of NumPy entries from a list of dictionaries?

More articles: