List of dictionaries from a numpy array without a loop

Question

List of dictionaries from a numpy array without a loop

Is there a way to vectorize an operation that takes multiple numpy arrays and puts them in a dictionary list?

Here is a simplified example. A real scenario may include more arrays and more dictionary keys.

import numpy as np x = np.arange(10) y = np.arange(10, 20) z = np.arange(100, 110) print [dict(x=x[ii], y=y[ii], z=z[ii]) for ii in xrange(10)]

I can have thousands or hundreds of thousands of iterations in an xrange call. All manipulations with creating x , y and z vectorized (my example is not as simple as above). So, only 1 for the cycle left to get rid of, which I expect will result in tremendous accelerations.

I tried using map with a function to create a dict and all kinds of other jobs. The Python for loop seems to be the slow part (as usual). I’m kind of stuck in using dictionaries due to pre-existing API requirements. However, it would be interesting to see solutions without dicts and write arrays or something else, but in the end I don't think this will work with the existing API.

+6

performance python vectorization numpy

durden2.0 Nov 03 '16 at 9:43

source share

3 answers

In your small example, I had trouble getting something faster than combining list comprehension and vocabulary

 In [105]: timeit [{'x':i, 'y':j, 'z':k} for i,j,k in zip(x,y,z)] 100000 loops, best of 3: 15.5 µs per loop In [106]: timeit [{'key':{'x':i, 'y':j, 'z':k}} for i,j,k in zip(x,y,z)] 10000 loops, best of 3: 37.3 µs per loop

Alternatives that use array concatenation to concatenate arrays before splitting are slower.

 In [108]: timeit [{'x':x_, 'y':y_, 'z':z_} for x_, y_, z_ in np.column_stack((x,y,z))] .... 10000 loops, best of 3: 58.2 µs per loop

=========================

A structured array is easiest with recfunctions :

 In [109]: from numpy.lib import recfunctions In [112]: M=recfunctions.merge_arrays((x,y,z)) In [113]: M.dtype.names=['x','y','z'] In [114]: M Out[114]: array([(0, 10, 100), (1, 11, 101), (2, 12, 102), (3, 13, 103), (4, 14, 104), (5, 15, 105), (6, 16, 106), (7, 17, 107), (8, 18, 108), (9, 19, 109)], dtype=[('x', '<i4'), ('y', '<i4'), ('z', '<i4')]) In [115]: M['x'] Out[115]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Time is much slower, but if you want to access all x values at once, it is much better than selecting from all dictionaries.

 np.rec.fromarrays((x,y,z),names=['x','y','z'])

makes a repeat with the specified names. About the same speed.

I could also create an empty array of the correct type and shape and copy the arrays into it. It's possible as fast as this merge , but harder to describe.

I would suggest optimizing the data structure for use / access, and not for building speed. Typically, you create it once and use it many times.

=============

 In [125]: dt=np.dtype([('x',x.dtype),('y',y.dtype),('z',z.dtype)]) In [126]: xyz=np.zeros(x.shape,dtype=dt) In [127]: xyz['x']=x; xyz['y']=y; xyz['z']=z # or for n,d in zip(xyz.dtype.names, (x,y,z)): xyz[n] = d In [128]: xyz Out[128]: array([(0, 10, 100), (1, 11, 101), (2, 12, 102), (3, 13, 103), (4, 14, 104), (5, 15, 105), (6, 16, 106), (7, 17, 107), (8, 18, 108), (9, 19, 109)], dtype=[('x', '<i4'), ('y', '<i4'), ('z', '<i4')])

+3

hpaulj Nov 03 '16 at 17:32

source share

Here's an approach using a combination of NumPy and Pandas -

 # Stack into columns & create a pandas dataframe with appropriate col names a = np.column_stack((x.ravel(),y.ravel(),z.ravel())) df = pd.DataFrame(a,columns=[['x','y','z']]) # Convert to list of dicts out = df.T.to_dict().values()

Run Example -

 In [52]: x Out[52]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [53]: y Out[53]: array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) In [54]: z Out[54]: array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109]) In [55]: out Out[55]: [{'x': 0, 'y': 10, 'z': 100}, {'x': 1, 'y': 11, 'z': 101}, {'x': 2, 'y': 12, 'z': 102}, {'x': 3, 'y': 13, 'z': 103}, {'x': 4, 'y': 14, 'z': 104}, {'x': 5, 'y': 15, 'z': 105}, {'x': 6, 'y': 16, 'z': 106}, {'x': 7, 'y': 17, 'z': 107}, {'x': 8, 'y': 18, 'z': 108}, {'x': 9, 'y': 19, 'z': 109}]

+1

Divakar Nov 03 '16 at 9:53

source share

Kasramvd · Accepted Answer · 2016-11-03T09:56:40+0000

Here is one (Num)? Pythonic way:

 In [18]: names = np.array(['x', 'y', 'z']) In [38]: map(dict, np.dstack((np.repeat(names[None, :], 10, axis=0), np.column_stack((x, y, z))))) Out[38]: [{'x': '0', 'y': '10', 'z': '100'}, {'x': '1', 'y': '11', 'z': '101'}, {'x': '2', 'y': '12', 'z': '102'}, {'x': '3', 'y': '13', 'z': '103'}, {'x': '4', 'y': '14', 'z': '104'}, {'x': '5', 'y': '15', 'z': '105'}, {'x': '6', 'y': '16', 'z': '106'}, {'x': '7', 'y': '17', 'z': '107'}, {'x': '8', 'y': '18', 'z': '108'}, {'x': '9', 'y': '19', 'z': '109'}]

Also note that if you do not need all the dictionaries at once, you can simply create a generator and access each item on request.

 (dict(x=x[ii], y=y[ii], z=z[ii]) for ii in xrange(10))

If you want a nested dictionary, I suggest a list comprehension:

 In [88]: inner = np.dstack((np.repeat(names[None, :], 10, axis=0), np.column_stack((x, y)))) In [89]: [{'connection': d} for d in map(dict, inner)] Out[89]: [{'connection': {'x': '0', 'y': '10'}}, {'connection': {'x': '1', 'y': '11'}}, {'connection': {'x': '2', 'y': '12'}}, {'connection': {'x': '3', 'y': '13'}}, {'connection': {'x': '4', 'y': '14'}}, {'connection': {'x': '5', 'y': '15'}}, {'connection': {'x': '6', 'y': '16'}}, {'connection': {'x': '7', 'y': '17'}}, {'connection': {'x': '8', 'y': '18'}}, {'connection': {'x': '9', 'y': '19'}}]

List of dictionaries from a numpy array without a loop

More articles: