Pandas: create a data frame from 2D numpy arrays while maintaining their sequential order

Let's say you have 3 numpy array: lat, lon, val:

import numpy as np

lat=np.array([[10, 20, 30],
              [20, 11, 33],
              [21, 20, 10]])

lon=np.array([[100, 102, 103],
              [105, 101, 102],
              [100, 102, 103]])

val=np.array([[17, 2, 11],
              [86, 84, 1],
              [9, 5, 10]])

And say that you want to create a pandasdataframe where df.columns = ['lat', 'lon', 'val'], but since each value in latis associated with both quantity longand s val, you want them to appear on the same line.

In addition, you want the row order of each column to correspond to the positions in each array, therefore, to get the following data frame:

      lat   lon   val
0     10    100    17
1     20    102    2
2     30    103    11
3     20    105    86
...   ...   ...    ...

So basically the first line in the dataframe stores the "first" quantities of each array, etc. How to do it?

I could not find a pythonic way to do this, so any help would be greatly appreciated.

+6
3

, - ravel:

df = pd.DataFrame({'lat': lat.ravel(), 'long': long.ravel(), 'val': val.ravel()})
print (df)
   lat  long  val
0   10   100   17
1   20   102    2
2   30   103   11
3   20   105   86
4   11   101   84
5   33   102    1
6   21   100    9
7   20   102    5
8   10   103   10
+7

- -

# Create stacked array
In [100]: arr = np.column_stack((lat.ravel(),long.ravel(),val.ravel()))

# Create dataframe from it and assign column names    
In [101]: pd.DataFrame(arr,columns=('lat','long','val'))
Out[101]: 
   lat  long  val
0   10   100   17
1   20   102    2
2   30   103   11
3   20   105   86
4   11   101   84
5   33   102    1
6   21   100    9
7   20   102    5
8   10   103   10

-

In [103]: lat = np.random.rand(30,30)

In [104]: long = np.random.rand(30,30)

In [105]: val = np.random.rand(30,30)

In [106]: %timeit pd.DataFrame({'lat': lat.ravel(), 'long': long.ravel(), 'val': val.ravel()})
1000 loops, best of 3: 452 µs per loop

In [107]: arr = np.column_stack((lat.ravel(),long.ravel(),val.ravel()))

In [108]: %timeit np.column_stack((lat.ravel(),long.ravel(),val.ravel()))
100000 loops, best of 3: 12.4 µs per loop

In [109]: %timeit pd.DataFrame(arr,columns=('lat','long','val'))
1000 loops, best of 3: 217 µs per loop
+3

No need to crack first. You can just fold and leave.

lat, long, val = np.arange(5), np.arange(5), np.arange(5)
arr = np.stack((lat, long, val), axis=1)
cols = ['lat', 'long', 'val']
df = pd.DataFrame(arr, columns=cols)
   lat  long  val
0    0     0    0
1    1     1    1
2    2     2    2
3    3     3    3
4    4     4    4
+1
source

Source: https://habr.com/ru/post/1014457/


All Articles