How to combine data from many data frames into one data frame with an array as data values

If I have many panda data frames with the same index structure, I want to create a data frame with the same index structure, but the data values ​​are np.arrays (actually I want np.matrix.)

Merging seems to be very simple with simple operations. df1 + df2 adds the element wise, but np.array ((df1, df2)) does nothing at all that I want.

Does pandas have a way to do this without restoring the whole element of an object by element?

eg. if I have

df1 = col1 col2 1 1 2 2 3 4 df2 = col1 col2 1 5 6 2 7 8 

and want

 df2 = col1 col2 1 [1,5] [2,6] 2 [3,7] [4,8] 
+4
source share
1 answer

I would use the Panel structure for this:

 In [11]: p = pd.Panel({'df1': df1, 'df2': df2}) In [12]: p['df1'] Out[12]: col1 col2 1 1 2 2 3 4 

And you can apply to the main axis:

 In [13]: p.apply(np.sum, axis='major') # use linalg function here instead of np.sum Out[13]: df1 df2 col1 4 12 col2 6 14 

Note: for each pair (df, col) you use a numpy array:

 In [21]: def f(x): print(repr(x)) return 1 In [22]: p.apply(f, 'major') array([1, 3]) array([2, 4]) array([5, 7]) array([6, 8]) Out[22]: df1 df2 col1 1 1 col2 1 1 

You can choose another numpy / linalg function (or create your own).

Update: this is actually not quite what you want, you need to use the axis of the elements:

 In [31]: p.apply(f, 'items') array([1, 5]) array([2, 6]) array([3, 7]) array([4, 8]) Out[31]: col1 col2 1 1 1 2 1 1 
+4
source

Source: https://habr.com/ru/post/1502887/


All Articles