Create a pandas data frame from a list of numpy arrays

I wonder if there is an easy way for an obvious task to generate a pandas DataFrame from a list of numpy arrays, where the columns are arrays. The default behavior seems to allow arrays to be strings, which I totally don't understand why. Here is a quick example:

names = ['data1', 'data2', 'data3'] data = [np.arange(10) for _ in names] df = pd.DataFrame(data=data, columns=names) 

This gives an error indicating that pandas is expecting 10 columns.

If i do

 df = pd.DataFrame(data=data) 

I get a DataFrame with 10 columns and three rows.

Given that it is generally much more difficult to add rows than columns in a DataFrame. I am curious about this behavior, for example. let's say I quickly want to put the 4th data array into a DataFrame. I want the data to be organized in columns to make

 df['data4'] = new_array 

How to quickly create a DataFrame that I want?

+8
source share
2 answers

I would use .from_items :

 pd.DataFrame.from_items(zip(names, data)) 

which gives

  data1 data2 data3 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 

It should also be faster than transpose:

 %timeit pd.DataFrame.from_items(zip(names, data)) 

1000 loops, best of 3: 281 μs per loop

 %timeit pd.DataFrame(data, index=names).T 

1000 loops, best of 3: 730 μs per loop

Adding a fourth column is also quite simple:

 df['data4'] = range(1, 11) 

which gives

  data1 data2 data3 data4 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 

EDIT:

As @jezrael mentioned, the third option would be (be careful: order is not guaranteed)

 pd.DataFrame(dict(zip(names, data)), columns=names) 

Dates:

 %timeit pd.DataFrame(dict(zip(names, data))) 

1000 loops, best of 3: 281 μs per loop

+7
source

There are many ways to solve your problem, but the easiest way is df.T ( T shorthand for pandas.DataFrame.transpose ):

 >>> df = pd.DataFrame(data=data, index=names) >>> df 0 1 2 3 4 5 6 7 8 9 data1 0 1 2 3 4 5 6 7 8 9 data2 0 1 2 3 4 5 6 7 8 9 data3 0 1 2 3 4 5 6 7 8 9 >>> df.T data1 data2 data3 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 
+2
source

Source: https://habr.com/ru/post/1265779/


All Articles