Creating a dataframe from a dictionary where records have different lengths

Question

Creating a dataframe from a dictionary where records have different lengths

Say I have a dictionary with 10 key-value pairs. Each entry contains a numpy array. However, the length of the array is not the same for all of them.

How to create a data framework where each column contains a different record?

When I try:

pd.DataFrame(my_dict)

I get:

 ValueError: arrays must all be the same length

Any way to overcome this? I'm glad Pandas used NaN to populate these columns for shorter entries.

+86

python pandas

Josh Nov 01 '13 at 21:59

source share

7 answers

Here is an easy way to do this:

 In[20]: my_dict = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) ) In[21]: df = pd.DataFrame.from_dict(my_dict, orient='index') In[22]: df Out[22]: 0 1 2 3 A 1 2 NaN NaN B 1 2 3 4 In[23]: df.transpose() Out[23]: AB 0 1 1 1 2 2 2 NaN 3 3 NaN 4

+70

dezzan Aug 9 '14 at 10:06

source share

The following is a way to tidy up your syntax, but essentially do the same as the other answers:

 >>> mydict = {'one': [1,2,3], 2: [4,5,6,7], 3: 8} >>> dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in mydict.items() }) >>> dict_df one 2 3 0 1.0 4 8.0 1 2.0 5 NaN 2 3.0 6 NaN 3 NaN 7 NaN

A similar syntax exists for lists:

 >>> mylist = [ [1,2,3], [4,5], 6 ] >>> list_df = pd.DataFrame([ pd.Series(value) for value in mylist ]) >>> list_df 0 1 2 0 1.0 2.0 3.0 1 4.0 5.0 NaN 2 6.0 NaN NaN

Another syntax for lists:

 >>> mylist = [ [1,2,3], [4,5], 6 ] >>> list_df = pd.DataFrame({ i:pd.Series(value) for i, value in enumerate(mylist) }) >>> list_df 0 1 2 0 1 4.0 6.0 1 2 5.0 NaN 2 3 NaN NaN

In all of these cases, you should be careful to check which pandas data type will be guessed for your columns. Columns containing any (missing) NaN values will be converted, for example, to a floating point number.

+10

Orangeherbet May 03 '18 at 23:00

source share

Although this does not directly answer the OP question. I found this a great solution for my case, when I had unequal arrays, and I would like to share:

from pandas documentation

 In [31]: d = {'one' : Series([1., 2., 3.], index=['a', 'b', 'c']), ....: 'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} ....: In [32]: df = DataFrame(d) In [33]: df Out[33]: one two a 1 1 b 2 2 c 3 3 d NaN 4

+3

user2015487 Sep 03 '15 at 18:35

source share

You can also use pd.concat along axis=1 with a list of pd.Series objects:

 import pandas as pd, numpy as np d = {'A': np.array([1,2]), 'B': np.array([1,2,3,4])} res = pd.concat([pd.Series(v, name=k) for k, v in d.items()], axis=1) print(res) AB 0 1.0 1 1 2.0 2 2 NaN 3 3 NaN 4

+3

jpp 12 sept '18 at 20:16

source share

Both of the following lines work fine:

 pd.DataFrame.from_dict(df, orient='index').transpose() #A pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in df.items() ])) #B (Better)

But with% timeit on Jupyter, I have a 4x speed ratio for B versus A, which is quite impressive, especially when working with a huge dataset (mostly with a lot of columns / functions).

+1

Ismail El Hachimi Mar 19 '19 at 9:26

source share

If you do not want it to display NaN , and you have two specific lengths, adding a “space” to each remaining cell will also work.

 import pandas long = [6, 4, 7, 3] short = [5, 6] for n in range(len(long) - len(short)): short.append(' ') df = pd.DataFrame({'A':long, 'B':short}] # Make sure Excel file exists in the working directory datatoexcel = pd.ExcelWriter('example1.xlsx',engine = 'xlsxwriter') df.to_excel(datatoexcel,sheet_name = 'Sheet1') datatoexcel.save() AB 0 6 5 1 4 6 2 7 3 3

If you have more than 2 record lengths, it is recommended that you create a function that uses a similar method.

+1

Rohan chandratre Aug 08 '19 at 16:19

source share

Jeff · Accepted Answer · 2013-11-01T22:27:02+0000

In Python 3.x:

 In [6]: d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) ) In [7]: DataFrame(dict([ (k,Series(v)) for k,v in d.items() ])) Out[7]: AB 0 1 1 1 2 2 2 NaN 3 3 NaN 4

In Python 2.x:

replace d.items() with d.iteritems() .

Creating a dataframe from a dictionary where records have different lengths

More articles: