I think you can create a new DataFrame
on comprehension
and then create a new column, for example:
df1 = pd.DataFrame([x for x in df['dic']]) print df1 Feature1 Feature2 Feature3 0 aa1 bb1 cc2 1 aa2 bb2 NaN 2 aa1 cc1 NaN df['Feature3'] = df1['Feature3'] print df dic num Feature3 0 {u'Feature2': u'bb1', u'Feature3': u'cc2', u'F... num1 cc2 1 {u'Feature2': u'bb2', u'Feature1': u'aa2'} num2 NaN 2 {u'Feature2': u'cc1', u'Feature1': u'aa1'} num3 NaN
Or one line:
df['Feature3'] = pd.DataFrame([x for x in df['dic']])['Feature3'] print df dic num Feature3 0 {u'Feature2': u'bb1', u'Feature3': u'cc2', u'F... num1 cc2 1 {u'Feature2': u'bb2', u'Feature1': u'aa2'} num2 NaN 2 {u'Feature2': u'cc1', u'Feature1': u'aa1'} num3 NaN
Delay
len(df) = 3
:
In [24]: %timeit pd.DataFrame([x for x in df['dic']]) The slowest run took 4.63 times longer than the fastest. This could mean that an intermediate result is being cached 1000 loops, best of 3: 596 Β΅s per loop In [25]: %timeit df.dic.apply(pn.Series) 1000 loops, best of 3: 1.43 ms per loop
len(df) = 3000
:
In [27]: %timeit pd.DataFrame([x for x in df['dic']]) 100 loops, best of 3: 3.16 ms per loop In [28]: %timeit df.dic.apply(pn.Series) 1 loops, best of 3: 748 ms per loop
source share