Extract dictionary value from column in data frame

Question

Extract dictionary value from column in data frame

I am looking for a way to optimize my code.

I have input data in this form:

import pandas as pn a=[{'Feature1': 'aa1','Feature2': 'bb1','Feature3': 'cc2' }, {'Feature1': 'aa2','Feature2': 'bb2' }, {'Feature1': 'aa1','Feature2': 'cc1' } ] b=['num1','num2','num3'] df= pn.DataFrame({'num':b, 'dic':a })

I would like to extract the “Feature3” element from the dictionaries in the “dic” column (if any) in the previous data frame. So far, I have been able to solve this problem, but I do not know if this is the fastest way, it seems a little more complicated.

 Feature3=[] for idx, row in df['dic'].iteritems(): l=row.keys() if 'Feature3' in l: Feature3.append(row['Feature3']) else: Feature3.append(None) df['Feature3']=Feature3 print df

Is there a better / faster / easier way to extract this Feature3 to split a column in a dataframe?

Thanks in advance for your help.

+5

python pandas

michalk Feb 29 '16 at 22:32

source share

5 answers

If you apply a Series , you will get a pretty nice DataFrame :

 >>> df.dic.apply(pn.Series) Feature1 Feature2 Feature3 0 aa1 bb1 cc2 1 aa2 bb2 NaN 2 aa1 cc1 NaN

From now on, you can simply use regular pandas operations.

+2

Ami tavory Feb 29 '16 at 10:42

source share

 df['Feature3'] = df['dic'].apply(lambda x: x.get('Feature3'))

Agree with maxymoo. Consider changing the format of your frame.

(Sidenote: pandas is usually imported as pd)

+2

as133 Mar 01 '16 at 1:34

source share

I think you can create a new DataFrame on comprehension and then create a new column, for example:

 df1 = pd.DataFrame([x for x in df['dic']]) print df1 Feature1 Feature2 Feature3 0 aa1 bb1 cc2 1 aa2 bb2 NaN 2 aa1 cc1 NaN df['Feature3'] = df1['Feature3'] print df dic num Feature3 0 {u'Feature2': u'bb1', u'Feature3': u'cc2', u'F... num1 cc2 1 {u'Feature2': u'bb2', u'Feature1': u'aa2'} num2 NaN 2 {u'Feature2': u'cc1', u'Feature1': u'aa1'} num3 NaN

Or one line:

 df['Feature3'] = pd.DataFrame([x for x in df['dic']])['Feature3'] print df dic num Feature3 0 {u'Feature2': u'bb1', u'Feature3': u'cc2', u'F... num1 cc2 1 {u'Feature2': u'bb2', u'Feature1': u'aa2'} num2 NaN 2 {u'Feature2': u'cc1', u'Feature1': u'aa1'} num3 NaN

Delay

len(df) = 3 :

 In [24]: %timeit pd.DataFrame([x for x in df['dic']]) The slowest run took 4.63 times longer than the fastest. This could mean that an intermediate result is being cached 1000 loops, best of 3: 596 µs per loop In [25]: %timeit df.dic.apply(pn.Series) 1000 loops, best of 3: 1.43 ms per loop

len(df) = 3000 :

 In [27]: %timeit pd.DataFrame([x for x in df['dic']]) 100 loops, best of 3: 3.16 ms per loop In [28]: %timeit df.dic.apply(pn.Series) 1 loops, best of 3: 748 ms per loop

+1

jezrael Feb 29 '16 at 10:39

source share

I think you are a little mistaken in data structures. It is better to create a data frame with functions as columns from the very beginning; pandas is actually smart enough to do this by default:

 In [240]: pd.DataFrame(a) Out[240]: Feature1 Feature2 Feature3 0 aa1 bb1 cc2 1 aa2 bb2 NaN 2 aa1 cc1 NaN

Then you would add your “num” column in a separate step, since the data is in a different orientation, or using

 df['num'] = b

or

 df = df.assign(num = b)

(I prefer the second option, since it got a more functional taste).

+1

maxymoo Feb 29 '16 at 10:54

source share

Alexander · Accepted Answer · 2016-02-29T22:57:01+0000

You can use list comprehension to extract function 3 from each row in your data framework by returning a list.

 feature3 = [d.get('Feature3') for d in df.dic]

If "Feature3" is not in dic , by default it returns None.

You do not even need pandas.

 feature3 = [d.get('Feature3') for d in a]

Extract dictionary value from column in data frame

More articles: