Pandas column dictionaries list

I have a dataset as shown below:

name    status    number   message
matt    active    12345    [job:  , money: none, wife: none]
james   active    23456    [group: band, wife: yes, money: 10000]
adam    inactive  34567    [job: none, money: none, wife:  , kids: one, group: jail]

How can I extract pairs of key values ​​and turn them into an extensible data framework?

Expected Result:

name    status   number    job    money    wife    group   kids 
matt    active   12345     none   none     none    none    none
james   active   23456     none   10000    none    band    none
adam    inactive 34567     none   none     none    none    one

A message contains several different types of keys.

Any help would be greatly appreciated.

+4
source share
2 answers

It is not simple.

You need to convert the values ​​to listfrom dictusing replace( \s+- one or more spaces), and then use ast.

Then you can use the constructor DataFramewith concat, popremove the column from df:

import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'], 
                                ['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)

df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
   kids  money group   job  money  wife
0   NaN   none   NaN  none    NaN  none
1   NaN    NaN  band   NaN  10000   yes
2   one    NaN  jail  none   none  none

df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number  kids  money group   job  money  wife
0   matt    active   12345   NaN   none   NaN  none    NaN  none
1  james    active   23456   NaN    NaN  band   NaN  10000   yes
2   adam  inactive   34567   one    NaN  jail  none   none  none

EDIT:

Another solution with yaml:

import yaml

df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)

df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
  group   job kids  money  wife
0   NaN  None  NaN   none  none
1  band   NaN  NaN  10000  True
2  jail  none  one   none  None

df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number group   job kids  money  wife
0   matt    active   12345   NaN  None  NaN   none  none
1  james    active   23456  band   NaN  NaN  10000  True
2   adam  inactive   34567  jail  none  one   none  None
+4

, , :

pd.concat([data.drop(['message'], axis=1), data['message'].apply(pd.Series)], axis=1)
+1

Source: https://habr.com/ru/post/1673242/


All Articles