Pandas: how to get unique values ​​for a column that contains a list of values?

Consider the following data block

df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']],
                   'col' : ['A','B','A','B','A','B']})       
df.sort_values(by='col',inplace=True)

df
Out[62]: 
  col                   name
0   A  [one two, three four]
2   A                     []
4   A              [one two]
1   B                  [one]
3   B                     []
5   B                [three]

I would like to get a column that keeps track of all the unique rows included in namefor each combination col.

That is the expected result

df
Out[62]: 
  col                   name    unique_list
0   A  [one two, three four]    [one two, three four]
2   A                     []    [one two, three four]
4   A              [one two]    [one two, three four]
1   B                  [one]    [one, three]
3   B                     []    [one, three]
5   B                [three]    [one, three]

Indeed, let's say, for group A you can see that the unique set of lines included in [one two, three four], []and [one two], is[one two]

I can get the corresponding number of unique values ​​using Pandas: how to get a unique number of values ​​in cells when the cells contain lists? :

df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique())))


df
Out[65]: 
  col                   name count_unique
0   A  [one two, three four]            2
2   A                     []            2
4   A              [one two]            2
1   B                  [one]            2
3   B                     []            2
5   B                [three]            2

but the replacement nuniquefor the uniqueabove is not performed.

Any ideas? Thank!

+4
2

df['unique_list'] = df.col.map(df.groupby('col')['name'].sum().apply(np.unique))
    df

enter image description here

+2

Try:

uniq_df = df.groupby('col')['name'].apply(lambda x: list(set(reduce(lambda y,z: y+z,x)))).reset_index()
uniq_df.columns = ['col','uniq_list']
pd.merge(df,uniq_df, on='col', how='left')

:

  col                   name              uniq_list
0   A  [one two, three four]  [one two, three four]
1   A                     []  [one two, three four]
2   A              [one two]  [one two, three four]
3   B                  [one]           [three, one]
4   B                     []           [three, one]
5   B                [three]           [three, one]
+2

Source: https://habr.com/ru/post/1654622/


All Articles