Counting combinations between two columns of a Dataframe

I would like to reformat the data framework so that it shows the number of combinations of two columns. Here's an example frame:

my_df = pd.DataFrame({'a': ['first', 'second', 'first', 'first', 'third', 'first'],
               'b': ['foo', 'foo', 'bar', 'bar', 'baz', 'baz'],
               'c': ['do', 're', 'mi', 'do', 're', 'mi'],
               'e': ['this', 'this', 'that', 'this', 'those', 'this']})

which is as follows:

        a    b   c      e
0   first  foo  do   this
1  second  foo  re   this
2   first  bar  mi   that
3   first  bar  do   this
4   third  baz  re  those
5   first  baz  mi   this

I want him to create a new data framework that takes into account combinations between aand columns c, which look like this:

c        do   mi   re
a                    
first   2.0  2.0  NaN
second  NaN  NaN  1.0
third   NaN  NaN  1.0

I can do this with pivot_tableif I set the argument valuesto some other column:

my_pivot_count1 = my_df.pivot_table(values='b', index='a', columns='c', aggfunc='count')

The problem is that column "b" may have nanvalues in it , in which case this combination will not be taken into account. For example, if my_dfit looks like this:

        a    b   c      e
0   first  foo  do   this
1  second  foo  re   this
2   first  bar  mi   that
3   first  bar  do   this
4   third  baz  re  those
5   first  NaN  mi   this

my call my_df.pivot_tablegives the following:

first   2.0  1.0  NaN
second  NaN  NaN  1.0
third   NaN  NaN  1.0

b values, values , my_df, , , , my_df['count'] = 1, my_df.reset_index(), , , , a c?

+4
3

pandas.crosstab dropna, True, False:

pd.crosstab(df['a'], df['c'], dropna=False)
# c       do  mi  re
# a                 
# first    2   2   0
# second   0   0   1
# third    0   0   1
+1

groupby/unstack :

df.groupby(by=['a', 'c']).size().unstack(level='c')

c        do   mi   re
a                    
first   2.0  2.0  NaN
second  NaN  NaN  1.0
third   NaN  NaN  1.0

fillna astype

N = (
    df.groupby(by=['a', 'c'])
      .size()
      .unstack(level='c')
      .fillna(0)
      .astype(int)
)

c       do  mi  re
a                 
first    2   2   0
second   0   0   1
third    0   0   1
+1

.fillna('x') my_df .

my_pivot_count1 = my_df.fillna('x').pivot_table(values='b', index='a', columns='c',aggfunc='count')
+1

Source: https://habr.com/ru/post/1692776/


All Articles