The number of unique values โ€‹โ€‹for each column by group

Consider the following data file:

ABE 0 bar one 1 1 bar three 1 2 flux six 1 3 flux three 2 4 foo five 2 5 foo one 1 6 foo two 1 7 foo two 2 

For each value of A I would like to find the number of unique values โ€‹โ€‹in the other columns.

  • I thought the following:

     df.groupby('A').apply(lambda x: x.nunique()) 

    but I get an error:

     AttributeError: 'DataFrame' object has no attribute 'nunique' 
  • I also tried:

     df.groupby('A').nunique() 

    but I also got the error:

     AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique' 
  • Finally, I tried:

     df.groupby('A').apply(lambda x: x.apply(lambda y: y.nunique())) 

    which returns:

      ABE A bar 1 2 1 flux 1 2 2 foo 1 3 2 

    and it seems right. Strange, however, it also returns column A as a result. Why?

+5
source share
2 answers

The DataFrame object DataFrame not have nunique . You must choose in which column you want to apply nunique() . You can do this with a simple dot operator:

 df.groupby('A').apply(lambda x: xBnunique()) 

will print:

 A bar 2 flux 2 foo 3 

And do:

 df.groupby('A').apply(lambda x: xEnunique()) 

will print:

 A bar 1 flux 2 foo 2 

Alternatively, you can do this with a single function call, using:

 df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()}) 

which will print:

  BE A bar 2 1 flux 2 2 foo 3 2 

To answer the question of why your recursive lambda prints column A , because when you do a groupby / apply operation, you now iterate through three DataFrame . Each DataFrame is a sub- DataFrame original. Applying operations to it will apply to each Series . There are three Series behind the DataFrame that you apply to the nunique() operator.

The first Series to be evaluated on each DataFrame is A Series , and since you made groupby on A , you know that each DataFrame has only one unique value in the A Series . This explains why you are ultimately assigned the result column A with all 1 .

+5
source

I ran into the same problem. Upgrading pandas to the latest version solved the problem for me.

 df.groupby('A').nunique() 

This code did not work for me in pandas version 0.19.2. I updated it to pandas version 0.21.1 and it worked.

You can check the version using the following code:

 print('Pandas version ' + pd.__version__) 
+1
source

Source: https://habr.com/ru/post/1207159/


All Articles