Calculate by grouping for each column once

Question

Calculate by grouping for each column once

I have an example data frame as follows. I am trying to calculate the data for each column by combining them together on the "Sample_ID" column. That is, I would calculate the mean and standard deviation for the first column of each group "Sample_ID" (1, 2 and 3). I can do this for one or even several columns. For my new data, I have 100 columns.

df = pd.DataFrame([[1, 1.0, 2.3,0.2,0.53], [2, 3.35, 2.0,0.2,0.65], [2,3.4, 
           2.0,0.25,0.55], [3,3.4,2.0,0.25,0.55], [1,3.4,2.0,0.25,0.55], 
           [3,3.4,2.0,0.25,0.55]], 
           columns=["Sample_ID", "NaX", "NaU","OC","EC"])\
           .set_index('Sample_ID')

Is there a way to iterate over each column and save them? Here is a calculation example for one data column, I need to do this calculation for 100 data columns.

Thanks for reading this!

OC_UNC=100*np.sqrt((((df.groupby(['Sample_ID'])['OC'].std()
         /df.groupby(['Sample_ID'])['OC'].mean())**2).sum()
           )/len((df.groupby(['Sample_ID'])['OC'].count())))

+4

python numpy pandas

Suresh raja Aug 1 '17 at 20:13

source share

2 answers

,

df.groupby('Sample_ID').describe()

            NaX                                                      NaU        ...       OC          EC                                                
          count   mean       std   min     25%    50%     75%  max count  mean  ...      75%   max count  mean       std   min    25%   50%    75%   max
Sample_ID                                                                       ...                                                                     
1           2.0  2.200  1.697056  1.00  1.6000  2.200  2.8000  3.4   2.0  2.15  ...   0.2375  0.25   2.0  0.54  0.014142  0.53  0.535  0.54  0.545  0.55
2           2.0  3.375  0.035355  3.35  3.3625  3.375  3.3875  3.4   2.0  2.00  ...   0.2375  0.25   2.0  0.60  0.070711  0.55  0.575  0.60  0.625  0.65
3           2.0  3.400  0.000000  3.40  3.4000  3.400  3.4000  3.4   2.0  2.00  ...   0.2500  0.25   2.0  0.55  0.000000  0.55  0.550  0.55  0.550  0.55

+4

piRSquared 01 . '17 20:24

MaxU · Accepted Answer · 2017-08-01T20:15:01+0000

IIUC:

In [31]: df.groupby('Sample_ID').agg('std')
Out[31]:
                NaX       NaU        OC        EC
Sample_ID
1          1.697056  0.212132  0.035355  0.014142
2          0.035355  0.000000  0.035355  0.070711
3          0.000000  0.000000  0.000000  0.000000

: mean std:

In [32]: df.groupby('Sample_ID').agg(['mean','std'])
Out[32]:
             NaX             NaU               OC              EC
            mean       std  mean       std   mean       std  mean       std
Sample_ID
1          2.200  1.697056  2.15  0.212132  0.225  0.035355  0.54  0.014142
2          3.375  0.035355  2.00  0.000000  0.225  0.035355  0.60  0.070711
3          3.400  0.000000  2.00  0.000000  0.250  0.000000  0.55  0.000000

Calculate by grouping for each column once

More articles: