How to create new values ​​in pandas data column based on values ​​from another column

I have pandas dataframe values ​​that I am reading from a csv file. I have a column labeled "SleepQuality", and the values ​​are floating point from 0.0 to 100.0. I want to create a new column called "SleepQualityGroup", where the values ​​from the original btw column 0 - 49 have a value of 0 in the new column, 50 - 59 = 1, 60 - 69 = 2, 70 - 79 = 3, 80 - 89 = 4 and 90 - 100 = 5

What would be the best formula for this? I focused on the logic needed to identify all the values ​​in each range and assign a new value.

An example of what will look in the next column of "SleepQualityGroup", as shown below:

SleepQuality SleepQualityGroup 80.4 4 90.1 5 66.4 2 50.3 1 86.2 4 75.4 3 45.7 0 91.5 5 61.3 2 54 1 58.2 1 
+5
source share
2 answers

Use pd.cut ie

 df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5]) 

Output:

  SleepQuality SleepQualityGroup new
 0 80.4 4 4
 1 90.1 5 5
 2 66.4 2 2
 3 50.3 1 1
 4 86.2 4 4
 5 75.4 3 3
 6 45.7 0 0
 7 91.5 5 5
 8 61.3 2 2
 9 54.0 1 1
 10 58.2 1 1
+10
source

This is basically a binning operation. How you can use two tools here.

Using np.searchsorted -

 bins = np.arange(50,100,10) df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality) 

Using np.digitize -

 df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins) 

Output Example -

 In [866]: df Out[866]: SleepQuality SleepQualityGroup 0 80.4 4 1 90.1 5 2 66.4 2 3 50.3 1 4 86.2 4 5 75.4 3 6 45.7 0 7 91.5 5 8 61.3 2 9 54.0 1 10 58.2 1 

Runtime Test -

 In [921]: df Out[921]: SleepQuality SleepQualityGroup 0 80.4 4 1 90.1 5 2 66.4 2 3 50.3 1 4 86.2 4 5 75.4 3 6 45.7 0 7 91.5 5 8 61.3 2 9 54.0 1 10 58.2 1 In [922]: df = pd.concat([df]*10000,axis=0) # @Dark soln using pd.cut In [923]: %timeit df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5]) 1000 loops, best of 3: 1.04 ms per loop In [926]: %timeit df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality) 1000 loops, best of 3: 591 µs per loop In [927]: %timeit df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins) 1000 loops, best of 3: 538 µs per loop 
+6
source

Source: https://habr.com/ru/post/1272305/


All Articles