How to create new values in pandas data column based on values from another column

Question

How to create new values in pandas data column based on values from another column

I have pandas dataframe values that I am reading from a csv file. I have a column labeled "SleepQuality", and the values are floating point from 0.0 to 100.0. I want to create a new column called "SleepQualityGroup", where the values from the original btw column 0 - 49 have a value of 0 in the new column, 50 - 59 = 1, 60 - 69 = 2, 70 - 79 = 3, 80 - 89 = 4 and 90 - 100 = 5

What would be the best formula for this? I focused on the logic needed to identify all the values in each range and assign a new value.

An example of what will look in the next column of "SleepQualityGroup", as shown below:

SleepQuality SleepQualityGroup 80.4 4 90.1 5 66.4 2 50.3 1 86.2 4 75.4 3 45.7 0 91.5 5 61.3 2 54 1 58.2 1

+5

python numpy pandas dataframe

Dom b Oct 3 '17 at 14:35

source share

2 answers

This is basically a binning operation. How you can use two tools here.

Using np.searchsorted -

 bins = np.arange(50,100,10) df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality)

Using np.digitize -

 df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins)

Output Example -

 In [866]: df Out[866]: SleepQuality SleepQualityGroup 0 80.4 4 1 90.1 5 2 66.4 2 3 50.3 1 4 86.2 4 5 75.4 3 6 45.7 0 7 91.5 5 8 61.3 2 9 54.0 1 10 58.2 1

Runtime Test -

 In [921]: df Out[921]: SleepQuality SleepQualityGroup 0 80.4 4 1 90.1 5 2 66.4 2 3 50.3 1 4 86.2 4 5 75.4 3 6 45.7 0 7 91.5 5 8 61.3 2 9 54.0 1 10 58.2 1 In [922]: df = pd.concat([df]*10000,axis=0) # @Dark soln using pd.cut In [923]: %timeit df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5]) 1000 loops, best of 3: 1.04 ms per loop In [926]: %timeit df['SleepQualityGroup'] = bins.searchsorted(df.SleepQuality) 1000 loops, best of 3: 591 µs per loop In [927]: %timeit df['SleepQualityGroup'] = np.digitize(df.SleepQuality, bins) 1000 loops, best of 3: 538 µs per loop

+6

Divakar Oct 3 '17 at 14:38

source share

Dark · Accepted Answer · 2017-10-03T14:41:03+0000

Use pd.cut ie

 df['new'] = pd.cut(df['SleepQuality'],bins=[0,50 , 60, 70 , 80 , 90,100], labels=[0,1,2,3,4,5])

Output:

  SleepQuality SleepQualityGroup new
 0 80.4 4 4
 1 90.1 5 5
 2 66.4 2 2
 3 50.3 1 1
 4 86.2 4 4
 5 75.4 3 3
 6 45.7 0 0
 7 91.5 5 5
 8 61.3 2 2
 9 54.0 1 1
 10 58.2 1 1

How to create new values ​​in pandas data column based on values ​​from another column

More articles:

How to create new values in pandas data column based on values from another column