Add a new column and insert specific values ​​according to the specific interval in python

How to add a new column to the pandas framework and insert 1 for all values ​​<= W1, 2 for all values ​​<= W2 and 3 for all values> W2?

W1=3
W2=6

This is my example:

column1 number   
2       1
1       1
5       2
6       2
7       3
8       3
3       1
+4
source share
3 answers

You can double numpy.where:

W1=3
W2=6

df['d'] = np.where(df['column1'] <= W1, 1, 
          np.where(df['column1'] <= W2, 2, 3))
print (df)
   column1  number  d
0        2       1  1
1        1       1  1
2        5       2  2
3        6       2  2
4        7       3  3
5        8       3  3
6        3       1  1

Another solution with cut, docs :

bins = [-np.inf, W1, W2, np.inf]
labels=[1,2,3]
df['d1'] = pd.cut(df['column1'], bins=bins, labels=labels)
print (df)

   column1  number  d d1
0        2       1  1  1
1        1       1  1  1
2        5       2  2  2
3        6       2  2  2
4        7       3  3  3
5        8       3  3  3
6        3       1  1  1
+6
source
df['new'] = df.column1.gt(W1).add(1).add(df.column1.gt(W2))

df

enter image description here


column1 , W1, True. False. 1, 1 0 . , 2 1 True False ( 1). , 1 W1 2 , W1. , column1 , W2, 0, W2, 1 2, column1 W2.

, ,

c = df.column1
(c > W1) + 1 + (c > W2)

0    1
1    1
2    2
3    2
4    3
5    3
6    1
Name: column1, dtype: int64
+5

np.searchsorted -

df['out'] = np.searchsorted([W1,W2],df.column1)+1

-

In [230]: df = pd.DataFrame(np.random.randint(0,10,(10000)),columns=[['column1']])

In [231]: W1,W2 = 3,6

In [232]: %timeit np.where(df['column1'] <= W1, 1,np.where(df['column1'] <= W2, 2, 3))
1000 loops, best of 3: 633 µs per loop # @jezrael soln

In [233]: %timeit df.column1.gt(W1).add(1).add(df.column1.gt(W2))
1000 loops, best of 3: 1.07 ms per loop # @piRSquared soln

In [234]: %timeit np.searchsorted([W1,W2],df.column1)+1
1000 loops, best of 3: 205 µs per loop # Using np.searchsorted

df.column1.values, np.searchsorted NumPy -

In [235]: %timeit np.searchsorted([W1,W2],df.column1.values)+1
1000 loops, best of 3: 184 µs per loop
+5
source

Source: https://habr.com/ru/post/1661738/


All Articles