Pandas: How to conditionally assign multiple columns?

Question

Pandas: How to conditionally assign multiple columns?

I want to replace negative values with nan only for specific columns. The easiest way:

 for col in ['a', 'b', 'c']: df.loc[df[col ] < 0, col] = np.nan

df can have many columns, and I want to do this only for certain columns.

Is there a way to do this on a single line? It sounds like it should be easy, but I couldn't figure it out.

+5

python numpy pandas

ezbentley Oct 17 '16 at 15:40

source share

6 answers

use loc and where

 cols = ['a', 'b', 'c'] df.loc[:, cols] = df[cols].where(df[cols].where.ge(0), np.nan)

demonstration

 df = pd.DataFrame(np.random.randn(10, 5), columns=list('abcde')) df

 cols = list('abc') df.loc[:, cols] = df[cols].where(df[cols].ge(0), np.nan) df

You can speed it up with numpy

 df[cols] = np.where(df[cols] < 0, np.nan, df[cols])

do the same thing.

time

 def gen_df(n): return pd.DataFrame(np.random.randn(n, 5), columns=list('abcde'))

since assignment is an important part of this, I create df from scratch every loop. I also added time to create df .

for n = 10000

for n = 100000

+6

piRSquared Oct 17 '16 at 15:49

source share

Here is the way:

 df[df.columns.isin(['a', 'b', 'c']) & (df < 0)] = np.nan

+5

ayhan Oct 17 '16 at 15:50

source share

You can use np.where to achieve this:

 In [47]: df = pd.DataFrame(np.random.randn(5,5), columns=list('abcde')) df Out[47]: abcde 0 0.616829 -0.933365 -0.735308 0.665297 -1.333547 1 0.069158 2.266290 -0.068686 -0.787980 -0.082090 2 1.203311 1.661110 -1.227530 -1.625526 0.045932 3 -0.247134 -1.134400 0.355436 0.787232 -0.474243 4 0.131774 0.349103 -0.632660 -1.549563 1.196455 In [48]: df[['a','b','c']] = np.where(df[['a','b','c']] < 0, np.NaN, df[['a','b','c']]) df Out[48]: abcde 0 0.616829 NaN NaN 0.665297 -1.333547 1 0.069158 2.266290 NaN -0.787980 -0.082090 2 1.203311 1.661110 NaN -1.625526 0.045932 3 NaN NaN 0.355436 0.787232 -0.474243 4 0.131774 0.349103 NaN -1.549563 1.196455

+4

Edchum Oct 17 '16 at 15:51

source share

Of course, just select the masks you need from the mask:

 (df < 0)[['a', 'b', 'c']]

You can use this mask in df[(df < 0)[['a', 'b', 'c']]] = np.nan .

+3

David z Oct 17 '16 at 15:55

source share

If it should be single line:

 df[['a', 'b', 'c']] = df[['a', 'b', 'c']].apply(lambda c: [x>0 and x or np.nan for x in c])

+1

Tammo heeren Oct 17 '16 at 15:53

source share

blacksite · Accepted Answer · 2016-10-17T15:55:19+0000

I do not think that you will be much simpler than this:

 >>> df = pd.DataFrame({'a': np.arange(-5, 2), 'b': np.arange(-5, 2), 'c': np.arange(-5, 2), 'd': np.arange(-5, 2), 'e': np.arange(-5, 2)}) >>> df abcde 0 -5 -5 -5 -5 -5 1 -4 -4 -4 -4 -4 2 -3 -3 -3 -3 -3 3 -2 -2 -2 -2 -2 4 -1 -1 -1 -1 -1 5 0 0 0 0 0 6 1 1 1 1 1 >>> df[df[cols] < 0] = np.nan >>> df abcde 0 NaN NaN NaN -5 -5 1 NaN NaN NaN -4 -4 2 NaN NaN NaN -3 -3 3 NaN NaN NaN -2 -2 4 NaN NaN NaN -1 -1 5 0.0 0.0 0.0 0 0 6 1.0 1.0 1.0 1 1

Pandas: How to conditionally assign multiple columns?

More articles: