Vectorized "and" columns for pandas

Question

Vectorized "and" columns for pandas

With data like this

import pandas as pd tcd = pd.DataFrame({ 'a': {'p_1': 1, 'p_2': 1, 'p_3': 0, 'p_4': 0}, 'b': {'p_1': 0, 'p_2': 1, 'p_3': 1, 'p_4': 1}, 'c': {'p_1': 0, 'p_2': 0, 'p_3': 1, 'p_4': 0}}) tcd # abc # p_1 1 0 0 # p_2 1 1 0 # p_3 0 1 1 # p_4 0 1 0

(but with columns 40e3)

I am looking for a vector way to put booleans and in the resulting row:

 a & b = ab -> 1 or True a & c = ac -> 0 or False 1 0 0 1 0 0 1 1 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0

At the moment I am only getting an ugly solution with a for :: loop

 res = pd.Series(index=['a&a', 'a&b', 'a&c']) for i in range(3): res[i] = (tcd.iloc[:, 0] & tcd.iloc[:, i]).any() res aa 1 ab 1 ac 0

with BM answer I get it

 def get_shared_p(tcd, i): res = (tcd.iloc[:, i][:, None] & tcd).any() res.index += '&_{}'.format(i) return res res = pd.DataFrame(columns=range(cols), index=range(cols)) for col_i in range(cols): res.iloc[:, col_i] = list(get_shared_p(tcd, col_i)) print res # 0 1 2 # 0 True True False # 1 True True True # 2 False True True

Perhaps we will avoid this new cycle.

+5

python vectorization numpy pandas

user3313834 Jan 25 '16 at 20:13

source share

3 answers

You can use np.logical_and and multiple broadcasts .

Let's say you define x and y as the first column and integer matrix, respectively:

 import numpy as np x = tcd.as_matrix() y = tcd.a.values.reshape((len(tcd), 1))

now, using translation, find the boolean of both x and y , and put it in and_ :

 and_ = np.logical_and(x, y)

Finally, find if any of the rows in any of the columns is true:

 >>> np.sum(and_) > 0 array([ True, True, False], dtype=bool)

+4

Ami tavory Jan 25 '16 at 20:28

source share

I would solve this problem as follows:

 import pandas as pd import numpy as np from itertools import combinations tcd = pd.DataFrame({ 'a': {'p_1': 1, 'p_2': 1, 'p_3': 0, 'p_4': 0}, 'b': {'p_1': 0, 'p_2': 1, 'p_3': 1, 'p_4': 1}, 'c': {'p_1': 0, 'p_2': 0, 'p_3': 1, 'p_4': 0}}) for c in combinations(tcd.columns, 2): tcd[c[0]+c[1]] = np.logical_and(tcd[c[0]], tcd[c[1]]) print(cd)

With an exit:

  abc ab ac bc p_1 1 0 0 False False False p_2 1 1 0 True False False p_3 0 1 1 False False True p_4 0 1 0 False False False

+1

George petrov Jan 25 '16 at 20:30

source share

BM · Accepted Answer · 2016-01-25T21:52:09+0000

Use [:,None] for data alignment and forced broadcasting:

 In[1] : res=(tcd.a[:,None] & tcd).any(); res.index+='&a'; res Out[1]: a&a True b&a True c&a False dtype: bool

Vectorized "and" columns for pandas

More articles: