The sum of each two columns in a Pandas dataframe

Question

The sum of each two columns in a Pandas dataframe

When I use Pandas, I have a problem. My task is this:

df=pd.DataFrame([(1,2,3,4,5,6),(1,2,3,4,5,6),(1,2,3,4,5,6)],columns=['a','b','c','d','e','f'])
Out:
    a b c d e f
0   1 2 3 4 5 6
1   1 2 3 4 5 6 
2   1 2 3 4 5 6

what I want to do is that the output frame looks like this:

Out:
    s1   s2   s3
0   3    7    11
1   3    7    11
2   3    7    11

In other words, sum the columns (a, b), (c, d), (e, f) separately and rename the column names of the results as (s1, s2, s3). Can anyone help solve this problem in Pandas? Thank you very much.

+4

python pandas dataframe

spind Nov 17 '16 at 17:06

source share

1 answer

Nickil Maveli · Answer 1 · 2016-11-17T17:15:38+0000

1) Run the groupbywrt columns by providing axis=1. According to @Boud's comment, you are exactly getting what you want, with a little tweaking in the grouping array:

df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

Grouping is performed in accordance with this condition:

np.arange(len(df.columns)) // 2
# array([0, 0, 1, 1, 2, 2], dtype=int32)

2) np.add.reduceat, :

df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
df.add_prefix('s')

:

DF 1 , 20 :

from string import ascii_lowercase
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20]))
df.shape
(1000000, 20)

def with_groupby(df):
    return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')

def with_reduceat(df):
    df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
    df.columns = df.columns + 1
    return df.add_prefix('s')

# test whether they give the same o/p
with_groupby(df).equals(with_groupby(df))
True

%timeit with_groupby(df.copy())
1 loop, best of 3: 1.11 s per loop

%timeit with_reduceat(df.copy())   # <--- (>3X faster)
1 loop, best of 3: 345 ms per loop

The sum of each two columns in a Pandas dataframe

More articles: