Pandas: an external product of strings and colic

In Pandas, I am trying to manually program the chi-square test. I compare row 0with row 1in the data frame below.

data
       2      3      5      10     30
0      3      0      6      5      0
1  33324  15833  58305  54402  38920

To do this, I need to calculate the expected cell count for each cell as: cell(i,j) = rowSum(i)*colSum(j) / sumAll. In R, I can do this simply by taking the outer()products:

Exp_counts <- outer(rowSums(data), colSums(data), "*")/sum(data)    # Expected cell counts

I used the numpy external product function to simulate the result of the above R code:

import numpy as np
pd.DataFrame(np.outer(data.sum(axis=1),data.sum(axis=0))/ (data.sum().sum()), index=data.index, columns=data.columns.values)
       2      3      5      10     30
0      2      1      4      3      2
1  33324  15831  58306  54403  38917

Is it possible to achieve this with the Pandas function?

+4
source share
1 answer

Complete solution using only Pandas built-in methods:

def outer_product(row):
    numerator = df.sum(1).mul(row.sum(0))
    denominator = df.sum(0).sum(0)
    return (numerator.floordiv(denominator))

df.apply(outer_product)

Picture

Timing: for 1 million lines of DF.

enter image description here

+1
source

Source: https://habr.com/ru/post/1524040/


All Articles