In Pandas, I am trying to manually program the chi-square test. I compare row 0with row 1in the data frame below.
data
2 3 5 10 30
0 3 0 6 5 0
1 33324 15833 58305 54402 38920
To do this, I need to calculate the expected cell count for each cell as: cell(i,j) = rowSum(i)*colSum(j) / sumAll. In R, I can do this simply by taking the outer()products:
Exp_counts <- outer(rowSums(data), colSums(data), "*")/sum(data)
I used the numpy external product function to simulate the result of the above R code:
import numpy as np
pd.DataFrame(np.outer(data.sum(axis=1),data.sum(axis=0))/ (data.sum().sum()), index=data.index, columns=data.columns.values)
2 3 5 10 30
0 2 1 4 3 2
1 33324 15831 58306 54403 38917
Is it possible to achieve this with the Pandas function?
source
share