Weighted correlation coefficient with pandas

Question

Weighted correlation coefficient with pandas

Is there a way to calculate a weighted correlation coefficient with pandas? I saw that R has such a method. In addition, I would like to get the correlation value p. I did not find this in R. Wikipedia link to explain the weighted correlation: https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Weighted_correlation_coefficient

+4

python pandas correlation pearson-correlation

Yehuda karlinsky Jul 28 '16 at 16:14

source share

1 answer

root · Answer 1 · 2016-07-28T22:11:43+0000

I don't know a single Python package that implements this, but it should be pretty simple to collapse your own implementation. Using wikipedia article naming conventions:

def m(x, w):
    """Weighted Mean"""
    return np.sum(x * w) / np.sum(w)

def cov(x, y, w):
    """Weighted Covariance"""
    return np.sum(w * (x - m(x, w)) * (y - m(y, w))) / np.sum(w)

def corr(x, y, w):
    """Weighted Correlation"""
    return cov(x, y, w) / np.sqrt(cov(x, x, w) * cov(y, y, w))

, . , @Alberto Garcia-Raboso, m(x, w) np.average(x, weights=w), .

, . , , , .. x = np.asarray(x), , . , ..

:

# Initialize a DataFrame.
np.random.seed([3,1415])
n = 10**6
df = pd.DataFrame({
    'x': np.random.choice(3, size=n),
    'y': np.random.choice(4, size=n),
    'w': np.random.random(size=n)
    })

# Compute the correlation.
r = corr(df['x'], df['y'], df['w'])

p-. , , .

Weighted correlation coefficient with pandas

More articles: