Using pandas data frame below taken from dict dict:
import numpy as np import pandas as pd from scipy.stats import pearsonr NaN = np.nan dd ={'A': {'A': '1', 'B': '2', 'C': '3'}, 'B': {'A': '4', 'B': '5', 'C': '6'}, 'C': {'A': '7', 'B': '8', 'C': '9'}} df_link_link = pd.DataFrame.from_dict(dd, orient='index')
I would like to create a new pandas DataFrame with the results of Pearson correlation between the rows for each row, excluding Pearson correlations between the same rows (A correlation with itself should be only NaN . Here as dictation dictations:
dict_congruent = {'A': {'A': NaN, 'B': pearsonr([NaN,2,3],[4,5,6]), 'C': pearsonr([NaN,2,3],[7,8,9])}, 'B': {'A': pearsonr([4,NaN,6],[1,2,3]), 'B': NaN, 'C': pearsonr([4,NaN,6],[7,8,9])}, 'C': {'A': pearsonr([7,8,NaN],[1,2,3]), 'B': pearsonr([7,8,NaN],[4,5,6]), 'C': NaN }}
where NaN is just numpy.nan . Is there a way to do this as an operation in pandas without repeating through dict of dicts? I have ~ 76 million pairs, so a non-iterative approach would be great if it exists.