EDIT: the link should work now, sorry for the trouble
I have a text file that looks like this:
Name, Test 1, Test 2, Test 3, Test 4, Test 5
Bob, 86, 83, 86, 80, 23
Alice, 38, 90, 100, 53, 32
Jill, 49, 53, 63, 43, 23.
I am writing a program that gave this text file, it will create a table of Pearson correlation coefficients that looks like where the entry (x, y) is the correlation between person x and face y:
Name, Bob, Alice, Jill
Bob, 1, 0.567088412588577, 0.899798494392584
Alice, 0.567088412588577, 1, 0.812425393004088
Jill, 0.899798494392584, 0.812425393004088, 1
My program works, except that the dataset I load has 82 columns and, more importantly, 54,000 rows. When I run my program right now, it is incredibly slow and I get an error from memory. Is there a way that I can do, first of all, remove any possibility of an error from memory and, possibly, make the program more efficient? The code is here: code .
Thanks for your help,
Jack.
Edit: In case someone else is trying to perform a large-scale calculation, convert your data to hdf5 format. Here is what I did to solve this problem.