My goal is to generate 7 numbers within the range of min and max, which correspond to a Pearson correlation coefficient of more than 0.95. I managed to get 3 numbers (obviously, because this is not very computationally demanding). However, for 4 numbers, the required calculation seems very large (i.e., the Order of 10k iterations). 7 numbers would be almost impossible with the current code.
Current Code:
def pearson_def(x, y):
assert len(x) == len(y)
n = len(x)
assert n > 0
avg_x = average(x)
avg_y = average(y)
diffprod = 0
xdiff2 = 0
ydiff2 = 0
for idx in range(n):
xdiff = x[idx] - avg_x
ydiff = y[idx] - avg_y
diffprod += xdiff * ydiff
xdiff2 += xdiff * xdiff
ydiff2 += ydiff * ydiff
return diffprod / math.sqrt(xdiff2 * ydiff2)
c1_high = 98
c1_low = 75
def corr_gen():
container =[]
x=0
while True:
c1 = c1_low
c2 = np.random.uniform(c1_low, c1_high)
c3 = c1_high
container.append(c1)
container.append(c2)
container.append(c3)
y = np.arange(len(container))
if pearson_def(container,y) >0.95:
c4 = np.random.uniform(c1_low, c1_high)
container.append(c4)
y = np.arange(len(container))
if pearson_def(container,y) >0.95:
return container
else:
continue
else:
x+=1
print(x)
continue
corrcheck = corr_gen()
print(corrcheck)
Final goal:
* To have 4 columns with linear distribution (with evenly spaced points)
* Each line corresponds to a group of elements (C1, C2, C3, C4), and their sum must be equal to 100.
C1 C2 C3 C4 sum range
1 70 10 5 1 100 ^
2 .. |
3 .. |
4 .. |
5 .. |
6 .. |
7 90 20 15 3 _
Distribution example for two theoretical components:
