How to calculate a random series of values ​​(B) that have a given correlation with a given series (A)

For an educational website, my goal is to allow students to fool themselves with several sets of values ​​and their college. For example, students can enter two arrays for which correlation is calculated:

$array_x = array(5,3,6,7,4,2,9,5); $array_y = array(4,3,4,8,3,2,10,5); echo Correlation($array_x, $array_y); // 0.93439982209434 

The code for this works fine and can be found at the bottom of this post. However, I ran into a problem. I want the following:

  • student enters $ array_x (5,3,6,7,4,2,9,5)
  • student introduces correlation (0.9)
  • the student enters the boundaries of $ array_y (for example, between 1 and 10 or between 50 and 80).
  • script returns a random array (e.g.: 4,3,4,8,3,2,10,5) that has (about) a given correlation

So, in other words, the code should work as follows:

 $array_x = array(5,3,6,7,4,2,9,5); $boundaries = array(1, 10); $correlation = 0.9; echo ySeries($array_x, $boundaries, $correlation); // array(4,3,4,8,3,2,10,5) 

On the Stackexchange Math forum, @ilya replied (inserted as an image since formatting latex fomulas does not work on stackoverflow):

enter image description here

PS Code used to calculate correlation:

 function Correlation($arr1, $arr2) { $correlation = 0; $k = SumProductMeanDeviation($arr1, $arr2); $ssmd1 = SumSquareMeanDeviation($arr1); $ssmd2 = SumSquareMeanDeviation($arr2); $product = $ssmd1 * $ssmd2; $res = sqrt($product); $correlation = $k / $res; return $correlation; } function SumProductMeanDeviation($arr1, $arr2) { $sum = 0; $num = count($arr1); for($i=0; $i < $num; $i++) { $sum = $sum + ProductMeanDeviation($arr1, $arr2, $i); } return $sum; } function ProductMeanDeviation($arr1, $arr2, $item) { return (MeanDeviation($arr1, $item) * MeanDeviation($arr2, $item)); } function SumSquareMeanDeviation($arr) { $sum = 0; $num = count($arr); for($i = 0; $i < $num; $i++) { $sum = $sum + SquareMeanDeviation($arr, $i); } return $sum; } function SquareMeanDeviation($arr, $item) { return MeanDeviation($arr, $item) * MeanDeviation($arr, $item); } function SumMeanDeviation($arr) { $sum = 0; $num = count($arr); for($i = 0; $i < $num; $i++) { $sum = $sum + MeanDeviation($arr, $i); } return $sum; } function MeanDeviation($arr, $item) { $average = Average($arr); return $arr[$item] - $average; } function Average($arr) { $sum = Sum($arr); $num = count($arr); return $sum/$num; } function Sum($arr) { return array_sum($arr); } 
+6
source share
2 answers

So, here is a PHP implementation of your algorithm that uses Dawkins caress to gradually reduce the error to the desired threshold.

 <?php function sqrMeanDeviation($array, $avg) { $sqrMeanDeviation = 0; for($i=0; $i<count($array); $i++) { $dev = $array[$i] - $avg; $sqrMeanDeviation += $dev * $dev; } return $sqrMeanDeviation; } // z values are non-0 an can value between [-abs_z_bound, abs_z_bound] function random_z_element($abs_z_bound = 1) { $a = (mt_rand() % (2*$abs_z_bound) ) - ($abs_z_bound-1); if($a <= 0) $a--; return $a; } // change za little function copy_z_weasel($old_array_z, $error_probability = 20 /*error possible is 1 in error_probability*/, $abs_z_bound = 1) { $new_z = array(); for($i = 0; $i < count($old_array_z); $i++) if(mt_rand() % $error_probability == 0 ) $new_z[$i] = random_z_element($abs_z_bound); else $new_z[$i] = $old_array_z[$i]; return $new_z; } function correlation_error($array_y, $array_x, $avg_x, $sqrMeanDeviation_x, $correlation) { // checking correlation $avg_y = array_sum($array_y)/count($array_y); $sqrMeanDeviation_y = 0; $covariance_xy = 0; for($i=0; $i<count($array_x); $i++) { $dev_y = $array_y[$i] - $avg_y; $sqrMeanDeviation_y += $dev_y * $dev_y; $dev_x = $array_x[$i] - $avg_x; $covariance_xy += $dev_y * $dev_x; } $correlation_xy = $covariance_xy/sqrt($sqrMeanDeviation_x*$sqrMeanDeviation_y); return abs($correlation_xy - $correlation); } function ySeries($array_x, $low_bound, $high_bound, $correlation, $threshold) { $array_y = array(); $avg_x = array_sum($array_x)/count($array_x); $sqrMeanDeviation_x = sqrMeanDeviation($array_x, $avg_x); // pre-compute beta $beta_x_sQMz = $sqrMeanDeviation_x * sqrt( 1 / ($correlation*$correlation) - 1 ); $best_array_z = array(); $n = 0; $error = $threshold + 1; while($error > $threshold) { ++$n; // generate z $array_z = array(); if(count($best_array_z) == 0) for($i=0; $i<count($array_x); $i++) $array_z[$i] = random_z_element(); else $array_z = copy_z_weasel($best_array_z); $sqm_z = sqrMeanDeviation($array_z, array_sum($array_z)/count($array_z) ); // this being 0 implies that for every beta correlation(x,y) = 1 so just give it any random beta if($sqm_z) $beta = $beta_x_sQMz / $sqm_z; else $beta = 10; // and now we have y for($i=0; $i<count($array_x); $i++) $array_y[$i] = $array_x[$i] + ($array_z[$i] * $beta); // now, change bounds (we could do this afterwards but we want precision and y to be integers) // rounding $min_y = $array_y[0]; $max_y = $array_y[0]; for( $i=1; $i<count($array_x); $i++ ) { if($array_y[$i] < $min_y) $min_y = $array_y[$i]; if($array_y[$i] > $max_y) $max_y = $array_y[$i]; } $range = ($high_bound - $low_bound) / ($max_y - $min_y); $shift = $low_bound - $min_y; for( $i=0; $i<count($array_x); $i++ ) $array_y[$i] = round($array_y[$i] * $range + $shift); // get the error $new_error = correlation_error($array_y, $array_x, $avg_x, $sqrMeanDeviation_x, $correlation); if($new_error < $error) { $best_array_z = $array_z; $error = $new_error; } } echo "Correlation ", $correlation, " approched within " , $new_error, " in ", $n ," iterations.\n"; return $array_y; } ?> 
+4
source

a simple approach, albeit very inefficient, would be to start with a random number in a given interval and try to add more numbers if they do not violate the correlation too much:

 function ySeries(array_x, boundaries, correlation) { array_y = [random(boundaries)] while (len(array_y) < len(array_x)) { do { y = random(boundaries) } while (Correlation(array_x, array_y + [y]) > correlation + epsilon) array_y.push(y) } } 

may work well if numbers are small

+3
source

Source: https://habr.com/ru/post/915737/


All Articles