Compare two arrays of points

Question

Compare two arrays of points

I am trying to find a way to find similarities in two arrays of different points. I drew circles around points that have similar patterns, and I would like to do some kind of automatic comparison at time intervals, say 100 points, and tell what the similarity coefficient is for this interval. As you can see, this may not be entirely correct, so point-to-point comparisons would not be a good solution either (I suppose). Patterns that are slightly offset may also mean that they match the pattern (but obviously with a lower coefficient)

What similarities can mean (1 coefficient - perfect match, 0 or less - does not match at all):

Points 640 to 660 are very similar (coefficient is ~ 0.8)
Points from 670 to 690 - Quite the same (coefficient is ~ 0.5- ~ 0.6)
Points 720-780 - Let's say they are very similar (the coefficient is ~ 0.5- ~ 0.6)
Points 790 - 810 - They are very similar (the coefficient is 1)

The coefficient is just my thoughts on how the final calculated result of the comparison function may look like with the data.

I read a lot of posts about SO, but it didn't seem to solve my problem. I would be grateful for your help. thank you

PS The ideal answer is one that provides pseudo-code for a function that can take two arrays of data as arguments (data intervals) and a similarity return coefficient.

Points to compare

Click here to view original image size.

+4

math algorithm pseudocode

Vytautas butkus Aug 7 '13 at 9:22

source share

4 answers

collapsar · Answer 1 · 2013-08-07T10:04:06+0000

I think the HighPerformanceMarks suggestion is the standard way to do the job.

a simple alternative measure from a computational point of view can be a point product.

splits both arrays into the same predefined indexes.
consider the elements of the array in each interval as vector coordinates in multidimensional space.
compute the point product of both vectors.

the point product will not be negative. if two vectors are perpendicular in their vector space, the point product will be 0 (in fact, as the “perpendicular” is usually defined in higher dimensions), and it will reach its maximum for identical vectors.

if you accept the geometric concept of perpendicularity as a measure of similarity (dis), here you go.

nuance: this is a special heuristic selected for computational efficiency. I cannot tell you about the mathematical / statistical properties of the process and the properties of separation. If you need a thorough analysis, however, you will probably better agree with the correlation theory and may need to forward your question to math.stackexchange.com .

Aaron · Answer 2 · 2013-08-07T19:38:24+0000

I also believe that the High Performance Mark basically gave you the answer (cross-correlation). In my opinion, most of the other answers give you only half of what you need (i.e., Product-point plus comparison with some threshold). However, this will not consider the signal to be like a shifted version of itself. You will want to calculate this point product N + M - 1 time, where N, M are the sizes of the arrays. For each iteration, calculate the product of the points between array 1 and the shifted version of array 2. The amount that you offset array 2 increases by one iteration. You can imagine array 2 as a window that you pass over array 1. You need to start a loop with the last element of array 2, overlapping only the first element in array 1.

This loop will generate numbers for different shift amounts, and what you do with that number is up to you. Maybe you compare it (or its absolute value) with a threshold value that you define in order to consider two “similar” signals.

Finally, in many contexts, a signal is considered to be similar to a scaled (in terms of amplitude, not time) version by itself, so there must be a normalization step before calculating cross-correlation. This is usually done by scaling the elements of the array, so the point product itself is 1. Just be careful that this makes sense for your application numerically, i.e. integers don't scale very well to values from 0 to 1 :-)

Bhargav ponnapalli · Answer 3 · 2013-08-07T09:32:57+0000

My attempt:

Total_sum=0 1. For each index i in the range (m,n) 2. sum=0 3. k=Array1[i]*Array2[i]; t1=magnitude(Array1[i]); t2=magnitude(Array2[i]); 4. k=k/(t1*t2) 5. sum=sum+k 6. Total_sum=Total_sum+sum Coefficient=Total_sum/(mn)

If all values are equal, then the sum will return 1 in each case, and total_sum will return (mn) * (1). Therefore, when the same thing is divided by (mn), we get a value of 1. If the graphs are exact opposites, we get -1, and for other options, a value from -1 to 1 is returned.

This is not so effective when the y range or x range is huge. But I just wanted to give you an idea.

Another option is to run an extensive xnor.

 1. For each index i in the range (m,n) 2. sum=1 3. k=Array1[i] xnor Array2[i]; 4. k=k/((pow(2,number_of_bits))-1) //This will scale k down to a value between 0 and 1 5. sum=(sum+k)/2 Coefficient=sum

This is useful?

Antti huima · Answer 4 · 2013-08-07T10:00:47+0000

You can define a distance metric for two vectors A and B of length N containing numbers in the interval [-1, 1], for example. a

  sum = 0 for i in 0 to 99: d = (A[i] - B[i])^2 // this is in range 0 .. 4 sum = (sum / 4) / N // now in range 0 .. 1

Now this returns a distance of 1 for vectors that are completely opposite (one is all 1, the other is -1) and 0 for identical vectors.

You can translate this into your ratio by

  coeff = 1 - sum

However, this is a crude approach because it does not take into account the fact that there may be horizontal distortions or a shift between the signals you want to compare, so let's look at some approaches to dealing with this.

You can sort both arrays (for example, in ascending order), and then calculate the distance / coefficient. This returns more similarities than the original metric, and is agnostic for permutations / shifts of the signal.

You can also calculate the differentials and calculate the distance / coefficient for them, and then you can do it sorted as well. The use of differentials has the advantage of eliminating vertical shifts. Sorted differentials eliminate horizontal shift, but still recognize different shapes better than sorted data source points.

You can then, for example, average odds. Here's a more complete code. In the procedure below, the coefficient for arrays A and B of a given size is calculated and first d different differentials are taken (recursively). If sorted correctly, the final (differentiated) array is sorted.

 procedure calc(A, B, size, d, sorted): if (d > 0): A' = new array[size - 1] B' = new array[size - 1] for i in 0 to size - 2: A'[i] = (A[i + 1] - A[i]) / 2 // keep in range -1..1 by dividing by 2 B'[i] = (B[i + 1] - B[i]) / 2 return calc(A', B', size - 1, d - 1, sorted) else: if (sorted): A = sort(A) B = sort(B) sum = 0 for i in 0 to size - 1: sum = sum + (A[i] - B[i]) * (A[i] - B[i]) sum = (sum / 4) / size return 1 - sum // return the coefficient procedure similarity(A, B, size): sum a = 0 a = a + calc(A, B, size, 0, false) a = a + calc(A, B, size, 0, true) a = a + calc(A, B, size, 1, false) a = a + calc(A, B, size, 1, true) return a / 4 // take average

For something completely different, you can also start the Fourier transform using FFT, and then take the distance metric over the returning spectra.

Compare two arrays of points

More articles: