How to normalize a sequence of numbers?

I am working on a user behavior project. Based on user interaction, I have some data. There is a good sequence that gradually increases and decreases over time. But there are small discrepancies that are very bad. See the chart below:

Plotted sequence

You can also find data here:

2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2, 42068 2,43947 2,45099 2,46564 2,48385 2,49747 2,49031 2,51458 2,5149 2,52632 2,54689 2,56077 2,57821 2,57877 2,59104 2,57625 2,55987 2 5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.34415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1, 83,539 1.80641 1.79946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1 52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.6095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.85942 1.87731 1.89895 1.91676 1.92987

I would like to smooth out this sequence. The technique should be able to exclude numbers with the characteristic X and Y, that is, the error in mono-increasing or mono-decreasing.

If not eliminated, the technique should be able to shift them so that the series are not affected by errors.

What I tried and failed:

  • I tried to check the difference between the values. In some special cases, it works, but for the sequence presented in this, the distance between the numbers is not such that I can cut out the errors

  • I tried to apply the counter, which is some X, then only the change is accepted, otherwise the point is displayed only at the previous point. Here I have big problems with solving the issue of the value of X, because it is based on user interaction, I really do not control it. If the user interaction is such that its graph is zigzag, I get a message about the lack of data on the user's movement in all cases.

Tell us about the methods that you know about.

PS: The data available in this example is a special case. There is no typical pattern in which numbers will occur, but we expect some range to be continuous with all examples. The solution I'm looking for is a general one.

+5
source share
2 answers

Since you cannot make a decision about the frequency turned off, and not even on the filter you want to use, I would do a few and let the user set the parameters.

The first thing I thought about is that it works on average , and you can see that there are so many things to set, get different outputs.

+1
source

I don’t know how much effort you want to get involved in this problem, but if you want theoretical guarantees, Topological perseverance seems to be well adapted to your imho problem. Basically, using this method, you can filter the local maximum / minimum, fixing the scale and there is theoretical evidence that says that if you are next to your function, you extract the correct number of maxima with preservation. You can see these slides (mainly pages 7-9 to get an idea) to get an idea about this method.

Basically, if you take your points as a landscape and imagine a watershed, starting from maximum height and decreasing, you have a choice. Each peak has a time in which it was born, and this is the time when it appears, and the time when it dies, when it merges with a higher choice. Now the persistence diagram draws a point for each choice, where its x / y coordinates are the time of birth / death (under the assumption that the first choice does not die and is not shown). If the choice is global maximum, then it will be further from the diagonal in the persistence diagram than the local maximum choice. To remove local maxima, you need to remove the picks next to the diagonal. In your example, there are four local maxima, as you can see in the persistence diagram of your data (thanks for providing the btw data) and two global ones (the first choice does not appear on the persistence diagram): Persistence diagram of your function

If you interfere with your data: enter image description here

You will still get a very decent persistence diagram that will allow you to filter the local maximum as you want:

enter image description here

Please ask if you would like more information or links.

+1
source

Source: https://habr.com/ru/post/1208795/


All Articles