How to determine if a change in time series data has changed?

I have a set of news articles for which there are statistics, for example: the number of posts on Twitter that mention an article for a range of days. The natural behavior of statistical values ​​is that the number of new posts grows rapidly and then decreases as the news grows.

I would like to know how to calculate the number of days after which the changes in the statistics are no longer significant (for example: <0.1% of the total number of messages) for the entire data set with a certain level of confidence.

Could you give some clues where to look for information and methods? I would appreciate a sample code in Python :)

+4
source share
1 answer

This question is really about time series analysis . Since you are interested in determining a cut-off point, a good place to start would be to read the Control Charts . If you want to delve deeper into statistics (outside the control charts), take a look at Change Point Analysis and also see Structural Changes in Time-Series.

Python Modules To perform this analysis in Python modules, NumPy and pandas modules are important. This statalgo post will give you the correct path in terms of Python code. (If you use R for analysis, consider the CRAN tseries and strucchange packages .)

Relavant Question in SE (stats) : How to detect a change in time series data?

Real life example . After the death of Osama bin Laden, there was a lot of analysis performed on how this news spread on Twitter. The article even has a section specifically related to your question about stopping the distribution of news.

Finally, you can also ask for it on the Stats SE website.

Hope this helps.

+8
source

Source: https://habr.com/ru/post/1393477/


All Articles