How to use k-tools for time series data that have nans?

Question

How to use k-tools for time series data that have nans?

I have several time series entries that partially overlap and do not necessarily have the same start and end dates. Each row represents a different time series. I made them the same length to maintain the actual data collection time.

For example, at t (1,2,3,4,5,6):

Station 1: nan, nan, 2, 4, 5, 10 Station 2: nan, 1, 4, nan, 10, 8 Station 3: 1, 9, 4, 7, nan, nan

I am trying to run cluster analysis in Python to group stations with similar behavior where action time is important, so I can't just get rid of nans. (What do I know).

Any ideas?

+4

python numpy time-series cluster-analysis

user2748977 Sep 05 '13 at 2:17

source share

1 answer

Anony-mousse · Answer 1 · 2013-09-05T08:28:01+0000

K-tool is not the best algorithm for this kind of data.

The K-tool is designed to minimize dispersion within the cluster (= sum of squares, WCSS).

But how do you calculate the deviation from NaN? And how significant is the deviation here?

You can use instead

similarity measure designed for time series such as DTW, threshold distances, etc.
distance-based clustering algorithm. If you have only a few episodes, hierarchical clustering should be great.

How to use k-tools for time series data that have nans?

More articles: