SWAB segmentation algorithm for time series data

I’m trying to figure out how to do segmentation using a dataset of time series (daily stock prices, temperatures, etc.) and came across a book explaining how to make a SWAB (sliding window and bottom to top) segmentation algorithm, but I don’t I totally understand. This segmentation is part of the sound processing algorithm. The following text is presented in the section “Multimedia Intelligent Data and Analytics: Disruptive Innovation”.

Segmentation The SWAB algorithm receives four parameters — the input file (time series data), the output file (segmented data), the maximum error, and the indication of nominal attributes. After starting a series of experiments on time series of different sizes with different values ​​for the number of segments, we selected the corresponding default segment value as follows. 25-50% of the time series for time series with less than 100 observations, 20-35% for time series with 100-200 observations and 15-25% for time series with more than 200 observations. If the user is not interested in using the default value for any reason, he can enter his number of segments as a parameter to the algorithm. Starting from the default values ​​for the minimum and maximum error,we run the segmentation algorithm for the first time and get the minimum number of segments for a given time series (the higher the maximum error, the fewer segments will be found). Then we reduce the maximum error (and, therefore, increase the number of segments found) trying to narrow the upper and lower boundaries of the error by dividing the base by degrees 2 (as in binary search). Each time after starting the segmentation algorithm with the current maximum error, we check that the value gives the best approximation for the optimal number of segments, as well as the best upper or lower limit for the optimal maximum error. If so, we match this value. Initially, only the upper bound is affected. However, as soon as we discovered a lower border that provides more segments than Optimal,we continue to search for the optimal number of segments in smaller steps: the next maximum error is the average between the current upper and lower boundaries. As follows from our experience with many time series databases, the optimal maximum error is usually within 3-4 iterations. The convergence rate depends on the time series database itself. If the algorithm does not converge within 20 iterations, we stop the search and perform the following steps of processing the sound using the segments found at the 20th iteration.The convergence rate depends on the time series database itself. If the algorithm does not converge within 20 iterations, we stop the search and perform the following steps of processing the sound using the segments found at the 20th iteration.The convergence rate depends on the time series database itself. If the algorithm does not converge within 20 iterations, we stop the search and perform the following steps of processing the sound using the segments found at the 20th iteration.

, , 150 ( 20-35% ), , , ?

, .

+4
1

:

, , . - , -1 . i, .

, . , .

//function takes a set of points T and a max error
function Sliding_Window(T, max_error)
  anchor = 1;
  while (not finished segmenting time series) {
    i=2;

    //keep making subsets of progressively larger size
    //until the error of the subset is greater than the max error
    //t[anchor: anchor + i] represents the elements of the set
    //from index (anchor) to index (anchor + i)
    //this could be an implemented as an array
    while (calculate_error(T[anchor: anchor+i]) < max_error) { 
      i=i+1;
    }

    //add the newly found segment to the set of all segments
    Seg_TS = concat(Seg_TS, create_segment(T[anchor: anchor + (i-1)]);

    //and increment the anchor in preparation for creating a new segment
    anchor = anchor + i;
  }
}

""

, , , , - "" . :

. , , . , , , . - , .

, , "". , , - . , , , - , .

: https://en.wikipedia.org/wiki/Residual_sum_of_squares

, :

function calculateSegmentErrorUsingSumOfSquares() {
  int sum = 0;
  for each (point in set_approximated_by_segment) {
    int difference = point.y_coordinate - approximation_segment.y_at_x(point.x_coordinate)
    sum = sum + (difference * difference)
  }
  return sum
}

, , , . . , : , , , , .

https://www.cs.rutgers.edu/~pazzani/Publications/survey.pdf

+2

Source: https://habr.com/ru/post/1656631/


All Articles