I'm new to scala sparks and apologize for the stupid question (if so). I am stuck in a problem that I simplified as shown below:
There is a data frame with three columns, "machine identifier" is the identifier of the machine. "startTime" is the timestamp of the start of the task. "endTime" is the timestamp of the end of the task.
My goal is to count the number of downtime intervals for each machine.
For example, the photographs in the table below, the 1st and 2nd rows show that machine No. 1 starts at time 0 and ends at time 3 and starts again at time 4, so the time interval [3, 4] is idle. For the 3rd and 4th rows, machine No. 1 starts at time 10 and ends at the 20th time and starts again immediately, so there is no downtime.
machineID, startTime, endTime
1, 0, 3
1, 4, 8
1, 10, 20
1, 20, 31
...
1, 412, 578
...
2, 231, 311
2, 781, 790
...
The data frame was already groupBy ("machineID").
I am using spark 2.0.1 and scala 2.11.8
source
share