I use Spark Streaming to read data from Kinesis using Structured Streaming structure, my connection is as follows
val kinesis = spark
.readStream
.format("kinesis")
.option("streams", streamName)
.option("endpointUrl", endpointUrl)
.option("initialPositionInStream", "earliest")
.option("format", "json")
.schema(<my-schema>)
.load
The data comes from several IoT devices that have a unique identifier, I need to aggregate the data by this identifier and an oval window above the timestamp field, as shown below:
val aggregateData = kinesis
.groupBy($"uid", window($"timestamp", "15 minute", "15 minute"))
.agg(...)
The problem I am facing is that I have to ensure that every window starts in round-the-clock (e.g. 00:00:00, 00:15:00, etc.), I also need a guarantee that only lines containing full 15-minute windows will be output to my receiver, what I am doing now is
val query = aggregateData
.writeStream
.foreach(postgreSQLWriter)
.outputMode("update")
.start()
.awaitTermination()
ths postgreSQLWriter - StreamWriter, PostgreSQL SGBD. 15 , - 15- ?