I had a problem with "Spark Streaming" about injecting a Dstream output stream into a persistent SQL table. I would like to insert each DStream result (coming from one batch that has spark processes) into a unique table. I used Python with version 1.6.2 of Spark.
In this part of my code, I have a Dstream, consisting of one or more RDDs, which I would like to constantly insert / store in the SQL table without losing the result for each batch processed.
rr = feature_and_label.join(result_zipped)\
.map(lambda x: (x[1][0][0], x[1][1]) )
Each Dstream is represented here, for example, as this tuple: (4.0, 0). I canβt use SparkSQL, since Spark treats the βtableβ, that is, as a temporary table, therefore it loses the result in each batch.
This is an example output:
Time: 2016-09-23 00:57:00
(0.0, 2)
: 2016-09-23 00:57:01
(4.0, 0)
: 2016-09-23 00:57:02
(4.0, 0)
...
, Dstream. , , -, , , . :
?
, - , , .
.