How to save / insert each DStream into a persistent table

Question

How to save / insert each DStream into a persistent table

I had a problem with "Spark Streaming" about injecting a Dstream output stream into a persistent SQL table. I would like to insert each DStream result (coming from one batch that has spark processes) into a unique table. I used Python with version 1.6.2 of Spark.

In this part of my code, I have a Dstream, consisting of one or more RDDs, which I would like to constantly insert / store in the SQL table without losing the result for each batch processed.

rr = feature_and_label.join(result_zipped)\
                      .map(lambda x: (x[1][0][0], x[1][1]) )

Each Dstream is represented here, for example, as this tuple: (4.0, 0). I can’t use SparkSQL, since Spark treats the “table”, that is, as a temporary table, therefore it loses the result in each batch.

This is an example output:

Time: 2016-09-23 00:57:00

(0.0, 2)

: 2016-09-23 00:57:01

(4.0, 0)

: 2016-09-23 00:57:02

(4.0, 0)

...

, Dstream. , , -, , , . : ?
, - , , . .

+4

apache-spark pyspark apache-spark-sql spark-dataframe spark-streaming

Davide Nardone 22 . '16 23:22

2

.

0

hadoop data scientist 27 . '17 7:01

plambre · Answer 1 · 2016-09-28T17:23:32+0000

Vanilla Spark , , HDFS ( Spark 2.0). - Spark Database Ecosystem. . - . :

, , Spark

, SQL, Integrated

SnappyData

, SQL,

MemSQL
Hana
Kudu
FiloDB
DB2
SQLServer (JDBC)
Oracle (JDBC)
MySQL (JDBC)

How to save / insert each DStream into a persistent table

Time: 2016-09-23 00:57:00

: 2016-09-23 00:57:01

: 2016-09-23 00:57:02

, , Spark

, SQL, Integrated

, SQL,

, NoSQL,

, ,

, ,

, ,

, SQL,

, NoSQL,

, ,

, ,

Datawarehouse, SQL, Connector

More articles: