Apache Structured Streaming: how to write a stream dataset in Hive?

Question

Apache Structured Streaming: how to write a stream dataset in Hive?

Using Apache Spark 2.2: Structured Streaming, I create a program that reads data from Kafka and writes it to Hive. I am looking for writing voluminous data included in the Kafka topic @ 100 records / sec.

Hive table created:

CREATE TABLE demo_user( timeaa BIGINT, numberbb INT, decimalcc DOUBLE, stringdd STRING, booleanee BOOLEAN ) STORED AS ORC ;

Paste with a request for a manual bush:

INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true);

Paste through spark structured stream code:

SparkConf conf = new SparkConf();
conf.setAppName("testing");
conf.setMaster("local[2]");
conf.set("hive.metastore.uris", "thrift://localhost:9083");
SparkSession session = 
SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();

// workaround START: code to insert static data into hive
String insertQuery = "INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true)";
session.sql(insertQuery);
// workaround END:

// Solution START
Dataset<Row> dataset = readFromKafka(sparkSession); // private method reading data from Kafka 'xyz' topic

// **My question here:**
// some code which writes dataset into hive table demo_user
// Solution END

+6

hive apache-spark apache-spark-sql spark streaming

Pravin agrawal Jan 15 '18 at 11:10

source share

2 answers

Kiran Balakrishnan · Answer 1 · 2018-01-21T07:49:31+0000

you do not need to create a hive table when using the following, it is automatically created

dataset.write.jdbc (String url, String table, java.util.Properties connectionProperties)

or use

dataset.write.saveAsTable (String tableName)

pbamba · Answer 2 · 2018-12-09T00:41:31+0000

, , , https://github.com/jerryshao/spark-hive-streaming-sink. , , https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+. , , , .

Apache Structured Streaming: how to write a stream dataset in Hive?

More articles: