Using Apache Spark 2.2: Structured Streaming, I create a program that reads data from Kafka and writes it to Hive. I am looking for writing voluminous data included in the Kafka topic @ 100 records / sec.
Hive table created:
CREATE TABLE demo_user( timeaa BIGINT, numberbb INT, decimalcc DOUBLE, stringdd STRING, booleanee BOOLEAN ) STORED AS ORC ;
Paste with a request for a manual bush:
INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true);
Paste through spark structured stream code:
SparkConf conf = new SparkConf();
conf.setAppName("testing");
conf.setMaster("local[2]");
conf.set("hive.metastore.uris", "thrift://localhost:9083");
SparkSession session =
SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();
String insertQuery = "INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true)";
session.sql(insertQuery);
Dataset<Row> dataset = readFromKafka(sparkSession);
source
share