Parse Dataset Json Column for Dataset <Row>

Question

Parse Dataset Json Column for Dataset <Row>

Having a Dataset<Row> one json row column:

 +--------------------+ | value| +--------------------+ |{"Context":"00AA0...| +--------------------+

Json example:

 {"Context":"00AA00AA","MessageType":"1010","Module":"1200"}

How can I most effectively get a Dataset<Row> that looks like this:

 +--------+-----------+------+ | Context|MessageType|Module| +--------+-----------+------+ |00AA00AA| 1010| 1200| +--------+-----------+------+

I process this data in a stream, I know that a spark can do it myself when I read it from a file:

 spark .readStream() .schema(MyPojo.getSchema()) .json("src/myinput")

but now i am reading data from kafka and it gives me data in a different form. I know I can use some parsers like Gson, but I would like the spark to do this for me.

+5

java json apache-spark apache-spark-2.0

Martin Brišiak Nov 22 '16 at 9:32

source share

1 answer

abaghel · Answer 1 · 2016-11-22T10:44:34+0000

Try this sample.

 public class SparkJSONValueDataset { public static void main(String[] args) { SparkSession spark = SparkSession .builder() .appName("SparkJSONValueDataset") .config("spark.sql.warehouse.dir", "/file:C:/temp") .master("local") .getOrCreate(); //Prepare data Dataset<Row> List<String> data = Arrays.asList("{\"Context\":\"00AA00AA\",\"MessageType\":\"1010\",\"Module\":\"1200\"}"); Dataset<Row> df = spark.createDataset(data, Encoders.STRING()).toDF().withColumnRenamed("_1", "value"); df.show(); //convert to Dataset<String> and Read Dataset<String> df1 = df.as(Encoders.STRING()); Dataset<Row> df2 = spark.read().json(df1.javaRDD()); df2.show(); spark.stop(); } }

Parse Dataset Json Column for Dataset <Row>

More articles: