Parse Dataset Json Column for Dataset <Row>

Having a Dataset<Row> one json row column:

 +--------------------+ | value| +--------------------+ |{"Context":"00AA0...| +--------------------+ 

Json example:

 {"Context":"00AA00AA","MessageType":"1010","Module":"1200"} 

How can I most effectively get a Dataset<Row> that looks like this:

 +--------+-----------+------+ | Context|MessageType|Module| +--------+-----------+------+ |00AA00AA| 1010| 1200| +--------+-----------+------+ 

I process this data in a stream, I know that a spark can do it myself when I read it from a file:

 spark .readStream() .schema(MyPojo.getSchema()) .json("src/myinput") 

but now i am reading data from kafka and it gives me data in a different form. I know I can use some parsers like Gson, but I would like the spark to do this for me.

+5
source share
1 answer

Try this sample.

 public class SparkJSONValueDataset { public static void main(String[] args) { SparkSession spark = SparkSession .builder() .appName("SparkJSONValueDataset") .config("spark.sql.warehouse.dir", "/file:C:/temp") .master("local") .getOrCreate(); //Prepare data Dataset<Row> List<String> data = Arrays.asList("{\"Context\":\"00AA00AA\",\"MessageType\":\"1010\",\"Module\":\"1200\"}"); Dataset<Row> df = spark.createDataset(data, Encoders.STRING()).toDF().withColumnRenamed("_1", "value"); df.show(); //convert to Dataset<String> and Read Dataset<String> df1 = df.as(Encoders.STRING()); Dataset<Row> df2 = spark.read().json(df1.javaRDD()); df2.show(); spark.stop(); } } 
+1
source

Source: https://habr.com/ru/post/1260127/


All Articles