How to convert Spark Row datasets to string?

I wrote code to access a Hive table using SparkSQL. Here is the code:

SparkSession spark = SparkSession
        .builder()
        .appName("Java Spark Hive Example")
        .master("local[*]")
        .config("hive.metastore.uris", "thrift://localhost:9083")
        .enableHiveSupport()
        .getOrCreate();
Dataset<Row> df =  spark.sql("select survey_response_value from health").toDF();
df.show();

I would like to know how can I convert the full output to an array of String or String? Since I'm trying to work with another module, where only I can pass String or String values ​​of type Array.
I tried other methods like .toStringor typecast for String values. But it didn’t work for me.
Please let me know how can I convert DataSet values ​​to String?

+4
source share
2 answers

Here is sample code in Java.

public class SparkSample {
    public static void main(String[] args) {
        SparkSession spark = SparkSession
            .builder()
            .appName("SparkSample")
            .master("local[*]")
            .getOrCreate();
    //create df
    List<String> myList = Arrays.asList("one", "two", "three", "four", "five");
    Dataset<Row> df = spark.createDataset(myList, Encoders.STRING()).toDF();
    df.show();
    //using df.as
    List<String> listOne = df.as(Encoders.STRING()).collectAsList();
    System.out.println(listOne);
    //using df.map
    List<String> listTwo = df.map(row -> row.mkString(), Encoders.STRING()).collectAsList();
    System.out.println(listTwo);
  }
}

"row" - java 8 lambda. , developer.com/java/start-using-java-lambda-expressions.html

+7

map , :

df.map(row => row.mkString())

mkString , ,

collect

val strings = df.map(row => row.mkString()).collect

( Scala, , Java )

+4

Source: https://habr.com/ru/post/1670522/


All Articles