Iterate over org.apache.spark.sql.Row

Question

Iterate over org.apache.spark.sql.Row

I use the Spark shell (1.3.1), which is the Scala shell. A simplified situation requiring iteration on Rowlooks something like this:

import org.apache.commons.lang.StringEscapeUtils

var result = sqlContext.sql("....")
var rows = result.collect() // Array[org.apache.spark.sql.Row]
var row = rows(0) // org.apache.spark.sql.Row
var line = row.map(cell => StringEscapeUtils.escapeCsv(cell)).mkString(",")
// error: value map is not a member of org.apache.spark.sql.Row
println(line)

My problem is that it Rowdoes not mapand - as far as I know - it cannot be converted to Arrayor List, therefore, I cannot avoid every cell using this style. I could write a loop using an index variable, but that would be inconvenient. I would like to iterate over cells in a situation like this:

result.collect().map(row => row.map(cell => StringEscapeUtils.escapeCsv(cell)).mkString(",")).mkString("\n")

(As a rule, these are not very large results; they can enter the client memory many times.)

Is there a way to iterate in cells Row? Is there any syntax to place an index based loop in place row.map(...)in the last snippet?

+4

scala apache-spark

Notinlist 20 '15 15:11

1

Chris Husband · Answer 1 · 2015-05-20T20:45:22+0000

toSeq() , . toSeq ,

Iterate over org.apache.spark.sql.Row

More articles: