I have the following data structure:
id: intrecords: Seq[String]other: boolean
In the json file, to facilitate testing:
var data = sc.makeRDD(Seq[String](
"{\"id\":1, \"records\": [\"one\", \"two\", \"three\"], \"other\": true}",
"{\"id\": 2, \"records\": [\"two\"], \"other\": true}",
"{\"id\": 3, \"records\": [\"one\"], \"other\": false }"))
sqlContext.jsonRDD(data).registerTempTable("temp")
And I would like to filter out only records with onein field recordsand otherequal trueusing only SQL.
I can do this using filter(see below), but is it possible to do this only using SQL?
sqlContext
.sql("select id, records from temp where other = true")
.rdd.filter(t => t.getAs[Seq[String]]("records").contains("one"))
.collect()
source
share