if you are working with spark version 1.6, then use this code to convert rdd to df
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame(rdd)
if you want to assign a title to the lines use this
df= rdd.map(lambda p: Row(ip=p[0], time=p[1], zone=p[2]))
ip, time, zone are the line headers in this example.
source
share