I created rdd from the csv file, and the first line is the header line in this csv file. Now I want to create a dataframe from this rdd and save the column from the 1st rdd element.
The problem is that I can create a dataframe with a column from rdd.first () too, but the created dataframe has the first row as the headers themselves. How to remove this?
lines = sc.textFile('/path/data.csv')
rdd = lines.map(lambda x: x.split('
df = rdd.toDF(rdd.first())
df.show()
mailid age address
satya 23 Mumbai
abc 27 Goa
Avoiding that the first element moves to the dataframe data. Can I give any option in rdd.toDF (rdd.first ()) to do this?
Note. I cannot compile rdd to create a list, and then remove the first element from this list, and then again arrange this list back into the rdd form, and then toDF () ...
Please offer !!! thank