I am using pyspark 2.0 to create a DataFrame object while reading csv using:
data = spark.read.csv('data.csv', header=True)
I find the data type using
type(data)
Result
pyspark.sql.dataframe.DataFrame
I am trying to convert some data columns to LabeledPoint in order to apply the classification.
from pyspark.sql.types import *
from pyspark.sql.functions import loc
from pyspark.mllib.regression import LabeledPoint
data.select(['label','features']).
map(lambda row:LabeledPoint(row.label, row.features))
I ran into this problem:
AttributeError: 'DataFrame' object has no attribute 'map'
Any idea of a bug? Is there a way to generate LabelPoint from a DataFrame to do the classification?
source
share