I have a dataframe as a result of sql query
df1 = sqlContext.sql("select * from table_test")
I need to convert this framework to libsvm format so that it can be presented as input for
pyspark.ml.classification.LogisticRegression
I tried to do the following. However, this led to the following error, since I am using spark 1.5.2
df1.write.format("libsvm").save("data/foo")
Failed to load class for data source: libsvm
Instead, I wanted to use MLUtils.loadLibSVMFile. I am behind a firewall and cannot install it directly. So I downloaded the file, scp-ed, and then manually installed it. Everything seemed to be working fine, but I still get the following error:
import org.apache.spark.mllib.util.MLUtils
No module named org.apache.spark.mllib.util.MLUtils
Question 1: My approach above is to convert the data format to the libsvm format in the right direction. Question 2: If yes to question 1, how to get MLUtils to work. If not, what is the best way to convert data to libsvm format