Convert data to libsvm format

I have a dataframe as a result of sql query

df1 = sqlContext.sql("select * from table_test")

I need to convert this framework to libsvm format so that it can be presented as input for

pyspark.ml.classification.LogisticRegression

I tried to do the following. However, this led to the following error, since I am using spark 1.5.2

df1.write.format("libsvm").save("data/foo")
Failed to load class for data source: libsvm

Instead, I wanted to use MLUtils.loadLibSVMFile. I am behind a firewall and cannot install it directly. So I downloaded the file, scp-ed, and then manually installed it. Everything seemed to be working fine, but I still get the following error:

import org.apache.spark.mllib.util.MLUtils
No module named org.apache.spark.mllib.util.MLUtils

Question 1: My approach above is to convert the data format to the libsvm format in the right direction. Question 2: If yes to question 1, how to get MLUtils to work. If not, what is the best way to convert data to libsvm format

+4
2

( , , df1, ):

libsvm:

# ... your previous imports

from pyspark.mllib.util import MLUtils
from pyspark.mllib.regression import LabeledPoint

# A DATAFRAME
>>> df.show()
+---+---+---+
| _1| _2| _3|
+---+---+---+
|  1|  3|  6|  
|  4|  5| 20|
|  7|  8|  8|
+---+---+---+

# FROM DATAFRAME TO RDD
>>> c = df.rdd # this command will convert your dataframe in a RDD
>>> print (c.take(3))
[Row(_1=1, _2=3, _3=6), Row(_1=4, _2=5, _3=20), Row(_1=7, _2=8, _3=8)]

# FROM RDD OF TUPLE TO A RDD OF LABELEDPOINT
>>> d = c.map(lambda line: LabeledPoint(line[0],[line[1:]])) # arbitrary mapping, it just an example
>>> print (d.take(3))
[LabeledPoint(1.0, [3.0,6.0]), LabeledPoint(4.0, [5.0,20.0]), LabeledPoint(7.0, [8.0,8.0])]

# SAVE AS LIBSVM
>>> MLUtils.saveAsLibSVMFile(d, "/your/Path/nameFolder/")

"/your/Path/nameFolder/part-0000 *":

1,0 1: 3,0 2: 6,0

4,0 1: 5,0 2: 20,0

7,0 1: 8,0 2: 8,0

LabeledPoint

+5

,

D.map(lambda line: LabeledPoint(line[0],[line[1],line[2]]))
+1

Source: https://habr.com/ru/post/1676939/


All Articles