How to make a prediction using the Sklearn model inside Spark?

Question

How to make a prediction using the Sklearn model inside Spark?

I prepared the model in python using sklearn. How can we use the same model to load into Spark and generate predictions on spark RDD?

+4

python scikit-learn apache-spark pyspark apache-spark-mllib

Tanveer Mar 19 '17 at 14:15

source share

1 answer

Thiago Baldim · Accepted Answer · 2017-03-19T14:30:09+0000

Well,

I will show an example of linear regression in Sklearn and show you how to use this to predict elements in Spark RDD.

First prepare the model using the sklearn example:

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

Here we have a fit, and you need to predict all the data from the RDD.

Your RDD in this case should be RDD with X as follows:

rdd = sc.parallelize([1, 2, 3, 4])

So, you first need to translate your sklearn model:

regr_bc = self.sc.broadcast(regr)

:

rdd.map(lambda x: (x, regr_bc.value.predict(x))).collect()

, RDD - X, seccond - Y. :

[(1, 2), (2, 4), (3, 6), ...]

How to make a prediction using the Sklearn model inside Spark?

More articles: