How to make a prediction using the Sklearn model inside Spark?

I prepared the model in python using sklearn. How can we use the same model to load into Spark and generate predictions on spark RDD?

+4
source share
1 answer

Well,

I will show an example of linear regression in Sklearn and show you how to use this to predict elements in Spark RDD.

First prepare the model using the sklearn example:

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

Here we have a fit, and you need to predict all the data from the RDD.

Your RDD in this case should be RDD with X as follows:

rdd = sc.parallelize([1, 2, 3, 4])

So, you first need to translate your sklearn model:

regr_bc = self.sc.broadcast(regr)

:

rdd.map(lambda x: (x, regr_bc.value.predict(x))).collect()

, RDD - X, seccond - Y. :

[(1, 2), (2, 4), (3, 6), ...]
+5

Source: https://habr.com/ru/post/1672672/


All Articles