Well,
I will show an example of linear regression in Sklearn and show you how to use this to predict elements in Spark RDD.
First prepare the model using the sklearn example:
regr = linear_model.LinearRegression()
regr.fit(diabetes_X_train, diabetes_y_train)
Here we have a fit, and you need to predict all the data from the RDD.
Your RDD in this case should be RDD with X as follows:
rdd = sc.parallelize([1, 2, 3, 4])
So, you first need to translate your sklearn model:
regr_bc = self.sc.broadcast(regr)
:
rdd.map(lambda x: (x, regr_bc.value.predict(x))).collect()
, RDD - X, seccond - Y. :
[(1, 2), (2, 4), (3, 6), ...]