Can I extract meaningful values for logistic regression coefficients in pyspark

Question

Can I extract meaningful values for logistic regression coefficients in pyspark

Is there a way to get the significance level of each coefficient that we get after we select a logistic regression model for the training data?

I tried to find a way and could not understand myself.

I think I can get the significance level of each function if I run chi sq test, but first of all I'm not sure if I can run a test on all functions together, and secondly, I have numerical data, so if it is will give me the right result or not, it remains a question.

Now I am running the model part using statsmodel and scikit learn, but of course I want to know how to get these results from pySparl ML or MLLib

If someone can shed light, it will be helpful

+5

machine-learning apache-spark pyspark logistic-regression

Cartman Dec 05 '16 at 18:13

source share

1 answer

Rachid Ait Abdesselam · Accepted Answer · 2016-12-23T09:24:11+0000

I use only mllib, I think that when you train the model, you can use the toPMML method to export your un PMML format (xml file), then you can parse the XML file to get the function scales, here is an example

https://spark.apache.org/docs/2.0.2/mllib-pmml-model-export.html

Hope that helps

Can I extract meaningful values ​​for logistic regression coefficients in pyspark

More articles:

Can I extract meaningful values for logistic regression coefficients in pyspark