How to replace DataField values โ€‹โ€‹with exact column names in a Spark-MLlib PMML file?

I am using Spark 2.1.0.

I am trying to export a Spark-MLlib linear regression model as a PMML file. I have also successfully exported the PMML file. But in this file I could not see any field name in it. All I see is this:

enter image description here

Can someone tell me what is the reason? Also, please let me know how to get the column names instead.

+5
source share
1 answer

There are two approaches to exporting Apache Spark models to the PMML data format. First, when working at the Spark ML abstraction level, you can use the JPMML-SparkML library . Secondly, when working at the Spark MLlib abstraction level, which seems to be here, you can use the built-in PMMLExportable attribute.

JPMML-SparkML extracts column names from a Spark ML data DataFrame#schema() through DataFrame#schema() . Unfortunately, there is no such option for Spark MLlib, so the function names "field_ {n}" and the label name "target" are just dummy hard-named names.

It is fairly easy to rename fields in a PMML document using the JPMML-Model library:

 pmmlExportable.toPMML("/tmp/raw-pmml-file") org.dmg.pmml.PMML pmml = org.jpmml.model.JAXBUtil.unmarshal("/tmp/raw-pmml-file"); org.jpmml.model.visitors.FieldRenamer targetRenamer = new FieldRenamer(FieldName.create("target"), FieldRenamer.create("y")); targetRenamer.applyTo(pmml); org.jpmml.model.JAXBUtil.marshal(pmml, "/tmp/final-pmml-file"); 

If you marshal this instance of the PMML object into a PMML file, you will see that the "target" field (and all its links) has been renamed to "y". Repeat the procedure with the functions.

+1
source

Source: https://habr.com/ru/post/1268303/


All Articles