There are two approaches to exporting Apache Spark models to the PMML data format. First, when working at the Spark ML abstraction level, you can use the JPMML-SparkML library . Secondly, when working at the Spark MLlib abstraction level, which seems to be here, you can use the built-in PMMLExportable attribute.
JPMML-SparkML extracts column names from a Spark ML data DataFrame#schema() through DataFrame#schema() . Unfortunately, there is no such option for Spark MLlib, so the function names "field_ {n}" and the label name "target" are just dummy hard-named names.
It is fairly easy to rename fields in a PMML document using the JPMML-Model library:
pmmlExportable.toPMML("/tmp/raw-pmml-file") org.dmg.pmml.PMML pmml = org.jpmml.model.JAXBUtil.unmarshal("/tmp/raw-pmml-file"); org.jpmml.model.visitors.FieldRenamer targetRenamer = new FieldRenamer(FieldName.create("target"), FieldRenamer.create("y")); targetRenamer.applyTo(pmml); org.jpmml.model.JAXBUtil.marshal(pmml, "/tmp/final-pmml-file");
If you marshal this instance of the PMML object into a PMML file, you will see that the "target" field (and all its links) has been renamed to "y". Repeat the procedure with the functions.
source share