How to create a custom rewritable transformer?

I am writing a special spark.ml transformer, extending Transformer .

Everything is in order, however, I can not save this instance of this transformer, since it does not extend from the DefaultParamsWritable tag, like all transformers, and I can not directly mix the DefaultParamsWritable trait either in the form org.apache.spark.ml .

The workaround to this is to put your class under org.apache.spark.ml . Is this the only way to achieve this? Any better solutions?

+5
source share
1 answer

Finally found a way to do it!

So, the trick has two steps.

If you plan to code a transformer with some variables that need to be written when saving, then this should be a sign that extends the class org.apache.spark.ml.param.Params.

Common features like HasInputCol are private to the spark ml package, so you need to redefine them in the public package of your choice too. (There is a mistake to make this publication on their JIRA board, but it has not yet fixed the date.)

But as soon as you do this, your transformer can simply implement both these Params and DefaultParamsWritable traits, and your transformer will be saved.

In fact, it was documented somewhere.

0
source

Source: https://habr.com/ru/post/1246174/


All Articles