we are trying to create an extension for Apache Flink that uses custom partitioning. For some operators we want to check / extract the used delimiter. Unfortunately, I could not find any way to do this on this DataSet. Am I missing something or is there another workaround for this?
I would start with something like this:
class MyPartitioner[..](..) extends Partitioner[..] {..}
[..]
val myP = new MyPartitioner(...)
val ds = in.partitionCustom(myP, 0)
Now from another class, I would like to access the delimiter (if one is defined). In Spark, I would do it like this:
val myP = ds.partitioner.get.asInstanceOf[MyPartitioner]
However, for Flink, I could not find an opportunity for this.
Edit1:
This seems to be possible with Fabian's suggestion . However, there are two limitations:
(1) When using Scala, you must first load the Java core dataset to pass it to PartitionOperator
(2) . , . . :
val in: DataSet[(String, Int)] = ???
val myP = new MyPartitioner()
val ds = in.partitionCustom(myP, 0)
val ds2 = ds.map(x => x)
val myP2 = ds2.asInstanceOf[PartitionOperator].getCustomPartitioner
,
Philipp