How to get Partitioner in Apache Flink?

we are trying to create an extension for Apache Flink that uses custom partitioning. For some operators we want to check / extract the used delimiter. Unfortunately, I could not find any way to do this on this DataSet. Am I missing something or is there another workaround for this?

I would start with something like this:

class MyPartitioner[..](..) extends Partitioner[..] {..}
[..]
val myP = new MyPartitioner(...)
val ds = in.partitionCustom(myP, 0)

Now from another class, I would like to access the delimiter (if one is defined). In Spark, I would do it like this:

val myP = ds.partitioner.get.asInstanceOf[MyPartitioner]

However, for Flink, I could not find an opportunity for this.


Edit1:

This seems to be possible with Fabian's suggestion . However, there are two limitations:

(1) When using Scala, you must first load the Java core dataset to pass it to PartitionOperator

(2) . , . . :

val in: DataSet[(String, Int)] = ???

val myP = new MyPartitioner()
val ds = in.partitionCustom(myP, 0)
val ds2 = ds.map(x => x)

val myP2 = ds2.asInstanceOf[PartitionOperator].getCustomPartitioner

, Philipp

+4
1

DataSet PartitionOperator PartitionOperator.getCustomPartitioner():

val in: DataSet[(String, Int)] = ???

val myP = new MyPartitioner()
val ds = in.partitionCustom(myP, 0)

val myP2 = ds.asInstanceOf[PartitionOperator].getCustomPartitioner

,

  • getCustomPartitioner() - (.. API) Flink.
  • PartitionOperator , DataSet.partitionByHash(). getCustomPartitioner() null.
0

Source: https://habr.com/ru/post/1658671/


All Articles