You can use mapPartition or foreachPartition . Here is a snippet taken from Learning Spark
Using partition-based operations, we can share the connection pool to this database so as not to make many connections and reuse our JSON parser. As examples 6-10 through 6-12 show, we use mapPartitions (), which gives us an iterator of elements in each section of the input RDD and expects us to return an iterator of our results.
This allows us to initialize one connection for each artist, and then iterate over the elements in the section as you would like. This is very useful for saving data to some external database or for creating an expensive reusable object.
Here is a simple scala example taken from a related book. This can be translated into java if necessary. Just here to show a simple example using mapPartition and foreachPartition.
ipAddressRequestCount.foreachRDD { rdd => rdd.foreachPartition { partition =>
Here is a link to a java example.
source share