Both foreach on RDD and foreachRDD on DStream will be executed sequentially because they are output transformations, which means that they cause the materialization of the chart. This would not be the case for any general lazy conversion to Spark that can work in parallel when the execution schedule diverges into several separate steps.
For instance:
dStream: DStream[String] = ??? val first = dStream.filter(x => x.contains("h")) val second = dStream.filter(x => !x.contains("h")) first.print() second.print()
The first part should not be performed sequentially when you have enough cluster resources for parallel operation of the basic stages. Then calling count , which is the output conversion again, will cause the print statements to print one after the other.
source share