The cost that you see here is not related to the "closure" in general, but to the cost of Stream initialization.
Let's take three code samples:
for (int i = 0; i < 10_000_000; i++) { Set<String> set = Collections.emptySet(); set.stream().forEach(s -> System.out.println(s)); }
This creates a new Stream instance in each loop; at least for the first 10k iterations, see below. After these 10k iterations, well, the JIT is probably smart enough to see that it is no-op anyway.
for (int i = 0; i < 10_000_000; i++) { Set<String> set = Collections.emptySet(); for (String s : set) { System.out.println(s); } }
Here the JIT starts up again: empty set? Well, this is no-op, the end of the story.
set.forEach(System.out::println);
Is an Iterator for a set that is always empty? Same story, JIT kicks.
The problem with your code, for starters, is that you do not take JIT into account; for realistic measurements, run at least 10k loops before the measurement, since 10k execution is what the JIT requires (at least HotSpot acts this way).
Now, lambdas: these are call sites, and they are connected only once; but the cost of the initial connection still exists, of course, and in your cycles you include this cost. Try to run only one cycle before taking measurements so that this cost is aloof.
In general, this is not a valid micro start. Use a caliper or jmh to really measure performance.
Great video to see how lambdas work here . Now this is a bit old-fashioned, and the JVM is much better than lambdas at the time.
If you want to know more, check out the invokedynamic .