Java 8 Stream: difference between limit () and skip ()

Speaking of Stream s, when I execute this piece of code

 public class Main { public static void main(String[] args) { Stream.of(1,2,3,4,5,6,7,8,9) .peek(x->System.out.print("\nA"+x)) .limit(3) .peek(x->System.out.print("B"+x)) .forEach(x->System.out.print("C"+x)); } } 

I get this conclusion

 A1B1C1 A2B2C2 A3B3C3 

because restricting my thread to the first three components causes actions A, B and C to be executed only three times.

Trying to do a similar calculation on the last three elements using the skip() method shows a different behavior: this

 public class Main { public static void main(String[] args) { Stream.of(1,2,3,4,5,6,7,8,9) .peek(x->System.out.print("\nA"+x)) .skip(6) .peek(x->System.out.print("B"+x)) .forEach(x->System.out.print("C"+x)); } } 

displays this

 A1 A2 A3 A4 A5 A6 A7B7C7 A8B8C8 A9B9C9 

Why in this case are actions A1 to A6 performed? This should have something to do with the fact that the limit is a short trailing intermediate state operation , while there is no skip, but I do not understand the practical consequences of this property. Is it just that "every action before skipping is performed until everything is to the limit"?

+43
java java-8 java-stream limit skip
Sep 05 '15 at 14:17
source share
4 answers

Here you have two stream pipelines.

These stream pipelines consist of a source, several intermediate operations, and terminal operations.

But intermediate operations are lazy. This means that nothing happens if the item is not required for the downstream operation. When this happens, the intermediate operation does everything necessary to create the required element, and then waits again until another element is requested, etc.

Terminal operations are usually "impatient." That is, they request all the elements in the stream that are needed to complete them.

So, you should really think of the pipeline as forEach , requesting a stream behind it for the next element, and that stream will request a stream behind it, etc. down to the source.

With this in mind, let's see what we have with your first pipeline:

 Stream.of(1,2,3,4,5,6,7,8,9) .peek(x->System.out.print("\nA"+x)) .limit(3) .peek(x->System.out.print("B"+x)) .forEach(x->System.out.print("C"+x)); 

So forEach requests the first element. This means that the "B" peek needs an element and requests the limit output stream for it, which means that limit will need to request the "A" peek , which goes to the source. The element is given and comes to forEach , and you will get your first line:

 A1B1C1 

forEach requests another element, then another. And each time the request is propagated upstream and executed. But when forEach requests the fourth element, when the request falls into limit , he knows that he has already provided all the elements that he is allowed to give.

Thus, it does not query “A” for another element. It immediately indicates that its elements are exhausted, and thus no more actions are performed and forEach completed.

What happens in the second pipeline?

  Stream.of(1,2,3,4,5,6,7,8,9) .peek(x->System.out.print("\nA"+x)) .skip(6) .peek(x->System.out.print("B"+x)) .forEach(x->System.out.print("C"+x)); 

Again, forEach requests the first element. It spreads back. But when he gets to skip , he knows that he needs to ask for 6 items from his upstream before he can transfer one downstream. Thus, he makes a request upstream from the “A” peek , consumes it without passing it downstream, makes another request, and so on. Thus, viewing "A" receives 6 requests for an element and produces 6 prints, but these elements are not transmitted.

 A1 A2 A3 A4 A5 A6 

In the 7th request made by skip , the element is passed to "B" and from it to forEach , so full printing is performed:

 A7B7C7 

Then it is exactly the same as before. skip now, whenever it receives a request, it requests the element up and passes it downstream, as it “knows” that it has already completed its work on skipping. Thus, the remaining prints pass through the entire pipe until the source is exhausted.

+65
Sep 05 '15 at 14:59
source share

Free recording of a streaming pipeline is what causes this confusion. Think of it this way:

limit(3)

All pipelined operations are evaluated lazily, except forEach() , which is a terminal operation that launches "pipeline execution".

When the pipeline runs, intermediate flow definitions will not make any assumptions about what happens before or after. All they do is take the input stream and convert it to the output stream:

 Stream<Integer> s1 = Stream.of(1,2,3,4,5,6,7,8,9); Stream<Integer> s2 = s1.peek(x->System.out.print("\nA"+x)); Stream<Integer> s3 = s2.limit(3); Stream<Integer> s4 = s3.peek(x->System.out.print("B"+x)); s4.forEach(x->System.out.print("C"+x)); 
  • s1 contains 9 different Integer values.
  • s2 looks at all the values ​​that pass it and prints.
  • s3 passes the first 3 values ​​to s4 and interrupts the pipeline after the third value. No additional values ​​are generated by s3 . This does not mean that there are no more values ​​in the pipeline. s2 will still produce (and print) more values, but no one asks for these values ​​and, therefore, execution stops.
  • s4 again looks at all the values ​​that pass it and prints.
  • forEach consumes and prints everything that s4 goes to it.

Think of it this way. The whole thread is completely lazy. Only a terminal operation actively retrieves new values ​​from the pipeline. After he pulled 3 values ​​from s4 <- s3 <- s2 <- s1 , s3 will no longer produce new values ​​and will no longer pull values ​​from s2 <- s1 . While s1 -> s2 can still create 4-9 , these values ​​are simply not retrieved from the pipeline, and therefore will never be printed using s2 .

skip(6)

The same thing happens with skip() :

 Stream<Integer> s1 = Stream.of(1,2,3,4,5,6,7,8,9); Stream<Integer> s2 = s1.peek(x->System.out.print("\nA"+x)); Stream<Integer> s3 = s2.skip(6); Stream<Integer> s4 = s3.peek(x->System.out.print("B"+x)); s4.forEach(x->System.out.print("C"+x)); 
  • s1 contains 9 different Integer values.
  • s2 looks at all the values ​​that pass it and prints.
  • s3 consumes the first 6 values, "skipping them", which means that the first 6 values ​​are not transmitted to s4 , but only the following values.
  • s4 again looks at all the values ​​that pass it and prints.
  • forEach consumes and prints everything that s4 goes to it.

The important thing is that s2 does not know that the remaining pipeline skips any values. s2 looks at all values ​​no matter what happens next.

Another example:

Consider this pipeline, which is listed in this blog post.

 IntStream.iterate(0, i -> ( i + 1 ) % 2) .distinct() .limit(10) .forEach(System.out::println); 

When you complete the above, the program will never stop. What for? Because:

 IntStream i1 = IntStream.iterate(0, i -> ( i + 1 ) % 2); IntStream i2 = i1.distinct(); IntStream i3 = i2.limit(10); i3.forEach(System.out::println); 

It means:

  • i1 generates an infinite number of variable values: 0 , 1 , 0 , 1 , 0 , 1 , ...
  • i2 consumes all the values ​​that were met earlier, passing only the "new" values, that is, there are only 2 values ​​coming out of i2 .
  • i3 transmits 10 values, then stops.

This algorithm will never stop, because i3 waits for i2 to get 8 more values ​​after 0 and 1 , but these values ​​never appear, and i1 never stops supplying values ​​to i2 .

It does not matter that at some point in the pipeline more than 10 values ​​were produced. All that matters is that i3 never saw these 10 meanings.

To answer your question:

Is it just that "every action before skipping is performed until everything is to the limit"?

Nope. All operations before skip() or limit() is executed. In both of your performances you will receive A1 - A3 . But limit() can lead to a short circuit of the pipeline, interrupting the consumption of value after an interesting event occurs (the limit is reached).

+10
Sep 05 '15 at
source share

It is complete blasphemy to look at steam operations individually, because it is not how the flow is evaluated.

Speaking of limit (3) , this is a short circuit operation, which makes sense because, thinking about it, any operation before and after strong> limit that has a flow restriction will stop the iteration after getting n elements to the limit operation, but this is not means that only n stream elements will be processed. Take this other stream operation for an example.

 public class App { public static void main(String[] args) { Stream.of(1,2,3,4,5,6,7,8,9) .peek(x->System.out.print("\nA"+x)) .filter(x -> x%2==0) .limit(3) .peek(x->System.out.print("B"+x)) .forEach(x->System.out.print("C"+x)); } } 

displays

 A1 A2B2C2 A3 A4B4C4 A5 A6B6C6 

which seem correct as the limit expects the three flow elements to go through the chain of operations, although 6 flow elements are processed.

+8
Sep 05 '15 at 15:54
source share

All threads are based on dividers, which have basically two operations: advance (forward movement of one element, similar to an iterator) and separation (division by an arbitrary position, suitable for parallel processing). You can stop accepting input elements at any time that you like (this is done using limit ), but you cannot just go to an arbitrary position (there is no such operation in the Spliterator interface). Thus, the skip operation must really read the first elements from the source in order to ignore them. Please note that in some cases you may take the actual leap:

 List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9); list.stream().skip(3)... // will read 1,2,3, but ignore them list.subList(3, list.size()).stream()... // will actually jump over the first three elements 
+4
Sep 05 '15 at 15:07
source share



All Articles