Are non-parallel streams designed for mass operation with a lot of data?

A few weeks ago, I was looking for a way to extract a specific value from a file and came across this question that introduced me to the Stream Object.

My first instinct was to find out whether this object would help other file operations, for example, to replace several placeholders with the corresponding values, for which I used BufferedReader and FileWriter . I barely managed to create any working code, but since then I began to be interested in articles that covered the topic, so I could understand the intended use of Stream .

Along the way, I came across Optional and understood it well and now I can identify cases where it is convenient for me to use Optional , keeping my code clean and clear. However, I can’t say that this is the case for Stream , not to mention the fact that it may not have provided a performance gain, I thought it would bring and still need a finally clause in cases where IO is involved .

Here is the main problem that I was trying to wrap my head with in mind that I have mainly worked on single-threaded programming so far: when is it preferable to use Stream in addition to parallel processing?

Is it necessary to make the operation massive in a certain subset of a large data set, where Collection would be used when trying to access specific objects of the specified collection? Although this seems to be intended for use, I'm still not sure if the example I linked at the beginning of my question is your typical use case.

Or is it just a construct used to make code smaller thanks to a lambda expression with a victim of readability? (Nothing against lambda if it is used correctly, but I saw most of the Stream usage example where it is completely inaudible, which did not help me in a general understanding)

+5
source share
1 answer

I have always referred to the Java 8 Streams API page description to help me decide between Collection and Stream :

However, the [threading API] has many advantages. First, the Streams API uses several methods such as laziness and short circuiting to optimize your request processing data.

Both a Stream and a Collection can be used to apply the calculation to each element of the data set before storing it. However, I found Streams useful if my pipeline included several separate filter / sort / display operations for each data item, as the Stream API can optimize these calculations behind the scenes and have built-in parallelization support.

I agree that readability can be both positively and negatively affected by Stream - you are correct that some Stream examples are completely unreadable, and I do not think that readability should be a key decision point for using Stream over something to others.

If you really optimize performance on a large dataset, consider using a toolbox specifically designed for massive datasets.

+2
source

Source: https://habr.com/ru/post/1272909/


All Articles