What is the difference between the two?
Think of it as two nested loops.
- In the first case there is no parallelism,
- in the second case, the outer loop / collection is parallel
- In the third case, the inner loop / collection, if it is more parallel.
- in the latter case, you have a mixture of parallelism, which is likely to be more confusing than useful.
The fourth case is not clear, since in reality there is only one pool of threads, and if the pool is busy, the current thread can be used, i.e. it may not be parallel ^ 2 at all.
Which one is better? Faster? and safer?
First, however, using a flat map will be easier.
 set.stream().flatMap(s -> s.stream()).forEach(System.out::println); 
Other versions are more complex, and since the console, which is the neck of the bottle, is a shared resource, a multi-threaded version is likely to be slower.
Which one is good for huge collections?
Assuming your goal is to do something other than printing, you need to complete enough tasks to keep all your processors busy, but not many of the tasks that it creates overhead. It might be worth considering the second option.
Which one is good when we want to apply heavy processes to each element?
Again, the second example may be the best, or perhaps the third, if you have a small number of external collections.