Like others, I suspect this is due to the GC. The download example uses huge amounts of memory, by the end of two cycles for StringBuilder objects they will request gigabyte-sized arrays to store their data.
There are several reasons why a GC thread can slow down processing.
One of them is that as soon as the VM finishes working, most threads will be paused and wait for the GC to free memory before they can continue (this is because all threads will request more memory at about the same time at runtime).
Secondly, this is due to contextual thread switching (and this is probably the biggest reason). If thread A runs on core X, running out of memory, then GC will either boot to core X, or load all threads of memory A from core X cache to the cache on the kernel in which it is running. In any case, the CPU will have to wait until its cache with memory is loaded from RAM. RAM compared to the hard drive is fast, but compared to the processor, it is painstakingly slow. And while the processor is waiting for a RAM response, it cannot perform any processing, thereby reducing the load.
When you have multiple virtual machines, each virtual machine can run on its own core and does not care about what other virtual machines do before. And when the GC is called, then there is no need for a context switch, since the GC can only work on the same kernel as the other two threads on the virtual machine.
Dunes source share