Why in this example do threads increase time (decrease performance)?

This code:

object obj = new object { }; Stopwatch watch = new Stopwatch(); watch.Start(); for (int i = 0; i < 90000; i++) { new Thread(() => { lock (obj) { string file = new JavaScriptSerializer().Serialize(saeed); File.AppendAllText(string.Format(@"c:\Temp\{0}.txt", i), file); } }).Start(); } watch.Stop(); 

works like 15 minutes, and this code:

  Stopwatch watch = new Stopwatch(); watch.Start(); for (int i = 0; i < 90000; i++) { { string file = new JavaScriptSerializer().Serialize(saeed); File.AppendAllText(string.Format(@"c:\Temp\{0}.txt", i), file); } } watch.Stop(); 

works like 45 seconds. Why is the first application so slow while it locks? Isn't it true that using threads is a way to improve application performance?

Update: Even using closure concepts and referring to the middle variable instead of i in my thread instead of using a lock that makes the threads really asynchronous, it takes more than 5 minutes to create these files.

  Stopwatch watch = new Stopwatch(); watch.Start(); for (int i = 0; i < 90000; i++) { var x = i; new Thread(() => { string file = new JavaScriptSerializer().Serialize(saeed); File.AppendAllText(string.Format(@"c:\Temp\{0}.txt", i), file); }).Start(); } watch.Stop(); 
+6
source share
5 answers

1) you are currently creating 90,000 threads, which is inefficient at all. Do not create a thread each time, use the thread pool instead, so you are reusing threads that are already created. Remember that creating a thread takes some time and memory.

2) you block the entire block of code with lock , which means that each thread is blocked until the other thread completes its work. So you basically pursue the whole goal of multithreading here.

3) Disk I / O operations do not work with multi-threaded systems for complex hardware-related reasons (buffers ... etc.). This is generally not a good idea for the multi-threaded part of the code.


About comments regarding disk I / O and multithreading: it's pretty complicated.

For magnetic disks, the disk lever must move to read / write bytes on a good sector / cylinder / track. If you write two different files at the same time (in the case of two streams, each of which writes a different file), depending on the physical location of the file on the disk, you can ask your disk lever to switch from one physical place to another very quickly , which destroys the performances. By writing several disk sectors for the first file in one physical location, then moving the disk to another location, and then writing some discs for the second file will be much more efficient. This effect can be seen when you compare the time to copy two files at the same time as copying one file and then another.

Thus, for this very simple example, gain / loss of performance depends on:

  • Hardware. There is no disk leverage with an SSD, so file access is faster.
  • physical file location
  • file fragmentation
  • buffering. a disk buffer system helps read sequential blocks that may not be useful if you need to move your hand to another place.

My humble advice: try to avoid reading / writing multiple times in multiple threads if performances are your main goal.

+35
source

Threading can speed up your code, giving you more options to execute. But you are looking at very different resource limits in the first snippet.

The first is the ability of a computer to make 90 gigabytes of memory. The space required for thread stacks. It takes some time, your hard drive, if it is probably furiously working on creating a backup storage for this large memory .. NET is a little unusual in that it fixes the stack space for the stream, it provides a guarantee of execution. Something that you can disable btw, the <disableCommitThreadStack> element in the app.exe.config file should have a very noticeable effect.

The second resource you are studying is the ability of the file system to simultaneously change this number of files. This will be greatly hampered by the first limitation, you steal most of the RAM from the file system cache. When you run out of free space, you see the effect of these streams that try to grab the head of a disc recording. Forcing it to loop between file clusters. The disk is looking for a very slow one, by far the slowest disk operation. This is a mechanical operation, the front drive head must physically move, which takes many milliseconds. Hard page errors that your code is likely to generate will also greatly degrade performance.

Blocking your code stream will reduce this, but will not eliminate it. With a high memory requirement, your program can generate a large number of page errors. Worse is the case in every thread context switch. The thread will be blocked while the disk searches + reads to satisfy the request on the page.

Well then, to Windows, letting you do this and not fall. But it is clear that this was a bad idea. Use no more than a few threads. Or only one if the entries will still saturate the file system cache to avoid search penalty.

+20
source

I would note that most of the answers did not read the sample code. It's not about creating a bunch of threads and writing to disk, it's about spawning a bunch of threads, and some are working on the new JavaScriptSerializer (). Serialize (saeed); and then write to disk!

This is important to note, because the longer this work requires more benefit, the easier it is to stream, making sure that the disk does not idle during the calculation.


Long and short due to the fact that you wrote simplified code, as others have explained:

  • You create 90,000 threads - this is unnecessarily expensive!
  • You lock all the work by doing this single threaded!
    • Yes, without blocking, you get an exception ... this does not magically make locking a good idea from the idea of ​​performance - it just means that you have a wrong code.

A quick and easy way to get into threads - it's a little less dangerous (although you can still fill it up) - use a parallel task library. For instance:

 using System; using System.Diagnostics; using System.IO; using System.Threading.Tasks; namespace ConsoleApplication15 { class Program { const int FILE_COUNT = 9000; const int DATA_LENGTH = 100; static void Main(string[] args) { if (Directory.Exists(@"c:\Temp\")) Directory.Delete(@"c:\Temp\", true); Directory.CreateDirectory(@"c:\Temp\"); var watch = Stopwatch.StartNew(); for (int i = 0; i < FILE_COUNT; i++) { string data = new string(i.ToString()[0], DATA_LENGTH); File.AppendAllText(string.Format(@"c:\Temp\{0}.txt", i), data); } watch.Stop(); Console.WriteLine("Wrote 90,000 files single-threaded in {0}ms", watch.ElapsedMilliseconds); Directory.Delete(@"c:\Temp\", true); Directory.CreateDirectory(@"c:\Temp\"); watch = Stopwatch.StartNew(); Parallel.For(0, FILE_COUNT, i => { string data = new string(i.ToString()[0], DATA_LENGTH); File.AppendAllText(string.Format(@"c:\Temp\{0}.txt", i), data); }); watch.Stop(); Console.WriteLine("Wrote 90,000 files multi-threaded in {0}ms", watch.ElapsedMilliseconds); } } } 

The single-threaded version starts after about 8.1 seconds, and the multi-threaded version works after about 3.8 seconds. Please note that my test values ​​are different from yours.

While the default TPL settings are not always optimized for the script you're working on, they provide a much better foundation than starting 90,000 threads! You will also notice that in this case I do not need to do any locks and I do not have to handle the closure, because the API presented already handles this for me.

+8
source

The reason is twofold

  • Creating an expensive stream is that it takes a non-trivial amount of time.
  • You are blocking obj , this actually ensures that only one thread can work at a time in this example, so you are not actually working in a multi-threaded way.
+5
source

Because the thread is done inside a for loop with a lock. Thus, threads execute one after the other, and not simultaneously, as in the second example.

+4
source

Source: https://habr.com/ru/post/949013/


All Articles