How to choose the right number of threads for a multithreaded C ++ application?

Question

How to choose the right number of threads for a multithreaded C ++ application?

I am a C ++ backend developer. I am developing a server part for a game in real time. So, the application architecture looks like this:

1) I have a Client class that processes requests from a game client. Examples of requests: log in, buy something at a store (internal game store), or do something. This Client also processes user input events from the game client (these are very often events that are sent ten times in a row from the game client to the server when the player plays the gameplay).

2) I have a thread pool. When the game client connects to the server, I create a client instance and bind it to one of the threads from the pool. So, we have a relationship with each other: one topic - many customers. Round-robin is used to select the thread to bind.

3) I use Libev to manage all events inside the server. This means that when a client instance receives some data from the game client via the network or processes some request or tries to send some data through the network to the game client, it blocks hi thread. While he does some things, other Clients who share the same thread will be blocked.

So, the thread pool is the bottleneck for the application. To increase the number of simultaneous players on the server, who will play without delay, I need to increase the number of threads in the thread pool.

Now the application runs on a server with 24 logical processors ( cat /proc/cpuinfo say). And I set the thread pool size to 24 (1 processor - 1 thread). This means that with current 2000 online members, each stream serves about 84 client instances. top say processors used less than 10 percent.

Now the question. If I increase the number of threads in the thread pool, does this increase or decrease the server performance (redistribution of context resources and blocked clients in the thread)?

UPD 1) The server has asynchronous I / O (libev + epoll), so when I say that the client is blocked when sending and receiving data, I mean accessing the buffers. 2) The server also has background threads for slow tasks: database operations, hard computing operations, ...

+5

c ++ performance multithreading linux cpu

helper helperov Apr 20 '16 at 6:15

source share

5 answers

David Haim · Answer 1 · 2016-04-20T09:05:32+0000

You have few problems.

2) I have a thread pool. When the game client connects to the server, I create an instance of the client and bind them to one of the threads from the pool. So, we have a one-to-many relationship: one thread - many customers. Round to select the thread to bind.

You did not specify asynchronous I / O in any of the items, I believe that your true bottleneck here is not the number of threads, but the fact that the thread is blocked due to I / O. using asynchronous IO (which is not an IO action in another thread) - the speed of your server is increased by huge magnets.

3) I use Libev to manage all events inside the server. This means that the client instance receives some data from the game client via the network or to process some request or try to send some data through the network so that it blocks the hi thread to the game client. While he is doing something else, Clients who use the same thread will be blocked.

again, without asynchronous I / O, this architecture has a very large server-side architecture (a-la Apache style). For maximum performance, your threads should only perform CPU-related tasks and should not wait for any I / O.

So, the thread pool is the bottleneck for the application. Increase the number of simultaneous players on the server who will play without delay I need to increase the number of threads in the thread pool.

Wrong. read about 10k concurrency issue.

Now the question. If I increase the number of threads in the thread pool, is it an increase or decrease in server performance (Context switching voltage vs blocked clients in the thread)?

So, the joke about the number of threads in the number of cores is valid only when your threads perform only cpu related tasks, and they are never blocked, and they are 100% staurated with cpu tasks. if your threads are also blocked by locks or IO actions, this fact breaks.

If we look at common server-side architectures, we can determine which best design we need.

Apache style architecture:
with a pool of fixed-size threads. sending a stream to each connection in the connection queue. not asynchronous IO.
Pros: Not.
minus: extremely poor bandwidth

Architecture of NGNix / Node.js:
with a single-threaded multiprocessor application. using asynchronous I / O.
Pros: A simple architecture that fixes multithreaded issues. Great for servers serving static data.
Cons: if processes must process data, a huge amount of processor time is burned based on serialization-transfer-deserialization of data between processes. In addition, a multi-threaded application can improve performance if performed correctly.

Modern .Net architecure:
with multi-threaded processing application. using asynchronous I / O.
Pros: If done right, performance could explode!
Cons: it is somewhat difficult to configure multi-threaded application and use it without distorting the data on the screen.

So, to summarize, I think that in your particular case, you should defenitly use only asynchronous IO + having threadpool with the number of threads equal to the number of cores.

If you use Linux, Facebook Proxygen can perfectly manage everything we talked about (multi-threaded code with asynchronous IO). hey facebook use it!

russw_uk · Answer 2 · 2016-04-20T06:33:38+0000

Many factors can affect overall performance, including how much each thread has to do for each client, how much cross-linking is required, whether there is a resource conflict between the threads, etc. The best thing to do is:

Define the performance parameters that you want to measure and make sure you have the tools - you mentioned the delay, so you need a mechanism for measuring the worst-case lag and / or the distribution of the lag from all clients on the server side.
Create a stress scenario. It can be as simple as a tool that reproduces the behavior of a real client or random behavior, but the more real the real load, the better.
Monitor a live server and change the number of threads (or even more radically change the design) and see which design or configuration leads to minimal latency.

This has the added benefit that you can use the same stress test along with the profiler to determine if you can extract any performance from your implementation.

QuikProBroNa · Answer 3 · 2016-04-20T06:35:13+0000

The optimal number of threads is most often equal to either the number of cores in your machine, or twice as many cores. To ensure maximum throughput, there should be minimal points of contention between threads. This number, i.e. The number of competing points floats between the value of the number of cores and twice the number of cores.

I would recommend testing and figuring out how you can achieve optimal performance.

L911 · Answer 4 · 2016-04-20T07:52:35+0000

Starting with the idea of having one thread per core can be enjoyable.

In addition, in some cases, calculating WCET (worst case execution time) is a way to determine which configuration is faster (cores do not always have the same frequency). You can easily measure it using timers (from the beginning of the function to the end and subtract the values to get the result in ms.)

In my case, I also had to work on consumption, as it was an embedded system. Some tools allow you to measure CPU consumption and therefore decide which configuration is most interesting in this particular case.

Andreas H. · Answer 5 · 2016-04-20T08:32:50+0000

The optimal number of threads depends on how your customers use the processor.

If cpu is the only bottleneck, and every kernel working with a thread is always in the top load, then setting the number of threads to the number of cores is a good idea.

If your clients do I / O (network, file, even page switching) or any other operation that blocks your stream, then it will be necessary to install more threads, because some of them will be blocked, even if cpu is available.

In your scenario, I would think that this is the second case. Threads are blocked because 24 client events are active, but use only 10% of the processor (so events processed by the thread lose 90% of its cpus resource). If so, it would be nice to raise the number of threads to about 240 (number of cores * 100 / average load) so that the other thread could run on an idle processor.

But be careful: if clients are tied to one thread (a thread processes clients 1, 2, 3 and a thread B processes clients 4, 5, 6), it helps to increase threadpool, but there may be sporadic delays if two client events should be processed by one and the same stream.

How to choose the right number of threads for a multithreaded C ++ application?

More articles: