What the hell, I continue and write the answer.
I will ignore the “asynchronous” and “non-blocking” terminology because I think this is not relevant to your question.
You are worried about performance when working with thousands of network clients, and you are right to worry. You reopened the C10K issue. When the Web was young, people saw the need for a small number of fast servers to handle a large number of (relatively) slow clients. Existing select / poll interfaces require linear scans — both in the kernel and in user space — across all sockets to determine which ones are ready. If many sockets are often idle, your server may spend more time figuring out what to do than to do the actual work.
Fast forward to today, where we have basically two approaches to solve this problem:
1) Use one thread for each socket and just block reading and writing. As a rule, this is the simplest code, and, in my opinion, modern operating systems make it possible to quietly idle threads without causing significant overhead. In my experience, this approach is very suitable for hundreds of customers; I can’t say how it will work thousands.
2) Use one of the platform interfaces that were introduced to solve the C10K problem. This means epoll (Linux), kqueue (BSD / Mac), or termination ports (Windows). (If you think that epoll same as poll , look again.) All of them will only notify your application of ready-made sockets, avoiding wasteful linear scanning of unoccupied connections. There are several libraries that simplify the use of these interfaces for the platform, including libevent, libev, and Boost.Asio. You will find that all of them ultimately cause epoll on Linux, kqueue on BSD, etc., when such interfaces are available.
source share