Yes, your reasoning is mostly true. You must create a thread on the kernel, an io_service instance for the thread, and call io_service.run () on each thread.
However, the question is whether you really will. These are the problems that I see:
Depending on how the work is balanced across your connections, you may get very busy cores and cores idling. Micro-optimization for getting into the cache in the kernel may mean that you lose the ability to work with an inactive kernel when the βoptimalβ kernel is not ready.
At socket speeds (i.e. slow), how many wins do you get from CPU cache hits? If a single connection requires enough CPU to keep the core busy, and you only have as many connections as there are cores, then fine. Otherwise, the inability to move the work to cope with deviations in the workload can lead to any victory obtained from cash hits. And if you do many different jobs in each thread, the cache will still not be hot.
If you just do I / O, winning the cache may not be that big, regardless. Depends on your actual workload.
My recommendation would be to have one io_service instance and call io_service.run () on the thread to the kernel. If you get inadequate performance or have connection classes that have a large number of CPUs for each connection, and you can get a cache gain, move them to specific io_service instances.
This is the case when you need to perform profiling to find out how many misses in caches are worth, and where.
source share