You can watch this discussion of BoostCon by Yandex guys: Boost.Asio network server optimization
The gut feeling says they (the guys from Yandex) overestimated this (quite a bit ...). I would say that the main solution would be to use pre-allocated fixed buffers (possibly for threads) and use Asio's MutableBufferSequence concept to glue them together.
This approach is known as Scatter-Gather and is briefly described in Asio docs. An example can be given here: http://www.boost.org/doc/libs/1_56_0/doc/html/boost_asio/examples/cpp11_examples.html#boost_asio.examples.cpp11_examples.buffers
As @Nim already noted, Asio works by default in the “zero copy” mode (since it never owns a buffer and does not allocate on behalf of the caller). Therefore, in fact, it should be quite simple to make it work. Of course, regardless of whether kernel / libc functions are implemented in the zero copy, it depends only on the OS / platform.
source share