.NET sockets with high performance C ++ sockets

My question is to allow an argument with my C ++ vs C # staff.

We have implemented a server that receives a large number of UDP streams. This server was developed in C ++ using asynchronous sockets and overlapping I / O using completion ports. We use 5 completion ports with 5 threads. This server can easily handle the bandwidth of 500 Mbps in a gigabit network without packet / error loss (we did not push our tests beyond 500 Mbps).

We tried to re-implement the same type of server in C #, and we were not able to achieve the same incoming bandwidth. We use asynchronous reception using the ReceiveAsync method and the SocketAsyncEventArgs pool to avoid the overhead of creating a new object for each reception. Each SAEventArgs has a buffer set for it, so we do not need to allocate memory for each reception. The pool is very, very large, so we can queue more than 100 requests for receipt. This server is not capable of handling inbound bandwidth of more than 240 Mbps. By this restriction, we lose some packets in our UDP streams.

My question is this: should I expect the same performance using C ++ sockets and C # sockets? My opinion is that it should be the same if memory is managed properly in .NET.

Side question: does anyone know a good article / link explaining how .NET sockets use I / O completion ports under the hood?

+42
c # sockets io-completion-ports
Dec 11 '11 at 16:38
source share
3 answers

Does anyone know a good article / link explaining how .NET sockets use I / O completion ports under the hood?

I suspect that the only reference will be the implementation (i.e. the reflector or another assembly compiler). With this, you will find that all asynchronous IOs go through the I / O completion port with callbacks processed in the IO thread pool (which is separate for the regular thread pool).

use 5 completion ports

I would expect to use one completion port that processes all IOs into one thread pool with one thread to complete the pool maintenance (assuming that you are running any other IO, including disk, asynchronously).

Multiple completion ports will make sense if you have some form of prioritization.

My question is this: should I expect the same performance using C ++ sockets and C # sockets?

Yes or no, depending on how narrowly you define the "using ... socket" part. As for operations from the beginning of the asynchronous operation until the completion is sent to the completion port, I would not expect a significant difference (all processing is performed in the Win32 API or the Windows kernel).

However, the security that the .NET runtime provides will add some overhead. For example. the length of the buffers will be checked, delegated, etc. If the limit for the application is the CPU, then this is likely to change the situation, and in a pinch, a small difference can easily add up.

Also, the .NET version is sometimes suspended for GC (.NET 4.5 makes an asynchronous collection, so this will improve in the future). There are methods to minimize garbage accumulation (for example, reuse objects, rather than create them, use structures, while avoiding boxing).

After all, if the C ++ version works and meets your performance needs, why a port?

+7
Dec 11 '11 at 17:04
source share

You cannot make a direct port of code from C ++ to C # and expect the same performance..NET does much more than C ++ when it comes to memory management (GC), and make sure your code is safe (borderline checks, etc.).

I would allocate one large buffer for all I / O operations (e.g. 65535 x 500 = 32767500 bytes) and then assign a chunk for each SocketAsyncEventArgs (and for send operations). Memory is cheaper than a processor. Use the buffer manager / factory to provide chunks for all connections and I / O (Flyweight pattern). Microsoft does this in its Async example.

Both Begin / End and Async methods use I / O completion ports in the background. The latter does not need to allocate objects for each operation, which improves performance.

+5
Dec 11 '11 at 18:25
source share

I assume that you do not see the same performance, because .NET and C ++ actually do different things. Your C ++ code may not be as secure or check boundaries. Also, are you just measuring the ability to receive packets without any processing? Or does your bandwidth include packet processing time? If so, then the code you may have written to process packages may not be as efficient.

I would suggest using a profiler to check how much time is spent and try to optimize it. The actual socket code should be quite efficient.

+1
Dec 11 '11 at 17:29
source share



All Articles