How to quickly read data passing through a 10GbE network adapter?

I have two debian blocks connected by a CX4 cable between two 10 GbE cards. One of them will generate data very quickly (between 4 Gbit / s and 16 Gbit / s), and the other should be able to capture all this and save it in RAM for subsequent parsing. I am new to this low-level coding and am happy to accept any ideas about which broad approach to use (I need DMA? RDMA?), Or tips and tricks that can be applied. Thanks!

+4
source share
7 answers

If you want to constantly process 1 GB of traffic per second, you need a very wide bus and very fast processing speed, and my experience comes from NIDS . You need specialized equipment for sequentially performing NIDS processing of 100 MB (1 Gig ethernet) data (10 Gb is another universe). Ram will not help you, because you can fill in GB in 5-10 seconds, and 1 GB contains many requests.

If you are trying to do some form of business or web processing using 10 gigabytes, you probably have to put a load distribution that can support 10 GB of traffic in front.

ps, I must clarify that NIDS is a 1: 1 traffic processed on a machine that sees traffic - that is, in the worst case, you process every byte on the same machine; whereas business / web processing is 1: many: many machines and byte orders to process.

- change -

Now that you mentioned that there is a gap between data delivery (no standard 10Gb nic can support 10Gb anyway), we need to know what the processing content is before we can make an offer.

- change 2 -

Berkeley DB (a database with a simple data model) behaves like an enterprise database (in terms of transaction speed) when using multiple threads. If you want to write to disk at high speed, you should probably study this solution. You probably need to set up a raid to increase throughput - a 0 + 1 raid is best for throughput and IO protection.

+2
source

The only topics I've heard about available for regular PCs that will handle migrating saturated 10GbE to user space for any further processing are those made by Napatech - you have to use your own API.

And you'd better put such a card in a fairly mature server with a bus to support this speed (I would probably shy away from any nvidia chipsets for such a box.)

+4
source

Before planning any special programming, you should do some testing to find out how much you can process with the vanilla system. Set up the data file and the sending process on the manufacturer machine and simple accepter / parser on the consumer machine and do a bunch of profiling - where are you going to run into data problems? Can you throw the best equipment on it, or can you customize your processing faster?

Make sure you start with an HW platform that can support the data rate you expect? If you are working with something like the Intel 82598EB NIC, make sure you plug it into the PCIe 2.0 slot, preferably the x16 slot, to get the full bandwidth from the network board to the chipset.

There are ways to configure the NIC driver settings for your data stream to make the most of the settings. For example, make sure you use jumbo frames on the link to minimize TCP overhead. In addition, you can play with the driver's throttle frequency to speed up low-level processing.

Is the processing for your data set parallelized? If you have one task to flush data into memory, can you configure a few more tasks to process pieces of data at the same time? This would allow the use of multi-core processors.

Finally, if this is not enough, use the profiling / synchronization data that you collected to find parts of the system that you can tune to improve performance. Don't just assume that you know where you need to tweak: backing up with real data - you might be surprised.

+2
source

Well, you will need money. One way could be to buy a load balancer to split incoming data into two computers and then process it into one database.

+1
source

Since you have some aspects that simplify the situation (a stable point to point between two machines, without processing), I would try to use the trivial or obvious method of a single TCP stream between systems and writing data using write() to disk. Then measure performance and profile to determine where the bottlenecks are.

As a starting point, read about the C10K issue (10,000 concurrent connections), which is what most high-performance servers are developing for. It should give you a strong foundation for high-performance server problems. Of course, you don't have to worry about choosing / poll / epoll to establish new connections, which is a major simplification.

+1
source

I think that the latest Linux kernel supports a 10Gb package from nic-> kernel, but I doubt that there is an effective way to copy data to user space, even play on the i7 / XEON 5500 platform.

0
source

What seems to be forgotten: if the network cards are 10 GB and you are worried about the receiver, you can safely (relatively easily): even if the source is able to generate data at such a speed, it will have the same problems as the data on the 10 GB line because the receiver receives it from the line in RAM.

And if the network cards are 10 GB, this means that the bit is synchronized at that speed, but nothing is said about the time between the individual packets, and we are not talking about protocols yet.

I believe that this question is out of date for the OP, but if you have such a task, start with a regularly programmed solution to be able to judge what speed increase will be required for your special case (your case is always special ;-)

0
source

Source: https://habr.com/ru/post/1301146/


All Articles