Performance tuning for Netty 4.1 on a Linux machine

I am building a messaging application using Netty 4.1 Beta3 to develop my server, and the server understands the MQTT protocol.

This is my MqttServer.java class that installs a Netty server and binds it to a specific port.

EventLoopGroup bossPool=new NioEventLoopGroup(); EventLoopGroup workerPool=new NioEventLoopGroup(); try { ServerBootstrap boot=new ServerBootstrap(); boot.group(bossPool,workerPool); boot.channel(NioServerSocketChannel.class); boot.childHandler(new MqttProxyChannel()); boot.bind(port).sync().channel().closeFuture().sync(); } catch (Exception e) { e.printStackTrace(); }finally { workerPool.shutdownGracefully(); bossPool.shutdownGracefully(); } } 

Now I have performed stress testing of my application on my Mac with the following configuration enter image description here

Net performance was exceptional. I looked at jstack while my code was running and found that netty NIO spawned about 19 threads, and none of them seemed to be stuck waiting for channels or something else.

Then I executed my code on linux machine

enter image description here

This is a dual-core 15 gigabyte computer. The problem is that the packet sent by my MQTT client seems to take a lot of time to go through the netty pipeline, and when I took jstack, I found that there were 5 threads of threads and they all got stuck like this.

  ."nioEventLoopGroup-3-4" #112 prio=10 os_prio=0 tid=0x00007fb774008800 nid=0x2a0e runnable [0x00007fb768fec000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000006d0fdc898> (a io.netty.channel.nio.SelectedSelectionKeySet) - locked <0x00000006d100ae90> (a java.util.Collections$UnmodifiableSet) - locked <0x00000006d0fdc7f0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:621) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:309) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:834) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) 

This is some kind of performance issue related to epoll on linux machine. If so, what changes should be made to the netty configuration to eliminate it or to improve performance.

Edit

Java version for local system: -

java version "1.8.0_40" Java (TM) SE Runtime Environment (build 1.8.0_40-b27) Java HotSpot (TM) 64-bit server VM (build 25.40-b25, mixed mode)

AWS Java Version: -

openjdk version "1.8.0_40-internal" OpenJDK working environment (build 1.8.0_40-internal-b09) OpenJDK 64-Bit Server VM (build 25.40-b13, mixed mode)

+6
source share
1 answer

Play around with workflows to see if it improves performance. The standard constructor NioEventLoopGroup() creates the default number of event loop threads:

 DEFAULT_EVENT_LOOP_THREADS = Math.max(1, SystemPropertyUtil.getInt( "io.netty.eventLoopThreads", Runtime.getRuntime().availableProcessors() * 2)); 

As you can see, you can pass io.netty.eventLoopThreads as a start argument, but I usually don't.

You can also pass the number of threads in the NioEventLoopGroup() constructor.

In our environment, we have network servers that receive messages from hundreds of clients. Typically, a single boss thread is enough to handle connections. However, it is necessary to reduce the number of workflows. We use this:

 private final static int BOSS_THREADS = 1; private final static int MAX_WORKER_THREADS = 12; 

 EventLoopGroup bossGroup = new NioEventLoopGroup(BOSS_THREADS); EventLoopGroup workerGroup = new NioEventLoopGroup(calculateThreadCount()); 

 private int calculateThreadCount() { int threadCount; if ((threadCount = SystemPropertyUtil.getInt("io.netty.eventLoopThreads", 0)) > 0) { return threadCount; } else { threadCount = Runtime.getRuntime().availableProcessors() * 2; return threadCount > MAX_WORKER_THREADS ? MAX_WORKER_THREADS : threadCount; } } 

So, in our case, we use only one boss stream. Workflows depend on the start argument. If not, use kernels * 2, but no more than 12.

You will have to test yourself, although which numbers are best for your environment.

+1
source

Source: https://habr.com/ru/post/987636/


All Articles