Java Sockets on RDMA (JSOR) vs. jVerbs performance in Infiniband

I have a basic understanding of both JSOR and jVerbs.

Both manage JNI restrictions and use a quick way to reduce latency. Both of them use the Verbs RDMA user interface to prevent context switching and provide quick path access. Both options also have options for transferring with zero copy.

The difference is that JSOR still uses the Java Socket interface. jVerbs provides a new interface. jVerbs also has something called Stateful Verbs Call to avoid re-serializing RDMA requests, which they say reduce latency. jVerbs provides a more native interface, and applications can use them directly. I read the jVerbs SoCC 2013 document, where they create jverbsRPC on top of jVerbs and show that it significantly reduces the latency of zookeeper and memcache operations.

The documentation for both shows that they work better than regular Java sockets based on TCP / IP, SDP and IPoIB.

I have no performance comparison between JSOR and jVerbs. I think jVerbs might work better than JSOR. But, with JSOR, I don't need to change my existing code because it still uses the same java socket interface. My question is what could be a performance gain when using jVerbs relative to JSOR. Does anyone know or have experience working with them? If you have comparison data, this will be great. I could not find.

+1
source share
2 answers

Here are a few numbers using DiSNI - the recently opened successor to IBM jVerbs - and DaRPC , a low latency RPC library that uses DiSNI.

  • DiSNI RDMA Read Delays for 64 Bytes Below 2 Microseconds
  • DaRPC RDMA send / return gifts for 64 bytes (request and response) are about 5 microseconds
  • The differences between the basic RDMA Java / DiSNI and C are not significant for one-way operations.

These tests were performed on two hosts connected using the Mellanox ConnectX-3 network interface.

Here are the commands to run the tests:

(1) Read the benchmark

Server:

java -cp disni-1.0-jar-with-dependencies.jar:disni-1.0-tests.jar com.ibm.disni.examples.benchmarks.AppLauncher -t java-rdma-server -a <address> -o read -s 64 -k 100000 -p 

Client:

 java -cp disni-1.0-jar-with-dependencies.jar:disni-1.0-tests.jar com.ibm.disni.examples.benchmarks.AppLauncher -t java-rdma-client -a <address> -o read -s 64 -k 100000 -p 

(2) Test to send / return

Server:

 java -cp darpc-1.0-jar-with-dependencies.jar:darpc-1.0-tests.jar com.ibm.darpc.examples.server.DaRPCServer -a <address> -d -l 64 -r 64 

Client:

 java -cp darpc-1.0-jar-with-dependencies.jar:darpc-1.0-tests.jar com.ibm.darpc.examples.client.DaRPCClient -a <address> -k 1000000 -l 64 -r 64 -b 1 

enter image description here

+2
source

Comparing jVerbs performance with JSOR is a bit tricky. The first is a message-oriented API, and the second hides RDMA behind the Java streaming API.

Here are some statistics. My test using a pair of old ConnectX-2 cards and Dell PowerEdge 2970 servers. CentOS 7.1 and Mellanox OFED version 3.1.

I was only interested in the latent test.

jVerbs

A test is a variation of the RPing sample (can be hosted on github if anyone is interested). Check the measured delay of 5,000,000 cycles of the next call sequence for a reliable connection. The message size was 256 bytes.

 PostSendMethod.execute() PollCQMethod.execute() CompletionChannel.ackCQEvents() 

Results (microseconds):

  • Median: 10.885
  • 99.0% percentile: 11.663
  • 99.9% percentile: 17,471
  • 99.99% percentile: 27,791

Jsor

A similar test through the JSOR socket. The test was a sample textbook client / server socket. The message size was also 256 bytes.

Results (microseconds):

  • Median: 43
  • 99.0% percentile: 55
  • 99.9% percentile: 61
  • 99.99% percentile: 217

These results are very far from the OFED timeout test. In the same ib_send_lat standard for hardware / OS, 2.77 as a median and 23.25 microseconds as the maximum delay were produced.

+1
source

Source: https://habr.com/ru/post/899601/


All Articles