Discrepancy between Cassandra’s footprint and client-side latency

We are on Cassandra 2.0.15 and see huge reading latencies (> 60 sec.) That appear at regular intervals (approximately every 3 minutes) from all application hosts. We measure this delay around calls on session.execute(stmt). At the same time, Cassandra keeps track of report durations <1s. We also ran a query through cqlsh from the same hosts during these peak latency periods, and cqlsh always returned within 1 second. What can explain this inconsistency at the Java driver level?

- edit: in response to comments -

Cassandra server JVM settings: -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -Xms32G -Xmx32G -XX:+UseG1GC -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC.

The client side of the GC is negligible (below). Client settings:, -Xss256k -Xms4G -Xmx4GCassandra driver version - 2.1.7.1

GC Client Side Void

Client Side Measurement Code:

val selectServiceNames = session.prepare(QueryBuilder.select("service_name").from("service_names"))

override def run(): Unit = {
  val start = System.currentTimeMillis()
  try {
    val resultSet = session.execute(selectServiceNames.bind())
    val serviceNames = resultSet.all()
    val elapsed = System.currentTimeMillis() - start
    latency.add(elapsed) // emits metric to statsd
    if (elapsed > 10000) {
      log.info("Canary2 sensed high Cassandra latency: " + elapsed + "ms")
    }
  } catch {
    case e: Throwable =>
      log.error(e, "Canary2 select failed")
  } finally {
    Thread.sleep(100)
    schedule()
  }
}

Cluster Design Code:

def createClusterBuilder(): Cluster.Builder = {
  val builder = Cluster.builder()
  val contactPoints = parseContactPoints()
  val defaultPort = findConnectPort(contactPoints)
  builder.addContactPointsWithPorts(contactPoints)
  builder.withPort(defaultPort) // This ends up config.protocolOptions.port
  if (cassandraUsername.isDefined && cassandraPassword.isDefined)
    builder.withCredentials(cassandraUsername(), cassandraPassword())
  builder.withRetryPolicy(ZipkinRetryPolicy.INSTANCE)
  builder.withLoadBalancingPolicy(new TokenAwarePolicy(new LatencyAwarePolicy.Builder(new RoundRobinPolicy()).build()))
}

Another remark I can not explain. I executed two threads that execute the same query in the same way (as indicated above) in a loop, the only difference is the yellow thread, which is 100 milliseconds between requests, and the green thread is 60 seconds between requests. The green thread falls on a low delay (less than 1 s) much more often than on the yellow.

enter image description here

+4
source share
2 answers

- . , , , .

  • ONE LOCAL_ONE
  • DC- ( ).

- ​​ Java, , . , - , , , DC .

+1

, compoent .

  • , .
  • , .
  • JVM , , .

- . , 100 , , 1 . , , 1 , 100 , 0 , 99 , , 1 , 100 , 99 .

, , , , , . .. , .

+3

Source: https://habr.com/ru/post/1608057/


All Articles