We are on Cassandra 2.0.15 and see huge reading latencies (> 60 sec.) That appear at regular intervals (approximately every 3 minutes) from all application hosts. We measure this delay around calls on session.execute(stmt). At the same time, Cassandra keeps track of report durations <1s. We also ran a query through cqlsh from the same hosts during these peak latency periods, and cqlsh always returned within 1 second. What can explain this inconsistency at the Java driver level?
- edit: in response to comments -
Cassandra server JVM settings: -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -Xms32G -Xmx32G -XX:+UseG1GC -Djava.net.preferIPv4Stack=true -Dcassandra.jmx.local.port=7199 -XX:+DisableExplicitGC.
The client side of the GC is negligible (below). Client settings:, -Xss256k -Xms4G -Xmx4GCassandra driver version - 2.1.7.1

Client Side Measurement Code:
val selectServiceNames = session.prepare(QueryBuilder.select("service_name").from("service_names"))
override def run(): Unit = {
val start = System.currentTimeMillis()
try {
val resultSet = session.execute(selectServiceNames.bind())
val serviceNames = resultSet.all()
val elapsed = System.currentTimeMillis() - start
latency.add(elapsed)
if (elapsed > 10000) {
log.info("Canary2 sensed high Cassandra latency: " + elapsed + "ms")
}
} catch {
case e: Throwable =>
log.error(e, "Canary2 select failed")
} finally {
Thread.sleep(100)
schedule()
}
}
Cluster Design Code:
def createClusterBuilder(): Cluster.Builder = {
val builder = Cluster.builder()
val contactPoints = parseContactPoints()
val defaultPort = findConnectPort(contactPoints)
builder.addContactPointsWithPorts(contactPoints)
builder.withPort(defaultPort)
if (cassandraUsername.isDefined && cassandraPassword.isDefined)
builder.withCredentials(cassandraUsername(), cassandraPassword())
builder.withRetryPolicy(ZipkinRetryPolicy.INSTANCE)
builder.withLoadBalancingPolicy(new TokenAwarePolicy(new LatencyAwarePolicy.Builder(new RoundRobinPolicy()).build()))
}
Another remark I can not explain. I executed two threads that execute the same query in the same way (as indicated above) in a loop, the only difference is the yellow thread, which is 100 milliseconds between requests, and the green thread is 60 seconds between requests. The green thread falls on a low delay (less than 1 s) much more often than on the yellow.

source
share