Spark cassandra: combine a table with a query condition based on an attribute from the primary RDD ("where tableA.myValue> tableB.myOtherValue")

Question

Spark cassandra: combine a table with a query condition based on an attribute from the primary RDD ("where tableA.myValue> tableB.myOtherValue")

Is there a way to join two tables adding a condition for columns between two tables?

Example:

case class TableA(pkA: Int, valueA: Int)
case class TableB(pkB: Int, valueB: Int)


val rddA = sc.cassandraTable[TableA]("ks", "tableA")
rddA.joinWithCassandraTable[TableB]("ks", "tableB").where("tableB.valueB > tableA.valueA")

Is there a way to send a command where("tableB.valueB > tableA.valueA")? ("tableB.value" is a clustering column)

+4

cassandra apache-spark spark-cassandra-connector

Gridou Nov 21 '16 at 10:37

source share

1 answer

Artem aliev · Answer 1 · 2016-12-06T08:10:44+0000

RDD.where () just calls the CQL predicate. CQL is limited for quick and easy OLTP queries. More complex queries can only be performed using SparkSQL. For your case, it might be something like this:

sqlContext.read.format("org.apache.spark.sql.cassandra")
    .options(Map( "table" -> "tableA", "keyspace"->"ks"))
    .load().registerTempTable("tableA")
sqlContext.read.format("org.apache.spark.sql.cassandra")
    .options(Map( "table" -> "tableB", "keyspace"->"ks"))
    .load().registerTempTable("tableB")
sqlContext.sql("select * from tableA join tableB on tableB.valueB > tableA.valueA").show

Spark cassandra: combine a table with a query condition based on an attribute from the primary RDD ("where tableA.myValue> tableB.myOtherValue")

More articles: