Spark cassandra: combine a table with a query condition based on an attribute from the primary RDD ("where tableA.myValue> tableB.myOtherValue")
Is there a way to join two tables adding a condition for columns between two tables?
Example:
case class TableA(pkA: Int, valueA: Int)
case class TableB(pkB: Int, valueB: Int)
val rddA = sc.cassandraTable[TableA]("ks", "tableA")
rddA.joinWithCassandraTable[TableB]("ks", "tableB").where("tableB.valueB > tableA.valueA")
Is there a way to send a command where("tableB.valueB > tableA.valueA")? ("tableB.value" is a clustering column)
RDD.where () just calls the CQL predicate. CQL is limited for quick and easy OLTP queries. More complex queries can only be performed using SparkSQL. For your case, it might be something like this:
sqlContext.read.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "tableA", "keyspace"->"ks"))
.load().registerTempTable("tableA")
sqlContext.read.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "tableB", "keyspace"->"ks"))
.load().registerTempTable("tableB")
sqlContext.sql("select * from tableA join tableB on tableB.valueB > tableA.valueA").show