Let's pretend that
rdd1 = ( (a, 1), (a, 2), (b, 1) ), rdd2 = ( (a, ?), (a, *), (c, .) ).
Want to create
( (a, (1, ?)), (a, (1, *)), (a, (2, ?)), (a, (2, *)) ).
Any simple methods? I think that it is different from the cross, but cannot find a good solution. My decision
(rdd1 .cartesian( rdd2 ) .filter( lambda (k, v): k[0]==v[0] ) .map( lambda (k, v): (k[0], (k[1], v[1])) ))
source share