I have two rdds:
rdd1 = sc.parallelize([("www.page1.html", "word1"), ("www.page2.html", "word1"),
("www.page1.html", "word3")])
rdd2 = sc.parallelize([("www.page1.html", 7.3), ("www.page2.html", 1.25),
("www.page3.html", 5.41)])
intersection_rdd = rdd1.keys().intersection(rdd2.keys())
// When I do this, I only get the key intersection ie (www.page1.html, www.page2.html).
But I need keys along with two values from two rdds. The result should look like this:
[www.page1.html, (word1, word3, 7.3)]
[www.page2.html, (word1, 1.25)]
source
share