I am new to sparks and I am trying to make a separate (). count () based on some fields in the csv file.
Csv structure (no title):
id,country,type 01,AU,s1 02,AU,s2 03,GR,s2 03,GR,s2
To download .csv I typed:
lines = sc.textFile("test.txt")
then a separate counter on lines returned 3, as expected:
lines.distinct().count()
But I have no idea how to make a reporting account based on let say id and country .
source share