How to reformat Spark Python output

(u'142578', (u'The-North-side-9890', (u'   12457896', 45.0)))
(u'124578', (u'The-West-side-9091', (u'   14578217', 0.0)))

This I got from joining two identifier-based RDDs, it looks like (key, (value_left, value_right)) using this Spark Join.

so I want to have a way out like

The-North-side-9890,12457896,45.0
The-West-side-9091,14578217,0.0

for this i will try with the following code

from pyspark import SparkContext
sc = SparkContext("local", "info")

file1 = sc.textFile('/home/hduser/join/part-00000').map(lambda line: line.split(','))
result = file1.map(lambda x: (x[1]+', '+x[2],float(x[3][:-3]))).reduceByKey(lambda a,b:a+b)
result = result.map(lambda x:x[0]+','+str(x[1]))
result = result.map(lambda x: x.lstrip('[(').rstrip(')]')).coalesce(1).saveAsTextFile("hdfs://localhost:9000/finalop")

but giving me the following conclusion

(u'The-North-side-9896',  (u'   12457896',0.0
(u'The-East-side-9876',  (u'  47125479',0.0

so I want to clean it, how can I do it

help me achieve this.

+4
source share
2 answers

to get from here:

(u'142578', (u'The-North-side-9890', (u' 12457896', 45.0)))

:

The-North-side-9890,12457896,45.0

you need to use:

result = result.map(lambda (k, (s, (n1, n2))): ','.join([s, str(int(n1)), str(float(n2))]))
+1
source

try it

def rdd2string(t):
    def rdd2StringHelper(x):
        s = ''  
        if isinstance(x, collections.Iterable):
            for elem in x:
                s = s+str(rdd2StringHelper(elem))
            return s
        else:
            return str(x)+','

    return rdd2StringHelper(t)[:-1]

yourRDD.map(lambda x: rdd2string(x)).saveAsTextFile(...)

, (tuple2, tuple3, tuple21 ..) ( , , ..) CSV.

(), [], PyPpark [duplicate]

import collections

+3

Source: https://habr.com/ru/post/1620099/


All Articles