Retrieving Values ​​from Federated RDDs

I have two lists:

Hourly_Sports,DEF (show,channel)
Hourly_Sports,21  (show,views)

I split the lines and rebuilt them using code:

def split_show_views(line):
    show,views=line.split(',')
    return (show, views)   
show_views = show_views_file.map(split_show_views)
def split_show_channel(line):
     show,channel=line.split(',')
     return (show, channel)
show_channel = show_channel_file.map(split_show_channel)
     joined_dataset = show_views.join(show_channel)

Now when I call collect, the list looks like this:

(u'Baked_Talking', (u'MAN', u'138'))

and now I only want to "channel" and "view part", Instructions:

def extract_channel_views(show_views_channel): 
    <INSERT_CODE_HERE>
    return (channel, views)

It seems that the merged list is made up of separated lines, so that I cannot use the split function again, and I checked with the python built-in functions, but did not find any extraction function? It seems to me that the “channel” and “opinions” are defined in the previous steps, so I don’t need to add anything? If this is not the case, how can I define the channel and views? I tried something like show,channel,views=split('',('','')), I don’t think it’s right, but I really don’t know how to do it.

+4
1

, , values:

joined_dataset.values()

, , , , RDD - Python tuples. , getitem:

def extract_channel_views(show_views_channel):
    return show_views_channel[1]

:

def extract_channel_views(show_views_channel):
    _, (channel, views) = show_views_channel
    return channel, views
+3

Source: https://habr.com/ru/post/1621749/


All Articles