I have a dataset that contains strings in the format (tab is split):
Title<\t>Text
Now for every word in Text, I want to create a pair (Word,Title). For instance:
ABC Hello World
gives me
(Hello, ABC)
(World, ABC)
Using Scala, I wrote the following:
val file = sc.textFile("s3n://file.txt")
val title = file.map(line => line.split("\t")(0))
val wordtitle = file.map(line => (line.split("\t")(1).split(" ").map(word => (word, line.split("\t")(0)))))
But this gives me the following result:
[Lscala.Tuple2;@2204b589
[Lscala.Tuple2;@632a46d1
[Lscala.Tuple2;@6c8f7633
[Lscala.Tuple2;@3e9945f3
[Lscala.Tuple2;@40bf74a0
[Lscala.Tuple2;@5981d595
[Lscala.Tuple2;@5aed571b
[Lscala.Tuple2;@13f1dc40
[Lscala.Tuple2;@6bb2f7fa
[Lscala.Tuple2;@32b67553
[Lscala.Tuple2;@68d0b627
[Lscala.Tuple2;@8493285
How can i solve this?
Further reading
What I want to achieve is to calculate the amount Wordsthat is found in Textfor a specific one Title.
The following code I wrote:
val wordcountperfile = file.map(line => (line.split("\t")(1).split(" ").flatMap(word => word), line.split("\t")(0))).map(word => (word, 1)).reduceByKey(_ + _)
But that will not work. Please feel free to submit your materials on this subject. Thank!
source
share