data=
"""
user date item1 item2
1 2015-12-01 14 5.6
1 2015-12-01 10 0.6
1 2015-12-02 8 9.4
1 2015-12-02 90 1.3
2 2015-12-01 30 0.3
2 2015-12-01 89 1.2
2 2015-12-30 70 1.9
2 2015-12-31 20 2.5
3 2015-12-01 19 9.3
3 2015-12-01 40 2.3
3 2015-12-02 13 1.4
3 2015-12-02 50 1.0
3 2015-12-02 19 7.8
"""
If I have some data as above, how can I get every record of the last user? I tried to use groupByKey but have no idea.
val user = data.map{
case(user,date,item1,item2)=>((user,date),Array(item1,item2))
}.groupByKey()
and then I don’t know how to deal with it. Can someone give me some advice? Many thanks:)
update:
I changed my data, and now the user has several records for the last day, and I want to get all of them. thanks:)
second update:
I want to get the result:
user1 (2015-12-02,Array(8,9.4),Array(90,1.3))
user2 (2015-12-31,Array(20,2.5))
user3 (2015-12-02,Array(13,1.4),Array(50,1.0),Array(19,7,8))
and now I am writing code:
val data2=data.trim.split("\\n").map(_split("\\s+")).map{
f=>{(f(0),ArrayBuffer(
f(1),
f(2).toInt,
f(3).toDouble)
)}
}
val data3 = sc.parallelize(data2)
data3.reduceByKey((x,y)=>
if(x(0).toString.compareTo(y(0).toString)>=0) x++=y
else y).foreach(println)
result:
(2,ArrayBuffer(2015-12-31, 20, 2.5))
(1,ArrayBuffer(2015-12-02, 8, 9.4, 2015-12-02, 90, 1.3))
(3,ArrayBuffer(2015-12-02, 13, 1.4, 2015-12-02, 50, 1.0, 2015-12-02, 19, 7.8))
Is there anything to improve it? :)