Sort Iterable Values ​​in Spark

Let's say I have this input:

["example.com", Date(2000, 1, 1)] : 100,
["example.com", Date(2000, 2, 1)]: 30,
["example.com", Date(2000, 3, 1)]: 5, 
["xyz.com", Date(2000, 1, 1)]: 20,
["xyz.com", Date(2000, 2, 1)]: 10,
["xyz.com", Date(2000, 3, 1)]: 60]

I want to group by date (descending), and then sort by account, giving me an ordered list of domains by date.

I want to end up with:

Date(2000, 1, 1), [["example.com", 100], ["xyz.com", 20]]
Date(2000, 2, 1), [["example.com", 30], ["xyz.com", 10]]
Date(2000, 3, 1), [["xyz.com", 60], ["example.com", 5]]

This seems to be a common use case, but I see no way to do this from the programming guide.

I can map [[domain, date] count] -> [date, [domain, count]]

which would give me couples (K, V)

Date(2000, 1, 1), ["example.com", 100],
Date(2000, 2, 1), ["example.com", 30],
Date(2000, 3, 1), ["example.com", 5], 
Date(2000, 1, 1), ["xyz.com", 20],
Date(2000, 2, 1), ["xyz.com", 10],
Date(2000, 3, 1), ["xyz.com", 60]

then groupByKeygiving me couples(K, Iterable<V>)

[Date(2000, 1, 1), [["example.com", 100], ["xyz.com", 20]]
[Date(2000, 2, 1), [["example.com", 30], ["xyz.com", 10]]
[Date(2000, 3, 1), [["example.com", 5], ["xyz.com", 60]]

How can I sort by keys?

Sorry for the pseudocode, I'm using the Flambo Clojure shell and I don't want to rewrite it in Java to ask this question!

EDIT: Each Iterable (i.e. a list of domains) is likely to be too large to fit in memory.

EDIT2: psuedocode. , , .

+4
1

. ( , 100% , , .) , RDD[((String,String),Int)].

-, groupBy - :

.groupBy { case ((_, month), _) => month }

:

.mapValues(_.map { case ((domain, _), count) => (domain, count) })

, :

def monthOfYear(month: String): Int = 
  month match {
     case "January" => 1
     case "February" => 2
     ...
  }

RDD :

.sortBy { case (month, _) => monthOfYear(month) }

:

.mapValues(_.toSeq.sortBy{ case (domain, count) => count }(Ordering[Int].reverse))

, , .

:

.sortBy(p => p._2, false)

. , , , , .

+4

Source: https://habr.com/ru/post/1570333/


All Articles