, transform, RDD RDD. , take, RDD:
sc: SparkContext = ...
author_counts_sorted_dstream.transform(
lambda rdd: sc.parallelize(rdd.take(5))
)
RDD.sortBy ( RDD), .
:
lambda foo: foo \
.sortBy(lambda x:x[0].lower()) \
.sortBy(lambda x:x[1], ascending=False)
. , Spark sort by shuffle . , , :
lambda x: (x[0].lower(), -x[1])