TL DR Cannot efficiently split Graphframe .
Graphframe algorithms can be divided into two categories:
Ways to delegate processing to GraphX . GraphX ββsupports a number of split methods, but they are not displayed through the Graphframe API. If you use one of them, it might be better to use GraphX directly.
Unfortunately, the development of GraphX almost completely stopped with only minor corrections over the past two years, and the overall performance is very disappointing in comparison with both internal libraries and non-corporate libraries .
Methods that are initially implemented using Spark Datasets , which consider a limited programming model and only one split mode, are extremely unsuitable for complex graph processing.
While relational column storage can be used to efficiently process the graph, the naive iterative join approach used by Graphframes just does not scale (but this is normal for a shallow intersection with one or two flights).
You can try DataFrames vertices and edges DataFrames into id and src respectively:
val nPart: Int = ??? GraphFrame(v.repartition(nPart, v("id")), e.repartition(e(nPart, "src")))
which should help in some cases.
In general, in the current state (December 2016), Spark is not a good choice for intensive graph analytics.
source share