Using a lot of Select statements or Select statements on Spark DataFrames, I wonder what their effect is on subsequent transformations after triggering.
For a data frame dfwith 10 columns from a to j.
How does it affect if I use asto rename columns in each column?
df.select (df ("a"). as ("1"), ..., df ("j"). as ("10"))
What if I select a subset (e.g. 5 columns)
val df2 = df.select (df ("a"), ..., df ("e"))
b. How does this projection handle? Is it preserved df(since it df2is a projection), therefore it dfcan serve as a kind of reference? Or df2is fresher created instead and dfdiscarded? (neglecting any perseverance here)
How does the general expression Columnused in select?
Are performance tests available for the above cases? Are performance measurements generally available? If not, what is the best way to measure performance?
source
share