you will have to re-cache the data every time every time you manipulate / modify the data frame. However, the entire data frame does not need to be recounted.
df = df.withColumn('c1', lit(0))
In the above statement, a new dataframe is created and reassigned to the df variable. But this time, only a new column is computed, and the rest is retrieved from the cache.
source share