Spark 1.6 further according to the official doc we cannot add specific hive sections to the DataFrame
Till Spark 1.5 uses the following to work, and the data block will have an object column and data, as shown below -
DataFrame df = hiveContext.read().format("orc").load("path/to/table/entity=xyz")
However, this will not work in Spark 1.6.
If I give the base path as follows, it does not contain the column of the object I want in the DataFrame, as shown below -
DataFrame df = hiveContext.read().format("orc").load("path/to/table/")
How can I load a specific hive section in a dataframe? What was the driver behind removing this feature?
I believe it was effective. Is there an alternative to achieving this in Spark 1.6?
According to my understanding, Spark 1.6 loads all partitions, and if I filter for certain partitions, it is inefficient, it gets into memory and gives GC (Garbage Collection) errors due to the fact that thousands of partitions are loaded into memory, and not into specific section
Please guide. Thanks in advance.
source share