The way I understand it, in local or yarn-client modes ...
- Launcher checks if it needs Kerberos tokens for HDFS, YARN, Hive, HBase
> hive-site.xml is executed in CLASSPATH by the Hive / Hadoop client libraries (including in driver.extraClassPath , since the driver runs inside Launcher, and the integrated CLASSPATH is already built at this point) - The driver checks what kind of metastor is used for internal purposes : a stand-alone metastar supported by an unstable Derby instance, or a regular metastatic hive > that
$SPARK_CONF_DIR/hive-site.xml - When using the Hive interface, the Metastore connection is used to read / write Hive metadata in the driver
> hive-site.xml is executed in CLASSPATH using the Hive / Hadoop client libraries (and uses the Kerberos token, if any)
So, you can have one hive-site.xml , which states that Spark should use Derby's in-memory Derby built-in instance for use as a sandbox (in memory it means "stop leaving all these temporary files behind you"), and another hive-site.xml gives the actual Metastore Uive URI. And everything's good.
Now, in yarn-cluster mode, this whole mechanism pretty much explodes in an unpleasant undocumented mess.
Launcher needs its own CLASSPATH settings to create Kerberos tokens, otherwise it will fail. Better go to the source code to find out which undocumented Env variable you are using.
It may also be necessary to override some properties because hard-coded defaults are suddenly not defaults (silently).
The driver cannot use the original $SPARK_CONF_DIR , it must rely on what Launcher provided for download. Does this $SPARK_CONF_DIR/hive-site.xml a copy of $SPARK_CONF_DIR/hive-site.xml ? It doesn't seem like that. So you are probably using Derby as a stub.
And the driver must deal with what YARN imposed on the CLASSPATH container in any order.
In addition, driver.extraClassPath add-ons driver.extraClassPath NOT have priority by default; To do this, you need to force spark.yarn.user.classpath.first=true (which translates into the standard Hadoop property, the exact name of which I cannot remember right now, especially since there are several details with similar names that may be outdated and / or not work in Hadoop 2.x)
Think bad? Try connecting to Kerberized HBase in
yarn-cluster mode. The connection is performed in the Performers, this is another layer of muck. But I'm going out.
Bottom line: start the diagnostics again .
A. Are you really sure that the cryptic "Metastore connection errors" are caused by missing properties and, in particular, the Metastore ID?
B. By the way, do your users explicitly use HiveContext ???
C. What is the CLASSPATH that YARN represents for the JVM driver, and what is the CLASSPATH that the driver presents to Hadoop libraries when opening a Metastore connection?
D. If the CLASSPATH created by YARN is for some reason messed up, what would be the minimal fix - changing the priority rules? addition? as?