TL; dg
This is not currently supported, but will be supported in a future version of Cloud Dataproc. However, there is a workaround for management.
Bypass
First make sure that you send python logs to the correct log4j logger from the spark context. To do this, declare the registrar as follows:
import pyspark
sc = pyspark.SparkContext()
logger = sc._jvm.org.apache.log4j.Logger.getLogger(__name__)
The second part includes a workaround that is not yet supported. If you look at the spark properties file in
/etc/spark/conf/log4j.properties
on your main cluster, you can see how log4j is configured to spark. Currently, it looks like this:
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n
...
, , log4j . dataproc . , fluentd Google Cloud Logging, log4j . log4j :
log4j.rootCategory=INFO, console, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/var/log/spark/spark-log4j.log
log4j.appender.file.MaxFileSize=512KB
log4j.appender.file.MaxBackupIndex=3
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.conversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n
...
/var/log/spark/spark -log4j.log, , fluentd Dataproc . - , , fluentd, .