Which registrar should I use to receive my data in the cloud protocol

Question

Which registrar should I use to receive my data in the cloud protocol

I am running PySpark using Cloud Dataproc and want to log information using the loggingPython module . The goal is then to paste these logs into Cloud Logging.

From this question , I learned that I can achieve this by adding a log file to the fluentd configuration located at /etc/google-fluentd/google-fluentd.conf.

However, when I look at the log files in /var/log, I cannot find the files containing my logs. I tried using the default python logger and the "py4j" logger.

logger = logging.getLogger()
logger = logging.getLogger('py4j')

Can someone shed some light on which logger I should use and which file should be added to the fluentd configuration?

thanks

+4

apache-spark pyspark google-cloud-dataproc google-cloud-logging

Gilles jacobs Dec 15 '15 at 10:29

source share

1 answer

Lauren · Answer 1 · 2015-12-16T17:56:55+0000

TL; dg

This is not currently supported, but will be supported in a future version of Cloud Dataproc. However, there is a workaround for management.

Bypass

First make sure that you send python logs to the correct log4j logger from the spark context. To do this, declare the registrar as follows:

import pyspark
sc = pyspark.SparkContext()
logger = sc._jvm.org.apache.log4j.Logger.getLogger(__name__)

The second part includes a workaround that is not yet supported. If you look at the spark properties file in

/etc/spark/conf/log4j.properties

on your main cluster, you can see how log4j is configured to spark. Currently, it looks like this:

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n

# Settings to quiet third party logs that are too verbose
...

, , log4j . dataproc . , fluentd Google Cloud Logging, log4j . log4j :

# Set everything to be logged to the console and a file
log4j.rootCategory=INFO, console, file
# Set up console appender.
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n

# Set up file appender.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/var/log/spark/spark-log4j.log
log4j.appender.file.MaxFileSize=512KB
log4j.appender.file.MaxBackupIndex=3
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.conversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n

# Settings to quiet third party logs that are too verbose
...

/var/log/spark/spark -log4j.log, , fluentd Dataproc . - , , fluentd, .

Which registrar should I use to receive my data in the cloud protocol

More articles: