How to clear spark history events log with disabling spark flow

We have an application for sparking, which is a long-term task. The event log is indicated at hdfs location hdfs: // spark-history, the application_XXX.inprogress file is created in it when we run the streaming application and the file size grows up to 70 GB. To delete the log file, we stop the sparking application and clean it. Is there a way to automate this process with shutting down or restarting the application. We set spark.history.fs.cleaner.enabled = true with a cleaning interval of 1 day and max. Age as 2 days. however, it does not clear the .inprogress file. we are using spark version 1.6.2. We launch a spark on yarn and deploy in cluster mode.

+5
source share
1 answer

In this problem you need to make some changes to several configurations, you need to add some changes to your yarn-default.xml file. In this file you need to change this line or add this line:

 yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds=3600 

This modification will bring your files to you, it will allow you to see data through yarn logs -applicationId YOUR_APP_ID

This is the first step. You can see a little about it here .

Seccond step you need to create the log4j-driver.property and log4j-executor.property file

In this file you can use this example:

 log4j.rootLogger=INFO, rolling log4j.appender.rolling=org.apache.log4j.RollingFileAppender log4j.appender.rolling.layout=org.apache.log4j.PatternLayout log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n log4j.appender.rolling.maxFileSize=50MB log4j.appender.rolling.maxBackupIndex=5 log4j.appender.rolling.file=/var/log/spark/${dm.logging.name}.log log4j.appender.rolling.encoding=UTF-8 log4j.logger.org.apache.spark=WARN log4j.logger.org.eclipse.jetty=WARN log4j.logger.com.anjuke.dm=${dm.logging.level} 

What are these lines?

This guy: log4j.appender.rolling.maxFileSize=50MB will only create 50 MB files. When the log file reaches 50 MB, it will be closed and a new one will start.

Another important line: log4j.appender.rolling.maxBackupIndex=5 This means that you will have a backup history of 5 files of 50 MB in size. Over time, this will be deleted when new files begin to be displayed.

After creating this log file, you need to send it using the spark-submit command:

 spark-submit --master spark://127.0.0.1:7077 --driver-java-options "-Dlog4j.configuration=file:/path/to/log4j-driver.properties -Ddm.logging.level=DEBUG" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/path/to/log4j-executor.properties -Ddm.logging.name=myapp -Ddm.logging.level=DEBUG" ... 

You can create a log file for your Driver and for your Workers. As a team, I use two different files, but you can use them. For more details you can see here .

0
source

Source: https://habr.com/ru/post/1265425/


All Articles