How to analyze log lines using Spark, which can span multiple lines

I am developing a Spark / Scala application that can read and parse its own log file. I am having trouble parsing multi-line log entries. Here is a snippet of my code:

case class MLog(dateTime: String, classification: String, serverType: String, identification:String, operation: String)
val PATTERN = """(?s)(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3})\s+(\w+)s+\[(.*)\]\s+\[(.*)\]\s+(.*)"""


def parseLogLine(log: String): MLog={
     val res = PATTERN.findFirstMatchIn(log)
     if (res.isEmpty) {
     throw new RuntimeException("Cannot parse log line: " + log)

     MLog(m.group(1),m.group(2),m.group(3),m.group(4),m.group(5))
}

sc.textFile("/mydirectory/logfile").map(parseLogLine).foreach(println)

Some entries in the log file span multiple lines. A regex works fine for single line records, but when a multi-line record is read as shown below,

2015-08-31 00:10:17,682 WARN  [ScheduledTask-10] [name=custname;mid=9999;ds=anyvalue;] datasource - Scheduled DataSource import failed.                 
com.xxx.common.service.ServiceException: system failure: Unable to connect to ANY server: LdapDataSource{id=xxx, type=xxx, enabled=true, name=xxx, host=xxx port=999, connectionType=ssl, username=xxx, folderId=99999}

I get this error:

Unable to parse log line: com.xxx.common.service.ServiceException: system crash: unable to connect to ANY server: LdapDataSource {id = xxx, type = xxx, enabled = true, name = xxx, host = xxx port = 999, connectionType = ssl, username = xxx, folderId = 99999}

Spark ?

+4
1

, SparkContext.wholeTextFiles.

// Parse a single file and return all extracted entries
def parseLogFile(log: String): Iterator[MLog] = {
    val p: scala.util.matching.Regex = ???
    p.findAllMatchIn(log).map(
        m => MLog(m.group(1), m.group(2), m.group(3), m.group(4), m.group(5))
    )
}

val rdd: RDD[MLog] = sc
   .wholeTextFiles("/path/to/input/dir")
   .flatMap{case (_, txt) => parseLogFile(txt)}
+2

Source: https://habr.com/ru/post/1606016/


All Articles