I'm struggling to find a way to parse multiline logs using spark streaming. I created a parser that takes an array of strings as an input parameter. When a multi-line stack is found, it contours on each line until it reaches the “normal” line before processing it.
Magazines are introduced through Flume on kafka and are accepted through KafkaUtils.createDirectStream.
When it comes to spark flow, stack stacks can be cut in the middle of more than 2 (or more) distributed RDDs. I would be very lucky if this did not happen ...
My question is: what can I do to restore a stack that is cut before processing them?
Should I pre-process RDD and create new ones that contain exactly what I'm waiting for? Do I have to restore stop traces through the global buffer? Should I somehow play with offsets? How exactly?
Any ideas are welcome.
Thanx,
- microphone
mikey source
share