How to (efficiently) follow / delay a file using Haskell, including detecting file rotation? (tail -F)

In essence, I want to know how to approach the implementation of the tail -F Linux command functionality in Haskell. My goal is to keep track of a log file, such as a web server log file, and compute various statistics in real time by analyzing the input when it arrives. Ideal without interruption if the log file is rotated using logrotate or a similar service.

I am a little at a loss on how to even approach the problem, and what I have to consider in terms of performance in the presence of lazy I / O. Will any of the streaming libraries be relevant here?

+6
source share
1 answer

This is a partial answer because it does not handle file truncation using logrotate . This avoids lazy I / O and uses bytestring , streaming , streaming-bytestring and hinotify .

Some pre-import deliveries:

 {-# language OverloadedStrings #-} module Main where import qualified Data.ByteString import Data.ByteString.Lazy.Internal (defaultChunkSize) import qualified Data.ByteString.Streaming as B import Streaming import qualified Streaming.Prelude as S import Control.Concurrent.QSem import System.INotify import System.IO (withFile,IOMode(ReadMode)) import System.Environment (getArgs) 

Here is the tailing function:

 tailing :: FilePath -> (B.ByteString IO () -> IO r) -> IO r tailing filepath continuation = withINotify $ \i -> do sem <- newQSem 1 addWatch i [Modify] filepath (\_ -> signalQSem sem) withFile filepath ReadMode (\h -> continuation (handleToStream sem h)) where handleToStream sem h = B.concat . Streaming.repeats $ do lift (waitQSem sem) readWithoutClosing h -- Can't use B.fromHandle here because annoyingly it closes handle on EOF -- instead of just returning, and this causes problems on new appends. readWithoutClosing h = do c <- lift (Data.ByteString.hGetSome h defaultChunkSize) if Data.ByteString.null c then return () else do B.chunk c readWithoutClosing h 

A file path is required - a callback that consumes streaming bytes.

The idea is that each time before reading from the descriptor to EOF, we reduce the semaphore, which increases only by the callback that is called when the file is modified.

We can test this function as follows:

 main :: IO () main = do filepath : _ <- getArgs tailing filepath B.stdout 
+4
source

Source: https://habr.com/ru/post/1013308/


All Articles