How can I use bash (grep / sed / etc) to grab a section of a log file between two timestamps?

I have a set of mail logs: mail.log mail.log.0 mail.log.1.gz mail.log.2.gz

each of these files contains chronologically sorted lines starting with a timestamp, for example:

May 3, 13:21:12 ...

How can I easily capture every log entry after a specific date / time and before another date / time using bash (and its associated command line tools) without comparing each individual line? Keep in mind that my before and after dates may not match any words in the log files.

It seems to me that I need to determine the offset of the first line is greater than the initial timestamp, and the offset of the last line is less than the final timestamp, and somehow cut off this part.

+3
source share
5 answers

Here is one basic idea on how to do this:

  • Examine the datestamp in the file to see if it matters
  • If this may be relevant, unzip it, if necessary, and check the first and last lines of the file to see if it contains a start or end time.
  • , , , . , , 20 .
  • echo logfile (s) ( )

, n- ( n + ** n | head 1 **?)

?

+1

/ " ",

MIN=`date --date="$1" +%s`
MAX=`date --date="$2" +%s`

n ,

L_DATE=`echo $LINE | awk '{print $1 $2 ... $n}'`
L_DATE=`date --date="$L_DATE" +%s`

, MIN,

if (( $MIN > $L_DATE )) ; then continue ; fi

, MAX,

if (( $L_DATE <= $MAX )) ; then echo $LINE ; fi

, MAX.

if (( $L_DATE > $MAX )) ; then exit 0 ; fi

script minmaxlog.sh :

#!/usr/bin/env bash

MIN=`date --date="$1" +%s`
MAX=`date --date="$2" +%s`

while true ; do
    read LINE
    if [ "$LINE" = "" ] ; then break ; fi

    L_DATE=`echo $LINE | awk '{print $1 " " $2 " " $3 " " $4}'`
    L_DATE=`date --date="$L_DATE" +%s`

    if (( $MIN > $L_DATE  )) ; then continue ; fi
    if (( $L_DATE <= $MAX )) ; then echo $LINE ; fi
    if (( $L_DATE >  $MAX )) ; then break ; fi
done

minmaxlog.input,

May 5 12:23:45 2009 first line
May 6 12:23:45 2009 second line
May 7 12:23:45 2009 third line
May 9 12:23:45 2009 fourth line
June 1 12:23:45 2009 fifth line
June 3 12:23:45 2009 sixth line

,

./minmaxlog.sh "May 6" "May 8" < minmaxlog.input
+5

( , ), , . ( , /).

:

state = preprint
for every line in file:
    if line.date >= startdate:
        state = print
    if line.date > enddate:
        exit for loop
    if state == print:
        print line

awk, Perl, Python, COBOL, , .

(, grep), , grep ( , , , , ).

, , " , " " , ". , /.

, , , . - , , , :

2009/
  01/
    01/
      0000.log
      0100.log
      : :
      2300.log
    02/
    : :

, . 2009/01/01-15:22 2009/01/05-09:07 :

  • ( ) 2009/01/01/1500.txt.
  • 2009/01/01/1[6-9]*.txt.
  • 2009/01/01/2*.txt.
  • 2009/01/0[2-4]/*.txt.
  • 2009/01/05/0[0-8]*.txt.
  • ( ) 2009/01/05/0900.txt.

, script, , .

+1

, :

sed -n "/BEGIN_DATE/,/END_DATE/p" logfile
+1

Bash, , . , Ruby, , . Unix ( , ).

irb> require 'time'
# => true

irb> Time.parse("May 3 13:21:12").to_i
# => 1241371272  

Ruby script:

  • . Unix.
  • , Unix , .

. Unix , .

" ". "", , . , , , , , ( , ), , . , .


. :

If you are really concerned about finding the start and end records efficiently, you can do a binary search for everyone. Or, if it seems redundant or too complicated with Bash tools, you can get a heuristic for reading only 5% of the lines (1 out of every 20) to quickly find the exact answer and then clarify what you want. These are just some suggestions for improving performance.

0
source

Source: https://habr.com/ru/post/1707734/


All Articles