Temporary arithmetic in bash

Say I have two log files ( input.log and output.log ) in the following format:

 2012-01-16T12:00:00 12345678 

The first field is the processing timestamp, and the second is a unique identifier. I am trying to find:

  • Entries from input.log that do not have a corresponding entry for this identifier in output.log
  • Entries from input.log that have an entry for this identifier, but the difference in timestamps exceeds 5 seconds.

I have a workaround with MySQL , but I would ideally like to remove the database component and process it using a shell script.

I have the following which returns input.log rows with an added column if output.log contains an identifier:

 join -a1 -j2 -o 0 1.1 2.1 <(sort -k2,2 input.log) <(sort -k2,2 output.log) 

Output Example:

 10111 2012-01-16T10:00:00 2012-01-16T10:00:04 11562 2012-01-16T11:00:00 2012-01-16T11:00:10 97554 2012-01-16T09:00:00 

The main question :

Now that I have this information, how can I calculate the differences between two timestamps and drop them for 5 seconds from each other? I ran into some problems handling the ISO 8601 timestamp using date (specifically T ) and suggested there should be a better way.

Secondary Question :

Is there a way to redo the whole approach, for example, into one awk script? My knowledge of processing several files and establishing the correct inequalities for the output conditions was the limiting factor here, so the approach is higher.

+4
source share
3 answers

If you have GNU awk , you can try something like this -

 gawk ' NR==FNR{a[$2]=$1;next} !($2 in a) {print $2,$1; next} ($2 in a) { "date +%s -d " $1 | getline var1; "date +%s -d " a[$2] | getline var2; var3 = var2 - var1; if (var3 > 4) print $2, $1, a[$2] }' output.log input.log 

Test:

 [jaypal:~/Temp] cat input.log 2012-01-16T09:00:00 9 2012-01-16T10:00:00 10 2012-01-16T11:00:00 11 [jaypal:~/Temp] cat output.log 2012-01-16T10:00:04 10 2012-01-16T11:00:10 11 2012-01-16T12:00:00 12 [jaypal:~/Temp] gawk ' NR==FNR{a[$2]=$1;next} !($2 in a) {print $2,$1; next} ($2 in a) {"date +%s -d " $1 | getline var1; "date +%s -d " a[$2] | getline var2;var3=var2-var1;if (var3>4) print $2,$1,a[$2] }' output.log input.log 9 2012-01-16T09:00:00 11 2012-01-16T11:00:00 2012-01-16T11:00:10 

Explanation:

  • NR==FNR{a[$2]=$1;next}

We start by storing the first field in the output.log file in an array indexed in the second field. We use next to prevent other pattern{action} statements from starting. Using NR==FNR allows you to completely clear the output.log file.

  • !($2 in a) {print $2,$1; next}

Once the output.log file is complete. Let's start with the input.log file. We check if there is any second field in the input.log file in our array (for example, in the output.log file). If found, we will print it. We continue this action until we print all of these fields.

  • ($2 in a) {"date +%s -d " $1 | getline var1; "date +%s -d " a[$2] | getline var2; var3=var2-var1; if (var3 > 4) print $2,$1,a[$2] }

In this we are looking for fields in which present in both files. When we find these fields, we need to enter our logic in order to calculate the difference. We use the system command to find the date. Now the system command prints to STDOUT by default, and we do not control them. Thus, we process the output and fix the output using the awk getline function and save it in a variable (var1 and var2). As soon as both dates are stored in a variable, we make the difference and save it in var3, if the found value of var3 is> 4, we print it in the desired format.

+4
source

Here is the solution I went with:

 cat input.log 2012-01-16T09:00:00 9 2012-01-16T10:00:00 10 2012-01-16T11:00:00 11 cat output.log 2012-01-16T10:00:04 10 2012-01-16T11:00:10 11 2012-01-16T12:00:00 12 sort -k2,2 input.log > input.sort sort -k2,2 output.log > output.sort join -a1 -j2 -o 0 1.1 2.1 input.sort output.sort | while read id io; do if [ -n "$o" ]; then ot=$(date +%s -d "${o/T/ }") it=$(date +%s -d "${i/T/ }") [[ $it+5 -lt $ot ]] && echo $id $i $o else echo $id $i fi done 11 2012-01-16T11:00:00 2012-01-16T11:00:10 9 2012-01-16T09:00:00 
+2
source
 t1=2012-01-16T10:00:00 t2=2012-01-16T10:00:04 echo $(($(date -d $t1 +%s)-$(date -d $t2 +%s))) -4 
-1
source

Source: https://habr.com/ru/post/1391353/


All Articles