UNIX AWK script - memory exhausted

I have an input CSV file that looks like this:

123456,ABC,A,,,
123457,DEF,A,H,,
1234568,GHI,,H,,
111111,AAA,A,,,
12345699,XYZ,A,H,,

Now I have an AWK script containing below line with several IF conditions:

BEGIN { FS=","}
{ 
variable=$1.","$2;
if(variable ~ /^123456.+,ABC/) print "P," $0; else
if(variable ~ /^123457.+,DEF/) print "P," $0; else
if(variable ~ /^123458.+,GHI/) print "R," $0; else
if(variable ~ /^1234599.+,XYZ/) print "P," $0; else print "U" ","  $0;} 
END { }

After running this AWK script in my input file, I get the following output:

P,123456,ABC,A,,,
P,123457,DEF,A,H,,
R,1234568,GHI,,H,,
U,111111,AAA,A,,,
P,12345699,XYZ,A,H,,

Everything worked so far, but when I had to add additional IF conditions to this AWK script (about 3500), it throws an "exhausted memory" error:

awk: script.awk:1259: if(variable ~ /^123311.+,AB23/) print "P," $0; else
awk: script.awk:1259:                                              ^ memory exhausted

Now the interesting part: firstly, the exhaustion error is always present on line 1259, and secondly, when I delete the number of IF conditions after line 1259 (inclusive 1259), then the script runs smoothly again. Is there any limit to the number of IF conditions inside an AWK / GAWK script?

The AWK version I'm using:

GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.3, GNU MP 6.1.0)
+4
source share
3

, if GNU awk, if, , ( ):

$ cat rules   # put your logic here
P,123456,ABC
P,123457,DEF
R,1234568,GHI

:

$ awk '
BEGIN { FS=OFS="," }                       
NR==FNR {                                  # read in the rules file
    a[$2","$3]=$1                          # and hash it
    next
}
{                                          # read the input file
    print ($1","$2 in a?a[$1","$2]:"U"),$0 # read code from a hash and it or U if not found
}' rules input                             # mind the order
P,123456,ABC,A,,,
P,123457,DEF,A,H,,
R,1234568,GHI,,H,,
U,111111,AAA,A,,,
U,12345699,XYZ,A,H,,

Edit

GNU awk, $1 $2 2D-, - :

$ cat rules   # put your logic here, notice 1st and 3rd
P,123456,ABC
P,123457,DEF
R,123456,GHI

:

$ awk '
BEGIN { FS=OFS="," }
NR==FNR {
    a[$2][$3]=$1
    next
}
{
    p=substr($1,1,6)
    print (p in a && $2 in a[p] ? a[p][$2] : "U"),$0
}' rules input
P,123456,ABC,A,,,    # matches 1st record in rules file
P,123457,DEF,A,H,,   # 2nd
R,1234568,GHI,,H,,   # 3 rd
U,111111,AAA,A,,,    # no match
U,12345699,XYZ,A,H,, # 123456 would match but XYZ wont
+2

, if , if-else, .

, , :

BEGIN { FS=OFS=","}
{ variable = $1 "." FS $2 }
variable ~ /^123456.+,ABC/  { print "P", $0; next }
variable ~ /^123457.+,DEF/  { print "P", $0; next }
variable ~ /^123458.+,GHI/  { print "R", $0; next }
variable ~ /^1234599.+,XYZ/ { print "P", $0; next }
{ print "U",  $0 } 

, .

- - script, :

BEGIN { FS=OFS=","}
{ variable = $1 "." FS $2; f=0 }
!f && variable ~ /^123456.+,ABC/  { print "P", $0; f=1 }
!f && variable ~ /^123457.+,DEF/  { print "P", $0; f=1 }
!f && variable ~ /^123458.+,GHI/  { print "R", $0; f=1 }
!f && variable ~ /^1234599.+,XYZ/ { print "P", $0; f=1 }
!f { print "U",  $0 } 

- else s.

, , , , , , , , , .

+1

Try the following:

awk -F',' '{if($1$2 ~ /^123456+ABC/ || $1$2 ~ /^123457+DEF/ || $1$2 ~ /^12345699+XYZ/ || $1$2 ~ /^123311+AB23/){print "P," $0;} else if($1$2 ~ /^1234568+GHI/){print "R," $0;} else{ print "U" ","  $0}}' file
-1
source

Source: https://habr.com/ru/post/1686341/


All Articles