Using Awk to process a file, where each record has different fixed-width fields

I have some data files from an old system that I would like to process using Awk. Each file consists of a list of entries. There are several different types of records, and each type of record has a different set of fields of fixed width (the field separator symbol does not exist). The first two characters of the record indicate the type, from this you then know which fields should follow. A file might look something like this:

AAField1Field2LongerField3
BBField4Field5Field6VeryVeryLongField7Field8
CCField99

Using Gawk, I can set FIELDWIDTHS , but this applies to the whole file (unless I missed some way to set it to write by write), or I can set FS to "" and process the file one character at a time, but it's a little cumbersome.

Is there any way to extract fields from such a file using awk?

Change . Yes, I could use Perl (or something else). I still want to know if there is a reasonable way to do this with Awk.

+3
source share
6 answers

, . , "CC", , if-then. , 1,5 7 , awk script .

BEGIN {
        field1=""
        field5=""
        field7=""
}
{
    record_type = substr($0,1,2)
    if (record_type == "AA")
    {
        field1=substr($0,3,6)
    }
    else if (record_type == "BB")
    {
        field5=substr($0,9,6)
        field7=substr($0,21,18)
    }
    else if (record_type == "CC")
    {
        print field1"|"field5"|"field7
    }
}

awk script, program.awk, . script, :

awk -f program.awk < my_multi_line_file.txt 
+8

, :

1step.awk

/^AA/{printf "2 6 6 12"    }
/^BB/{printf "2 6 6 6 18 6"}
/^CC/{printf "2 8"         }
{printf "\n%s\n", $0}

2step.awk

NR%2 == 1 {FIELDWIDTHS=$0}
NR%2 == 0 {print $2}

awk -f 1step.awk sample  | awk -f 2step.awk
+5

, (, , ) awk :

awk '/^AA/ { manually process record AA out of $0 }
     /^BB/ { manually process record BB out of $0 }
     /^CC/ { manually process record CC out of $0 }' file ...

- , substr , , , .

, Perl unpack, awk , .

+4

Perl, ?

+3

, perl ruby.

0

What about 2 scripts? For example. The 1st script inserts field separators based on the first characters, then the second should handle it?

Or, first of all, define some function in the AWK script that breaks the lines into variables based on input - I would go this way for possible reuse.

0
source

Source: https://habr.com/ru/post/1717134/


All Articles