Multiple multi-line regular expressions in Bash

Question

Multiple multi-line regular expressions in Bash

I am trying to make a fairly simple string analysis in a bash script. Essentially, I have a file that consists of several multi-line fields. Each field is surrounded by a well-known header and footer.

I want to extract each field separately into an array or similar, like this

>FILE='cat file'
>REGEX="@#@#@#[\s\S]+?@#@#@"
> 
>if [[$FILE =~ $REGEX ]] then
>   echo $BASH_REMATCH
>fi

FILE:

@#@#@#################################
this is field one
@#@#@#
@#@#@#################################
this is field two
they can be any number of lines
@#@#@#

Now I'm sure the problem is that bash does not match newlines with "."

I can match this with "pcregrep -M", but of course the whole file will match. Can I get one match at a time from pcregrep?

I do not mind using embedded Perl or similar.

+3

bash regex

prestomation Jan 22 '10 at 15:06

source share

4 answers

TXR , ( dump -B" ) , eval -ed. .

@ , .

$ cat fields.txr
@(collect)
@@#@@#@@#################################
@  (collect)
@field
@  (until)
@@#@@#@@#
@  (end)
@  (cat field)@# <- catenate the fields together with a space separator by default
@(end)

$ txr -B fields.txr data
field[0]="this is field one"
field[1]="this is field two they can be any number of lines"

$ eval $(txr -B fields.txr data)
$ echo ${field[0]}
this is field one
$ echo ${field[1]}
this is field two they can be any number of lines

@field . , @(collect), , @(collect). @(cat field) , .

" TXR": , :

- ?

, , , .

+1

Kaz 07 . '14 1:01

I would build something around awk. Here is the first proof of concept:

awk '
    BEGIN{ f=0; fi="" }
    /^@#@#@#################################$/{ f=1 }
    /^@#@#@#$/{ f=0; print"Field:"fi; fi="" }
    { if(f==2)fi=fi"-"$0; if(f==1)f++ }
' file

0

mouviciel Jan 22 '10 at 16:06

source share

begin="@#@#@#################################"
end="@#@#@#"
i=0
flag=0

while read -r line
do
    case $line in
        $begin)
            flag=1;;
        $end)
            ((i++))
            flag=0;;
        *)
            if [[ $flag == 1 ]]
            then
                array[i]+="$line"$'\n'    # retain the newline
            fi;;
     esac
done < datafile

If you want to save marker lines in array elements, move the assignment operator (with its flag test) to the beginning of the whileuntil loop case.

0

Dennis williamson Jan 22 '10 at 16:06

source share

ghostdog74 · Accepted Answer · 2010-01-22T16:09:27+0000

if you have gawk

awk 'BEGIN{ RS="@#*#" }
NF{
    gsub("\n"," ") #remove this is you want to retain new lines
    print "-->"$0 
    # put to array
    arr[++d]=$0
} ' file

Output

$ ./shell.sh
--> this is field one
--> this is field two they can be any number of lines

Multiple multi-line regular expressions in Bash

More articles: