Gawk regex for sequence selection

Sorry for the simple question about regexp, but I can't get what I need without what seems like a complicated solution to me. I am parsing a file containing a sequence of three letters A, E, D, as in

AADDEEDDA

EEEEEEEE

AEEEDEEA

AEEEDDAAA

and I would like to identify only those that start with E and end in D with just one change in sequence, for example, in

EDDDDDDDD

EEEDDDDDD

EEEEEEEED

I fight the right regex to do this. Here is my last attempt

echo "1,AAEDDEED,1\n2,EEEEDDDD,2\n3,EDEDEDED" | gawk -F, '{if($2 ~ /^E[(ED){1,1}]*D$/ && $2 !~ /^E[(ED){2,}]*D$/) print $0}'

which does not work. Any help?

Thanks in advance.

+4
source share
3 answers

If I understand your request correctly, just

awk '/^E+D+$/' file.input

will do the trick.

UPDATE: / ( -), , ( -F,):

awk '/^[0-9]+,E+D+(,[0-9]+)?$/' input.test
+5

:

^E+[^ED]*D+$

E , , E D , D .

AWK

$2 ~ /^E+[^ED]*D+$/

$2 , ~ , / . , AWK "", . , "" ( { s). , , AWK , { print $0 }, .

+2

, , E, D .

echo "1,AAEDDEED,1\n2,EEEEDDDD,2\n3,EDEDEDED" | gawk -F, '{if($2 ~ /^E+D+$) print $0}'
+1

Source: https://habr.com/ru/post/1615473/


All Articles