Grep all characters including newline

Question

I am parsing an XML file using

"lalala it a Sunday {{ Some words here, maybe a new line }} oh boy"

How can I use grep to get everything inside "{{" and "}}, given that the grep . Character does not recognize newlines?

I currently have

 grep '{{.*}}'

but it only works with things on the same line.

+6

Rio Feb 20 '11 at 17:28

4 answers

You can use alternation between mutually exclusive character sets to match any character. For example, this command:

 grep -E "\{\{([[:digit:]]|[^[:digit:]])+\}\}"

... will match all (greedily) between the first {{ and last }} .

But as @JesseCohen states, you really, really, really have to parse XML with an XML parser, not regexps .

+1

Phrogz Feb 20 '11 at 18:03

So I solved this problem

  grep '{{[\s\S]*}}'

0

Yuri Barbashov Jun 11 '11 at 2:02

This worked for me:

 grep -zo '[[:cntrl:][:print:]]'

0

Peter K Dec 7 '18 at 16:06

Jesse cohen · Accepted Answer · 2011-02-20T17:33:08+0000

One option is to remove the new line and then grep, as in:

  cat myfile | tr -d '\n' | grep {{.*}}

But if you say this is an XML file, why not use an XML parser that uses a property of the inline structure, and not just a regular expression?

EDIT

Grep regexp are greedy, you can use perl regexp:

 cat myfile | tr -d '\n' | perl -pe 's/.*?({{.*?}})/\1\n/g' | grep {{

This should output one match per line. If you have nested {{then it will be even more difficult.