Grep all characters including newline

I am parsing an XML file using

"lalala it a Sunday {{ Some words here, maybe a new line }} oh boy" 

How can I use grep to get everything inside "{{" and "}}, given that the grep . Character does not recognize newlines?

I currently have

 grep '{{.*}}' 

but it only works with things on the same line.

+6
source share
4 answers

One option is to remove the new line and then grep, as in:

  cat myfile | tr -d '\n' | grep {{.*}} 

But if you say this is an XML file, why not use an XML parser that uses a property of the inline structure, and not just a regular expression?

EDIT

Grep regexp are greedy, you can use perl regexp:

 cat myfile | tr -d '\n' | perl -pe 's/.*?({{.*?}})/\1\n/g' | grep {{ 

This should output one match per line. If you have nested {{then it will be even more difficult.

+8
source

You can use alternation between mutually exclusive character sets to match any character. For example, this command:

 grep -E "\{\{([[:digit:]]|[^[:digit:]])+\}\}" 

... will match all (greedily) between the first {{ and last }} .

But as @JesseCohen states, you really, really, really have to parse XML with an XML parser, not regexps .

+1
source

So I solved this problem

  grep '{{[\s\S]*}}' 
0
source

This worked for me:

 grep -zo '[[:cntrl:][:print:]]' 
0
source

Source: https://habr.com/ru/post/1340536/


All Articles