Regular expression must match anything inside p tags

I need a regular expression to match all the <p> tags, for example if I had text:

 <p>Hello world</p> 

The regex will match part of the Hello world.

+4
source share
5 answers

in javascript:

 var str = "<p>Hello world</p>"; str.search(/<\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/) 

in php:

 $str = "<p>Hello world</p>"; preg_match_all("/<\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/", $str); 

They will correspond to something more complicated than this.

 < p style= "font-weight: bold;" >Hello world < / p > 
+8
source

EDIT . Do not do that. Just don't do it.

See this question

If you insist, use <p>(.+?)</p> and the result will be in the first group. This is not ideal, but there will never be any regular expression to solve the HTML parsing problem.

For example (in python)

 >>> import re >>> r = re.compile('<p>(.+?)</p>') >>> r.findall("<p>fo o</p><p>ba adr</p>") ['fo o', 'ba adr'] 
+5
source

Regex:

 <([az][a-z0-9]*)\b[^>]*>(.*?)</\1> 

This will work for any pair of tags.

eg <p class="foo">hello<br/></p>

\ 1 ensures that the open tag matches the closing tag.

Content between tags is written to \ 2.

+1
source

It seems that the solutions proposed above will fail either:

  • return text in <p>...</p> tags if it contains other tags, such as <a> , <em> , etc., or
  • distinguish between <p> and <path> or
  • include tags with attributes such as <p class="content">

Consider using this regex:

<p(|\s+[^>]*)>(.*?)<\/p\s*>

The resulting text will be recorded in group 2.


Obviously, this solution will not work properly when the closing tag </p> for some reason wrapped in comment tags <p>... <!--... </p>... -->

0
source

You can use this in Python:

 import re your_variable = 'A html text that has <p> tags' result = your_variable.find_all('p') 
0
source

Source: https://habr.com/ru/post/1338221/


All Articles