Regular expression hierarchical expression

Is it possible / practical to build one regular expression that matches hierarchical data?

For instance:

<h1>Action</h1>
  <h2>Title1</h2><div>data1</div>
  <h2>Title2</h2><div>data2</div>
<h1>Adventure</h1>
  <h2>Title3</h2><div>data3</div>

I would like to get a match.

"Action", "Title1", "data1"
"Action", "Title2", "data2"
"Adventure", "Title3", "data3"

As I see it, this will require knowledge that there is a hierarchical structure, and if I encode a template for capturing H1, it will correspond only to the first record of this hierarchy. If I am not a code for H1, then I cannot capture it. I wonder if there are any special tricks that I use to solve this problem.

This is a .NET project.

+3
source share
3 answers

The solution is to not use regular expressions. They are not strong enough for this kind of thing.

, , - - , HTML, .

+5

HTML/XML RegEx, , . , XML. , , , .

EDIT: Regex , HTML; , XML/DOM, , , , , .

, , / , . , .

:

- In a recursive function, seek out a "<" character.
- Now find a ">" character.
- Preserve everything you find until the next "<" character.
- Find a ">" character.
- Pass whatever you found between those tags into the recursive function.

, ( ) - , .

, , , . .

+2

Regex . , .

XML-.

0

Source: https://habr.com/ru/post/1732275/


All Articles