Regex, php and evil nested (? R)

UPDATE

So, I'm still arguing with this, and got to the point of finding all the tag instances, although I would prefer to just find the most complex instance, as life would be easier. Anyway, that’s what I am ..

/(({{)(?:(?=([^\/][^ ]*?))\3|(\/[\w])))([a-zA-Z0-9\$\'\"\s\#\%\^\&\!\.\_\+\=\-\\\*\(\)\ ]+?}})/ 

Is there any regular guru that can give me some pointers or a regular expression that mimics what I need? Which gets only the deepest complex instance {{tag}}, which ends as follows {{// tag}}

ORIGINAL

Well, that’s why I have a problem that I have seen with others, but with a different approach to it. Or so I thought. Therefore, I am curious if anyone else can help me solve this problem further.

I have a database full of templates that I need to work with in PHP, these templates are created and used by another system, and therefore they cannot be changed. With that said, these templates added hierarchy style tags to them. What I need to do is get these templates from the database, and then programmatically find these tags, their function name (or tag name) and their internal contents, as well as everything that follows the name of the function (tag) in brackets. An example of one of these tags: {{FunctionName some (otherStuff)! Here}} Some content is inside and ends with {{/ FunctionName}}

It’s getting more fun here, the templates have another random tag, which I assume is the “variable” style of these tags, since they always have the same syntax. Which looks like this: $ {RandomTag}, but there are also times when there is a function style, but without an end tag, for example. {{RandomLoner}}

Sample template ...

 {{FunctionTag (Condition?)}} <div>This is an {{CheckOfSomeSort someTimesThese !orThese}} example of some {{Random}} data {{/CheckOfSomeSort}} that will be ${worked} on</div> {{/FunctionTag}} 

Ok, so this is not a real template, but it follows all the rules that I have seen so far.

Now I have tried different things with regex and preg_match_all to pull the matches and get each of them into a nice array. So far I got this (used it on a sample template to make sure it works)

 Array ( [0] => Array ( [0] => {{CheckOfSomeSort someTimesThese !orThese}}example of some datas{{/CheckOfSomeSort}} [1] => {{CheckOfSomeSort someTimesThese !orThese}} [2] => CheckOfSomeSort [3] => example of some data [4] => {{/CheckOfSomeSort}} ) ) 

I tried a couple of approaches (it took me almost 8 hours)

 /({{([^\/].[^ ]*)(?:.[^ ][^{{]+)}})(?:(?=([^{{]+))\3|{{(?!\2[^}}]*}}))*?({{\/\2}})/ AND, more recently... /({{([^\/].[^ ]*)(?:.[^ ][^{{]+)}})((?:(?!\{\{|\}\}).)++|(?R)*)({{\/\2}})/ 

By no means am I a regular expression guru, I actually just recognized him on the last day or so, trying to get him to work. I thought about it and realized that the regex isn't intended for file attachments, but (? R) seems to do the trick in the simple examples of brackets I've seen on the internet, but they always only consider material between {and} or (and ) or <and>. After reading almost the entire regular expressions information website and playing, I came up with these 2 versions.

So what I need to do (I think) will first work with the regular expression from the DEEPEST hierarchy tag and work out its output (if I can do this with php, this is fine with me). I thought to find the deepest layer, get its data and work in reverse order until all the contents are in 1 bold array. I assumed that this is what ($ R) was going to do for me, but it is not.

So any help in what I am missing would be wonderful, also keep in mind that I seem to have problems with {{}} that DONT has a final version. So, like my {{Random}} example, the array parsing was deleted for me. I feel that these tags along with the $ {} tags can be left alone (if I knew how to do this with a regular expression) and just stay in the text where they are. I am more or less interested in functions and get their data in a multidimensional array so that I can work further.

Sorry for the long post, I just banged my head all night with this. I started by suggesting that it would be a little easier. Til I understood the tags in which they are nested: /

Any help is appreciated! Thanks!

+4
source share
3 answers

After some time working on this, I eventually learned more about regular expression and understood it now in T. The great thing is that PHP has (? R), and now I understand why it looks like this. lol

In the end, the regex that I got appeared on the php page explaining recursive (? R). Then I just worked on getting the regular expression of the tags instead of the bracket they used in the example.

I know that I need the innermost tag, but of course it can do the same with the outer tag, so this regular expression does just that. It finds and captures the outermost {{tag (thatMightHaveDataHere)}} and has inner contents that may be larger than {{TAGS}} inside it. {{/ Tag}}

Here he is,

 /{{([\w]+) ?([^}]*?)(?:}}((?:[^{]*?|(?R)|{{[\w]*?}}|\${.*?})*){{\/\1}})/ 

0 = Matches "Outer Tag" 1 = found tag, i.e. {{Tag}} {{/ \ 1}} 2 = Any data after the first space inside the tag, i.e. {{Tag ThisDataIs StoredAs2}} 3 = INNER Content (which may be recursive for this regular expression or a non-contact tag {{noEndTag}} or a tag that starts with the dollar $ {likeThis}

Run a loop in $ match [3] with this regular expression, and you can cycle through them. You don’t know where you would use it outside of what I needed, but I’m sure that someone can change it if they need it to work on a different structure of the nested style.

-1
source

Wow, what a weird template syntax.

The method that I would probably use to solve this problem would be something like this:

  • Use a simple regular expression to change all {{tags}} to <tags>
  • Use another simple regular expression to convert the spatially-separated arguments / conditions into tags into an XML type attribute syntax (for example, {{foo bar !baz}} becomes <foo arg1="bar" arg2="!baz"> or similar)
  • Process it as a DOMDocument .

Enjoy.: -)

+1
source

A warning! . You are trying to write a parser with regular expressions only. This does not work very well. Why not? Because you also need to save a fortune!

So what then? Well, you, of course, write a parser: D

If you need any tips on how to get started, I can help, but I would advise you to try it yourself. How does the parser work? :)


Indicate the entry. And convert it to a nested tree as follows:

 array( array("code", "FunctionTag (Condition?)", array( "<div>This is an ", array("code", "CheckOfSomeSort someTimesThese !orThese", array( "example of some ", array("code", array("Random"), array()), " data" )), " that will be ${worked} on</div>" )) ) 

Now you just need to interpret the parts of the code and get the expected result. You can also add things like line numbers and character positions, which is very useful for debugging.

0
source

Source: https://habr.com/ru/post/1390900/


All Articles