Conditional delete with variable string regex

I have searched for a few Q & As and cannot find a solution that is adequate enough to help.

I have a large XML file and you need to do a conditional β€œdelete” in one field depending on the value in another field.

For instance:

<vehicle>...<manufacturer>JCB</manufacturer>....<item_category>JCB Tractors</item_category>...</vehicle><vehicle>...<manufacturer>Caterpillar</manufacturer>....<item_category>Digger</item_category>...</vehicle><vehicle>...<manufacturer>Caterpillar</manufacturer>....<item_category>Caterpillar Digger</item_category>...</vehicle>

should become

<vehicle>...<manufacturer>JCB</manufacturer>...<item_category>Tractors</item_category>...</vehicle><vehicle>...<manufacturer>Caterpillar</manufacturer>...<item_category>Digger</item_category>...</vehicle><vehicle>...<manufacturer>Caterpillar</manufacturer>....<item_category>Digger</item_category>...</vehicle>

Ideally, the solution would be what I could apply using the search and replace function in the text panel set to the POSIX extended regular expression.

Actually I appreciate the help in this, as I hit my head about it several times!

If I use a parser, I can highlight the string of the variable that I want to delete using

(?<=<manufacturer>)(.*?)(?=<\/manufacturer>)

Is it possible to use this pattern to highlight the line that I really want to delete.

eg.

(?<=<item_category>)(?<=<manufacturer>)(.*?)(?=<\/manufacturer>)(\s)
+4
1

, , .

. . , .

. , item_category.

DEMO: https://regex101.com/r/rO7pM0/1

(\<manufacturer>([^<]*)<\/manufacturer>)(\s*)(\<item_category>)(?:\2\s*)?([^<]*)(<\/item_category>)

:

 (                            # Opens CG1
     \<manufacturer>          # Literal 
     (                        # Opens CG2
         [^<]*                # Negated Character class (excludes the characters within)
                                # None of: <
                                # * repeats zero or more times
     )                        # Closes CG2
     <                        # Literal <
     \/                       # Literal /
     manufacturer             # Literal manufacturer
     >                        # Literal >
 )                            # Closes CG1
 (                            # Opens CG3
     \s*                      # Token: \s (white space)
                                # * repeats zero or more times
 )                            # Closes CG3
 (                            # Opens CG4
     \<item_category>         # Literal 
 )                            # Closes CG4
 (?:                          # Opens NCG
     \2                       # A backreference to CG2
     \s*                      # Token: \s (white space)
                                # * repeats zero or more times
 )?                           # Closes NCG
                                # ? repeats zero or one times
 (                            # Opens CG5
     [^<]*                    # Negated Character class (excludes the characters within)
                                # None of: <
                                # * repeats zero or more times
 )                            # Closes CG5
 (                            # Opens CG6
     <                        # Literal <
     \/                       # Literal /
     item_category            # Literal item_category
     >                        # Literal >
 )                            # Closes CG6

(\s*), , ([\s\S]*?) , , , item_category. , , , .

+2

Source: https://habr.com/ru/post/1584273/


All Articles