Regex replace single quote with single quote twice if it is inside <xsl: or <XSL:

Question

Regex replace single quote with single quote twice if it is inside <xsl: or <XSL:

The regular expression to replace ' by '' , if it is inside <xsl: else ' should remain as it is.
Code snippet:

 public static void main(String[] args) { String replaceSingleQuoteInsideXsltCondition = "(<\\s*?xsl\\s*?:.*?=.*?)(')(.*?)(')(.*?>)"; String dummyXSLT = "<p>Thank you for sending us <xsl:for-each select=\"catalog/cd[artist='Bob Dylan']\"> " + "paper to prove your <span class=\"highlight\"><xsl:if test=\"D01 ='Y'\">Income</xsl:if></span> <span class=\"highlight\"><xsl:if test=\"D02 ='Y'\">&#160;and&#160;" + "</xsl:if></span><span class=\"highlight\"><xsl:if test=\"D03 ='Y'\">Citizenship and/or Identity</xsl:if></span>. " + "We need a little more information to finish your application. Addition of few words like 7 o'clock, employees' or employ and child and 'xyz and 'hello'</p>" + "contact number for inquiry = '478965152' and email id = ' pqr@xyz '" + "<xsl:template match=\"num[ . = 3 or . = 5]\"/></xsl:stylesheet><xsl:if test=\"contains($search, 'Web Developer') and (contains($expSearch, 'Computer') or contains($expSearch, 'Information') or contains($expSearch, 'Web' ))\">" + "<xsl:if test=\"((node/ABC!='') and (normalize-space(node/DEF)='') and (normalize-space(node/GHI)=''))\"> just a dummy sample.</xsl:if>"; System.out.println(dummyXSLT.replaceAll(replaceSingleQuoteInsideXsltCondition, "$1''$3''$5")); }

Actual code result:

 <p>Thank you for sending us <xsl:for-each select="catalog/cd[artist=''Bob Dylan'']"> paper to prove your <span class="highlight"><xsl:if test="D01 =''Y''">Income</xsl:if></span> <span class="highlight"><xsl:if test="D02 =''Y''">&#160;and&#160;</xsl:if></span><span class="highlight"><xsl:if test="D03 =''Y''">Citizenship and/or Identity</xsl:if></span>. We need a little more information to finish your application. Addition of few words like 7 o'clock, employees' or employ and child and 'xyz and 'hello'</p>contact number for inquiry = '478965152' and email id = ' pqr@xyz '<xsl:template match="num[ . = 3 or . = 5]"/></xsl:stylesheet><xsl:if test="contains($search, ''Web Developer'') and (contains($expSearch, 'Computer') or contains($expSearch, 'Information') or contains($expSearch, 'Web' ))"><xsl:if test="((node/ABC!='''') and (normalize-space(node/DEF)='') and (normalize-space(node/GHI)=''))"> just a dummy sample.</xsl:if>

Expected Result:

 <p>Thank you for sending us <xsl:for-each select="catalog/cd[artist=''Bob Dylan'']"> paper to prove your <span class="highlight"><xsl:if test="D01 =''Y''">Income</xsl:if></span> <span class="highlight"><xsl:if test="D02 =''Y''">&#160;and&#160;</xsl:if></span><span class="highlight"><xsl:if test="D03 =''Y''">Citizenship and/or Identity</xsl:if></span>. We need a little more information to finish your application. Addition of few words like 7 o'clock, employees' or employ and child and 'xyz and 'hello'</p>contact number for inquiry = '478965152' and email id = ' pqr@xyz '<xsl:template match="num[ . = 3 or . = 5]"/></xsl:stylesheet><xsl:if test="contains($search, ''Web Developer'') and (contains($expSearch, ''Computer'') or contains($expSearch, ''Information'') or contains($expSearch, ''Web'' ))"><xsl:if test="((node/ABC!='''') and (normalize-space(node/DEF)='''') and (normalize-space(node/GHI)=''''))"> just a dummy sample.</xsl:if>

+4

java regex

Sanjay madnani Mar 21 '17 at 18:44

source share

3 answers

This is not possible if you allow arbitrary nesting of elements in <xsl> </> tags. See Open RegEx matching tags, with the exception of stand-alone XHTML tags .

You can create a regular expression for this particular case, but not for all possible cases.

0

whaleberg Mar 27 '17 at 18:41

source share

If you just parse TAGS, it works.
If you are trying to interpret HTML closure, this cannot be done using Java
regular expression.

The basic idea is that you cannot just parse xsl tags. All tags must be parsed.
to promote match position and pass tags that may hide html.

So, all tags should be analyzed.
In the regular expression below, Capture Group 2 contains the xsl tags you want to find.

All tags will be matched. You can ignore them and just search when
capture group 2 has a length. This is the one you want to manipulate.

What we do is Replace everything with a callback.

Inside the callback:

If capture group 2 does not match (i.e. does not have length)
just return the contents of capture group 0 (match).
It just replaces what matches. These are other tags.
If capture group 2 matches copy group 2 to line
and run another regex expression on this strinG (this is the content).
It will be a global search (?<!')'(?!') Replace. ''
Return this line as a replacement in the callback.

That's all.

Now hold on to yourself.
This is a regular expression.

(Feel free to make this register case insensitive if you want)

Advanced

  < (?: (?: (?: # Invisible content; end tag req'd ( # (1 start) script | style #| head | object | embed | applet | noframes | noscript | noembed ) # (1 end) (?: \s+ (?> " [\S\s]*? " | ' [\S\s]*? ' | (?: (?! /> ) [^>] )? )+ )? \s* > ) [\S\s]*? </ \1 \s* (?= > ) ) | (?: /? [\w:]+ \s* /? ) | ( # (2 start), The xsl: we want to find xsl: [\w:-]* \s+ (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]? )+ \s* /? ) # (2 end) | (?: [\w:]+ \s+ (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]? )+ \s* /? ) | \? [\S\s]*? \? | (?: ! (?: (?: DOCTYPE [\S\s]*? ) | (?: \[CDATA\[ [\S\s]*? \]\] ) | (?: -- [\S\s]*? -- ) | (?: ATTLIST [\S\s]*? ) | (?: ENTITY [\S\s]*? ) | (?: ELEMENT [\S\s]*? ) ) ) ) >

Final note. To find out how effective and fast this regular expression is,
get great HTML code. Launch the global find and replace it with "".
Now you will see all the content completely devoid of html.

0

sln Mar 27 '17 at 22:50

source share

Yunnosch · Accepted Answer · 2017-03-27T21:41:28+0000

I suppose it's okay to use two different regular expressions, one in a loop.
(The modifier "g" does not help.)

Here is the concept of a Java implementation for your use:

first replace everything '' with '''' ,
once but globally
replace (<xsl([^>']|'')+)'(([^>']|[^>']+'')+)'(([^'>])+) with \1''\3''\5 , not globally, but in a loop until it replaces anything else.
if this works, the next step is to force it to accept xsl as well as xsl , and also allow the required optional spaces
(<\\s*(xsl|XSL)([^>']|'')+)'(([^>']|[^>']+'')+)'(([^'>])+)

I am not a javaman (respectful pun), so I cannot offer a demonstrator in java.
Here is a demo (you don't need it, just to show what I tested) in sed.
He implements the concept above and has the desired output for a given sample input.

 bash-3.1$ sed -En "1{s/''/''''/g;:a;s/(<xsl([^>']|'')+)'(([^>']|[^>']+'')+)'(([^'>])+)/\1''\3''\5/;ta;p};" input.txt > output.txt

The main trick is to look for something that DOES NOT occur in the already successfully replaced part, and then replace it successfully.
The secondary trick is to first replace everything that needs to be replaced, but already looks replaced ( '' → '''' ).

Note:
While java and sed have potentially different regex variables, I don't see anything that clearly conflicts when comparing your regex to mine. The mine does not even contain any \s \d \w or the like.
You may need to use $1''$3''$5 instead of my \1''\3''\5 .

Regex replace single quote with single quote twice if it is inside <xsl: or <XSL:

More articles: