Here is the complete XSLT 1.0 solution :
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ext="http://exslt.org/common" xmlns:w="w" exclude-result-prefixes="ext w"> <xsl:output omit-xml-declaration="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="w:p"> <xsl:variable name="vrtfPass1"> <p> <xsl:apply-templates/> </p> </xsl:variable> <xsl:apply-templates mode="pass2" select="ext:node-set($vrtfPass1)/*"/> </xsl:template> <xsl:template match="w:r"> <xsl:variable name="vrtfProps"> <xsl:for-each select="w:rPr/*"> <xsl:sort select="local-name()"/> <xsl:copy-of select="."/> </xsl:for-each> </xsl:variable> <xsl:call-template name="toHtml"> <xsl:with-param name="pProps" select= "ext:node-set($vrtfProps)/*"/> <xsl:with-param name="pText" select="w:t/text()"/> </xsl:call-template> </xsl:template> <xsl:template name="toHtml"> <xsl:param name="pProps"/> <xsl:param name="pText"/> <xsl:choose> <xsl:when test="not($pProps)"> <xsl:copy-of select="$pText"/> </xsl:when> <xsl:otherwise> <xsl:element name="{local-name($pProps[1])}"> <xsl:call-template name="toHtml"> <xsl:with-param name="pProps" select= "$pProps[position()>1]"/> <xsl:with-param name="pText" select="$pText"/> </xsl:call-template> </xsl:element> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="/*" mode="pass2"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:call-template name="processInner"> <xsl:with-param name="pNodes" select="node()"/> </xsl:call-template> </xsl:copy> </xsl:template> <xsl:template name="processInner"> <xsl:param name="pNodes"/> <xsl:variable name="pNode1" select="$pNodes[1]"/> <xsl:if test="$pNode1"> <xsl:choose> <xsl:when test="not($pNode1/self::*)"> <xsl:copy-of select="$pNode1"/> <xsl:call-template name="processInner"> <xsl:with-param name="pNodes" select= "$pNodes[position()>1]"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:variable name="vbatchLength"> <xsl:call-template name="getBatchLength"> <xsl:with-param name="pNodes" select="$pNodes[position()>1]"/> <xsl:with-param name="pName" select="name($pNode1)"/> <xsl:with-param name="pCount" select="1"/> </xsl:call-template> </xsl:variable> <xsl:element name="{name($pNode1)}"> <xsl:copy-of select="@*"/> <xsl:call-template name="processInner"> <xsl:with-param name="pNodes" select= "$pNodes[not(position()>$vbatchLength)] /node()"/> </xsl:call-template> </xsl:element> <xsl:call-template name="processInner"> <xsl:with-param name="pNodes" select= "$pNodes[position()>$vbatchLength]"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:if> </xsl:template> <xsl:template name="getBatchLength"> <xsl:param name="pNodes"/> <xsl:param name="pName"/> <xsl:param name="pCount"/> <xsl:choose> <xsl:when test= "not($pNodes) or not($pNodes[1]/self::*) or not(name($pNodes[1])=$pName)"> <xsl:value-of select="$pCount"/> </xsl:when> <xsl:otherwise> <xsl:call-template name="getBatchLength"> <xsl:with-param name="pNodes" select= "$pNodes[position()>1]"/> <xsl:with-param name="pName" select="$pName"/> <xsl:with-param name="pCount" select="$pCount+1"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>
when this conversion is applied to the following XML document (based on the provided, but harder to show, as more edges):
<w:p xmlns:w="w"> <w:r> <w:rPr> <w:b/> </w:rPr> <w:t xml:space="preserve">This is a </w:t> </w:r> <w:r> <w:rPr> <w:b/> </w:rPr> <w:t xml:space="preserve">bold </w:t> </w:r> <w:r> <w:rPr> <w:b/> <w:i/> </w:rPr> <w:t>with a bit of italic</w:t> </w:r> <w:r> <w:rPr> <w:b/> <w:i/> </w:rPr> <w:t> and some more italic</w:t> </w:r> <w:r> <w:rPr> <w:i/> </w:rPr> <w:t> and just italic, no-bold</w:t> </w:r> <w:r> <w:rPr> <w:b/> </w:rPr> <w:t xml:space="preserve"></w:t> </w:r> <w:r> <w:rPr> <w:b/> </w:rPr> <w:t>paragr</w:t> </w:r> <w:r> <w:rPr> <w:b/> </w:rPr> <w:t>a</w:t> </w:r> <w:r> <w:rPr> <w:b/> </w:rPr> <w:t>ph</w:t> </w:r> <w:r> <w:t xml:space="preserve"> with some non-bold in it too.</w:t> </w:r> </w:p>
required, the correct result is obtained :
<p><b>This is a bold <i>with a bit of italic and some more italic</i></b><i> and just italic, no-bold</i><b>paragraph</b> with some non-bold in it too.</p>
Explanation
- This is a two pass conversion . The first pass is relatively simple and converts the original XML document (in our particular case) into the following:
pass1 result (indented for reading):
<p> <b>This is a </b> <b>bold </b> <b> <i>with a bit of italic</i> </b> <b> <i> and some more italic</i> </b> <i> and just italic, no-bold</i> <b/> <b>paragr</b> <b>a</b> <b>ph</b> with some non-bold in it too.</p>
0.2. The second pass (performed in "pass2" mode) combines any batch of sequential and identically named elements into one element with this name. He recursively calls himself child elements of the combined elements - thus, parties merge at any depth.
0.3. Take a note . We do not use (and cannot) the following-sibling:: or preceding-sibling , because only the nodes (which should be combined) at the top level are really brothers and sisters. For this reason, we process all nodes in the same way as node-set.
0.4. This solution is completely general - it combines any sequence of consecutive identically named elements at any depth - and no specific names are hardcoded.