Need XSLT transform to remove duplicate elements - sort by attribute

I have a terrible part of XML that I need to process through BizTalk, and I was able to normalize it in this example below. I'm not an XSLT ninja, but between the network and the VS2010 debugger, I can find my way around XSL.

Now I need a smart XSLT bit to "weed out" duplicate elements and store the latter only according to the date in the ValidFromDate attribute.

The ValidFromDate attribute is of type XSD: Date.

<SomeData> <A ValidFromDate="2011-12-01">A_1</A> <A ValidFromDate="2012-01-19">A_2</A> <B CalidFromDate="2011-12-03">B_1</B> <B ValidFromDate="2012-01-17">B_2</B> <B ValidFromDate="2012-01-19">B_3</B> <C ValidFromDate="2012-01-20">C_1</C> <C ValidFromDate="2011-01-20">C_2</C> </SomeData> 

After the conversion, I would like to save only these lines:

 <SomeData> <A ValidFromDate="2012-01-19">A_2</A> <B ValidFromDate="2012-01-19">B_3</B> <C ValidFromDate="2012-01-20">C_1</C> </SomeData> 

Any tips on how I combined this XSL? I emptied the Internet trying to find a solution, and I tried many smart XSL sorting scripts, but none of them felt me ​​in the right direction.

+6
source share
6 answers

XSLT 1.0 solutions are slightly simpler and shorter than @lwburk’s :

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:key name="kName" match="*/*" use="name()"/> <xsl:template match="/"> <xsl:apply-templates select= "*/*[generate-id() = generate-id(key('kName', name())[1]) ] "/> </xsl:template> <xsl:template match="*/*"> <xsl:for-each select="key('kName', name())"> <xsl:sort select="@ValidFromDate" order="descending"/> <xsl:if test="position() = 1"> <xsl:copy-of select="."/> </xsl:if> </xsl:for-each> </xsl:template> </xsl:stylesheet> 

when this conversion is applied to the provided XML document :

 <SomeData> <A ValidFromDate="2011-12-01">A_1</A> <A ValidFromDate="2012-01-19">A_2</A> <B CalidFromDate="2011-12-03">B_1</B> <B ValidFromDate="2012-01-17">B_2</B> <B ValidFromDate="2012-01-19">B_3</B> <C ValidFromDate="2012-01-20">C_1</C> <C ValidFromDate="2011-01-20">C_2</C> </SomeData> 

required, the correct result is obtained :

 <A ValidFromDate="2012-01-19">A_2</A> <B ValidFromDate="2012-01-19">B_3</B> <C ValidFromDate="2012-01-20">C_1</C> 
+2
source

The best solution to this problem with Xslt 1.0 would be to use Muenchian grouping. (Given that the elements are already sorted by the ValidFromDate attribute), the following stylesheet should do the trick:

 <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:key name="element-key" match="/SomeData/*" use="name()" /> <xsl:template match="/SomeData"> <xsl:copy> <xsl:for-each select="*[generate-id() = generate-id(key('element-key', name()))]"> <xsl:copy-of select="(. | following-sibling::*[name() = name(current())])[last()]" /> </xsl:for-each> </xsl:copy> </xsl:template> </xsl:stylesheet> 

Here is the result I got when starting with your Xml sample:

 <?xml version="1.0" encoding="utf-8"?> <SomeData> <A ValidFromDate="2012-01-19">A_2</A> <B ValidFromDate="2012-01-19">B_3</B> <C ValidFromDate="2011-01-20">C_2</C> </SomeData> 
+3
source

The following stylesheet gives the correct result without any dependence on the input order:

 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:key name="byName" match="/SomeData/*" use="name()"/> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="SomeData"> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:for-each select="*[generate-id()= generate-id(key('byName', name())[1])]"> <xsl:apply-templates select="key('byName', name())" mode="out"> <xsl:sort select="translate(@ValidFromDate, '-', '')" data-type="number" order="descending"/> </xsl:apply-templates> </xsl:for-each> </xsl:copy> </xsl:template> <xsl:template match="SomeData/*" mode="out"> <xsl:if test="position()=1"> <xsl:apply-templates select="."/> </xsl:if> </xsl:template> </xsl:stylesheet> 

Conclusion:

 <SomeData> <A ValidFromDate="2012-01-19">A_2</A> <B ValidFromDate="2012-01-19">B_3</B> <C ValidFromDate="2012-01-20">C_1</C> </SomeData> 

Note that the result is slightly different from what you indicated as the desired result, because C_1 actually the last element of C (i.e. the input is not yet sorted). Using the original sort order (and blindly following the specified expected result), the existing answers are actually incorrect.

Explanation:

  • An xsl:key groups all /SomeData/* into name()
  • External for-each selects the first item in each group
  • The templates are then applied to all members of this group, which are sorted by @ValidFromDate
  • One additional template handles the selection of the first element from each sorted group
  • Identity conversion template takes care of the rest
+2
source

Based on the @ValidFromDate order:

XSLT:

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:key name="k" match="*" use="name()"/> <xsl:template match="SomeData"> <xsl:copy> <xsl:apply-templates select="*[generate-id() = generate-id(key('k', name()))]"/> </xsl:copy> </xsl:template> <xsl:template match="*"> <xsl:apply-templates select="key('k', name())" mode="a"> <xsl:sort select="@ValidFromDate" order="descending"/> </xsl:apply-templates> </xsl:template> <xsl:template match="*" mode="a"> <xsl:if test="position() = 1"> <xsl:copy-of select="."/> </xsl:if> </xsl:template> </xsl:stylesheet> 

applies:

 <SomeData> <A ValidFromDate="2011-12-01">A_1</A> <A ValidFromDate="2012-01-19">A_2</A> <B CalidFromDate="2011-12-03">B_1</B> <B ValidFromDate="2012-01-17">B_2</B> <B ValidFromDate="2012-01-19">B_3</B> <C ValidFromDate="2012-01-20">C_1</C> <C ValidFromDate="2011-01-20">C_2</C> </SomeData> 

gives:

 <SomeData> <A ValidFromDate="2012-01-19">A_2</A> <B ValidFromDate="2012-01-19">B_3</B> <C ValidFromDate="2012-01-20">C_1</C> </SomeData> 
+2
source

Based on the Pawel answer , I made the following modification, which gives the same result:

 <xsl:template match="/SomeData"> <xsl:copy> <xsl:copy-of select="*[generate-id() = generate-id(key('element-key', name())[last()])]"/> </xsl:copy> </xsl:template> 

If they produce the same result every time, I like it because it is a little cleaner.

+1
source

XLST 2.0 solution without relying on input order.

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="/"> <SomeData> <xsl:for-each-group select="/SomeData/*" group-by="name()"> <xsl:for-each select="current-group()"> <xsl:sort select="number(substring(attribute(),1,4))" order="descending" data-type="number"/> <!-- year--> <xsl:sort select="number(substring(attribute(),6,2))" order="descending" data-type="number"/> <!-- month--> <xsl:sort select="number(substring(attribute(),9,2))" order="descending" data-type="number"/> <!-- date--> <xsl:if test="position()=1"> <xsl:sequence select="."/> </xsl:if> </xsl:for-each> </xsl:for-each-group> </SomeData> </xsl:template> </xsl:stylesheet> 
+1
source

Source: https://habr.com/ru/post/906963/


All Articles