XML shredding through XSLT in Java

I need to convert large XML files that have a nested (hierarchical) form structure

<Root> Flat XML Hierarchical XML (multiple blocks, some repetitive) Flat XML </Root> 

into a flatter ("ground") form with 1 block for each repeated nested block.

Data has many different tags and hierarchy variations (especially in the number of shredded XML tags before and after hierarchical XML), so ideally you should not make any assumptions about tag names and attributes or the hierarchical level.

The top level of the hierarchy in just 4 levels will look something like this:

 <Level 1> ... <Level 2> ... <Level 3> ... <Level 4>A</Level 4> <Level 4>B</Level 4> ... </Level 3> ... </Level 2> ... </Level 1> 

and the desired result will then be

 <Level 1> ... <Level 2> ... <Level 3> ... <Level 4>A</Level 4> ... </Level 3> ... </Level 2> ... </Level 1> <Level 1> ... <Level 2> ... <Level 3> ... <Level 4>B</Level 4> ... </Level 3> ... </Level 2> ... </Level 1> 

That is, if at each level i there are Li different components, a total Product(Li) various components will be created (only 2 above, since the only differentiating factor is level 4, therefore L1*L2*L3*L4 = 2 ).

From what I saw, XSLT might be a way, but any other solution (like StAX or even JDOM).

A more detailed example using fictitious information would be

 <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/10/2001</StartDate> <Months>38</Months> </Job> <Job title = "Senior Developer"> <StartDate>01/12/2004</StartDate> <Months>6</Months> </Job> <Job title = "Senior Developer"> <StartDate>01/06/2005</StartDate> <Months>10</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>2</Jobs> <JobDetails> <Job title = "Junior Developer"> <StartDate>01/05/1999</StartDate> <Months>25</Months> </Job> <Job title = "Junior Developer"> <StartDate>01/07/2001</StartDate> <Months>3</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee> 

The above data should be shredded into 5 blocks (i.e. one for each other <Job> block), each of which will leave all other tags the same and will have only one <Job> element. So, given the 5 different <Job> blocks in the above example, the converted (β€œshredded”) XML will

 <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/10/2001</StartDate> <Months>38</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/12/2004</StartDate> <Months>6</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/06/2005</StartDate> <Months>10</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Junior Developer"> <StartDate>01/05/1999</StartDate> <Months>25</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Junior Developer"> <StartDate>01/07/2001</StartDate> <Months>3</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> 
+4
source share
2 answers

Given the following XML:

 <?xml version="1.0" encoding="utf-8" ?> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Developer"> <StartDate>01/10/2001</StartDate> <Months>38</Months> </Job> <Job title = "Developer"> <StartDate>01/12/2004</StartDate> <Months>6</Months> </Job> <Job title = "Developer"> <StartDate>01/06/2005</StartDate> <Months>10</Months> </Job> </JobDetails> </Employment> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>2</Jobs> <JobDetails> <Job title = "Developer"> <StartDate>01/05/1999</StartDate> <Months>25</Months> </Job> <Job title = "Developer"> <StartDate>01/07/2001</StartDate> <Months>3</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee> 

Next XSLT:

 <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <Output> <xsl:apply-templates select="//Employee/EmploymentHistory/Employment/JobDetails/Job" /> </Output> </xsl:template> <xsl:template match="//Employee/EmploymentHistory/Employment/JobDetails/Job"> <Employee> <xsl:attribute name="name"> <xsl:value-of select="ancestor::Employee/@name"/> </xsl:attribute> <Address> <xsl:value-of select="ancestor::Employee/Address"/> </Address> <Age> <xsl:value-of select="ancestor::Employee/Age"/> </Age> <EmploymentHistory> <Employment> <xsl:attribute name="country"> <xsl:value-of select="ancestor::Employment/@country"/> </xsl:attribute> <Comment> <xsl:value-of select="ancestor::Employment/Comment"/> </Comment> <Jobs> <xsl:value-of select="ancestor::Employment/Jobs"/> </Jobs> <JobDetails> <xsl:copy-of select="."/> </JobDetails> <Available> <xsl:value-of select="ancestor::Employee/Available"/> </Available> <Experience> <xsl:attribute name="unit"> <xsl:value-of select="ancestor::Employee/Experience/@unit"/> </xsl:attribute> <xsl:value-of select="ancestor::Employee/Experience"/> </Experience> </Employment> </EmploymentHistory> </Employee> </xsl:template> </xsl:stylesheet> 

Gives the following output:

 <?xml version="1.0" encoding="utf-8"?> <Output> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title="Developer"> <StartDate>01/10/2001</StartDate> <Months>38</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title="Developer"> <StartDate>01/12/2004</StartDate> <Months>6</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title="Developer"> <StartDate>01/06/2005</StartDate> <Months>10</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>2</Jobs> <JobDetails> <Job title="Developer"> <StartDate>01/05/1999</StartDate> <Months>25</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>2</Jobs> <JobDetails> <Job title="Developer"> <StartDate>01/07/2001</StartDate> <Months>3</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> </Output> 

Note that I have added an output root element to make sure the document is well formed.

Is that what you wanted?

You can also use xsl: copy to copy higher-level elements, but I need to think about this a little more. With the above xslt, you have more control, but you must also override your elements ...

+3
source

Here is a general solution on request :

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:param name="pLeafNodes" select="//Level-4"/> <xsl:template match="/"> <t> <xsl:call-template name="StructRepro"/> </t> </xsl:template> <xsl:template name="StructRepro"> <xsl:param name="pLeaves" select="$pLeafNodes"/> <xsl:for-each select="$pLeaves"> <xsl:apply-templates mode="build" select="/*"> <xsl:with-param name="pChild" select="."/> <xsl:with-param name="pLeaves" select="$pLeaves"/> </xsl:apply-templates> </xsl:for-each> </xsl:template> <xsl:template mode="build" match="node()|@*"> <xsl:param name="pChild"/> <xsl:param name="pLeaves"/> <xsl:copy> <xsl:apply-templates mode="build" select="@*"/> <xsl:variable name="vLeafChild" select= "*[count(.|$pChild) = count($pChild)]"/> <xsl:choose> <xsl:when test="$vLeafChild"> <xsl:apply-templates mode="build" select="$vLeafChild | node()[not(count(.|$pLeaves) = count($pLeaves))]"> <xsl:with-param name="pChild" select="$pChild"/> <xsl:with-param name="pLeaves" select="$pLeaves"/> </xsl:apply-templates> </xsl:when> <xsl:otherwise> <xsl:apply-templates mode="build" select= "node()[not(.//*[count(.|$pLeaves) = count($pLeaves)]) or .//*[count(.|$pChild) = count($pChild)] ] "> <xsl:with-param name="pChild" select="$pChild"/> <xsl:with-param name="pLeaves" select="$pLeaves"/> </xsl:apply-templates> </xsl:otherwise> </xsl:choose> </xsl:copy> </xsl:template> <xsl:template match="text()"/> </xsl:stylesheet> 

When applied to the provided simplified (and universal) XML document :

 <Level-1> ... <Level-2> ... <Level-3> ... <Level-4>A</Level-4> <Level-4>B</Level-4> ... </Level-3> ... </Level-2> ... </Level-1> 

required, the correct result is obtained :

 <Level-1> ... <Level-2> ... <Level-3> <Level-4>A</Level-4> </Level-3> ... </Level-2> ... </Level-1> <Level-1> ... <Level-2> ... <Level-3> <Level-4>B</Level-4> </Level-3> ... </Level-2> ... </Level-1> 

Now, if we change the line :

  <xsl:param name="pLeafNodes" select="//Level-4"/> 

in

  <xsl:param name="pLeafNodes" select="//Job"/> 

and apply the transform to the Employee XML document :

 <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/10/2001</StartDate> <Months>38</Months> </Job> <Job title = "Senior Developer"> <StartDate>01/12/2004</StartDate> <Months>6</Months> </Job> <Job title = "Senior Developer"> <StartDate>01/06/2005</StartDate> <Months>10</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>2</Jobs> <JobDetails> <Job title = "Junior Developer"> <StartDate>01/05/1999</StartDate> <Months>25</Months> </Job> <Job title = "Junior Developer"> <StartDate>01/07/2001</StartDate> <Months>3</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee> 

again we get the desired, correct result :

 <t> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title="Senior Developer"> <StartDate>01/10/2001</StartDate> <Months>38</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title="Senior Developer"> <StartDate>01/12/2004</StartDate> <Months>6</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title="Senior Developer"> <StartDate>01/06/2005</StartDate> <Months>10</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>2</Jobs> <JobDetails> <Job title="Junior Developer"> <StartDate>01/05/1999</StartDate> <Months>25</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>2</Jobs> <JobDetails> <Job title="Junior Developer"> <StartDate>01/07/2001</StartDate> <Months>3</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee> </t> 

The explanation . Processing is performed in a named template ( StructRepro ) and is controlled by one external parameter called pLeafNodes , which should contain a set of nodes of all nodes whose "upstream structure" should be reproduced as a result.

+4
source

Source: https://habr.com/ru/post/1386865/


All Articles