I need to convert large XML files that have a nested (hierarchical) form structure
<Root> Flat XML Hierarchical XML (multiple blocks, some repetitive) Flat XML </Root>
into a flatter ("ground") form with 1 block for each repeated nested block.
Data has many different tags and hierarchy variations (especially in the number of shredded XML tags before and after hierarchical XML), so ideally you should not make any assumptions about tag names and attributes or the hierarchical level.
The top level of the hierarchy in just 4 levels will look something like this:
<Level 1> ... <Level 2> ... <Level 3> ... <Level 4>A</Level 4> <Level 4>B</Level 4> ... </Level 3> ... </Level 2> ... </Level 1>
and the desired result will then be
<Level 1> ... <Level 2> ... <Level 3> ... <Level 4>A</Level 4> ... </Level 3> ... </Level 2> ... </Level 1> <Level 1> ... <Level 2> ... <Level 3> ... <Level 4>B</Level 4> ... </Level 3> ... </Level 2> ... </Level 1>
That is, if at each level i there are Li different components, a total Product(Li) various components will be created (only 2 above, since the only differentiating factor is level 4, therefore L1*L2*L3*L4 = 2 ).
From what I saw, XSLT might be a way, but any other solution (like StAX or even JDOM).
A more detailed example using fictitious information would be
<Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/10/2001</StartDate> <Months>38</Months> </Job> <Job title = "Senior Developer"> <StartDate>01/12/2004</StartDate> <Months>6</Months> </Job> <Job title = "Senior Developer"> <StartDate>01/06/2005</StartDate> <Months>10</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>2</Jobs> <JobDetails> <Job title = "Junior Developer"> <StartDate>01/05/1999</StartDate> <Months>25</Months> </Job> <Job title = "Junior Developer"> <StartDate>01/07/2001</StartDate> <Months>3</Months> </Job> </JobDetails> </Employment> </EmploymentHistory> <Available>true</Available> <Experience unit="years">6</Experience> </Employee>
The above data should be shredded into 5 blocks (i.e. one for each other <Job> block), each of which will leave all other tags the same and will have only one <Job> element. So, given the 5 different <Job> blocks in the above example, the converted (βshreddedβ) XML will
<Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/10/2001</StartDate> <Months>38</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/12/2004</StartDate> <Months>6</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="US"> <Comment>List of previous jobs in the US</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Senior Developer"> <StartDate>01/06/2005</StartDate> <Months>10</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Junior Developer"> <StartDate>01/05/1999</StartDate> <Months>25</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee> <Employee name="A Name"> <Address>123 A Street</Address> <Age>28</Age> <EmploymentHistory> <Employment country="UK"> <Comment>List of previous jobs in the UK</Comment> <Jobs>3</Jobs> <JobDetails> <Job title = "Junior Developer"> <StartDate>01/07/2001</StartDate> <Months>3</Months> </Job> </JobDetails> <Available>true</Available> <Experience unit="years">6</Experience> </Employment> </EmploymentHistory> </Employee>