Splitting a huge xml file> 10 GB into small pieces with Stax Parser

We have a scenario in which we need to split a large xml file larger than 10 GB in small pieces. Each piece should contain 100 or 200 elements. Xml example

<Employees> <Employee id="1"> <age>29</age> <name>Pankaj</name> <gender>Male</gender> <role>Java Developer</role> </Employee> <Employee id="3"> <age>35</age> <name>Lisa</name> <gender>Female</gender> <role>CEO</role> </Employee> <Employee id="3"> <age>40</age> <name>Tom</name> <gender>Male</gender> <role>Manager</role> </Employee> <Employee id="3"> <age>25</age> <name>Meghna</name> <gender>Female</gender> <role>Manager</role> </Employee> <Employee id="3"> <age>29</age> <name>Pankaj</name> <gender>Male</gender> <role>Java Developer</role> </Employee> <Employee id="3"> <age>35</age> <name>Lisa</name> <gender>Female</gender> <role>CEO</role> </Employee> <Employee id="3"> <age>40</age> <name>Tom</name> <gender>Male</gender> <role>Manager</role> </Employee> </Employees> 

I have a Stax parser code that breaks a file into small pieces. But each file contains only one complete Employee element, where I need 100 or 200 or more <Employee> elements in one file. Here is my java code

 public static void main(String[] s) throws Exception{ String prefix = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+"\n"; String suffix = "\n</Employees>\n"; int count=0; try { int i=0; XMLInputFactory xif = XMLInputFactory.newInstance(); XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("D:\\Desktop\\Test\\latestxml\\test.xml")); xsr.nextTag(); // Advance to statements element TransformerFactory tf = TransformerFactory.newInstance(); Transformer t = tf.newTransformer(); while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) { File file = new File("C:\\Users\\test\\Desktop\\xml\\"+"out" +i+ ".xml"); FileOutputStream fos=new FileOutputStream(file,true); t.transform(new StAXSource(xsr), new StreamResult(fos)); i++; } } catch (Exception e) { e.printStackTrace(); } 
+5
source share
2 answers

I hope I fix this, but you only need to increase the score every time you add one employer

  File file = new File("out" + i + ".xml"); FileOutputStream fos = new FileOutputStream(file, true); appendStuff("<Employees>",file); while (xsr.nextTag() == XMLStreamConstants.START_ELEMENT) { count++; t.transform(new StAXSource(xsr), new StreamResult(fos)); if(count == 100) { count = 0; i++; appendStuff("</Employees>",file); fos.close(); file = new File("out" + i + ".xml"); fos = new FileOutputStream(file, true); appendStuff("<Employees>",file); } } 

Its not very nice, but you get the idea

 private static void appendStuff(String content, File file) throws IOException { FileWriter fw = new FileWriter(file.getAbsoluteFile(),true); BufferedWriter bw = new BufferedWriter(fw); bw.write(content); bw.close(); } 
+2
source

Do not put me with each iteration, it should be updated with the last count, when your iteration reaches 100 or 200

how

 String outputPath = "/test/path/foo.txt"; while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) { FileOutputStream file = new FileOutputStream(outputPath,true); ... ... count ++; if(count == 100){ i++; outputPath = "/test/path/foo"+i+"txt"; count = 0; } } 
+2
source

Source: https://habr.com/ru/post/1237630/


All Articles