The fastest solution to date is the StAX parser, especially because you only need a specific subset of the XML file, and you can easily ignore everything that you really don’t need using StAX, while you would get the event anyway. if you used a SAX analyzer.
But it is also a bit more complicated than using SAX or DOM. The other day, I had to write a StAX parser for the following XML:
<?xml version="1.0"?> <table> <row> <column>1</column> <column>Nome</column> <column>Sobrenome</column> <column> email@gmail.com </column> <column></column> <column>2011-06-22 03:02:14.915</column> <column>2011-06-22 03:02:25.953</column> <column></column> <column></column> </row> </table>
Here's what the latest parser code looks like:
public class Parser { private String[] files ; public Parser(String ... files) { this.files = files; } private List<Inscrito> process() { List<Inscrito> inscritos = new ArrayList<Inscrito>(); for ( String file : files ) { XMLInputFactory factory = XMLInputFactory.newFactory(); try { String content = StringEscapeUtils.unescapeXml( FileUtils.readFileToString( new File(file) ) ); XMLStreamReader parser = factory.createXMLStreamReader( new ByteArrayInputStream( content.getBytes() ) ); String currentTag = null; int columnCount = 0; Inscrito inscrito = null; while ( parser.hasNext() ) { int currentEvent = parser.next(); switch ( currentEvent ) { case XMLStreamReader.START_ELEMENT: currentTag = parser.getLocalName(); if ( "row".equals( currentTag ) ) { columnCount = 0; inscrito = new Inscrito(); } break; case XMLStreamReader.END_ELEMENT: currentTag = parser.getLocalName(); if ( "row".equals( currentTag ) ) { inscritos.add( inscrito ); } if ( "column".equals( currentTag ) ) { columnCount++; } break; case XMLStreamReader.CHARACTERS: if ( "column".equals( currentTag ) ) { String text = parser.getText().trim().replaceAll( "\n" , " "); switch( columnCount ) { case 0: inscrito.setId( Integer.valueOf( text ) ); break; case 1: inscrito.setFirstName( WordUtils.capitalizeFully( text ) ); break; case 2: inscrito.setLastName( WordUtils.capitalizeFully( text ) ); break; case 3: inscrito.setEmail( text ); break; } } break; } } parser.close(); } catch (Exception e) { throw new IllegalStateException(e); } } Collections.sort(inscritos); return inscritos; } public Map<String,List<Inscrito>> parse() { List<Inscrito> inscritos = this.process(); Map<String,List<Inscrito>> resultado = new LinkedHashMap<String, List<Inscrito>>(); for ( Inscrito i : inscritos ) { List<Inscrito> lista = resultado.get( i.getInicial() ); if ( lista == null ) { lista = new ArrayList<Inscrito>(); resultado.put( i.getInicial(), lista ); } lista.add( i ); } return resultado; } }
The code itself is in Portuguese, but you should understand what it is, here is the github repo .
source share