XSLT for structure conversion + Ruby for value conversion?

We have quite large (~ 200 mb) xml files from different sources, which we want to convert to a common format.

For structural transformations (element names, nesting, etc.) we decided to use XSLT (1.0). Since it should be fast (we get a lot of such files), we chose Apache Xalan as the engine. Structural transformations can be quite complex (and not only <tag a> -> <tag b>) and different for xml files from different sources.

However, we also need to convert the values ​​of the elements. Conversions can be quite complex (for example, some require access to the Google Maps API, others require access to our database, etc.), so we decided to use a simple Ruby-based DSL, which is an xpath selector list, = > transformer objects, i.e.:

{"rss/channel/item" => {:class => 'ItemMutators', :method => :guess_location}

However, saving changes to elements other than value conversions seems more like a hack. Are there any better solutions?


For example, with Java, you can write extensions for xalan, and you can use them to convert values. Is there something similar, but for a ruby?


Thanks guys! All answers were very valuable. I'm thinking now :)

+3
source share
3

XSLT. - , Xalan Java : http://xml.apache.org/xalan-j/extensions.html

:

, XSLT , Xalan-Java . Xalan-Java , .

, -, - Ruby, xslt: http://greg.rubyfr.net/pub/packages/ruby-xslt/classes/XML/XSLT.html

+2

Ruby, , :

1) SAX- Ruby XML, XML- /

2) DOM XML , , DOM

SAX ( DOM-!), , . - XML, XML .

DOM XML , , .

, , .. , , ; , , . XSD DTD, Ruby , " " , .

, , .

- Ruby, - , , SAX , , DOM.

> SAX?

, - elese, XML , , , Java ( , Ruby, - !):

class MySAXHandler implements org.xml.sax.ContentHandler extends Object {
  final static int MAX_DEPTH=512;
  final static int FILETYPE_A=1;
  final static int FILETYPE_B=2;
  String[] qualifiedNames = new String[MAX_DEPTH];
  String[] localNames = new String[MAX_DEPTH];
  String[] namespaceURIs = new String[MAX_DEPTH];
  int[] meaning = new int[MAX_DEPTH];
  int pathPos=0;
  public java.io.Writer destination;
  ArrayList errorList=new ArrayList();
  org.xml.sax.Locator locator;
  public int inputFileSchemaType;

  String currentFirstName=null;
  String currentLastName=null;

  puiblic void setDocumentLocator(org.xml.sax.Locator l) { this.locator=l; }

  public void startElement(String uri, String localName, String qName,
    org.xml.sax.Attributes atts) throws SAXException { 

    // record current tag in stack
    qualifiedNames[pathPos] = qName;
    localNames[pathPos] = localName;
    namespaceURIs[pathPos] = uri;
    int meaning;

    // what is the meaning of the current tag?
    meaning=0; pm=pathPos==0?0:meanings[pathPos-1];
    switch (inputFileSchemaType) {
           case FILETYPE_A:
      switch(pathPos) {
        // this checking can be as strict or as lenient as you like on case,
        // namespace URIs and tag prefixes
             case 0:
        if(localName.equals("document")&&uri.equals("http://xyz")) meaning=1;
      break; case 1: if (pm==1&&localName.equals("clients")) meaning=2;
      break; case 2: if (pm==2&&localName.equals("firstName")) meaning=3;
        else if (pm==2&&localName.equals("lastName")) meaning=4;
        else if (pm==2) meaning=5;
      }
      break; case FILETYPE_B:
      switch(pathPos) {
        // this checking can be as strict or as lenient as you like on case,
        // namespace URIs and tag prefixes
             case 0:
        if(localName.equals("DOC")&&uri.equals("http://abc")) meaning=1;
      break; case 1: if (pm==1&&localName.equals("CLS")) meaning=2;
      break; case 2: if (pm==2&&localName.equals("FN1")) meaning=3;
        else if (pm==2&&localName.equals("LN1")) meaning=4;
        else if (pm==2) meaning=5;
      }
    }

    meanings[pathPos]=meaning;

    // does the tag have unrecognised attributes?
    // does the tag have all required attributes?
    // record any keys in hashtables...
    // (TO BE DONE)

    // generate output
    switch (meaning) {
      case 0:errorList.add(new Object[]{locator.getPublicId(),
        locator.getSystemId(),
        locator.getLineNumber(),locator.getColumnNumber(),
        "Meaningless tag found: "+localName+" ("+qName+
        "; namespace: \""+uri+"\")});
      break;case 1:
      destination.write("<?xml version=\"1.0\" ?>\n");
      destination.write("<imdoc xmlns=\"http://someurl\" lang=\"xyz\">\n");
      destination.write("<!-- Copyright notice -->\n");
      destination.write("<!-- Generated by xyz -->\n");
      break;case 2: destination.write(" <cl>\n");
        currentFirstName="";currentLastName="";
    }
    pathPos++;
  }
  public void characters(char[] ch, int start, int length)
            throws SAXException {
    int meaning=meanings[pathPos-1]; switch (meaning) {
    case 1: case 2:
              errorList.add(new Object[]{locator.getPublicId(),
        locator.getSystemId(),
        locator.getLineNumber(),locator.getColumnNumber(),
        "Unexpected extra characters found"});
    break; case 3:
      // APPEND to currentFirstName IF WITHIN SIZE LIMITS
    break; case 4:
      // APPEND to currentLastName IF WITHIN SIZE LIMITS
    break; default: // ignore other characters
    }
  }
  public void endElement(String uri, String localName, String qName)
    throws SAXException {
    pathPos--;
    int meaning=meanings[pathPos]; switch (meaning) { case 1:
      destination.write("</imdoc>");
    break; case 2:
      destination.write("  <ln>"+currentLastName.trim()+"</ln>\n");
      destination.write("  <fn>"+currentFirstName.trim()+"</fn>\n");
      destination.write(" </cl>\n");
    break; case 3:
      if (currentFirstName==null||currentFirstName.equals(""))
              errorList.add(new Object[]{locator.getPublicId(),
        locator.getSystemId(),
        locator.getLineNumber(),locator.getColumnNumber(),
        "Invalid first name length"});
      // ADD FIELD FORMAT VALIDATION USING REGEXES / RANGE CHECKING
    break; case 4:
      if (currentLastName==null||currentLastName.equals(""))
              errorList.add(new Object[]{locator.getPublicId(),
        locator.getSystemId(),
        locator.getLineNumber(),locator.getColumnNumber(),
        "Invalid last name length"});
      // ADD FIELD FORMAT VALIDATION USING REGEXES / RANGE CHECKING
    }
  }
  public void endDocument() {
    // check for key violations
  }
}

, ( - , ) .

SAX , XSLT. , , ...?

OTOH, XSLT, , , . {} , Xalan Ruby. , ( , !).

XML , :

& &

> >

< <

Non-ascii , UTF-8

..

, SAX Attributes , , , , , .

, , MAX_ERRORS 1000, " " .

XML , , - /, , DOM , , , , , SAX-, Google XML N .

> ~ 50 ,

> switch/case FORMAT_X .

, :

// set meaning and attributesValidationRule (avr)
if (fileFormat>=GROUP10) switch (fileFormat) {
  case GROUP10_FORMAT1: 

    switch(pathPos) {
    case 0: if (...) { meaning=GROUP10_CUSTOMER; avr=AVR6_A; }
    break; case 1: if (...) { meaning=...; avr=...; }
    ...
    }

  break; case GROUP10_FORMAT2: ...

  break; case GROUP10_FORMAT3: ...
}
else if (fileFormat>=GROUP9) switch (fileFormat) {
  case GROUP9_FORMAT1: ... 
  break; case GROUP9_FORMAT2: ...
}
...
else if (fileFormat>=GROUP1) switch (fileFormat) {
  case GROUP1_FORMAT1: ... 
  break; case GROUP1_FORMAT2: ...
}

...

result = validateAttribute(atts,avr);

if (meaning >= MEANING_SET10) switch (meaning) {
case ...:  ...
break; case ...:  ...
}
else if (meaning >= MEANING_SET9) switch (meaning) {
}
etc

, , .

> , , ,

> -

> (, Java Xalan).

, XSLT , , ?

, , , Google , , , , Google/db XML, " XML" , , , :

<?xml version="1.0" ?>
<myConsolidatedInputXmlDoc>
  <myOriginalOrIntermediateFormatDoc>
    ...
  </myOriginalOrIntermediateFormatDoc>
  <myFetchedRelatedDataFromGoogleMaps>
    ...
  </myFetchedRelatedDataFromGoogleMaps>
  <myFetchedDataFromSQL>
    ...
  </myFetchedDataFromSQL>
</myConsolidatedInputXmlDoc>

XSLT , Xalan.

+2

- Xalan-J , RPC Ruby. XSLT.

For tighter integration, you can bundle Xalan-C ++ as a Ruby library. You will probably need a small part of the Xalan API, similar to the one used in the XalanExe command-line driver. When Xalan is running in the process, your extensions can then directly access your Ruby model.

References:

+1
source

Source: https://habr.com/ru/post/1726360/


All Articles