Document outline declarations and lxml

According to the official lxml documentation, if you want to check the xml document in the XML schema document, you need

  • build an XMLSchema object (basically parse a schema document)
  • build XMLParser by passing an XMLSchema object as an argument schema
  • parse the actual XML document (instance document) using the built-in parser

There may be variations, but the essence is pretty much the same no matter how you do it — the schema is specified “externally” (as opposed to specifying it inside the actual XML document).

If you follow this procedure, validation is done, but if I understand it correctly, it completely ignores the whole idea of ​​the XSI schemaLocation and noNamespaceSchemaLocation attributes

This introduces a number of limitations, starting with the fact that you have to deal with the relation of the ↔ all instance yourself (either store it from the outside, or write a hack to get the location of the scheme from the root element of the instance document), you cannot check the document with multiple schemas (for example, when each schema manages its own namespace), etc.

So the question is, maybe I'm missing something completely trivial or am I doing it wrong? Or my statements about lxml limitations regarding schema validation?

To remind, I would like to be able to:

  • for the analyzer to use schema location declarations in the instance document during parsing / verification
  • use multiple schemas to validate an XML document
  • , root ( )

, ? , , - lxml - xml- python // ( , )

+3
1

: , lxml .

In , :

  • ​​ → , , . , .
  • schemaLocation , schemaLocation, , URI/: xsi:schemaLocation="urn:schema1 schema1.xsd urn:schema2 schema2.xsd.
  • , , - schemaLocation, , root. , : .
+3

Source: https://habr.com/ru/post/1748647/


All Articles