Is it possible to get the XML node type as it was defined in XSD?

I am parsing XML in python. I have an XSD schema for validating XML. Can I get the specific node type of my XML as it was defined in XSD?

For example, my XML (small part)

<deviceDescription> <wakeupNote> <lang xml:lang="ru"></lang> <lang xml:lang="en">English</lang> </wakeupNote> </deviceDescription> 

My XSD (again a small part of it):

 <xsd:element name="deviceDescription" type="zwv:deviceDescription" minOccurs="0"/> <xsd:complexType name="deviceDescription"> <xsd:sequence> <xsd:element name="wakeupNote" type="zwv:description" minOccurs="0"> <xsd:unique name="langDescrUnique"> <xsd:selector xpath="zwv:lang"/> <xsd:field xpath="@xml:lang"/> </xsd:unique> </xsd:element> </xsd:sequence> </xsd:complexType> <xsd:complexType name="description"> <xsd:sequence> <xsd:element name="lang" maxOccurs="unbounded"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute ref="xml:lang" use="required"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> 

During parsing, I want to know that my wakeupNote tag is defined in XSD as complexType zwv: description. How to do it (in python)?

Why do I need it? Suppose I have a lot of these XML, and I want to check that all of them have fields filled with English. It would be easy to verify that <lang xml:lang="en"></lang> empty, but it is allowed not to specify this tag at all.

So, the idea is to get all the tags that can contain language descriptions, and check that the <lang> present and has non-empty content for en.

UPD

Because during validation, my XML is validated against XSD, the validation engine knows the types of all nodes. I had a similar question 7 months ago, which still has not received an answer. They are connected, IMHO. Validating and populating default values ​​in XSD-based XML in Python

+4
source share
2 answers

If the question arises: how to find the type name for a given XML node? The answer is to use xpath in python to find it. Xpath to run on xsd will be

 //element[@name='wakeupNote']/@type 

this should return zwv: description. If it returns two types, you have to go from the root

 /root/foo/wakeupNote (type A) /root/bar/wakeupNote (type B) 

It will be tiring to go from the root. You will have to look for both standalone and named types.

If the question arises: how to find all the XML nodes of a given type? If the circuit will change frequently, you can test the type of each node when analyzing this method.

If the circuit is well known, fixed, and the nodes you are looking for can be found using XPATH, you can test each node.

 //@xml:lang='en' 

Then use python to check the length of each one.

In the case of a stable circuit, you can write a second XSD that will fulfill the criteria you are looking for.

0
source

You are right that the validator must know the type associations of all the elements and attributes that it checks, and that the validator can thus provide access to this information.

However, for better or worse, both the API between the caller and the validator, and the choice of information related to the verification information available to the caller, are fully defined. Some validators (Xerces J is a great example) provide a complete set of validation data available; others do not.

Without knowing which validator you are using, no one can say with certainty whether the type information you are looking for is available. Since you are calling a validator, there must be an API; if type associations are available through the API, the documentation will probably say that. If the API does not provide access to it, this may be due to the fact that the main authentication module does not provide access to information, or it may be because the creator of the API did not see this point; your job (if you want to continue this) will be to figure out what comes of this, and then try to convince the relevant parties that it would be useful to make this information available.

If you are unable to access the information through the API, you can help yourself with a more complex version of the approach mentioned in another answer by David B. This XSD schema property that controls the type of any element is strictly a function of the path to this element from the validation root, therefore in principle (if this is more than a little tiring in practice) for any element in a document instance, you can determine what its control type will be if the document instance is checked for a specific scheme. For example, for the case you are talking about, it is simple to say whether a given wakeupNote deviceDescription or otherElement as an ancestor or which is the closest ancestor if wakeupNote has both, and infer a definition of the appropriate type of control based on this knowledge.

Assistance in this manner is likely to require a non-trivial amount of work. This would help if there were general-purpose tools for calculating this information and making it available in various forms, but if they exist, I do not know about them. (I know people who could build such a tool for a fee.) So, if I were you, I would try to get the information through the API first.

0
source

Source: https://habr.com/ru/post/1337071/


All Articles