Wikipedia’s claim that XSD is “not 100% self-describing” is correct in the sense that not every document that conforms to document schemas (S4SD) is capable of creating a valid schema. One reason for this is that there are limitations that cannot be expressed in XSD (for example, that the content of the xpath attributes must be syntactically correct for XPath expressions); the other is that XSD can only express constraints for a single document, while for the sake of correctness of the scheme, there are sequence constraints that apply to documents.
If S4SD took full advantage of the capabilities of XSD 1.1, one could come close to 100% coverage of all the rules of XML representation; I was hoping to try it, but it was never done. There will still be a few spaces.
Your plan for writing schema document processing software is one you should think carefully about. It is not easy to extract information from the raw documents of the schema, since there are so many different ways in which the author of the schema can express the same thing. An alternative is to work with an API that offers access to the "schema components" (such as element declarations and type definitions) that the schema processor creates from raw input documents. Several circuit processors offer such an API. Saxon-EE provides schema components in the SCM XML representation, which is much easier to process than raw XSD documents.
source share