What is the difference between the PHP DOM and SimpleXML extensions?

I donโ€™t understand why we need 2 XML parsers in PHP.

Can someone explain the difference between the two?

+44
php domdocument simplexml
Jan 26 '11 at 9:41
source share
5 answers

In a nutshell:

Simplexml

  • for simple XML and / or simple UseCases
  • limited API for working with nodes (for example, cannot program much per interface)
  • all nodes of the same type (the node element matches the node attribute)
  • nodes are magically accessible, for example. $root->foo->bar['attribute']

Dom

  • for any XML UseCase you can have
  • is an implementation of the W3C DOM API (found in many languages)
  • differentiates between different types of node (more control)
  • much more verbose due to explicit API (can encode interface)
  • can parse broken HTML
  • allows you to use PHP functions in XPath queries

Both are based on libxml and may affect libxml functions for some time .




Personally , I don't really like SimpleXml. This is because I do not like implicit access to nodes, for example. $foo->bar[1]->baz['attribute'] . It associates the actual XML structure with a programming interface. The one-node -type-for-all type is also somewhat unintuitive, since the behavior of SimpleXmlElement magically changes depending on its contents.

For example, when you have <foo bar="1"/> , the dump of /foo/@bar will be identical to that of /foo , but the echo will print with different results. Moreover, since both of them are SimpleXml elements, you can call the same methods on them, but they will be applied only when SimpleXmlElement supports it, for example. trying to do $el->addAttribute('foo', 'bar') in the first SimpleXmlElement will do nothing. Now, of course, itโ€™s correct that you cannot add the attribute to the Node attribute, but the point is that the node attribute will not expose this method in the first place.

But this is only my 2c. Make up your mind :)




There are no two parsers in sidenote , but a couple more in PHP . SimpleXml and DOM are just those that parse a document in a tree structure. The rest are parsers / readers / writers based on events or events.

Also see my answer to

  • Best XML Parser for PHP
+86
Jan 26 2018-11-11T00:
source share

I am going to make the shortest possible answer so that beginners can easily take it off. I also simplify things a bit for the sake of brevity. Go to the end of this answer for an overpriced version of TL; DR.




DOM and SimpleXML are not really two different parsers . The real parser libxml2 , which is used inside the DOM and SimpleXML. Thus, DOM / SimpleXML are just two ways to use the same parser, and they provide ways to convert one object to another .

SimpleXML should be very simple, so it has a small set of functions and is focused on reading and writing data . That is, you can easily read or write an XML file, you can update some values or delete some nodes ( with some restrictions! ), And what it is. No weird manipulations , and you don't have access to less common node types. For example, SimpleXML cannot create a CDATA partition, although it can read them.

The DOM offers a complete DOM implementation plus a couple of non-standard methods such as appendXML . If you are used to manipulating the DOM in Javascript, you will find exactly the same methods in the PHP DOM. There is basically no limit to what you can do, and it aligns the HTML processing. The flip to this wealth of features is that it is more complex and more verbose than SimpleXML.




Side note

People often wonder which extension they should use to process their XML or HTML content. In fact, the choice is easy, because from the very beginning there is no choice:

  • If you need to deal with HTML, you really have no choice: you should use the DOM
  • if you need to do something interesting, such as moving nodes or adding some raw XML, again you will have to use the DOM pretty much
  • If all you have to do is read and / or write some basic XML (for example, exchange data using the XML service or read the RSS feed), then you can use them. Or and .
  • if your XML document is so large that it does not fit into memory, you cannot use it, and you should use XMLReader , which is also based on libxml2, even more annoying, but still plays well with others



TL DR

  • SimpleXML is very easy to use, but only suitable for 90% of use cases.
  • The DOM is more complex, but can do anything.
  • XMLReader is very complex, but uses very little memory. Very situational.
+35
Jan 27 2018-11-11T00:
source share

As others noted, the DOM and SimpleXML extensions are not strictly โ€œXML parsers,โ€ but rather they are different interfaces to the structure generated by the libxml2 base parser.

The SimpleXML interface treats XML as a serialized data structure, just like processing a decoded JSON string. Thus, it provides quick access to the contents of the document with an emphasis on accessing elements by name and reading their attributes and text content (including automatic folding into entities and CDATA sections). It supports documents containing multiple namespaces (mainly using the children() and attributes() methods) and can search for a document using an XPath expression. It also includes support for basic content manipulation - for example, adding or overwriting elements or attributes with a new line.

The DOM interface, on the other hand, views XML as a structured document, where the presentation used is just as important as the data presented. Therefore, it provides much more detailed and explicit access to various types of "nodes", such as CDATA entities and sections, as well as some that are ignored by SimpleXML, such as comments and processing instructions. It also provides a much richer set of manipulation functions, allowing, for example, reordering nodes and choosing how to present textual content. Compromise is a rather complex API, with a large number of classes and methods; since it implements a standard API (originally designed to manipulate HTML into JavaScript), there may be less of a sense of โ€œnatural PHPโ€, but some programmers may be familiar with it from other contexts.

Both interfaces require a complete analysis of the document in memory and efficient completion of the pointers in this analyzed representation; you can even switch between two wrappers with simplexml_import_dom() and dom_import_simplexml() , for example, to add a "missing" function to SimpleXML using a function from the DOM API. For larger documents, pull-based XMLReader or event-based XML Parser may be more appropriate.

+3
May 18 '13 at 17:55
source share

SimpleXML is, as the name suggests, a simple parser for XML content and nothing more. You cannot parse, say, standard html content. It is quick and easy, and therefore an excellent tool for creating simple applications.

The DOM extension, on the other hand, is much more powerful. This allows you to parse almost any DOM document, including html, xhtml, xml. This allows you to open, write and even correct output code, supports xpath and general manipulation. Therefore, its use is much more complicated, because the library is quite complex, and this makes it an ideal tool for large projects where heavy data manipulation is required.

Hope that answers your question :)

+2
Jan 26 '11 at 9:50
source share

What DOMNodes can be represented by SimpleXMLElement?

The biggest difference between the two libraries is that SimpleXML is basically one class: SimpleXMLElement . In contrast, the DOM extension has many classes, most of which are a subtype of DOMNode .

So, one key question when comparing these two libraries is which of the many offers of DOM classes can be represented at the end of SimpleXMLElement ?

The following is a comparison table containing those DOMNode types that are really useful when it comes to XML (useful node types). Your mileage may vary, for example. when you need to deal with DTD, for example:

 +-------------------------+----+--------------------------+-----------+ | LIBXML Constant | # | DOMNode Classname | SimpleXML | +-------------------------+----+--------------------------+-----------+ | XML_ELEMENT_NODE | 1 | DOMElement | yes | | XML_ATTRIBUTE_NODE | 2 | DOMAttr | yes | | XML_TEXT_NODE | 3 | DOMText | no [1] | | XML_CDATA_SECTION_NODE | 4 | DOMCharacterData | no [2] | | XML_PI_NODE | 7 | DOMProcessingInstruction | no | | XML_COMMENT_NODE | 8 | DOMComment | no | | XML_DOCUMENT_NODE | 9 | DOMDocument | no | | XML_DOCUMENT_FRAG_NODE | 11 | DOMDocumentFragment | no | +-------------------------+----+--------------------------+-----------+ 

As shown in this table, SimpleXML has really limited interfaces compared to the DOM. Along with those in the table, SimpleXMLElement also abstracts access to child elements and attribute lists, and also provides traversal through element names (access to properties), attributes (array access), as well as Traversable , iterating its "own" children (elements or attributes) and offering access to the namespace using the children() and attributes() methods.

As long as this whole magical interface is beautiful, however, it cannot be changed by extension from SimpleXMLElement, as well as magic, as well as limited.

To find out what type of nodetype a SimpleXMLElement is, see below:

  • How to tell about all SimpleXML objects representing an element and attribute?

DOM follows DOMDocument Core Level 1 specifications here . You can do almost every conceivable XML processing with this interface. However, this is only level 1, so compared to modern DOMDocument levels like 3, it is somewhat limited for some cooler things. I am sure that SimpleXML is also lost here.

SimpleXMLElement allows you to cast subtypes. This is very important in PHP. The DOM also allows this, although a bit more, and you need to choose a narrower type.

XPath 1.0 is supported by both, the result in SimpleXML is an array from SimpleXMLElements , in DOM a DOMNodelist .

SimpleXMLElement supports listing for string and array (json), the DOMNode classes in the DOM do not. They offer casting to an array, but only like any other object (public properties like keys / values).

Common patterns for using these two extensions in PHP:

  • Usually you use SimpleXMLElement. The level of knowledge of XML and XPath is equally low.
  • After struggling with the magic of their interfaces, sooner or later a certain level of frustration will be reached.
  • You will find that you can import SimpleXMLElement into the DOM and vice versa. You will learn more about the DOM and how to use the extension to do what you failed (or failed to figure out) with SimpleXMLElement .
  • You noticed that you can load HTML documents with the DOM extension. And invalid XML. And format the output. Things SimpleXMLElement just can't do. Even with dirty tricks.
  • You might even completely switch to the DOM extension, because at least you know that the interface is more differentiated and allows you to do something. You also see an advantage in learning the DOM 1 level, because you can use it also in Javascript and other languages โ€‹โ€‹(a huge advantage of the DOM extension for many).

You can enjoy both extensions, and I think you should know both. The bigger, the better. All libxml-based extensions in PHP are very good and powerful extensions. And on Stackoverflow under php there is a good tradition to cover these libraries well, as well as detailed information.

+2
Jul 09 '13 at 20:50
source share



All Articles