PHP XML Expat parser: how to read only part of an XML document?
I have an XML document with the following structure:
<posts> <user id="1222334"> <post> <message>hello</message> <client>client</client> <time>time</time> </post> <post> <message>hello client how can I help?</message> <client>operator</client> <time>time</time> </post> </user> <user id="2333343"> <post> <message>good morning</message> <client>client</client> <time>time</time> </post> <post> <message>good morning how can I help?</message> <client>operator</client> <time>time</time> </post> </user> </posts> I can create a parser and print the entire document, however the problem is that I want to print only the (user) node and children with a specific attribute (id).
my php code is:
if( !empty($_GET['id']) ){ $id = $_GET['id']; $parser=xml_parser_create(); function start($parser,$element_name,$element_attrs) { switch($element_name) { case "USER": echo "-- User --<br>"; break; case "CLIENT": echo "Name: "; break; case "MESSAGE": echo "Message: "; break; case "TIME": echo "Time: "; break; case "POST": echo "--Post<br> "; } } function stop($parser,$element_name){ echo "<br>"; } function char($parser,$data){ echo $data; } xml_set_element_handler($parser,"start","stop"); xml_set_character_data_handler($parser,"char"); $file = "test.xml"; $fp = fopen($file, "r"); while ($data=fread($fp, filesize($file))) { xml_parse($parser,$data,feof($fp)) or die (sprintf("XML Error: %s at line %d", xml_error_string(xml_get_error_code($parser)), xml_get_current_line_number($parser))); } xml_parser_free($parser); } using this in the start() function, you can select the correct node, but this does not affect the reading process:
if(($element_name == "USER") && $element_attrs["ID"] && ($element_attrs["ID"] == "$id")) any help would be appreciated
UPDATE: XMLReader works, but when using the if statement it stops working:
foreach ($filteredUsers as $user) { echo "<table border='1'>"; foreach ($user->getChildElements('post') as $index => $post) { if( $post->getChildElements('client') == "operator" ){ printf("<tr><td class='blue'>%s</td><td class='grey'>%s</td></tr>", $post->getChildElements('message'), $post->getChildElements('time')); }else{ printf("<tr><td class='green'>%s</td><td class='grey'>%s</td></tr>", $post->getChildElements('message'), $post->getChildElements('time')); } } echo "</table>"; } As suggested in a comment earlier, you can use XMLReader Docs .
The XMLReader extension is an XML Pull parser. The reader acts as a cursor that moves forward along the flow of a document and stops at each node in the path.
This is a class (with the same name: XMLReader ) that can open a file. By default, you use next() to go to the next node. Then you need to check if the current position is in the element, and then if the element has the name you are looking for, and then you can process it, for example, by reading the external XML element of the XMLReader::readOuterXml() Docs element.
Compared to callbacks in the Expat parser, this is a bit cumbersome. To get more flexibility with XMLReader , I usually create iterators for myself that can work with the XMLReader object and provide the steps I need .
They allow you to iterate over specific elements directly using foreach . Here is an example:
require('xmlreader-iterators.php'); // https://gist.github.com/hakre/5147685 $xmlFile = '../data/posts.xml'; $ids = array(3, 8); $reader = new XMLReader(); $reader->open($xmlFile); /* @var $users XMLReaderNode[] - iterate over all <user> elements */ $users = new XMLElementIterator($reader, 'user'); /* @var $filteredUsers XMLReaderNode[] - iterate over elements with id="3" or id="8" */ $filteredUsers = new XMLAttributeFilter($users, 'id', $ids); foreach ($filteredUsers as $user) { printf("---------------\nUser with ID %d:\n", $user->getAttribute('id')); echo $user->readOuterXml(), "\n"; } I created an XML file containing a few more messages, for example, in your question, numbered in the id attribute from one or more:
$xmlFile = '../data/posts.xml'; Then I created an array with two ID values ββof the user of interest:
$ids = array(3, 8); It will be used in filter conditions later. Then an XMLReader is created, and the XML file is opened by it:
$reader = new XMLReader(); $reader->open($xmlFile); The next step creates an iterator over all the <user> elements of this reader:
$users = new XMLElementIterator($reader, 'user'); which are then filtered for id attribute values ββstored previously in the array:
$filteredUsers = new XMLAttributeFilter($users, 'id', $ids); The rest now iterates using foreach when all conditions are formulated:
foreach ($filteredUsers as $user) { printf("---------------\nUser with ID %d:\n", $user->getAttribute('id')); echo $user->readOuterXml(), "\n"; } which will return XML users with identifiers 3 and 8:
--------------- User with ID 3: <user id="3"> <post> <message>message</message> <client>client</client> <time>time</time> </post> </user> --------------- User with ID 8: <user id="8"> <post> <message>message 8.1</message> <client>client</client> <time>time</time> </post> <post> <message>message 8.2</message> <client>client</client> <time>time</time> </post> <post> <message>message 8.3</message> <client>client</client> <time>time</time> </post> </user> XMLReaderNode , which is part of the XMLReader iterators , also provides SimpleXMLElement Docs if you want to easily read the values ββinside the <user> element.
The following example shows how to get the number of <post> elements inside an <user> element:
foreach ($filteredUsers as $user) { printf("---------------\nUser with ID %d:\n", $user->getAttribute('id')); echo $user->readOuterXml(), "\n"; echo "Number of posts: ", $user->asSimpleXML()->post->count(), "\n"; } Then, Number of posts: 1 for user ID 3 and Number of posts: 3 for user ID 8 will be displayed.
However, if this external XML is large, you do not want to do this, and you want to continue the iteration inside this element:
// rewind $reader->open($xmlFile); foreach ($filteredUsers as $user) { printf("---------------\nUser with ID %d:\n", $user->getAttribute('id')); foreach ($user->getChildElements('post') as $index => $post) { printf(" * #%d: %s\n", ++$index, $post->getChildElements('message')); } echo "Number of posts: ", $index, "\n"; } Which produces the following output:
--------------- User with ID 3: * #1: message 3 Number of posts: 1 --------------- User with ID 8: * #1: message 8.1 * #2: message 8.2 * #3: message 8.3 Number of posts: 3 This example shows: depending on how large the nested children are, you can go further with the iterators available through getChildElements() , or you can also use a general XML parser, such as SimpleXML or even a DOMDocument for a subset of XML.
You can use PHP SimpleDomHTML (the HTML DOM parser, written in PHP5 +, allows you to very easily manipulate HTML code!). You can query your data the same way you work with jQuery. It supports HTML, so it certainly supports an XML document.
You can download and view the document here: http://simplehtmldom.sourceforge.net/