12.4.1 Problem
You want to parse
an XML file using the DOM API. This puts the file into a tree, which you can
process using DOM functions. With the DOM, it's easy to search for and retrieve
elements that fit a certain set of criteria.
12.4.2 Solution
$dom = domxml_open_file('books.xml');
Here's how to read XML from a variable:
$dom = domxml_open_mem($books);
$root = $dom->document_element( );
function process_node($node) {
if ($node->has_child_nodes( )) {
foreach($node->child_nodes( ) as $n) {
process_node($n);
}
}
// process leaves
if ($node->node_type( ) = = XML_TEXT_NODE) {
$content = rtrim($node->node_value( ));
if (!empty($content)) {
print "$content\n";
}
}
}
process_node($root);
12.4.3 Discussion
The W3C's DOM provides a platform- and language-neutral method
that specifies the structure and content of a document. Using the DOM, you can
read an XML document into a tree of nodes and then maneuver through the tree to
locate information about a particular element or elements that match your
criteria. This is called tree-based parsing . In contrast, the non-DOM XML
functions allow you to do event-based parsing.
Additionally, you can modify the structure by creating,
editing, and deleting nodes. In fact, you can use the DOM XML functions to
author a new XML document from scratch; see Section 12.3
One of the major advantages of the DOM is that by following the
W3C's specification, many languages implement DOM functions in a similar manner.
Therefore, the work of translating logic and instructions from one application
to another is considerably simplified. PHP 4.3 comes with an updated series of
DOM functions that are in stricter compliance with the DOM standard than
previous versions of PHP. However, the functions are not yet 100% compliant.
Future PHP versions should bring a closer alignment, but this may break some
applications that need minor updates. Check the DOM XML material in the online PHP Manual at http://www.php.net/domxml for changes. Functions available in
earlier versions of PHP are available, but deprecated.
The DOM is
large and complex. For more information, read the specification at http://www.w3.org/DOM/ or pick up a copy of XML in a Nutshell; Chapter 18 discusses
the DOM.
For DOM parsing, PHP uses libxml, developed for the Gnome project. You can download it from
http://www.xmlsoft.org. To activate it, configure PHP with
--with-dom.
DOM functions in PHP are
object-oriented. To move from one node to another, call methods such as
$node->child_nodes( ), which returns an
array of node objects, and $node->parent_node( ), which returns the parent node
object. Therefore, to process a node, check its type and call a corresponding
method:
// $node is the DOM parsed node <book cover="soft">PHP Cookbook</book> $type = $node->node_type(); switch($type) { case XML_ELEMENT_NODE: // I'm a tag. I have a tagname property. print $node->node_name(); // prints the tagname property: "book" print $node->node_value(); // null break; case XML_ATTRIBUTE_NODE: // I'm an attribute. I have a name and a value property. print $node->node_name(); // prints the name property: "cover" print $node->node_value(); // prints the value property: "soft" break; case XML_TEXT_NODE: // I'm a piece of text inside an element. // I have a name and a content property. print $node->node_name(); // prints the name property: "#text" print $node->node_value(); // prints the content property: "PHP Cookbook" break; default: // another type break; }
To automatically search through a DOM tree for specific
elements, use get_elements_by_tagname( ) . Here's
how to do so with multiple book records:
<books>
<book>
<title>PHP Cookbook</title>
<author>Sklar</author>
<author>Trachtenberg</author>
<subject>PHP</subject>
</book>
<book>
<title>Perl Cookbook</title>
<author>Christiansen</author>
<author>Torkington</author>
<subject>Perl</subject>
</book>
</books>
Here's how to find all authors:
// find and print all authors
$authors = $dom->get_elements_by_tagname('author');
// loop through author elements
foreach ($authors as $author) {
// child_nodes( ) hold the author values
$text_nodes = $author->child_nodes( );
foreach ($text_nodes as $text) {
print $text->node_value( );
}
print "\n";
}
The get_elements_by_tagname( ) function returns an
array of element node objects. By looping through each element's children, you
can get to the text node associated with that element. From there, you can pull
out the node values, which in this case are the names of the book authors, such
as Sklar and Trachtenberg.