September 30th, 2008XPath in SimpleXML

SimpleXML as it name imply, is a very simple API to traverse XML implemented specially in PHP language. It is very similar to the XPath, but since it has more PHP friendly syntax PHP developers really like to use it.
As an Example for this XML:

<dwml>
  <data>
    <location>
      <location-key>point1</location-key>
      	<point latitude="37.39" longitude="-122.07"></point>
      </location>
  </data>
  .....
</dwml>

XPATH Query to take the latitude in more general way

/dwml/data/location/point/@latitude

Where as with simple XML it is just a familiar PHP statement,

$simplexml->data->location->point->attributes()->latitude

Anyway still you can use the xpath inside your simplexml code. You can execute xpath queries by calling xpath function from any SimpleXMLEelment. It will return an array of SimpleXMLElement that match your query. So for the above example your XPath query would be somethingĀ  like this,

$simplexml= new SimpleXMLElement($xml);
$lats =  $simplexml->xpath('/dwml/data/location/point/@latitude');
echo $lats[0];

This simplicity allows you to choose between these two methods interchangeably as best fit per your application. Here are some cases that I think use of XPath is more easy.

Ability to use of XPath shorthand
Take the above example XML it self. If there is only one attribute named ‘latitude’ throughout the XML you can call that value by

//@latitude

If XML node name or attribute name contains characters like ‘-’ which are not allowed in PHP for variable names
In the example if you want to access the value inside ‘location-key’ node using simplexml it would be like,

echo $simplexml->data->location->location-key;

This will not give you the expected result as PHP will try to think ‘location’ and ‘key’ as two taken in ‘location-key’. So this particular code can be replaced with the xpath function.

$keys =  $simplexml->xpath('/dwml/data/location/location-key');
echo $keys[0];

You want to iterate through node with a same name in an XML

If the nodes which we want to iterate is in organized positions in an XML (like the one in following) both approaches can be used with same easiness.

<root>
  <mynode>value1</mynode>
  <mynode>value2</mynode>
  <mynode>value3</mynode>
  <mynode>value4</mynode>
  <mynode>value5</mynode>
</root>

But how if the ‘mynode’ was in different locations in an XML like this,

<root>
  <anothernode>
     <mynode>value1</mynode>
  </anothernode>
  <anotheranothernode>
     <anotheranotheranothernode>
       <mynode>value2</mynode>
     </anotheranotheranothernode>
     <mynode>value3</mynode>
  </anotheranothernode>
  <mynode>value4</mynode>
</root>

You can iterate all the ‘mynode’ nodes with the following xpath query.

//mynode

Note that this case can be handled easily in DOM with the getElementsByName.

To use the power of XPath functions and Axes
You can use the XPath functions like last(), position() and even string manipulation functions like substring() in a XPath statement.
For an example in the above example, if you want only take the value of last ‘mynode’ just use this expression

//mynode[last()]

And you can use the power of axes in Xpath Queries. If you want to iterate all the ancestors from current node just use this query

'ancestor::*'

Access elements with different namespaces

<saleItems>
   <ns1:car xmlns:ns1="http:/toyota.xxx.com">$3000</ns1:car>
   <ns2:car xmlns:ns2="http:/suziki.rrr.com">$4000</ns2:car>
</saleItems>

You want to extract the cars from simpleXML. You can do this by following code.

$simplexml= new SimpleXMLElement($xml);
$ns1_childs = $simplexml->children("http:/toyota.xxx.com");
echo $ns1_childs->car;

$ns2_childs = $simplexml->children("http:/suziki.rrr.com");
echo $ns2_childs->car;

Every time you access a different namespace you have to call the children method with the namespace as an argument.

If you use XPath approach, you first register the namespces with an prefix and just use those prefix in your XPath queries.

$simplexml= new SimpleXMLElement($xml);

$simplexml->registerXPathNamespace("p1", "http:/toyota.xxx.com");
$simplexml->registerXPathNamespace("p2", "http:/suziki.rrr.com");

$toyota_cars = $simplexml->xpath('//p1:car');
$suziki_cars = $simplexml->xpath('//p2:car');

echo $toyota_cars[0];
echo $suziki_cars[0];

SimpleXML is simple and powerful in its native form. But whenever it is impossible or difficult to use you don’t need to go back for tedious DOM or manual string manipulation. You can use the xpath queries to get the work done within the simplexml environment itself.

Last week I had an opportunity to write some CGI scripts in Perl. It is like going few years back in web development. And it gave me the answer why PHP become favorite over Perl among the web developers. It is not just PHP’s C-like friendly syntax, but also the ability to write inline script in html may have been a big factor.

In there we came across parsing following type of XML.

<ns1:person
       xmlns:ns1="http://dimuthu.org/example/perl_xml/xsd">
<ns1:name>PQR XYZ</ns1:name>
<ns1:age>25</ns1:age>
</ns1:person>

We started to use XML::Twig. And it is pretty straightforward. Here was our code.

use XML::Twig;

my $xml_str = <<E;
<ns1:person
    xmlns:ns1="http://dimuthu.org/example/perl_xml/xsd">
<ns1:name>PQR XYZ</ns1:name>
<ns1:age>25</ns1:age>
</ns1:person>
E

my $xt = XML::Twig->new();

$xt->parse($xml_str);

my $txt = $xt->root->findvalue("//ns1:name");
print $txt;

OK. It was working.

Anyway in practice we found that this is not the only way we receive the xml. That is the namespace prefix can be different. When you write an XML to an XML schema you are free to have your own prefixes for the namespaces. And in fact in practice different people, programs and vendors uses different prefixes.

So our program should be able to parse following XML too. (see the namespace prefix is changed)

<ns0:person
    xmlns:ns0="http://dimuthu.org/example/perl_xml/xsd">
<ns0:name>PQR XYZ</ns0:name>
<ns0:age>25</ns0:age>
</ns0:person>

And with the default namespace.

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

Both these occasions our code failed. And normally whenever there is an API to parse XPath, you can register namespaces. But XML::Twit there was no something like that. That made us to jump to use XML::LibXML. It is apparently little complicated than the simple Twig API. But it did work. (Look later for the code using XML::LibXML)

The story is actually there is a way to register namespaces in Twit, in fact it is not inĀ  XML::Twit, it is in the XML::Twit:XPath module which is hardly any documented. It just duplicate the Twit API adding some functionalities to deal with namespaces. Apparently not the most elegant way of designing an API. Anyway Here is the code that works written using XML::Twit:XPath.

use XML::Twig;
use XML::Twig::XPath;

my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xtp = XML::Twig::XPath->new();
$xtp->parse($xml_str);

$xtp->set_namespace('ns',
       'http://dimuthu.org/example/perl_xml/xsd');

my $txt = $xtp->root->findvalue('//ns:name');
print $txt;

Note that I have register the namespace to the prefix ‘ns’, so in xpath quires I can use this prefix to refer namespaces.

Anyway Perl is not bad, it has dozens of modules to do the same thing. So just for the reference I will note down it here,

Using XML:XPath,

use XML::XPath;
my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xp = XML::XPath->new(xml => $xml_str);

$xp->set_namespace('ns',
       'http://dimuthu.org/example/perl_xml/xsd');
my $txt = $xp->findvalue('//ns:name'); # get name 

print $txt;

Then Using XML::LibXML;

use XML::LibXML;

my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xl= XML::LibXML->new();

$xml = $xl->parse_string($xml_str);

my $xpc = XML::LibXML::XPathContext->new($xml);

$xpc->registerNs('ns',
        'http://dimuthu.org/example/perl_xml/xsd');
my $txt = $xpc->findvalue('//ns:name'); # get name 

print $txt;

So it is all for this post, And one reason I didn’t tell you earlier and so obvious why PHP is popular over Perl. Look at http://php.net/dom or http://php.net/simplexml. It has a great documentation for every function to the every needle. Whenever Perl, Ruby thinking of going pass PHP, they have to consider this aspect too more seriously.