Perl XML Parsers And My Story

Last week I had an opportunity to write some CGI scripts in Perl. It is like going few years back in web development. And it gave me the answer why PHP become favorite over Perl among the web developers. It is not just PHP’s C-like friendly syntax, but also the ability to write inline script in html may have been a big factor.

In there we came across parsing following type of XML.

<ns1:person
       xmlns:ns1="http://dimuthu.org/example/perl_xml/xsd">
<ns1:name>PQR XYZ</ns1:name>
<ns1:age>25</ns1:age>
</ns1:person>

We started to use XML::Twig. And it is pretty straightforward. Here was our code.

use XML::Twig;

my $xml_str = <<E;
<ns1:person
    xmlns:ns1="http://dimuthu.org/example/perl_xml/xsd">
<ns1:name>PQR XYZ</ns1:name>
<ns1:age>25</ns1:age>
</ns1:person>
E

my $xt = XML::Twig->new();

$xt->parse($xml_str);

my $txt = $xt->root->findvalue("//ns1:name");
print $txt;

OK. It was working.

Anyway in practice we found that this is not the only way we receive the xml. That is the namespace prefix can be different. When you write an XML to an XML schema you are free to have your own prefixes for the namespaces. And in fact in practice different people, programs and vendors uses different prefixes.

So our program should be able to parse following XML too. (see the namespace prefix is changed)

<ns0:person
    xmlns:ns0="http://dimuthu.org/example/perl_xml/xsd">
<ns0:name>PQR XYZ</ns0:name>
<ns0:age>25</ns0:age>
</ns0:person>

And with the default namespace.

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

Both these occasions our code failed. And normally whenever there is an API to parse XPath, you can register namespaces. But XML::Twit there was no something like that. That made us to jump to use XML::LibXML. It is apparently little complicated than the simple Twig API. But it did work. (Look later for the code using XML::LibXML)

The story is actually there is a way to register namespaces in Twit, in fact it is not in  XML::Twit, it is in the XML::Twit:XPath module which is hardly any documented. It just duplicate the Twit API adding some functionalities to deal with namespaces. Apparently not the most elegant way of designing an API. Anyway Here is the code that works written using XML::Twit:XPath.

use XML::Twig;
use XML::Twig::XPath;

my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xtp = XML::Twig::XPath->new();
$xtp->parse($xml_str);

$xtp->set_namespace('ns',
       'http://dimuthu.org/example/perl_xml/xsd');

my $txt = $xtp->root->findvalue('//ns:name');
print $txt;

Note that I have register the namespace to the prefix ‘ns’, so in xpath quires I can use this prefix to refer namespaces.

Anyway Perl is not bad, it has dozens of modules to do the same thing. So just for the reference I will note down it here,

Using XML:XPath,

use XML::XPath;
my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xp = XML::XPath->new(xml => $xml_str);

$xp->set_namespace('ns',
       'http://dimuthu.org/example/perl_xml/xsd');
my $txt = $xp->findvalue('//ns:name'); # get name 

print $txt;

Then Using XML::LibXML;

use XML::LibXML;

my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xl= XML::LibXML->new();

$xml = $xl->parse_string($xml_str);

my $xpc = XML::LibXML::XPathContext->new($xml);

$xpc->registerNs('ns',
        'http://dimuthu.org/example/perl_xml/xsd');
my $txt = $xpc->findvalue('//ns:name'); # get name 

print $txt;

So it is all for this post, And one reason I didn’t tell you earlier and so obvious why PHP is popular over Perl. Look at http://php.net/dom or http://php.net/simplexml. It has a great documentation for every function to the every needle. Whenever Perl, Ruby thinking of going pass PHP, they have to consider this aspect too more seriously.

This entry was posted in personal, web services, wsf/perl and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *