In Apache Axis2/C AXIOM is used as the basic object model to represent XML. AXIOM provide a DOM like API that allows to traverse and build the XML very easily.

Anyway in underneath, AXIOM is different from DOM, as it has used some techniques to optimize the parsing of the XML as suited specially for SOAP message processing in web services. For an example the SOAP processor can validate a SOAP message by reading only some parts of the SOAP header fields, and if it is not valid, they can completely skip processing the body part. And since AXIOM is designed to built from a stream of data retrieved from a transport, sometimes SOAP processors can validate the message without the need of reading the full stream.

Anyway there should be lot of application that needs this optimization in parsing XMLs. They can easily adapt AXIOM/C to their application. Here is an AXIOM/C tutorial that covers both parsing and building XMLs from AXIOM. In this post I’d like to mention a code that can be used to retrieve an AXIOM from a String (char buffer) which we call as deserialization.

    axiom_node_t* AXIS2_CALL
    deserialize_my_buffer (
        const axutil_env_t * env,
        char *buffer)
    {
        axiom_xml_reader_t *reader = NULL;
        axiom_stax_builder_t *builder = NULL;
        axiom_document_t *document = NULL;
        axiom_node_t *payload = NULL;

        reader = axiom_xml_reader_create_for_memory (env,
            buffer, axutil_strlen (buffer),
            AXIS2_UTF_8, AXIS2_XML_PARSER_TYPE_BUFFER);

        if (!reader)
        {
            return NULL;
        }

        builder = axiom_stax_builder_create (env, reader);

        if (!builder)
        {
            return NULL;
        }
        document = axiom_stax_builder_get_document (builder, env);
        if (!document)
        {
            AXIS2_LOG_ERROR (env->log, AXIS2_LOG_SI,
                    "Document is null for deserialization");
            return NULL;
        }

        payload = axiom_document_get_root_element (document, env);

        if (!payload)
        {
            AXIS2_LOG_ERROR (env->log, AXIS2_LOG_SI,
                    "Root element of the document is not found");
            return NULL;
        }
        axiom_document_build_all (document, env);

        axiom_stax_builder_free_self (builder, env);

        return payload;
    }

Regardless of the fact this piece of code is been used many time by Axis2 and application that uses Axis2, it has never been identified as a core AXIOM function. I think it is better we have this function as an alternative method to create an axiom.

axiom_node_t *AXIS2_CALL
axiom_node_create_from_buffer(const axutil_env_t *env, axis2_char_t *buffer);

I already suggested this in Axis2/C mailing list and hopefully it will be included from the next release.

Here when we create the axiom tree function from the character buffer, we used “axiom_xml_reader_create_for_memory” function. Anyway whenever transport read data stream from wire it always uses the “axiom_xml_reader_create_for_io” function.

    /**
     * This create an instance of axiom_xml_reader to
     * parse a xml document in a buffer. It takes a callback
     * function that takes a buffer and the size of the buffer
     * The user must implement a function that takes in buffer
     * and size and fill the buffer with specified size
     * with xml stream, parser will call this function to fill the
     * buffer on the fly while parsing.
     * @param env environment MUST NOT be NULL.
     * @param read_input_callback() callback function that fills
     * a char buffer.
     * @param close_input_callback() callback function that closes
     * the input stream.
     * @param ctx, context can be any data that needs to be passed
     * to the callback method.
     * @param encoding encoding scheme of the xml stream
     */
    AXIS2_EXTERN axiom_xml_reader_t *AXIS2_CALL
    axiom_xml_reader_create_for_io(
        const axutil_env_t * env,
        AXIS2_READ_INPUT_CALLBACK read_callback,
        AXIS2_CLOSE_INPUT_CALLBACK close_callback,
        void *ctx,
        const axis2_char_t * encoding);

As you may have noticed it requires us to implement a “read_callback” function. Here is an example function prototype to implement this callback.

    int AXIS2_CALL
    some_function(
            char *buffer,
            int size,
            void *ctx);

This function will be called by the parser as required to parse the XML read from some stream.

So if your application involves reading data from a stream you are always recommended to use this function (i.e. “axiom_xml_reader_create_for_io”) instead of “axiom_xml_read_create_for_buffer” to create the AXIOM model more effectively.

September 30th, 2008XPath in SimpleXML

SimpleXML as it name imply, is a very simple API to traverse XML implemented specially in PHP language. It is very similar to the XPath, but since it has more PHP friendly syntax PHP developers really like to use it.
As an Example for this XML:

<dwml>
  <data>
    <location>
      <location-key>point1</location-key>
      	<point latitude="37.39" longitude="-122.07"></point>
      </location>
  </data>
  .....
</dwml>

XPATH Query to take the latitude in more general way

/dwml/data/location/point/@latitude

Where as with simple XML it is just a familiar PHP statement,

$simplexml->data->location->point->attributes()->latitude

Anyway still you can use the xpath inside your simplexml code. You can execute xpath queries by calling xpath function from any SimpleXMLEelment. It will return an array of SimpleXMLElement that match your query. So for the above example your XPath query would be something  like this,

$simplexml= new SimpleXMLElement($xml);
$lats =  $simplexml->xpath('/dwml/data/location/point/@latitude');
echo $lats[0];

This simplicity allows you to choose between these two methods interchangeably as best fit per your application. Here are some cases that I think use of XPath is more easy.

Ability to use of XPath shorthand
Take the above example XML it self. If there is only one attribute named ‘latitude’ throughout the XML you can call that value by

//@latitude

If XML node name or attribute name contains characters like ‘-’ which are not allowed in PHP for variable names
In the example if you want to access the value inside ‘location-key’ node using simplexml it would be like,

echo $simplexml->data->location->location-key;

This will not give you the expected result as PHP will try to think ‘location’ and ‘key’ as two taken in ‘location-key’. So this particular code can be replaced with the xpath function.

$keys =  $simplexml->xpath('/dwml/data/location/location-key');
echo $keys[0];

You want to iterate through node with a same name in an XML

If the nodes which we want to iterate is in organized positions in an XML (like the one in following) both approaches can be used with same easiness.

<root>
  <mynode>value1</mynode>
  <mynode>value2</mynode>
  <mynode>value3</mynode>
  <mynode>value4</mynode>
  <mynode>value5</mynode>
</root>

But how if the ‘mynode’ was in different locations in an XML like this,

<root>
  <anothernode>
     <mynode>value1</mynode>
  </anothernode>
  <anotheranothernode>
     <anotheranotheranothernode>
       <mynode>value2</mynode>
     </anotheranotheranothernode>
     <mynode>value3</mynode>
  </anotheranothernode>
  <mynode>value4</mynode>
</root>

You can iterate all the ‘mynode’ nodes with the following xpath query.

//mynode

Note that this case can be handled easily in DOM with the getElementsByName.

To use the power of XPath functions and Axes
You can use the XPath functions like last(), position() and even string manipulation functions like substring() in a XPath statement.
For an example in the above example, if you want only take the value of last ‘mynode’ just use this expression

//mynode[last()]

And you can use the power of axes in Xpath Queries. If you want to iterate all the ancestors from current node just use this query

'ancestor::*'

Access elements with different namespaces

<saleItems>
   <ns1:car xmlns:ns1="http:/toyota.xxx.com">$3000</ns1:car>
   <ns2:car xmlns:ns2="http:/suziki.rrr.com">$4000</ns2:car>
</saleItems>

You want to extract the cars from simpleXML. You can do this by following code.

$simplexml= new SimpleXMLElement($xml);
$ns1_childs = $simplexml->children("http:/toyota.xxx.com");
echo $ns1_childs->car;

$ns2_childs = $simplexml->children("http:/suziki.rrr.com");
echo $ns2_childs->car;

Every time you access a different namespace you have to call the children method with the namespace as an argument.

If you use XPath approach, you first register the namespces with an prefix and just use those prefix in your XPath queries.

$simplexml= new SimpleXMLElement($xml);

$simplexml->registerXPathNamespace("p1", "http:/toyota.xxx.com");
$simplexml->registerXPathNamespace("p2", "http:/suziki.rrr.com");

$toyota_cars = $simplexml->xpath('//p1:car');
$suziki_cars = $simplexml->xpath('//p2:car');

echo $toyota_cars[0];
echo $suziki_cars[0];

SimpleXML is simple and powerful in its native form. But whenever it is impossible or difficult to use you don’t need to go back for tedious DOM or manual string manipulation. You can use the xpath queries to get the work done within the simplexml environment itself.

In a WSDL, XML Schema is the section where it define the message format for each operations, which eventually become the real API that users are interested. And it is the most tricky part of the WSDL. Nowadays there are many tools that you can design and use WSDLs without any needs in knowing the meaning of a single line of the WSDL. But there are situations that you may find it is better you have some knowledge in XML Schema section and in WSDL overall.
For this post I m taking a simple example of use of nillable=”true” and minOccurs=”0″. Take the following example.

<xs:element name="myelements">
  <xs:complexType>
    <xs:sequence>
     <xs:element name="nonboth" type="xs:string"/>
      <xs:element minOccurs="0" name="minzero" type="xs:int"/>
      <xs:element name="nilint" nillable="true" type="xs:int"/>
      <xs:element name="nilstring" nillable="true" type="xs:string"/>
      <xs:element minOccurs="0" name="minzeronil" nillable="true" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

Just ignore the meaning of what nillable and minOccurs attributes for now. You can safely say the following XML is valid for the above Schema.

<myelements>
  <nonboth>i can't be either nil nor skipped<nonboth>
  <minzero>3<minzero>
  <nilint><nilint>
  <nilstring>i can have null, but i cant skipeed</nilstring>
  <minzeronil>i can be skipped and have the nil value<minzeronil>
</myelements>

Take the first element ‘nonboth’ in the schema, It has not any minOccurs or nillable attribute. By default minOccurs equal to 1 and nillable equal to false. That mean it can’t have nil value nor it can not be removed from the xml.

Is that making an element nil and removing the element from the XML is same? No. Take the second element in the schema ‘minzerostring’. There you have minOccurs =”0″ but there are no nillable=”true”, mean it is non-nillable. The idea is whenever you don’t want that element in your xml, you can’t have the element keeping empty like

  <minzero xsi:nil="true"><minzero>

But you can remove the whole element from the XML (since it is minOccurs=0).

The opposite of the above scenario is nillable=”true” but minOccurrs != 0. Check the ‘nilint’ element in the schema. There you can’t skip the element ‘nilint’, you have to have the element <nilint/> but it can hold a nil value.

  <nilint xsi:nil="true"></nilint>

or simply

  <nilint xsi:nil="true"/>

Note that the correct way to declare the nil element is,

  <nilint xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>

You can understand why when we look at the third element ‘nilstring’. Say you set message the following element

  <nilstring></nilstring>

You can say that this is not nil, this is an empty string. In fact an empty string is nil in some other language, But if we take XML Schema as a language, then for someone to be nil, it have to have the xsi:nil attribute set to “true” or “1″.

So going back to the ‘minzero’ which is non-nillable, by theory you should be able to write the following xml,

  <minzero/>

Since you don’t have that xsi:nil=”1″ this is not a nil value, so the condition nillable=”false” condition is preserved. But unlike for string when you set an empty element for an integer, it doesn’t sound correct. So in practice whenever some schema says non-nillable you should set some valid value.

The last one is ‘minzeronil’ element which is both nillable=”true” and minOccurs=”0″. Whenever you don’t need to set a value for this element, you have the choice of either skipping the element or setting the value of the element to nil. It is obvious rather than setting a nil value it is better you just skip the element to make the XML shorter. This is really needed specially in web services where you need the payload to be minimum as much as possible.

Say you have to prepare the XML and you don’t have valid values for any of the element. So this can be the optimum XML you can create.

<myelements xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
  <nonboth><nonboth>
  <nilint xsi:nil="1"/>
  <nilstring xsi:nil="1">
</myelements>

Read this nice article in developer works on nillable=”true” and minOccurs=”0″ for more.

Last week I had an opportunity to write some CGI scripts in Perl. It is like going few years back in web development. And it gave me the answer why PHP become favorite over Perl among the web developers. It is not just PHP’s C-like friendly syntax, but also the ability to write inline script in html may have been a big factor.

In there we came across parsing following type of XML.

<ns1:person
       xmlns:ns1="http://dimuthu.org/example/perl_xml/xsd">
<ns1:name>PQR XYZ</ns1:name>
<ns1:age>25</ns1:age>
</ns1:person>

We started to use XML::Twig. And it is pretty straightforward. Here was our code.

use XML::Twig;

my $xml_str = <<E;
<ns1:person
    xmlns:ns1="http://dimuthu.org/example/perl_xml/xsd">
<ns1:name>PQR XYZ</ns1:name>
<ns1:age>25</ns1:age>
</ns1:person>
E

my $xt = XML::Twig->new();

$xt->parse($xml_str);

my $txt = $xt->root->findvalue("//ns1:name");
print $txt;

OK. It was working.

Anyway in practice we found that this is not the only way we receive the xml. That is the namespace prefix can be different. When you write an XML to an XML schema you are free to have your own prefixes for the namespaces. And in fact in practice different people, programs and vendors uses different prefixes.

So our program should be able to parse following XML too. (see the namespace prefix is changed)

<ns0:person
    xmlns:ns0="http://dimuthu.org/example/perl_xml/xsd">
<ns0:name>PQR XYZ</ns0:name>
<ns0:age>25</ns0:age>
</ns0:person>

And with the default namespace.

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

Both these occasions our code failed. And normally whenever there is an API to parse XPath, you can register namespaces. But XML::Twit there was no something like that. That made us to jump to use XML::LibXML. It is apparently little complicated than the simple Twig API. But it did work. (Look later for the code using XML::LibXML)

The story is actually there is a way to register namespaces in Twit, in fact it is not in  XML::Twit, it is in the XML::Twit:XPath module which is hardly any documented. It just duplicate the Twit API adding some functionalities to deal with namespaces. Apparently not the most elegant way of designing an API. Anyway Here is the code that works written using XML::Twit:XPath.

use XML::Twig;
use XML::Twig::XPath;

my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xtp = XML::Twig::XPath->new();
$xtp->parse($xml_str);

$xtp->set_namespace('ns',
       'http://dimuthu.org/example/perl_xml/xsd');

my $txt = $xtp->root->findvalue('//ns:name');
print $txt;

Note that I have register the namespace to the prefix ‘ns’, so in xpath quires I can use this prefix to refer namespaces.

Anyway Perl is not bad, it has dozens of modules to do the same thing. So just for the reference I will note down it here,

Using XML:XPath,

use XML::XPath;
my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xp = XML::XPath->new(xml => $xml_str);

$xp->set_namespace('ns',
       'http://dimuthu.org/example/perl_xml/xsd');
my $txt = $xp->findvalue('//ns:name'); # get name 

print $txt;

Then Using XML::LibXML;

use XML::LibXML;

my $xml_str = <<E;

<person xmlns="http://dimuthu.org/example/perl_xml/xsd">
<name>PQR XYZ</name>
<age>25</age>
</person>

E

my $xl= XML::LibXML->new();

$xml = $xl->parse_string($xml_str);

my $xpc = XML::LibXML::XPathContext->new($xml);

$xpc->registerNs('ns',
        'http://dimuthu.org/example/perl_xml/xsd');
my $txt = $xpc->findvalue('//ns:name'); # get name 

print $txt;

So it is all for this post, And one reason I didn’t tell you earlier and so obvious why PHP is popular over Perl. Look at http://php.net/dom or http://php.net/simplexml. It has a great documentation for every function to the every needle. Whenever Perl, Ruby thinking of going pass PHP, they have to consider this aspect too more seriously.


© 2007 Dimuthu’s Blog | iKon Wordpress Theme by Windows Vista Administration | Powered by Wordpress