In this case study “ PHP Data Services Extract Content from Drupal Database“, I intended to present how Data Service concepts can be applied to extract data with marketing value from  a CMS database and publish it as web services.  I used the drupal instance deployed at http://wso2.org as the CMS for the use case. And as the data service framework, I used WSF/PHP data services library, as it requires minimum changes to the existing infrastructure (the LAMP stack).

The case study also talks about how to consume the data service by any third party mashup to present textual/ graphical views of analyzed data. These mashups can be extended up to intergreate with social networks like facebook, twitter and etc to communicate back and forth a wider community and may be it can be used to track the distribution of active users using Google maps. Simply it makes easier to analyze business data + engagement with the community.

In Apache Axis2/C AXIOM is used as the basic object model to represent XML. AXIOM provide a DOM like API that allows to traverse and build the XML very easily.

Anyway in underneath, AXIOM is different from DOM, as it has used some techniques to optimize the parsing of the XML as suited specially for SOAP message processing in web services. For an example the SOAP processor can validate a SOAP message by reading only some parts of the SOAP header fields, and if it is not valid, they can completely skip processing the body part. And since AXIOM is designed to built from a stream of data retrieved from a transport, sometimes SOAP processors can validate the message without the need of reading the full stream.

Anyway there should be lot of application that needs this optimization in parsing XMLs. They can easily adapt AXIOM/C to their application. Here is an AXIOM/C tutorial that covers both parsing and building XMLs from AXIOM. In this post I’d like to mention a code that can be used to retrieve an AXIOM from a String (char buffer) which we call as deserialization.

    axiom_node_t* AXIS2_CALL
    deserialize_my_buffer (
        const axutil_env_t * env,
        char *buffer)
    {
        axiom_xml_reader_t *reader = NULL;
        axiom_stax_builder_t *builder = NULL;
        axiom_document_t *document = NULL;
        axiom_node_t *payload = NULL;

        reader = axiom_xml_reader_create_for_memory (env,
            buffer, axutil_strlen (buffer),
            AXIS2_UTF_8, AXIS2_XML_PARSER_TYPE_BUFFER);

        if (!reader)
        {
            return NULL;
        }

        builder = axiom_stax_builder_create (env, reader);

        if (!builder)
        {
            return NULL;
        }
        document = axiom_stax_builder_get_document (builder, env);
        if (!document)
        {
            AXIS2_LOG_ERROR (env->log, AXIS2_LOG_SI,
                    "Document is null for deserialization");
            return NULL;
        }

        payload = axiom_document_get_root_element (document, env);

        if (!payload)
        {
            AXIS2_LOG_ERROR (env->log, AXIS2_LOG_SI,
                    "Root element of the document is not found");
            return NULL;
        }
        axiom_document_build_all (document, env);

        axiom_stax_builder_free_self (builder, env);

        return payload;
    }

Regardless of the fact this piece of code is been used many time by Axis2 and application that uses Axis2, it has never been identified as a core AXIOM function. I think it is better we have this function as an alternative method to create an axiom.

axiom_node_t *AXIS2_CALL
axiom_node_create_from_buffer(const axutil_env_t *env, axis2_char_t *buffer);

I already suggested this in Axis2/C mailing list and hopefully it will be included from the next release.

Here when we create the axiom tree function from the character buffer, we used “axiom_xml_reader_create_for_memory” function. Anyway whenever transport read data stream from wire it always uses the “axiom_xml_reader_create_for_io” function.

    /**
     * This create an instance of axiom_xml_reader to
     * parse a xml document in a buffer. It takes a callback
     * function that takes a buffer and the size of the buffer
     * The user must implement a function that takes in buffer
     * and size and fill the buffer with specified size
     * with xml stream, parser will call this function to fill the
     * buffer on the fly while parsing.
     * @param env environment MUST NOT be NULL.
     * @param read_input_callback() callback function that fills
     * a char buffer.
     * @param close_input_callback() callback function that closes
     * the input stream.
     * @param ctx, context can be any data that needs to be passed
     * to the callback method.
     * @param encoding encoding scheme of the xml stream
     */
    AXIS2_EXTERN axiom_xml_reader_t *AXIS2_CALL
    axiom_xml_reader_create_for_io(
        const axutil_env_t * env,
        AXIS2_READ_INPUT_CALLBACK read_callback,
        AXIS2_CLOSE_INPUT_CALLBACK close_callback,
        void *ctx,
        const axis2_char_t * encoding);

As you may have noticed it requires us to implement a “read_callback” function. Here is an example function prototype to implement this callback.

    int AXIS2_CALL
    some_function(
            char *buffer,
            int size,
            void *ctx);

This function will be called by the parser as required to parse the XML read from some stream.

So if your application involves reading data from a stream you are always recommended to use this function (i.e. “axiom_xml_reader_create_for_io”) instead of “axiom_xml_read_create_for_buffer” to create the AXIOM model more effectively.

We use ‘GROUP BY’ SQL construct to query the data with aggregating some rows according to a field. For an example say if your blog database store your blogs in a table call ‘Blog’ and it has ‘Date’ as a field. If so

SELECT count(*) FROM Blog GROUP BY Date

will give you a set of numbers that represent the number of blog you posted each day.

SELECT Date, count(*) FROM Blog GROUP BY Date

will give you a map of ‘date’ to ‘number of blog posted for that date’ without much trouble. 

Anyway the problem is most of the databases of blogs don’t just keep the ‘date’ for a blog, rather it keep both ‘date and time’ (say in a field called ‘Time’). But you still want to group by date. You may use the MySQL ‘DATE’ function to convert the ‘Date and Time value’ to just ‘Date’ and use it in GROUP BY statement.

SELECT Date, count(*) FROM Blog GROUP BY DATE(Time)

If you take Drupal for a blog database, it save the time of the blog entry as a unix timestamp. So you have to derive the Date from the timestamp using the infamous FROM_UNIXTIME mysql function,

In Drupal the database table name to store blog is ‘node’ and the field name to store the create time is ‘created’ . So your query to get statistics of Drupal would be something like this.

SELECT DATE( FROM_UNIXTIME(`created`)), count(*)
FROM `node`
GROUP BY DATE( FROM_UNIXTIME(`created`))

Rather than converting the timestamp to SQL Date format, You can convert Date to timestamp and your sql statement may look little mathematical.

SELECT ROUND( (
UNIX_TIMESTAMP( NOW( ) ) - `created` ) / ( 24 *60 *60 )
), count( * )
FROM `node`
GROUP BY ROUND( (
UNIX_TIMESTAMP( NOW( ) ) - `created` ) / ( 24 *60 *60 )
)

In fact the expression “UNIX_TIMESTAMP( NOW( ) )`created` ) / ( 24 *60 *60 )” derives a number that represent the age of the post in days.

So this way you can derive statistics of your data with the use of ‘GROUP BY” construct. The ability to write complex queries in SQL syntax like this is really useful, specially when you access a remote database through a web services (i.e. Data Services) or using database drivers, you have to minimize the number of sql queries to execute as minimum as possible.

Here are some of the other aggregate function that you may use with GROUP BY, http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html


© 2007 Dimuthu’s Blog | iKon Wordpress Theme by Windows Vista Administration | Powered by Wordpress