Use of MySQL ‘GROUP BY’ to Derive Statistics

We use ‘GROUP BY’ SQL construct to query the data with aggregating some rows according to a field. For an example say if your blog database store your blogs in a table call ‘Blog’ and it has ‘Date’ as a field. If so

SELECT count(*) FROM Blog GROUP BY Date

will give you a set of numbers that represent the number of blog you posted each day.

SELECT Date, count(*) FROM Blog GROUP BY Date

will give you a map of ‘date’ to ‘number of blog posted for that date’ without much trouble. 

Anyway the problem is most of the databases of blogs don’t just keep the ‘date’ for a blog, rather it keep both ‘date and time’ (say in a field called ‘Time’). But you still want to group by date. You may use the MySQL ‘DATE’ function to convert the ‘Date and Time value’ to just ‘Date’ and use it in GROUP BY statement.

SELECT Date, count(*) FROM Blog GROUP BY DATE(Time)

If you take Drupal for a blog database, it save the time of the blog entry as a unix timestamp. So you have to derive the Date from the timestamp using the infamous FROM_UNIXTIME mysql function,

In Drupal the database table name to store blog is ‘node’ and the field name to store the create time is ‘created’ . So your query to get statistics of Drupal would be something like this.

SELECT DATE( FROM_UNIXTIME(`created`)), count(*)
FROM `node`
GROUP BY DATE( FROM_UNIXTIME(`created`))

Rather than converting the timestamp to SQL Date format, You can convert Date to timestamp and your sql statement may look little mathematical.

SELECT ROUND( (
UNIX_TIMESTAMP( NOW( ) ) - `created` ) / ( 24 *60 *60 )
), count( * )
FROM `node`
GROUP BY ROUND( (
UNIX_TIMESTAMP( NOW( ) ) - `created` ) / ( 24 *60 *60 )
)

In fact the expression “UNIX_TIMESTAMP( NOW( ) )`created` ) / ( 24 *60 *60 )” derives a number that represent the age of the post in days.

So this way you can derive statistics of your data with the use of ‘GROUP BY” construct. The ability to write complex queries in SQL syntax like this is really useful, specially when you access a remote database through a web services (i.e. Data Services) or using database drivers, you have to minimize the number of sql queries to execute as minimum as possible.

Here are some of the other aggregate function that you may use with GROUP BY, http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html

This entry was posted in DataServices, drupal, SQL, Tutorial/Guide and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *