OSS Logo

Aspirin for AWStats

          0 votes
March 11, 2008 – 10:41 am

It’s been a while since my last post folks, work has been keeping me busy lately, and I haven’t had much time to write about new things going on in the open source world. Hopefully the usefulness of this post will make up for my absence.

Something I spend a significant amount of time dealing with in my line of work is website statistics. Eventually every website owner wants to know who is visiting their site and how often. It’s a gauge to some extent of how popular your site is, and therefore a measure of your ability to create something people will be attracted to, and in some cases buy from. The problem is that web statistics are a tricky beast. No two stats packages are going to give the same results, and no matter how perfect the setup, you are never going to get a completely accurate picture of your website traffic. At best, you’re going to get an educated guestimate of how many people have viewed your site.

There are dozens of different statistics applications out in the wild. By far the most common and most popular is AWStats. Of all the different choices, I would say AWStats provides the most complete and accurate picture you could get short of reading the raw logs yourself. But several problems can crop up if things aren’t just so with your server. I’ve especially seen problems on Windows servers where permissions don’t seem to work the way you expect them to. Perhaps this is part of my own lack of knowledge, but regardless one problem in particular has vexated me for years.

If, for one reason or another, AWStats goes for several days or months without updating correctly, you cannot just run an update and expect all your stats to magically appear. AWStats will parse the most recent log file and leave everything else blank. Up to now, the solution has been to go back and manually reparse each log file one at a time, in sequential order, from the very beginning of your logs (not just the beginning of the gap). You’ll also have this problem if you are installing AWStats on a site that has already been active for a while. Obviously this process can take hours. I’ve spent as much as two days reparsing the logs for one domain. Frankly, I don’t have time for this. So this morning when I was faced with the prospect of having to read in two years worth of logs for 25 domains all at once, I resolved to find an automated solution, and here is what I came up with:

<?php  //AWStats Rebuild Script © Josh Benson 2008

//On a Posix system, change these variables to point to the folder where the log folders
// for each site are kept, and the folder where awstats.pl is located
$logfolder = “/var/www/logs/”;
$awstats_bin = “/usr/local/awstats/wwwroot/cgi-bin/awstats.pl”;

//Get listing of folders (sites) that we need to update
$listing = array();
$command = “ls $logfolders”;
exec($command, $listing, $status);

//Go through each folder and update from each log file
foreach ($listing as $folder){
        echo “Updating $folder…\n”;
        $logfiles = array();
        $sitename = “www.”.$folder;
        $command = “ls -tr “.$logfolders.$folder;
        exec($command, $logfiles, $status);

        foreach($logfiles as $logfile){
            //Parse the logfile
            echo “\tReading $logfile…\n”;
            $fullpath = $logfolders.$folder.’/’.$logfile;
            $command = “$awstats_bin –update –config=$sitename –LogFile=\”$fullpath\”";
            echo $command . “\n”;
            exec($command, $result, $status);
            if ($status != 0){
                echo “Log Parse Failed!\n”;
                $failed = TRUE;
                echo $status . “\n”;
            }
            else {
                echo “\tLog $logfile read ok…\n”;
            }
        }
	if ($failed == TRUE){
			echo “$sitename was not updated correctly!\n”;
		}
		else {
			echo “$sitename updated successfully!\n”;
		}
 }
?>

This script assumes a couple things about your setup. It assumes that instead of having a log folder under the directory for each domain, that you have one log folder with subfolders for each domain on your server. This isn’t the standard setup for a lot of people, but it’s easy enough to change it and it’s a more efficient configuration in the long run. It also (obviously) assumes you’re running some flavor of Linux or BSD. In my case it was Fedora Core 4 with the RPM distribution of AWStats installed, but the script should be flexible enough to accommodate just about any distribution and configuration.

To run this script, simply open your favorite text editor and paste the code in, and save it as awupdate.php or something like that on your server. Since it uses full paths it can be run from anywhere on the server by issuing “php awupdate.php” from the command line. I would not recommend running this from a web browser, since it generates a lot of text output to show you the status of the update. There is no reason you could not use this script on just one domain, or just one domain at a time if you don’t mind editing it to reflect the correct directories.

Inevitably I’ll be posting a Windows version of this script, since I have to perform this type of update on Windows servers far more often than on Linux. The challenge on Windows is dealing with the different way the exec() function in PHP and the DIR command work. I’m not yet sure if DIR has command line options like the “ls” command on Unix to sort the results in a temporal way. After I do some more research, I’ll have an update about this in a few days. In the meantime, all you Linux/Apache admins can thank me by taking a few minutes out of the hours you’ll now have freed up to donate to my beer fund.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

You must be logged in to post a comment.