Saturday, 1 March 2008

Lies damn lies and web stats

Been a little while, but lots of things have been happening web-wise so I had better leave some thoughts here in case they fade from my mind (the main purpose of this blog is to prevent loss).

In my opinion if you have anything to do with web applications then it is absolutely vital  that your marketeers and sales people and the like are aware that web statistics are invariably generalizations, they are not exact and not usually fit to be treated in the same way as accurate financial accounts. This does not detract from their value but does mean that don't don't waste endless hours trying to account for tiny discrepancies etc.

Of course it depends what kind of analysis you do but increasingly people expect more than you get from the average server log. Many ways of getting very good information about your users are not 100% reliable, for example Google analytics provide a whole host of useful information but that depends on your visitor having JavaScript; This is not a problem if you accept that and make allowances for it.

There are many reasons why things may not sync up exactly, one that keeps haunting me is timestamps between different pieces of the system or between the database and the software application, not matching up exactly, other reasons apart from the JavaScript problem mentioned above could be down to cookie problems (or even absence of cookies), proxy servers (if you are monitoring sites accounts by IP address). data and links between data actually changing between the time of the original event and the processing of the statistics.

What can you do about it?, well if the processing is complicated and you have to tie a lot of elements together to cook your final reports, it seems that it is always more reliable to capture as much of the relevant information you need at the source, as it happens and then log that. For a simple example if you have a linking between an account and an IP address range then capture the user id and put it in your logs, as it happens. If you really have a pressing need to capture usage information 100% accurately then log each occurrence as it happens rather than trying to derive it from a url in your log.

Sometimes people treat web stats as absolute, this is not wise recently a number of people trying to sell us tools that included mailshot facilities proudly told us that their tool could track who actually opens the email you sent out 100% reliably. Of course they ended up looking a little foolish as they were talking about tracking the request of tagged images in HTML emails; which although it might be useful, is never going to be a 100% reliable method to determine if someone opens the mail you sent.

Blogged with the Flock Browser

No comments: