Back to Basics…Making a Case for Analyzing the Web Server Logs

Sometimes it take a long time to clearly see what you want. I’ve been struggling for months about how do we see where our latency is from in an end-to-end manner. This is a major need for the team and not just a curiosity. It’s been incredibly challenging because the data is spread around across many…many sources. Ideally we would be able to get the data within LoadRunner. If it was in LoadRunner, it could easily get into Galileo. As of late, Patrick and I have been talking about bringing third-party data sources into Galileo anyways…so having the data in LoadRunner is no longer a requirement.

So this got me thinking…Our friends and colleagues from WebCT did an excellent job with their performance repository. It is something that even to this day I commend about the team. One tool that they wrote was called the RequestCounter. This was a simple Perl script that created a CSV listing all requests found in the web server log, the number of occurrences and the response times of those requests. If we are able to implement a similar process, we can solve the problem we have today with LoadRunner and HTTP metrics. Right now LoadRunner is unable to store all HTTP operations performed during a test. LoadRunner naturally filters the data out and focuses on a small percentage of the data. Last I recall it was some ridiculous number like 15% of all data is available.

There’s a little bit of work to make this happen. First, we need to make changes to our Bb configuration after installation. Both Apache and IIS need to be modified in order to capture Response Times. Right now Apache is giving us a little bit of a problem. The problem is that the option of %T shows response times in seconds. Another option is available in Apache 2 (%D) which shows Microseconds. The problem is it is not supported in Apache 1.3. I’ve logged a case with Covalent/Spring Source (Incident 6165) in the meantime to see if they can help. I haven’t yet dipped into IIS to see how easy or hard it will be to get the information we desire. Below is what we presently have for Apache. Note that we do not make use of %T right now.

Sadly, we can only really get response time data going forward. All of those application logs we held onto from previous tests really won’t offer us much help in accomplishing this goal. We are only going to be able to accommodate this request by issuing a new log format.

# The following directives define some format nicknames for use with
# a CustomLog directive (see below).
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{User-Agent}i\" \"%{Cookie}i\"" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
What are We Trying to Accomplish

There are probably a dozen or two reasons to accomplish this task. The immediate number 1 reason is to treat each HTTP request as a transaction. We want to look at every single HTTP request (200, 300 and 400)...all POSTs and GETs that get recorded in the web server logs. I would like to be able to capture statistics on each HTTP object in an identical fashion to Galileo. Most importantly, we need to be able to look at total time from an HTTP object perspective. Having that data is essential to narrowing our focus for investigation. I think one thing we should consider is studying individual sessions. Right now it could be pretty difficult to integrate the Loadrunner results with this log data. It would be great to be able to look at individual users or even courses, from the perspective of the data model and look at HTTP requests individually. I would imagine a report that shows a report of each user and a few key metrics about the session:
  • User Name
  • Time of First Request (Actual Date Format)
  • Time of Last Request (Actual Date Format)
  • HTTP Requests
  • Posts vs. Gets
  • Total Time of Session
  • Other metrics (TBD)
It would be great to be able to click into each user individually and get summary statistics for each HTTP object. This would be relative to the user's session. It would be very much like Coradiant. Links to Follow-up Research


One thought on “Back to Basics…Making a Case for Analyzing the Web Server Logs

  1. ez

    In less than one month we will finally be off WebCT/Blackboard Vista 3 and totally on Vista 8. Ever inflated Response Time, especially during peak usage, was invaluable to us DBAs/Sysamins pushing hard 20 months ago to make the move. We started over with new databases and are carrying forward only what we need so we can shed over 11TB. (The biggest ought to be around 3TB.)

    I was under the impression Coradiant was extensively used at Blackboard? Maybe it is not be able to tell the difference between internal network and server slow response times?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s