Tag Archives: abandonment

What is 90-9-1

If you asked me what 90-9-1 was back in 1992, I might have told you it was my favorite DC radio station (WHFS). In the context of performance, specifically response times 90-9-1 is a new way to define performance thresholds. The first data points (90) represents 90% of response times during a performance test returning in 0 to 2 seconds. The second data point (9) represents 9% of response times returning in 2 to 10 seconds. The third and final data point (1) is for 1% in which we accept response times greater then 10 seconds. We could technically throw in a second 1 at the end (90-9-1-1) to represent a 1% failure rate that we potentially will accept during a performance test.

Take a look at a blog of mine from a few weeks back where I debate our abandonment thresholds. I’ve had a few weeks to really think through this blog entry. Since then I’ve had the opportunity to watch the team collect 90-9-1 metrics for our 9.0 PVT and the Dell/VMWare benchmark, as well as present at BbWorld about performance forensics. The more and more I think about responsiveness in the application, the less I am willing to cross the 10 second barrier.

10 seconds is an eternity. It’s the equivalent of response time purgatory…Well maybe not that bad, but still 10 seconds is a long time. Why don’t you count to 10 and tell me how annoyed you are at the end of the count…This blog isn’t even 10 seconds long. I guess only if you can read as fast as the Micro Machines Guy.

Should We Reconsider Some of Our Abandonment Metrics and ClickPath Approaches

I had an interesting conversation with Rob from our UI team a few days back. We were talking about the book I desperately want him to read (Designing and Engineering Time: The Psychology of Time Perception in Software} and some of the thoughts the author talks about with page level responsiveness. Specifically, I am starting to get a little concerned about our implementation of the page wait processor. This of course is funny that I have anxiety over it since it was developed based on my original request.

I summed up my concerns with the following example…

Imagine you are in the lobby of a really big building. You aren’t quite sure how many floors exist in the building, but you have a rough calculation that there are more then 40 floors in the building. There are 4 elevators available to service your request, but none are available to you as they are each servicing requests on different floors. The elevator canvasing in the lobby is pretty simplistic. There is the call button which contains an up and down arrow button. Note: the building does contain a basement which can be taken down via elevator. To the side of each elevator is a nondescript light in the shape of an up or down arrow. Above each elevator is a digital box that says the present floor of the elevator. It does not say whether it is going up or coming down. It simply says the floor. It appears that 2 of the 4 are not functioning properly as they are not showing any numbers. All appear to be working without issue as there are no Out-Of-Service messages on any of them.

You select the up button and begin waiting. After what feels like 10 seconds, you look-up and notice that one of the elevators showing floor status is climbing from the 15th floor to the 28th floor and stopped. The other elevator with a working status message shows the elevator at the 42nd floor (ah ha… you at least know there are 42 floors…maybe more) and descending. It too stops at 20 and pauses. You have no idea what’s happening with the other two elevators. Because there’s no message, you assume maybe it’s not working or that it will come when it comes. Because you have no idea about the two elevators, you end up focusing all of your attention on the two that do provide you status.

Another 10 seconds passes…It’s starting to feel like minutes. Each second feels like an eternity. You watch as the one elevator that climbed to 28th begins to descend. It’s stopping every other floor. With each stop comes another 10 second pause. The elevator that was coming down has become to climb again. You feel almost cheated because you assumed that would be the first elevator to reach you and with a poof of a wand or better yet an impatient glaze from the corner of your eyes the elevator leaves you standing more frustrated then before.

I’ve set you up with a pretty descriptive amount of information thus far. What you are going through is what it feels like for users who experience our wait state page. You don’t know whether something is coming or going? It’s almost like you feel like you are taken hostage. The question remains, how could we have made it better? To answer that, I come back to the example of the elevator.

There are multiple problems with the elevator example:

  • Not all of the status lights are operational
  • You are unable to determine which direction the elevator is going
  • There’s no information that tells you what’s the estimated time until you are serviced
  • I think the third would be damn near impossible unless one of the elevators was considered a high-speed floor to ground elevator. Even that might not be all that accurate given it might pick-up passengers along the way. Someone could hold the door open button or something. There are definitely ways to make this problem more solvable. One would be to identify downward traversing elevators that are candidates to travel directly to the lobby. These elevators would identify themselves using the up or down arrow. Another would be to identify the most likely elevator to come to the lobby first. At least the prospective passenger could get a sense of which elevator they are waiting for and could estimate the expected arrival time. The key thing in these examples is that the elevator system is providing constructive feedback to the waiting passenger. Rather then spinning shiny wheels and a please wait message.

    Now that I put this in perspective, I ask myself “What was I/we thinking? This feature is just a mesmerizing black hole that makes the user feel no better then he/she felt like before making the request.” What really got me thinking about this was I recently started working on Coradiant again. I noticed that a feature that is part of the TrueSight device is this clock that tells you how long it takes to boot the server. While doing a WebEx with two of the developers of TrueSight I asked them how they were able to accurately measure the time it took for the server to boot. Couldn’t there be factors that made it faster or slower? What if the time wasn’t accurate, but rather was off significantly. In our case, the TrueSight device was set to 300 seconds. Notice I didn’t say 5 minutes. The reason it was set to 300 seconds is that seconds always seem a lot shorter then minutes. Low and behold it literally returned in +/- 2 seconds which is amazing.

    So I’ve gone a long way to say that we need to reconsider our approach for feedback progression to our users. We need to find a happier medium in which provide a mechanism to hold them captive, but not hostage. Holding a user captive is an entirely challenging process considering that time is not what it used to be. Back in the day (circa 1999), if something took too long, we would simply tell our users to make the application request, walk away from your computer for a coffee break and viola when you return your data will be available. That no longer applies in today’s InterWeb (Note my uncle calls the Internet the InterWeb. He jokes about it being a TV with interconnected tubes and knobs that present stock quotes and Hollywood gossip) where speed is absolutely key.

    We seriously need to reconsider our approaches for abandonment and calibration. Anything less then 2 seconds is really considered instantaneous or immediate. The natural flow of web activity is considered at 2 to 5 seconds. Once we cross the line of 5 seconds, we need to identify how can we maintain the attentiveness of our users without holding the hostage.

    What these data points are telling me is that we need to figure a few things out with regards to responsiveness:

    • Do we know which transactions in our system fall outside of the boundaries of instantaneous, immediate and continuous?
    • Do we have ideas around better ways to keep our user’s attention when they fall in the captive range?
    • What should we do beyond 10 seconds that makes our users not feel hostage, but informed?
    • How unforgiving will our users be if we can keep them informed, but that the responsiveness is beyond reason? For example, is it unfair to expect our reporting framework to generate detailed reports in seconds, rather then minutes?
    • Could we have done something with the workflow of report generation to make the user from not feeling like he/she is hostage by our processing framework?
    • Could we potentially calibrate how long things will take and develop an algorithm to present data back to the user to make he/she feel accurately informed?

    Before I forget…

    The reason I wrote this blog was from a conversation I had with Rob Shea. I mentioned that above. What I didn’t mention was that Rob and I discussed our approach to defining User Abandonment as well as our ClickPath Analysis approach. I explained how there are cognitive factors involved in our performance testing approach. We consider transactional utility, as well as human patience. I mentioned that our model is a model of percentages. Everything in the model is a percentage of something.

    One of the somethings we discussed were clickpaths. I talked about Markovian Models and how we use probability models to determine how users traverse through our system. Rob brought up a good point. He and I were talking about how there are multiple ways to do something in the system. I said we tend to be biased in our models and skew towards the idea that humans will look for the shortest path to do something. Rob brought up a great counter argument. He said, humans will not necessarily skew towards the shortest path. Many will take the path of familiarity. So in essence, we could have users who have been trained to do something a particular way and that way might not be the fastest. Those who take the familiar path are least likely to take risks, or in our case look for the shortest path.

    This is really a great point. Our distribution models could be highly flawed if we didn’t consider the path of familiarity. How do you identify the path of familiarity? We could still perform log analysis. At least that would provide insight into where and how users traversed through the system. It doesn’t tell you what is familiar or not.