Monthly Archives: August 2009

Why Query Execution is So Important to Study in an E2E

In the past I have talked about the two easiest areas of the application stack to study are the front-end client experience and the database. I wanted to cover a few thoughts about E2E from a query perspective in this blog. I’m hoping that others on the team will add to my thoughts.

There’s a reason that I call-out queries rather then the database. The database is a system component whereas a query is transactional in nature. That’s an important differentiator as our E2E’s are supposed to be about the transaction and not the system. So when we look at queries from an E2E perspective, we are truly trying to isolate latency exhibited at the persistence tier. I like to classify culprits of latency in the following four ways:

  • Data Access or Persistence Layer Design Anti-Patterns
  • Inefficient SQL Design
  • Poor database structural design to support querying of one or more entities
  • Lack of understanding of data set the query interacts with

Data Access or Persistence Layer Design Anti-Patterns

To the naked eye a query like SELECT * from TABLE XXX where COLUMN YYY = :1 looks harmless. Basically, it translates to select all columns from table XXX where the predicate condition on column Y is equal to some value. A query like this is so wrong in so many ways. The first and most obvious is that by requesting all columns, we are immediately introducing a Wide Load data access pattern. We have a very simple mechanism inside of our persistence framework to filter a SimpleSelect in order to retrieve only columns we need. I can recall a few cases where we requested all columns of a 30 column table only to make use of 2 values. It turns out one of the columns not used, but retrieved was storing binary data. We essentially return mounds of unnecessary data over the wire only to throw it away at the presentation layer. What a waste!

What else is deceiving about this query (and I didn’t mention it) is that the persistence code calling this query could be calling it N times for N-records. This so-called Round-Tripping pattern has been seen throughout the years in our application. It more often then not could have been avoided by working through the query needs with greater precision. I think it’s fairly obvious that making unnecessary round trips to the database is simply a waste. It’s also a waste from a query perspective. I will go into this in a little more detail below.

Inefficient SQL Design

The best developers understand how to write good SQL. I say that with all serious of intentions because I am a firm believer that OO developers aren’t just coders in their given language. The best OO developers understand data access and entity modeling. SQL to these developers isn’t a foreign language that only DBAs understand. Rather, it’s a language that they are fluent.

Sometimes people make mistakes…we all make mistakes right? Inefficient SQL design is no different then making a mistake. It’s not like the developer responsible for writing a particular query was trying to be malicious. He/she simply made a mistake. Being able to identify inefficient SQL is a priority of a performance engineer.

Inefficient SQL can come in many forms. The most common form would be any of the following:

  • Incorrect driving table
  • Poor join order
  • Incorrect join type

There are others as well. I think it’s important for us to simply the problem a few ways. SQL is inefficient not necessarily because it runs slow. We could have inefficient SQL that runs fast. The inefficiency comes from poor use of interfaces and resources. Queries that make unnecessary logical I/Os are example of inefficient SQL. They most likely have chosen a poor access path, join order and/or join type. Queries that have to be continuously parsed is another example of inefficient SQL. A third is failing to identify predicates that are necessary for minimizing our available result set.

Poor Database Structural Design to Support Querying of One or More Entities

I think this goes without saying that it’s absolutely essential to make sure that the structural aspects of the database, specifically the DDL is appropriately defined to support queries. Developers have to be aware of their entity structures. They also have to be aware of data access patterns, data orientation (cardinality or makeup of the data) and frequency of data access in order to make design decisions about DDL.

As part E2E, it’s imperative that we review the supporting DDL structures for our queries to ensure efficiency of query execution. It’s important to exercise some executions of different use cases that make use of a given entity or set of entities. The reason for that is to understand why certain index structures have been defined. They may have been defined specifically for a given use case and not considered for another use case that you just happen to be working on.

Lack of Understanding of Data Set the Query Interacts

This last point is what I would call “King of the Obvious” from an SPE perspective. If you get a chance to read my old blog post, plus the attached article from Karen Morton, you will really get a sense of how important it is to understanding the data you are interacting with. As I mentioned above, how do you detect wide load antipatterns if you don’t know what data you need? What about the value of an index. If the data doesn’t have much variation, does it really make sense to index a column in which the optimizer most likely is going to recommend a Full Table Scan?

At the heart of SPE, we preach that we understand the functional and the technical. This last point is really an amalgam of the two. You really need to know your data needs (the orientation of the data you are querying) in order to move forward with a SQL optimization or SQL acceptance.

Relevance of Time on Task to SPE

Time on task is a simple way of studying the time it takes to perform a given function or task. It’s an effective way of measuring the efficiency of a workflow or design. Time on task can be quantified by studying the elapsed time from the beginning of a task until the end of a task.

More often then not the phrase is represented as a measurement within usability testing and human factors. There’s a fairly basic notion that if a user can perform X operation in Y amount of time, where Y is desirable, then the experience is better. That pretty much blends well with most of my beliefs and studies about page responsiveness. The faster a page responds, the more engaged and comfortable the user becomes with interacting with the page. Could you imagine a user actually complaining about how fast a page responded?

I have to admit, I have yet to work with a user who complained a page request was too fast. There have been some cases where users questioned the experience entirely. Speed was an attribute of their questioning. The context centered around the validity of data. In the few isolated cases that I can recall users questioning responsiveness, it was that the data returned was questioned. The page request may have come back in sub-second, but the data was disjointed or inaccurate. In this case the experience was muffed.

It’s not necessarily fair either to say that time on task is better when it’s faster. From an academic sense, there’s widely accepted research that suggests faster isn’t better. I’m by no means argue whether this is right or not. What I can say is that it makes sense to me that in many educational scenarios, it makes sense to determine the appropriate boundaries of time to complete a task beyond faster is better.

So Why Exactly is Time on Task Important?

Inside the software world, it’s important for certain tasks to take minimal amounts of time. Tasks which require minimal thinking, unlikely the exercise the brain are optimal candidates for short time on task measurements. Tasks that are performed redundantly by users are additional candidates, though with redundancy of task comes the opportunity for optimization of workflow. Critical tasks that can make or break adoption of a feature set are also candidates for short time on task. I believe they are important purely from the perspective that if a task is too cluegy to perform or simply takes too long, users performing that task are going to quickly become frustrated. The saviest of users will look for short cuts. When they can’t find the short cut, the either lose interest or abandon. Both cases directly affect adoption of a new task.

How Time of Task is Applied to Usability

Usability engineers use time on task as a core metric for observing the efficiency of a task. Elapsed time is often measured directly (stop watch or embedded timers) or indirectly via recording tools. Multiple samples of the same task are studied and analyzed. Often the data is presented in the same fashion we present data in Performance Engineering. Mean values (specifically the Geometric Mean) to complete the task, as well confidence intervals (UCI and LCI]) are studied to present a stastical view of time on task.

Initial Thoughts on Time on Task Relevance to SPE

I think the key piece of information that needs to be applied to SPE has to do with task efficiency. When it comes to responsiveness, we try to place a cognitive value on a task. The way we do that is apply a utility value for performing the task. We combine the utility value with a patience rating of a user. The combined utility + patience dictates the abandonment factor.

I believe time on task is in fact the missing piece of data that would make our abandonment decisions more meaningful. Come to think of it, really our abandonment decisions are guesses on what we believe will be the rational behavior of a user who becomes frustrated. It uses arbitrary response time factors to determine whether a user will become frustrated or not.

How exactly can we be the authoritarian on likelihood of abandonment if we do not have much context on expected time on task?

Some Factors Driving UI Performance Analysis

Sometimes it’s hard for performance engineers to wrap their minds around when it’s time to study a transaction in isolation versus going through a full-scale design of experiment. I wanted to put some thoughts down about what drives a UI performance analysis project. What I mean by a UI performance analysis project is a profiling project in which we are studying the end-to-end response time characteristics across our certified browsers. We have been calling this work E2E, which it is.

We first start by measuring end-to-end round trip response time for all of our browsers. Subsequently, we profile each browser to understand the impact of latency caused by the browser platform. Next we profile at the application tier. Finally, we study query execution at the database tier. Our end goal is that we can take a transaction and decompose latency at each layer in the end-to-end pipeline.

I’ve decided to put some initial thoughts about factors that should be used by SPE’s for recommending a UI Performance Analysis project. I’ll briefly summarize below.

Criteria for End-to-End Analysis

1. New Design Pattern: At the heart of SPE, we as performance engineers are to identify good design patterns and call-out poor anti-patterns. By design patterns I am talking not only API patterns, but also new approaches to workflow and client-side interaction from the UI. Whenever a new interface design pattern is introduced, we should study the behavioral characteristics of this interface across multiple browsers.

2. Variable Workload: Any time we allow the user to interact with a flexible or variable workload of data from a single page request, then we should without a doubt study how the workload affects page responsiveness.

3. Rich User Interface: Without a doubt, the richer the interface, the greater the need to study UI performance behavior.

4. Predictable Model of Concurrency: My argument about predictable models of concurrency is that use cases should be studied under non-concurrent scenarios in order to understand the service time of a single request. Once this is understood, a clearer picture can be had of the model under concurrent conditions.

5. Core State or Action: I am a firm believer that when we introduce a use case that will most likely change session behavior of a user, then it should be studied. If we essentially force users to perform a particular operation or traverse a particular page, then it should be studied.

6. Use Case Affecting User Adoption: This is a fairly broad statement. What I am getting at is that when a use case is going to increase adoption of the product or tool, then it’s a worthy candidate for studying. For example, a few weeks ago, 1-800-Flowers was the first to set-up a commerce site in Facebook. Their ultimate goal is to drive sales for their company. The underlying goal for a company like Facebook to enable such applications is to keep the application sticky so more users adopt and remain loyal to the application platform.

7. Resource and/or Interface Intensive Transaction Hypothesized: What I mean by this is that as SPE’s we hypothesize whether a transaction will be resource and/or interface intensive as part of our modeling efforts. If we have a shred of doubt that a transaction will have impact on the system execution model, then it should be an immediate candidate for analysis.

8. Transactions Affecting Cognition: We need to call-out transactions that affect how users perceive the transaction they interface with. Users have response time expectations. When those expectations are not achieved, users become impatient and/or abandon. Ultimately, poor responsiveness decreases adoption.

Interesting Visualization of Twitter

I stumbled across this site called Tori’s Eye which takes search input and visualizes tweets from Twitter using the search input. It’s an interesting approach to captivating user’s attention while also mind numbing to say the least of watching little birdies flash across the screen. It’s something my 3 year old daughter would love.

 

I think something like this would be a great teaching and communication capability for K-12…especially for 3rd grade and lower. This would be a great way for teaching children to interact with computers, while also demonstrating time based data.

There’s a similar audio experiment going on as well here of the same likeness.