Do the Chances of Performance Issues Increase when Functional Bugs Exist?
On my way home from work this evening I gave Patrick a call to gain his perspective on an idea I had. Well, it’s not really an idea, but a question. Could it be possible for us to determine whether a sub-system or even a use case is more likely to experience performance issues if the sub-system or use case contains a high number of functional bugs? It’s an interesting question and I’m hoping we can gain some insight into this matter.
My personal hypothesis on the matter, granted this is not scientific or conclusive at this point, is that we would most likely be able to draw a correlation. Why might you ask? Well, I think it’s pretty obvious. If something is buggy, it’s most likely frail or poorly understood by the developer(s) who worked on it. Why would we expect something that is functionally buggy to be high-performing or scalable?
I’m going to spend the next few weeks trying to demystify this problem. I’m curious if anyone out here has thoughts on the matter?
Could there be a home for Episodes in our product…How about Jiffy
A little over a year ago, Steve Souders put together a presentation about a web page timing framework called Episodes. I guess very early in Souders career at Google either he asked to write about the idea or someone got the bug in his ear. Either way, Souders appears to be one of the only people out there talking about Episodes. My personal belief is that when Souders speaks, we should all listen.
According to Souders…
Episodes is a web performance measurement framework that solves these issues. It has the following key features:
- Supports measuring Web 2.0 applications by having the timing instrumentation integrated with the application’s client code.
- Separates the instrumentation from the data collection. This reduces the work for the application developer, allows multiple services to consume and report the information, and results in a lighter weight implementation.
- Is Open Source, gathering the best practices from across the industry without bias to any company or organization.
- Provides a single framework that can be used by web developers, tool developers, browser developers, and web metrics service providers.
The goal is to make Episodes the industrywide solution for measuring web page load times. his is possible because Episodes has benefits for all the stakeholders. Web developers only need to learn and deploy a single framework. Tool developers and web metrics service providers get more accurate timing information by relying on instrumentation inserted by the developer of the web page. Browser developers gain insight into what’s happening in the web page by relying on the context relayed by Episodes.
Most importantly, users benefit by the adoption of Episodes. They get a browser that can better inform them of the web page’s status for Web 2.0 apps. Since Episodes is a lighter weight design than other instrumentation frameworks, users get faster pages. As Episodes makes it easier for web developers to shine a light on performance issues, the end result is an Internet experience that is faster for everyone.What I find interesting about the idea is the programmatic nature of embedding a timing framework into a web request. This is not new, in fact Jiffy was one of the first to do such a thing. Either framework in my opinion is worth investigating.
I haven’t seen much web traffic on either Jiffy or Episodes in a while, so I ended up sending a quick note to Steve Souders to get his perspective on what’s happening next with either framework. Hopefully he will respond…
Either way I would love for us to investigate…
Impending Evaluation of Doloto
For the past year I’ve talked about a tool coming from Microsoft called Doloto. Doloto is an optimization tool for AJAX applications. Doloto makes AJAX applications run faster by stubbing out unnecessary code so that the application can start fast, and then downloading the necessary code on-demand if it is actually needed. This way application execution and code download is interleaved. It also means that with Doloto, rarely used code is rarely downloaded. I’m not sure if this could really apply to our code base, but it’s worth taking a look and evaluating.
It’s hard to say what Doloto is. If you take a look at this recent briefing, you get the sense that it could be a browser performance solution, but really it’s a server side optimization framework. It looks like Doloto takes large JavaScript functions and decomposes them into smaller pseudo code blocks which can be downloaded on demand. As you dig deeper, it looks like Doloto is simply some form of optimization proxy between the browser and the application server.
According to this blog Doloto does the following:
- Doloto profiles your application. Doloto performs profiling by running a local proxy on your machine that intercepts JavaScript files and instruments them to capture timestamps at runtime for every JavaScript function in a browser-independent manner.
- Profiling information is used to calculate code coverage and a clustering strategy. This determines which functions are stubbed out and which are not and groups functions into batches which are downloaded together, called clusters.
- Doloto rewrites JavaScript code. It then saves it to disk so that you can upload it to the server. The entire process happens on your machine, without needing access to the server. This way, you can profile and optimize the JavaScript of a any third-party site without special access to their servers. When you are satisfied with Doloto’s results, you can deploy the rewritten files to the server.
What am I Looking to Do?
First, I want someone to get a better understanding about what exactly Doloto is. Second, I would love for someone to play around with it, set it up and design a few experiments that make use of it. It could be that we find out very quickly that our application will struggle to work inside of a Doloto proxy and subsequently fail to leverage the advice coming from the profile.
The Kitchet Sink is Overflowing
So I would like to apologize to my few, but much appreciated readers for my nearly 2 month absence from blogging. I can promise that a new blog will be showing up regularly. I will try my best to get something posted shortly. I figured I would use this blog as a chance to clear the mind.
To say I’ve been busy is an understatement. On the personal front, my 8-month old still is not sleeping through the night. She wakes up either midnight and 5am or 2am and 6am. Either way, it’s a total downer. As much as “Little McGoo” is cute as a button, sadly I had to admit to my wife that the best part about getting to travel is the opportunity to get a full night’s sleep. The lack of sleep is wearing thin on my patience. Sadly, it’s taking a toll on my wife and my oldest daughter.
The work front is what’s truly wearing on me. I normally don’t write about my work situation. I love my job, my team and my company. I’m just a little tired from work. I’ll use the analogy “the kitchen sink is overflowing” to describe what’s going on. My company is very ambitious…I’m very ambitious as well. Sometimes juggling the volume of work, plus the personal/family side of life simply wears you down. I’ve simply got too many things going on.
What I lack these days is some me time. I’m sure my wife feels the same…she’s right in that she’s lacking the same thing. Maybe it’s the rain as of late (feels like the past 3 weeks has rained every waking moment). Or maybe it’s the fact that I’ve been slacking at working out. I’m not really sure what it is besides being a little overwhelmed.
I did have one parting story…so yesterday I went to Barnes and Noble to see if there’s any new books out there and in some ways to get some quiet time. I went to Barnes and Noble probably 8 times throughout the summer. Each time I did my normal route:
- Went to New Non-Fiction
- Strolled over to B&N Recommends
- Wandered over to the Computer/Software Second
- Next to the Economics, Management and Business Profile Racks
- Took the escalator up to the Books on CD
- Jumped around the corner to the Science/Psychology Section
For the 9th consecutive time I went home empty handed. It just feels like no good, new books are coming out. Maybe it’s me…or maybe I’ve just read them all. Regardless, the fact that I don’t have a good book to read and haven’t since last Spring is probably one of the factors affecting me. I always feel as though I have balance in my life when I make time to read.
Why Query Execution is So Important to Study in an E2E
In the past I have talked about the two easiest areas of the application stack to study are the front-end client experience and the database. I wanted to cover a few thoughts about E2E from a query perspective in this blog. I’m hoping that others on the team will add to my thoughts.
There’s a reason that I call-out queries rather then the database. The database is a system component whereas a query is transactional in nature. That’s an important differentiator as our E2E’s are supposed to be about the transaction and not the system. So when we look at queries from an E2E perspective, we are truly trying to isolate latency exhibited at the persistence tier. I like to classify culprits of latency in the following four ways:
- Data Access or Persistence Layer Design Anti-Patterns
- Inefficient SQL Design
- Poor database structural design to support querying of one or more entities
- Lack of understanding of data set the query interacts with
Data Access or Persistence Layer Design Anti-Patterns
To the naked eye a query like SELECT * from TABLE XXX where COLUMN YYY = :1 looks harmless. Basically, it translates to select all columns from table XXX where the predicate condition on column Y is equal to some value. A query like this is so wrong in so many ways. The first and most obvious is that by requesting all columns, we are immediately introducing a Wide Load data access pattern. We have a very simple mechanism inside of our persistence framework to filter a SimpleSelect in order to retrieve only columns we need. I can recall a few cases where we requested all columns of a 30 column table only to make use of 2 values. It turns out one of the columns not used, but retrieved was storing binary data. We essentially return mounds of unnecessary data over the wire only to throw it away at the presentation layer. What a waste!
What else is deceiving about this query (and I didn’t mention it) is that the persistence code calling this query could be calling it N times for N-records. This so-called Round-Tripping pattern has been seen throughout the years in our application. It more often then not could have been avoided by working through the query needs with greater precision. I think it’s fairly obvious that making unnecessary round trips to the database is simply a waste. It’s also a waste from a query perspective. I will go into this in a little more detail below.
Inefficient SQL Design
The best developers understand how to write good SQL. I say that with all serious of intentions because I am a firm believer that OO developers aren’t just coders in their given language. The best OO developers understand data access and entity modeling. SQL to these developers isn’t a foreign language that only DBAs understand. Rather, it’s a language that they are fluent.
Sometimes people make mistakes…we all make mistakes right? Inefficient SQL design is no different then making a mistake. It’s not like the developer responsible for writing a particular query was trying to be malicious. He/she simply made a mistake. Being able to identify inefficient SQL is a priority of a performance engineer.
Inefficient SQL can come in many forms. The most common form would be any of the following:
- Incorrect driving table
- Poor join order
- Incorrect join type
There are others as well. I think it’s important for us to simply the problem a few ways. SQL is inefficient not necessarily because it runs slow. We could have inefficient SQL that runs fast. The inefficiency comes from poor use of interfaces and resources. Queries that make unnecessary logical I/Os are example of inefficient SQL. They most likely have chosen a poor access path, join order and/or join type. Queries that have to be continuously parsed is another example of inefficient SQL. A third is failing to identify predicates that are necessary for minimizing our available result set.
Poor Database Structural Design to Support Querying of One or More Entities
I think this goes without saying that it’s absolutely essential to make sure that the structural aspects of the database, specifically the DDL is appropriately defined to support queries. Developers have to be aware of their entity structures. They also have to be aware of data access patterns, data orientation (cardinality or makeup of the data) and frequency of data access in order to make design decisions about DDL.
As part E2E, it’s imperative that we review the supporting DDL structures for our queries to ensure efficiency of query execution. It’s important to exercise some executions of different use cases that make use of a given entity or set of entities. The reason for that is to understand why certain index structures have been defined. They may have been defined specifically for a given use case and not considered for another use case that you just happen to be working on.
Lack of Understanding of Data Set the Query Interacts
This last point is what I would call “King of the Obvious” from an SPE perspective. If you get a chance to read my old blog post, plus the attached article from Karen Morton, you will really get a sense of how important it is to understanding the data you are interacting with. As I mentioned above, how do you detect wide load antipatterns if you don’t know what data you need? What about the value of an index. If the data doesn’t have much variation, does it really make sense to index a column in which the optimizer most likely is going to recommend a Full Table Scan?
At the heart of SPE, we preach that we understand the functional and the technical. This last point is really an amalgam of the two. You really need to know your data needs (the orientation of the data you are querying) in order to move forward with a SQL optimization or SQL acceptance.
Relevance of Time on Task to SPE
Time on task is a simple way of studying the time it takes to perform a given function or task. It’s an effective way of measuring the efficiency of a workflow or design. Time on task can be quantified by studying the elapsed time from the beginning of a task until the end of a task.
More often then not the phrase is represented as a measurement within usability testing and human factors. There’s a fairly basic notion that if a user can perform X operation in Y amount of time, where Y is desirable, then the experience is better. That pretty much blends well with most of my beliefs and studies about page responsiveness. The faster a page responds, the more engaged and comfortable the user becomes with interacting with the page. Could you imagine a user actually complaining about how fast a page responded?
I have to admit, I have yet to work with a user who complained a page request was too fast. There have been some cases where users questioned the experience entirely. Speed was an attribute of their questioning. The context centered around the validity of data. In the few isolated cases that I can recall users questioning responsiveness, it was that the data returned was questioned. The page request may have come back in sub-second, but the data was disjointed or inaccurate. In this case the experience was muffed.
It’s not necessarily fair either to say that time on task is better when it’s faster. From an academic sense, there’s widely accepted research that suggests faster isn’t better. I’m by no means argue whether this is right or not. What I can say is that it makes sense to me that in many educational scenarios, it makes sense to determine the appropriate boundaries of time to complete a task beyond faster is better.
So Why Exactly is Time on Task Important?
Inside the software world, it’s important for certain tasks to take minimal amounts of time. Tasks which require minimal thinking, unlikely the exercise the brain are optimal candidates for short time on task measurements. Tasks that are performed redundantly by users are additional candidates, though with redundancy of task comes the opportunity for optimization of workflow. Critical tasks that can make or break adoption of a feature set are also candidates for short time on task. I believe they are important purely from the perspective that if a task is too cluegy to perform or simply takes too long, users performing that task are going to quickly become frustrated. The saviest of users will look for short cuts. When they can’t find the short cut, the either lose interest or abandon. Both cases directly affect adoption of a new task.
How Time of Task is Applied to Usability
Usability engineers use time on task as a core metric for observing the efficiency of a task. Elapsed time is often measured directly (stop watch or embedded timers) or indirectly via recording tools. Multiple samples of the same task are studied and analyzed. Often the data is presented in the same fashion we present data in Performance Engineering. Mean values (specifically the Geometric Mean) to complete the task, as well confidence intervals (UCI and LCI]) are studied to present a stastical view of time on task.
Initial Thoughts on Time on Task Relevance to SPE
I think the key piece of information that needs to be applied to SPE has to do with task efficiency. When it comes to responsiveness, we try to place a cognitive value on a task. The way we do that is apply a utility value for performing the task. We combine the utility value with a patience rating of a user. The combined utility + patience dictates the abandonment factor.
I believe time on task is in fact the missing piece of data that would make our abandonment decisions more meaningful. Come to think of it, really our abandonment decisions are guesses on what we believe will be the rational behavior of a user who becomes frustrated. It uses arbitrary response time factors to determine whether a user will become frustrated or not.
How exactly can we be the authoritarian on likelihood of abandonment if we do not have much context on expected time on task?
Some Factors Driving UI Performance Analysis
Sometimes it’s hard for performance engineers to wrap their minds around when it’s time to study a transaction in isolation versus going through a full-scale design of experiment. I wanted to put some thoughts down about what drives a UI performance analysis project. What I mean by a UI performance analysis project is a profiling project in which we are studying the end-to-end response time characteristics across our certified browsers. We have been calling this work E2E, which it is.
We first start by measuring end-to-end round trip response time for all of our browsers. Subsequently, we profile each browser to understand the impact of latency caused by the browser platform. Next we profile at the application tier. Finally, we study query execution at the database tier. Our end goal is that we can take a transaction and decompose latency at each layer in the end-to-end pipeline.
I’ve decided to put some initial thoughts about factors that should be used by SPE’s for recommending a UI Performance Analysis project. I’ll briefly summarize below.
Criteria for End-to-End Analysis
1. New Design Pattern: At the heart of SPE, we as performance engineers are to identify good design patterns and call-out poor anti-patterns. By design patterns I am talking not only API patterns, but also new approaches to workflow and client-side interaction from the UI. Whenever a new interface design pattern is introduced, we should study the behavioral characteristics of this interface across multiple browsers.
2. Variable Workload: Any time we allow the user to interact with a flexible or variable workload of data from a single page request, then we should without a doubt study how the workload affects page responsiveness.
3. Rich User Interface: Without a doubt, the richer the interface, the greater the need to study UI performance behavior.
4. Predictable Model of Concurrency: My argument about predictable models of concurrency is that use cases should be studied under non-concurrent scenarios in order to understand the service time of a single request. Once this is understood, a clearer picture can be had of the model under concurrent conditions.
5. Core State or Action: I am a firm believer that when we introduce a use case that will most likely change session behavior of a user, then it should be studied. If we essentially force users to perform a particular operation or traverse a particular page, then it should be studied.
6. Use Case Affecting User Adoption: This is a fairly broad statement. What I am getting at is that when a use case is going to increase adoption of the product or tool, then it’s a worthy candidate for studying. For example, a few weeks ago, 1-800-Flowers was the first to set-up a commerce site in Facebook. Their ultimate goal is to drive sales for their company. The underlying goal for a company like Facebook to enable such applications is to keep the application sticky so more users adopt and remain loyal to the application platform.
7. Resource and/or Interface Intensive Transaction Hypothesized: What I mean by this is that as SPE’s we hypothesize whether a transaction will be resource and/or interface intensive as part of our modeling efforts. If we have a shred of doubt that a transaction will have impact on the system execution model, then it should be an immediate candidate for analysis.
8. Transactions Affecting Cognition: We need to call-out transactions that affect how users perceive the transaction they interface with. Users have response time expectations. When those expectations are not achieved, users become impatient and/or abandon. Ultimately, poor responsiveness decreases adoption.
Interesting Visualization of Twitter
I stumbled across this site called Tori’s Eye which takes search input and visualizes tweets from Twitter using the search input. It’s an interesting approach to captivating user’s attention while also mind numbing to say the least of watching little birdies flash across the screen. It’s something my 3 year old daughter would love.
I think something like this would be a great teaching and communication capability for K-12…especially for 3rd grade and lower. This would be a great way for teaching children to interact with computers, while also demonstrating time based data.
There’s a similar audio experiment going on as well here of the same likeness.
When Did This Java Issue Creep Up on Windows
I normally wouldn’t post something like this, but I think it’s weird that we are seeing this problem…plus I already posted the issue on Sun’s Java Forums.
Yesterday I ran into Mesfin as I was heading to the eye doctor. He mentioned that he was unable to start-up his developer build when his -Xmx parameter was set above 1024m. He said that other teammates reported the same issue on their Windows devpods. I sat with him at his machine for a few minutes to figure out what’s up. There’s a couple of things we know:
- Java on 32-bit Windows should support around a 1.7GB heap
- In Mesfin’s case, he was starting a VM with 1250mb + 256 MaxPerm + 70mb in stack memory, so it was clearly well below 1.7GB
- He could start up using 1024mb without issue, so it appears there’s an artificial barrier of around 1.3GB
- I recently had Windows Updates applied on both W2k3 Server and XP
- Apparently in the past there have been issues of these heap problems after Windows Updates.
- Someone else wrote about this a few months back, but did not come to a resolution as to why.
- Our stack size recently was changed to 320kb, up from 160kb
- I don’t think this is really the issue as I’ve dropped the stack size down a bunch.
- Only solution for keeping a 1250mb heap was to drop the PermSize below 10mb, though it was totally unscientific and more analysis between 256mb and 10mb would need to be explored.
So I decided to rebuild my machine with the latest and greatest mainline. I used the following parameters below with all of the other defaults with the product.
# Initial Java Heap Size wrapper.java.initmemory=1250m # Maximum Java Heap Size wrapper.java.maxmemory=1250m
I received the following exception:
INFO | jvm 1 | 2009/07/28 13:39:12 | Error occurred during initialization of VM INFO | jvm 1 | 2009/07/28 13:39:12 | Could not reserve enough space for object heap INFO | jvm 1 | 2009/07/28 13:39:12 | Could not create the Java virtual machine. ERROR | wrapper | 2009/07/28 13:39:12 | JVM exited while loading the application.
What’s crazy is that we have been doing tests non-stop with values above 1024m for months. So why are we seeing this now? Why hasn’t anyone else seen this issue out on the web (recently that is)? Why haven’t any customers reported this issue? If they used the tuning set from 9.0 SP1 on 32-bit they would have experienced it first hand.
I’m going to do a little more research. Curious if anyone has thoughts…
Back From Some Research
OK…so this is messed up. I tried launching the JVM with 1mb of PermSize and heap sizes of 1.4g, 1.2g, 1.1g, 1050mb and even 1025mb. Yes that’s right, 1mb more then 1GB and guess what happened? I’m not kidding you when I say I could not start the heap.
I then wanted to see what would happen if I used an MX of 1024m, but a PermSize larger. I tried PermSizes of 600m, 512m, 300m and 256m. The only one that would start without a heap reservation issue was the 256mb.
So we have a problem to say the least. It’s clearly not an application issue, but either a JDK issue or a Windows issue. Either way, the best folks to resolve this would be Sun. I’m going to submit a ticket and see what happens. In the meantime, I’m going to make support and engineering services aware of this issue.
Steve On SPE Voting of Clickpath Probability Models
I’m by no means attempting to be Joel Spolsky based on the title of this blog. I just wanted a simple way to say I’m really interested in talking about clickpath probability models. I think we need to consider a more effective manner of gathering input from our sprint teams with regards to our probability models that PE delivers.
Our main challenge is finding the time from others to be more interactive with our SPE efforts. Everyone is busy, so finding the time to get all of these different constitutents to provide input is darn near impossible. So maybe we have to figure out a better way to use their time more effectively for the inputs that we really need.
From an SPE perspective, the two pieces of data that we really need input from our team constituents is with regards to the lifecycle of our data model and our probability models. I think the data model discussion is a little too complex for me to start writing about here. So for simplicity purposes, I will focus on our probability models. Below is an example of a probability model taken from 9.1 about Assessment creation using question discovery.
![]()
Assessment authoring is by far one of the more critical content authoring activities in the system. It is also one of the most expensive from a resource and storage perspective. The process of authoring a question is treated with a lot TLC by content creators. How exactly do I know this? Well, I don’t have exact empirical data that says 84% of all assessment creation activity requires intense critical thinking and lasts as long as 3 hours end-to-end. Whoa…just throwing out an example. I do understand that assessments are instruments for measuring academic performance. Given that our culture is to put more utility in the measurement process, it seems likely that this would be an important activity.
I’m a victim of the problem I’ve described above, as well as a contributor. I could have easily solicited input from our team consitituents looking for their guidance about the critical nature of assessments. Some of those folks would have given an answer like mine, meaning they would believe this is a critical process given the nature of measurement in academic worlds. Some of them and hopefully the right group of people would give me the answers that they have heard first hand by the users of the system, the ones who really matter.
Getting that information is really hard. Asking the question isn’t necessarily tough. Getting the answer is tough, especially when the interviewee doesn’t have time to answer you. So I want to propose we open up our issue a little further by the team. We need to figure out how to get more precision in our probability model. Precision comes from input from our team consituents who are closer to the real data. The real data we want will shape our probability models.
As SPE engineers we should be able to read requirements and then specifications to build out all of the states, decision points and actions in order to identify the possible model paths. We should also be able to infer from the artifacts our teams give us such as the MRD and specifications our own pass at the probability models that shape the weighting of our clickpaths. What we really need is a way to receive input about these models in an easy to interpret and participate manner.
I would love for our team to rack their brain on possible solutions. At the end of our SPE day, we have to feel confident that we:
a) Receive the correct input and feedback about how users will engage with the application
b) Improve our transparency so that teams value our contributions and efforts toward the development of the product
c) Encourage our team members to consider probability models as an important component of software development
Steve’s Thoughts on Making This Happen
There’s no way that I am going to write a 2-page blog without providing my own input on a way we solicit other’s inputs. As I stated above, I think one of the challenges facing Product Managers, Designers and QA Analysts is a lack of time. For us to expect them to review our documents asynchronously is not too wise. For us to expect them to have time to meet face to face or over WebEx isn’t all that likely either. So we have to find a way to give them an easy way to review our work in a meaningful workflow that demands little time and not too much thought (as a means of reducing time and effort). They need a way to approve or question our probability models. Most importantly, we should consider building a tool set that in due time they would want to adopt themselves and ultimately seek value from (Refer to The Dip by Seth Godin).
The tool I think of is inpsired by a blog that Nakisa herself wrote back in March when she talked about a tool that’s out there on the web called Web Sequence Diagrams. This is an easy to use visualization tool that models UML interactively from simple Wiki markup and subsequently creates UML notations.
Another tool that has inspired me is Cuzillion from Steve Souders which is another simple modeling tool, but this time for modeling the ordering and sequence of web parts of an HTML page.
Both tools provide a very simple interface to construct models for their purpose. I envision that we need a tool that gives us the ability to model states, decision points and actions. We then need a way to highlight the critical paths. In essence, each path needs a weighting system to differentiate probability. We need the modeling aspect and the visualization aspect, but we also need a way to solicit input. One way is through a publish and vote process where constituents could be sent a workflow to review the probability model and then vote on its accuracy. We could also give the ability to fix the model and then compare the two or more models head to head.
If I didn’t mention it before, if we built a tool like this, it would go into Galileo (our home grown system) and be used for other data oriented projects.
As I said before and will say again, we really need input from the team about how to solve the problem of getting input. The manner in which we get the data is up for debate. Let’s start debating…
Comments (1)
Leave a Comment
Leave a Comment