Monthly Archives: August 2008

Old Blog Post: SQL Server 2005 Performance Dashboard Reports

Originally Posted on December 17, 2007

Let me start off by saying this is by no means new. Microsoft released SQL Server 2005 Performance Dashboard Reports back in early March of 2007. It took me until now to stumble across the tool, mainly because I’ve been out of the thick of things from a benchmark perspective. I spent the better half of the day playing with the report. It’s quit impressive and easy to configure.

The Performance Dashboard Reports are targeted toward SQL Server Administrators and other users; the objective of the report set is to act as both a health monitoring and diagnostic tool. Although it relies upon Reporting Services definition files (.rdl), Reporting Services does not need to be installed to use the Performance Dashboard Reports. This custom report set relies upon SQL Server’s dynamic management views (DMV’s) as a data source, providing the wealth of data the dynamic management views contain, while insulating the viewers of the information from the views and the structures underlying them. No additional sources, data capture or tracing is required to access and use this storehouse of performance information. Other obvious benefits of using these prefabricated views are constant availability of the information they contain and their inexpensive nature (from the tandem perspective of collection and querying) as a source of server monitoring.

The report set comes with a primary dashboard report file, as we shall see in the hands-on installation procedure that follows. This report file is loaded directly as a custom report in SQL Server Management Studio. The other Performance Dashboard Reports are accessed via the Reporting Services drill-through mechanism, each path of which is initially entered when the user clicks a navigation link on the main page. The linkages are pre-constructed, and, once the primary dashboard report is loaded as a Custom Report in Management Studio, the rest of the reports work “out of the box” automatically, without any additional setup.

You have to start by installing the add-on. It takes about 20 seconds to install. Once you have run the installer file, go to the directory in which the installer is placed. From there you will find a sql script called setup.sql. Run this against the SQL Server database you want to report. The instructions are a little misleading. They appear to make it seem like you have to run this for every schema in your 2005 instance. That’s not the case. It’s only for every named instance you have installed. From the same directory open the performance_dashboard_main.rdl file. It will format into an XML file. Close that file and you are now ready to play with the Dashboard. To open the Dashboard, open SQL Server Management Studio. Right mouse click on the named instance. From here, select Reports followed by Custom Reports. If you navigate to your install directory, you will see the performance_dashboard_main.rdl file again. Open this and viola you have your report.

Check-out this article for screen shots.

Start with this article from William Pearson. He breaks down each and every aspect of the report. Another article from Brad McGehee on SQL-Server-Performance.com is not as descriptive as the first article, but is pretty good. While I was on the SQL-Server-Performance.com site I came across other links worth taking a look at.
Other Interesting Links

* SQL Server 2005 Waits and Queues
* DBCC SHOWCONTIG Improvements in SQL Server 2005 and comparisons to SQL Server 2000
* Troubleshooting Performance Problems in SQL Server 2005
* Script Repository: SQL Server 2005
* Top 10 Hidden Gems in SQL Server 2005
* Top SQL Server 2005 Performance Issues for OLTP Applications
* Storage Top 10 Best Practices

Quality is Free

I’ve been thinking a lot lately as to why everything I have to assemble for my daughter comes with extra parts. It used to be that you would buy something that required assembly and by the time it arrived at your house it was DOA because of a missing or broken part. Nowadays it’s pretty difficult to find a doll house or a toy car requiring assembly come broken or missing parts. Why might you ask? Well, because it’s a lot harder to get away with that in today’s consumer marketplace. If the quality of a product or good isn’t up to snuff, then the consumer is going to go elsewhere.

Do we feel as though we are immune to consumers making another choice? I don’t think we intentionally do, but often we neglect to realize that Quality is Free. (Note to self: I didn’t invent the phrase Quality is Free, but rather it’s the name of book by Philip Crosby). Well, maybe it’s not totally free, but it’s a whole lot cheaper.

Let’s take an example…today we received a series of emails from Engineering Services asking us to benchmark a Solaris Cluster of Vista 4.2 because of a reported issue only seen on Solaris. Ordinarily this kind of request wouldn’t been too outrageous. If the request came on Linux, it could have been taken care of in minutes. Because it was on Solaris, which we have limited equipment, it required some juggling of equipment and re-arranging schedules so Anand could work on the problem. Needless to say, as I write this blog Anand is still having trouble getting an environment up and running.

There’s nothing out of the ordinary with this example, unless you ask the question “Why are 10 clients reporting this issue and we never saw it in our own lab?” Well if you understand the equipment on hand (limited Solaris equipment) and the amount of time it would take to do cycles on Solaris, you would know that we have done very little testing on Solaris. Most of our Unix work has been on Linux. The main reason is that our Solaris environments cost 2 to 4 times more then our Linux environments.

So let’s add up the costs. If we had purchased Solaris PVT servers, which we would need a minimum of 5 (~$9,000 each) for a total of $45,000. You factor in that we would need to run PVT cycles for Solaris, which would cost us about $3,000 to have one engineer perform what would be an additional month of work during a release. We will throw in an additional $2,000 in miscellaneous expenses to make our grand total about $50,000 in expenses.

From a cost perspective $50,000 isn’t all that much considering we spent several 100k on hardware as a department. What does the dollar value add up to handle these issues after the fact? Let’s forget about all of the expenses that we would have endured with Support Engineers and Engineering Services had to put up. Let’s also forget that one of our resources had to stop working on his current assignment in favor of this assignment.

If we solely focus on the affect this issue could have in terms of contract value, let’s just hypothesize that this issue becomes the final issue that breaks the camel’s back. We’ve had 10 clients (large and small) report this issue affecting their semester. If we take a low-end average of $10,000 per contract value (we all know that these contracts are probably 2 to 5 times larger…but for fun we will go with the 10k), then we end up losing in year 1 $50,000. If the contract value compounds over four years (which is the life we usually get out of our hardware), we would have lost $350,000 assuming no other clients left and these particular clients did not change their license level or fees. We wouldn’t have spent more then $50,000 on the equipment. According to FASB accounting standards, we could amortize the capital expenditure over the period of 4 years. That means we would have only spent about $12,500 per year. So instead of the lost revenue being $50k in year one, it’s more like $86.5k.

There are some people who are going to read this blog and say to themselves that our problem is we didn’t budget money to handle Solaris PVTs. That’s not the total message I am trying to convey. Rather, I am trying to say that making the effort to address quality long before our product reaches the consumer market is heck of lot cheaper then waiting until the fire engulfs us. So then why do we neglect quality? We do it every day…without even realizing…we forget to put instructions in our packaged materials…we put that cracked piece of wood at the bottom of the box…or we forget to include all of the screws and washers.

Ask yourself every chance you get…”What do we get by sacrificing quality?”

Old Blog Series: Sun’s Virtual World

Originally Posted on July 15, 2007 23:56

I forgot to mention that at the conference I met up with Kevin Roebuck from Sun Micro-Systems. Kevin has been a big fan of Blackboard over the years and a key advocate of the Performance Engineering team. He has personally funded nearly 100k in hardware, software and services since I arrived at Blackboard. Sun has a large gaming initiative. They have been a big advocate of Linden Labs’s Second Life over the past few years. They have their own 3d virtual community that they are building and have made available as an open source community. Project Wonderland and Project Darkstar are the underlying projects of the 3d virtual initiatives. The big difference between this effort and Second Life is that the community is not public, but rather hosted by a private community. So the likeliness of an undesirable lurker or streaker into a Learning Environment is unlikely. If someone did invade a community space, the user could be identified and disciplined ASAP.

The 3d software is built on top of the Project Darkstar server infrastructure. Project Darkstar, a platform designed for massively multi-player games, provides a scalable and secure multi-user infrastructure well-suited for enterprise-grade applications.

For the client, we will use the Project Wonderland 3D engine for creating the world as well as the avatars and animations within with world. This open source project provides the ability to create a virtual world with live, shared applications as well as audio. As you explore the environment, you will hear people, music, or videos in much the same way as you would walking around the physical world. The initial prototype supports the sharing of Java and X applications, but the vision is to eventually be able to use, edit, and share all desktop applications within the virtual world.

I’m going to have someone from the team take a look at Sun’s projects and get an instance up and running.

Old Blog Series: The Digital Library Project at Berkeley

Originally posted March 23, 2007 @ 11:29 PM

The University of California at Berkeley is a pretty amazing research University. While working on my graduate school project on Internet Search technologies, I ran across a number of interesting sites:

* http://www2.sims.berkeley.edu/research/projects/
* http://elib.cs.berkeley.edu/
* http://bailando.sims.berkeley.edu/index.html
* http://metadata.sims.berkeley.edu/

All of these sites are good idea generation sites for building out a more robust repository of digital assets (which is one of the PerfEng goals for 2007). Not much to this entry other then keeping track of these sites for later review.

Old Blog Series: Very Interesting Search Engine and Web 2.0 Site

Originally Posted March 13, 2007 on my internal blog:

While doing some research this weekend for my graduate school term paper, I came across an interesting site called Quintura. In my quest to find better visualization on the web, here’s a site that is all about visualization. Searches are presented as tag clouds on the left and search results on the right. As you hover over a piece of metadata from the cloud, the results on the right change.

5 Problems with Today’s Search (From Quintura Site)

1. Too many search results and too many irrelevant search results. After spending time on the first few pages of the search results, you don’t have time or patience to go beyond those pages.
2. No ability to manage the results by defining context or meaning. It is not easy to build an advanced search query. Lack of hints for search.
3. No user-friendly visual management with mouse click.
4. Documents are ranked by a search engine according to an algorithm. specific to a search engine, and not specific to your interests.
5. Over-ranked commercial and under-ranked non-commercial search results.

Old Blog Series: Different Ways to Look at Problems

This is taken from my old internal blog…I’m slowly moving this stuff over here:

For the last few weeks, I’ve been introducing different ways to visualize our work. For example Apdex, Lego Blocks, state models, the NG reference architecture and pipelines . I’m a fairly visualized learner, so I tend to be a more effective learner when I can present a picture or diagram. I tend to be a better communicator when I can work with a picture or diagram, rather then bulleted or narrative text.

Some of you have had the pleasure of reviewing the MIT Timeline project (http://simile.mit.edu/timeline/) I passed on a few weeks back, as well as another example called Gapminder (http://tools.google.com/gapminder/) that I forwarded on earlier today. If you haven’t seen either of these, please take a look. They are highly effective visualizations for understanding sequential and spacial data representations. We could use these tools within our own practice for materializing performance and scalability models, correlating multi-dimensional data representations, as well as digesting massive quantities of data.

Which gets me to why I wrote this email. I came across this site (http://infosthetics.com/), well it’s more a blog then anything about using visualization. The sole mission of the site is to comment or identify visualizations of a concept or data. For example, take a look at http://themulife.com/?p=553 or http://infosthetics.com/archives/2007/01/2007_trend_map.html.

I would like to see how we can represent data (single and multi-dimensional) via correlations in a more abstract manner. Writing 20 page reports, bulleted power points and massive spreadsheets simply doesn’t do an effective job of representing our work.

I’m interested in your thoughts …

Steve

10,000 Hours to Performance Mastery

I’m on a Malcolm Gladwell kick as of late. A week ago I got a Google Alert that his new book was coming out in November. Ever since then I’ve been scouring the Internet for lectures and presentations that I can find. There’s something about his message that absolutely appeals to me. The most recent video I watched was called Genius 2012.

Gladwell observes, ‘Modern problems require persistence more than genius, and we ought to value quantity over quality when it comes to intelligence… When you’re dealing with something as complex and as difficult as Fermat’s last theorem, you’re better off with a large number of smart guys than a small number of geniuses.’

The point of interest is that he advocates taking problems slowly – noting that expertise comes with approx. 10,000 hours of training. He thereby identifies the ‘mismatch problem’, which is simply the idea that standards used to judge/predict success in a given field don’t match what it takes to be successful in that field. Below is a transcription from Gladwell’s speech:

“But here we’re saying the critical part of what it means to be good, to succeed at the very specific and critical task at finding colon cancers, has nothing to do with speed of facility – on the contrary, it depends on those who are willing to take their time and willing to very very painstakingly go through something that seems like it can be done in a minute. In other words, that’s a mismatch: we select on a cognitive grounds for people being fast at things, but what we really want is a personality characteristic that allows people to be slow at critical things. Here we have the same thing with Wiles in a certain sense. We have erected in our society a system that selects people for tasks like solving Fermat’s or tackling big modern problems on the basis of their intelligence and the smarter they seem to be, the more we push them forward. But what we’re saying with Wiles is, that the critical issue here was not his intellectual brilliance, it was his stubbornness, it was the notion that he was willing to put everything else aside and spend 10,000 hours on a problem no-one else thought could be solved. So, this is the question: Are we actually selecting people for stubbornness? I don’t think we are.”

A lot can be accomplished in 10,000 hours. It’s been said throughout the psychology community that the application of learning towards a craft…any craft for a period of 10,000 hours gets you closer to mastery in that said craft. Now I’ve been at the “craft” of Performance Engineering formally for 7 years an informally for 10 years. If we assumed a steady 40 hour week schedule for let’s say 48 weeks a year (I’m being generous since I often read about PE on my vacation). That’s 13440 just for the 7 formal years. I’m well past the 10,000 hour mark. So how come I don’t feel a master in PE?

I’ve got a lot more in me to achieve mastery of PE. 10 more years won’t be enough to get me to mastery…

Old Blog Series: More Ways to Look at Numbers

This is one of my old blogs that I’m moving over to wordpress from April 01, 2007:

Eureka…Eureka said Archimedes when discovering the principle of buoyancy. Well, I’ve had a similar moment myself. A few weeks back, I blogged about different ways to look at problems back in February. As part of that blog, I added the RSS feed for infostetics.com. This past month infostetics.com posted a very interesting blog about an acquisition Google made. Two very interesting sites were showcased in the Blog: swivel and Many Eyes. Coincidently, Many Eyes is an IBM Alpha Works sites. It’s pretty cool. They site allows you to upload up to 5mb of your own data and create your own visualization sets.

swivel looks to be the more advanced of the two sites and one that closely integrates with Google. My early thoughts are that we could create some interesting Apdex visuals, as well as PAR visuals from swivel by integrating our data points with Google Spreadsheets.

Take a look…you will definitely enjoy!

Why the Transaction Matters

This morning I took my daughter to our neighborhood Safeway to pick-up groceries. One of the items on our list was sliced Turkey meat for lunches. So like the typical dad that I am, I picked up the pre-packaged meat you find over in the meats and cheeses. When we approached the deli counter, my daughter felt the urge to tell me the place mommy gets turkey is over here. I decided to heed my daughter’s advice and proceeded to wait in line at the deli counter.

There was literally one person in front of me. It appeared as if he had finished his order and was chit-chatting with the clerk. I didn’t want to come off obnoxious so I was willing to wait until their conversation ended. Before that happened, an older gentlemen (probably 80+ years) jumped ahead of me in line. The clerk ended her conversation. Even though she made eye contact with me…even gestured that she would be 1 second while talking to other person, she proceeded to help their older gentlemen who so rudely stepped in line in front of me.

I obviously was frustrated and decided to abandon the line. Was the turkey really that important to me or my daughter? Not really…good customer service is more important to me at the end of the day.

Should I blame the first customer for engaging in a conversation well past the customary service time? Should I fault the older gentlemen for jumping in line in front of me? Should we blame the clerk for pretty much everything?

I think at the end of the day I won’t blame any of the three, but I will think twice before going back to the deli counter again. In the world of software performance, an experience such as this can be incredibly frustrating. Waiting for an assessment to load or a gradebook to render is no different then my example of waiting at the deli counter.

The key point is that every transaction matters. I kid around that I might not go back to the deli counter. I’m on half-joking. I seriously will have reservations about going back. If I see the same clerk, or see there’s a line chances are I will be unlikely to get in line. I might even be unlikely to go the grocery store entirely if my first thought or memory when I have to go grocery shopping, is of this tainted experience. You might laugh, but cognitively speaking this is a realistic thought and one that could have a high probability of turning out.

The real question is why should this matter to the grocery store? Well, I am somewhat of a connector when it comes to grocery shopping. I like to talk about my experiences with my friends…especially friends who live near me and share a lot of the same experiences. I’m not suggesting that I will cause a Tipping Point of sorts with the deli counter at the Safeway, but I will go as far as to say that grocery stores are afraid of people like me just for this reason.

Does it make sense yet? It might not…but essentially what I am saying is that the transaction matters at the end of the day for two reasons. The first is that we don’t want to impair the perception (as well as support) of our users to the point that they question whether they will come back to the use case or much worst, the system. Second, we have to be incredibly mindful (somewhat fearful) that those users who are affected are not capable of influencing other users to abandon use cases or the system as a whole. At the end of the day, our performance credibility is how we rest our laurels.

Different Approach to Focus Performance Testing

In the past I have discussed an approach for performance testing individual clickpaths by taking a steady-state approach in which we arrive users in bunches and subsequently increase the bunch size over time until the system breaks down. I then stated that we identify the aggregate workload from the sum of executions that occurred from the steady-state. Once we have an idea of the peak concurrent workload in the steady-state, plus the aggregate workload from the entire test, we can then evaluate the maximum workload.

I hypothesize that the maximum workload falls between the peak workload and the aggregate workload.

This can be seen in the visual below:

I definitely think this type of test provides very important insight into concurrency testing. I am not necessarily positive that we can guarantee a tremendous amount of confidence in response time metrics when we simply consider 1 sample per batch. My main argument is based on a presentation that Cary Milsap (formerly of Hotsos) argued in his paper about skew.


Example: All of the following lists have a sum of 10, a count of 5, and thus a mean of 2:

A = (2, 2, 2, 2, 2) Has No Skew
B = (2, 2, 3, 1, 2) Has a Little Skew
C = (0, 0, 10, 0, 0) Has a High Skew

Essentially, if we don’t understand our skew factor, whether it be for response times or resource instrumentation metrics, then we are not effectively looking at our data.

I would like to consider a test that makes use of multiple samples so that we can have a greater confidence factor and determine how accurate or inaccurate our skew is from the sample response times. I would also like the test to start at the lowest workload possible (ie: 1). Here’s how I would consider the test executing:

  • (1) VUser executes Use Case X (5) samples (Each sample as a short (1) minute)
  • (5) VUsers execute Use Case X (5) samples (Each sample as a short (1) minute recovery time)
  • (25) VUsers execute Use Case X (5) samples (Each sample as a short (1) minute recovery time)
  • (50) VUsers execute Use Case X (5) samples (Each sample as a short (1) minute recovery time)
  • (N) VUsers execute Use Case X (5) samples (Each sample as a short (1) minute recovery time)

My thoughts are that a population of 5 will provide enough samples to determine the skew factor of a given transaction. Although (1) minute is not incredibly long for recovery, we could see that it provides little relief at higher workloads.

The new model would look like this:

There are multiple considerations as to whether a test is considered complete. First, we should have a quantifiable performance objective defined. The most obvious objectives would be a response time threshold and an error rate percentage. Typically, we want to define three dimensions of response times: Satisfied, Tolerant and Frustrated. This would be based on our Apdex philosophies.

Assume we consider a response time chart of 0 to 10 as our threshold definition. Using Apdex, we might define T=2.5s and 4T=10s. This would basically mean that we are satisfied when response times are below 2.5s. We are tolerant when response times fall between 2.5s and 10s. We are frustrated when they are beyond 10s. We essentially will expect abandonment greater then 10s.

The same kind of philosophy could be applied from a performance perspective based on error rate. We ideally would want 0 errors. That would be considered successful and acceptable. There is a chance that we might accept some failure. We might create a similar concept to Apdex based on error. We might set three dimensions E=1% and 4T=4%. We would subsequently graph workloads at 0, 1% and 4%.

This doesn’t necessarily tell us when to end the test. Until we begin defining performance requirements we might be have to make assumptions on whether we believe a use case is scalable or not.