One of the greatest challenges facing my team each day is “Getting to Zero”. Alone that phrase doesn’t mean much. When the context is about having automated tests with zero failures, it’s a completely relevant phrase. We run system performance tests every day. Rarely do we ever have a day in which we are at Zero script failures.
We have (3) degrees of failures you can say. The first are what we call 100% failures. This is when an automated script fails due to a change in application behavior, a change in our data model, simply poor scripting or time-outs. We could gather the statistics, but I’m pretty sure that we haven’t had 2-days of successful “0 failures” in this category in the last 100 builds if not longer. The second are automated scripts that fail more than 10% of the time. Ideally we want 0% failure. When a script fails 9 out of 100 times, there could be a variety of issues, but most likely it’s related to a data model issue, poor scripting or another condition which is the system under-duress. If we get the last condition, forensically it’s great because we can start hunting down the performance issue. More often than not it’s because of the first or second condition (data or scripts). The third degree of failure is kind of a hybrid of the first two which are HTTP 400/500 exceptions. Shall I dare to go into that? I won’t for the sake of brevity…
Because our build process takes longer than we would like, rather than testing system performance with every check-in, we get a nightly build with a variety of changes. That alone can make debugging an automated script unnecessarily complicated. I’m not one for excuses and that’s a big excuse. So let me go into a little diatribe about
Do We Have the Same Goals?
Should I even ask this question? I think so because deep down inside I’m scared what the answer might really be. While our goal is to yield the best responsiveness and scale with our application, if the instrument to measure those attributes is not reliable, then how confident can we be with the result? Our instruments should seek “functional” perfection. If we want these tests to be reliable then we have to establish goals of “perfection”. Those goals ultimately become norms of our business going forward once they are established, achieved and stabilized so that in the long-term they can become managed.
So what am I saying? Pretty clearly I’m saying that our functional instruments (load scripts, data model, system configuration, hardware, etc…) need to be 100% reliable. We need to “Get to Zero” every day.
Now that seems like a lofty goal. It appears that we could very quickly get into a routine of chasing our tails. Fear not…If you look at the task of “Getting to Zero” as a mundane act of plugging a hole in the wall with bubblegum, newspaper and spit, well then you are not being real strategic. You certainly aren’t solving problems.
In order to make this goal into a norm, we need to minimize the complexity of the problem. The cop out answer would be to get the process to yield a build and a test with every check-in. While that’s important, it’s still leaves exposure to all of the other issues I mentioned as well (poor scripts, data model, configuration or load problems). We kind of have to chase our tail for a “short while” as we gather data around the problem.
Each day we need to “Get to Zero” with whatever effort it takes. It’s going to slow us down no doubt. With each issue, we have to mark it, categorize it and then make an effort to strategize how to minimize and eventually eliminate the theme of the issue. We have to make it such that our functional instruments are not the cause of the problem. Once we have confidence that we can “Get to Zero” each day, our confidence in our instruments are going to be hire. If our instruments are quickly no longer the issue…then it’s an opportunity to focus on “Getting Down to the Changelist”.
If we were testing per changelist rather than a daily build, we would be killing ourselves chasing our tails. Slowing the cadence down to a daily build is the limit I would go to. We can’t span multiple days.
Are We Doing this to Ourselves?
There are a lot of reasons to speculate as to why it’s difficult to “Get to Zero” each day. I started with the number one above, which is do we share the same goal of getting to zero each day and will do everything in our power to meet that goal? It’s bigger then having that one goal as we mentioned above. We need to get our functional instruments in a stable
Scripts
It’s more than just setting code standards. Two years ago I wrote a blog inside of my company’s Confluence site pushing for my team to make progress on code quality with our automation code. Our automation code is written in C. There are static analysis tools that embed into our Sonar environment. We simply didn’t get the ball rolling on this one. Second we need to do a better job auditing and reviewing code. Our script engineers need to follow the same habits as our developers. They need to submit changelists with their tickets. They need to justify their code changes in blogs. They need to have their code audited in Crucible reviews. We need to maintain statistics on code quality issues and then enforce code quality improvement initiatives. We can’t neglect our code. It’s a BIG job, but someone has to do it.
Data Model
Our data set is the “Center of Our Universe”, well at least from my perspective. There are 3 initial things that need to happen. First, is our definitions when we create the model have to correct. We need to define them correctly. We need to verify that they are correct. If it’s too tedious to validate, then it means we have a problem with this instrument. Second, when we generate the model, we need to confirm that everything was generated correctly. We have been asking for validate for years. We have yet to prioritize said validation. It’s clear to me that it’s a big problem. The third and definitely not the final is that as the scaled model is used with other testing instruments. It definitely can get altered because of those instruments. It could get altered by a script, a date issue, a corruption of a backup…you name it. It needs to be re-verified is my point.
Resources
I’ll ask the question out loud. Would the resources we have for delivering and managing these instruments be better served focusing on “Getting to Zero” versus explaining performance anomalies and performing forensic investigations? Could we have other team members support the latter? Could we beef up our automated forensics capabilities to support that need more in the short-term?
Tracking and Measuring: We have the most sophisticated instruments in place, all capable of being tracked, measured and analyzed. Are we even doing that? If not, when will we start? How do we make this habit? How do we gain momentum on this?
Last Question…
Can we do something dramatic? Let’s think out of the box…