Monthly Archives: May 2013

Getting to Zero…Continuously

One of the greatest challenges facing my team each day is “Getting to Zero”. Alone that phrase doesn’t mean much. When the context is about having automated tests with zero failures, it’s a completely relevant phrase. We run system performance tests every day. Rarely do we ever have a day in which we are at Zero script failures.

We have (3) degrees of failures you can say. The first are what we call 100% failures. This is when an automated script fails due to a change in application behavior, a change in our data model, simply poor scripting or time-outs. We could gather the statistics, but I’m pretty sure that we haven’t had 2-days of successful “0 failures” in this category in the last 100 builds if not longer. The second are automated scripts that fail more than 10% of the time. Ideally we want 0% failure. When a script fails 9 out of 100 times, there could be a variety of issues, but most likely it’s related to a data model issue, poor scripting or another condition which is the system under-duress. If we get the last condition, forensically it’s great because we can start hunting down the performance issue. More often than not it’s because of the first or second condition (data or scripts). The third degree of failure is kind of a hybrid of the first two which are HTTP 400/500 exceptions. Shall I dare to go into that? I won’t for the sake of brevity…

Because our build process takes longer than we would like, rather than testing system performance with every check-in, we get a nightly build with a variety of changes. That alone can make debugging an automated script unnecessarily complicated. I’m not one for excuses and that’s a big excuse. So let me go into a little diatribe about

Do We Have the Same Goals?

Should I even ask this question? I think so because deep down inside I’m scared what the answer might really be. While our goal is to yield the best responsiveness and scale with our application, if the instrument to measure those attributes is not reliable, then how confident can we be with the result? Our instruments should seek “functional” perfection. If we want these tests to be reliable then we have to establish goals of “perfection”. Those goals ultimately become norms of our business going forward once they are established, achieved and stabilized so that in the long-term they can become managed.

So what am I saying? Pretty clearly I’m saying that our functional instruments (load scripts, data model, system configuration, hardware, etc…) need to be 100% reliable. We need to “Get to Zero” every day.

Now that seems like a lofty goal. It appears that we could very quickly get into a routine of chasing our tails. Fear not…If you look at the task of “Getting to Zero” as a mundane act of plugging a hole in the wall with bubblegum, newspaper and spit, well then you are not being real strategic. You certainly aren’t solving problems.

Image

In order to make this goal into a norm, we need to minimize the complexity of the problem. The cop out answer would be to get the process to yield a build and a test with every check-in. While that’s important, it’s still leaves exposure to all of the other issues I mentioned as well (poor scripts, data model, configuration or load problems). We kind of have to chase our tail for a “short while” as we gather data around the problem.

Each day we need to “Get to Zero” with whatever effort it takes. It’s going to slow us down no doubt. With each issue, we have to mark it, categorize it and then make an effort to strategize how to minimize and eventually eliminate the theme of the issue. We have to make it such that our functional instruments are not the cause of the problem. Once we have confidence that we can “Get to Zero” each day, our confidence in our instruments are going to be hire. If our instruments are quickly no longer the issue…then it’s an opportunity to focus on “Getting Down to the Changelist”.

If we were testing per changelist rather than a daily build, we would be killing ourselves chasing our tails. Slowing the cadence down to a daily build is the limit I would go to. We can’t span multiple days.

Are We Doing this to Ourselves?

There are a lot of reasons to speculate as to why it’s difficult to “Get to Zero” each day. I started with the number one above, which is do we share the same goal of getting to zero each day and will do everything in our power to meet that goal? It’s bigger then having that one goal as we mentioned above. We need to get our functional instruments in a stable

Scripts

It’s more than just setting code standards. Two years ago I wrote a blog inside of my company’s Confluence site pushing for my team to make progress on code quality with our automation code. Our automation code is written in C. There are static analysis tools that embed into our Sonar environment. We simply didn’t get the ball rolling on this one. Second we need to do a better job auditing and reviewing code. Our script engineers need to follow the same habits as our developers. They need to submit changelists with their tickets. They need to justify their code changes in blogs. They need to have their code audited in Crucible reviews. We need to maintain statistics on code quality issues and then enforce code quality improvement initiatives. We can’t neglect our code. It’s a BIG job, but someone has to do it.

Data Model

Our data set is the “Center of Our Universe”, well at least from my perspective. There are 3 initial things that need to happen. First, is our definitions when we create the model have to correct. We need to define them correctly. We need to verify that they are correct. If it’s too tedious to validate, then it means we have a problem with this instrument. Second, when we generate the model, we need to confirm that everything was generated correctly. We have been asking for validate for years. We have yet to prioritize said validation. It’s clear to me that it’s a big problem. The third and definitely not the final is that as the scaled model is used with other testing instruments. It definitely can get altered because of those instruments. It could get altered by a script, a date issue, a corruption of a backup…you name it. It needs to be re-verified is my point.

Resources

I’ll ask the question out loud. Would the resources we have for delivering and managing these instruments be better served focusing on “Getting to Zero” versus explaining performance anomalies and performing forensic investigations? Could we have other team members support the latter? Could we beef up our automated forensics capabilities to support that need more in the short-term?

Tracking and Measuring: We have the most sophisticated instruments in place, all capable of being tracked, measured and analyzed. Are we even doing that? If not, when will we start? How do we make this habit? How do we gain momentum on this?

Last Question…

Can we do something dramatic? Let’s think out of the box…

Advertisements

Book Review Time: Someone Could Get Hurt

I personally don’t know Drew Magary, but I’ve been a big fan of Drew for quite a while. I read his witty and always awesome articles on Deadspin as fast as he puts them out…follow him on Twitter and of course, since I’m a probably a stone’s throw from where Magary lives I listen to him from time to time on the Sports Reporters, on our local ESPN radio network. Magary never disappoints in any medium, though my preference is to read his articles as they are as vivid as a blue sky night. He says the things that guys like me are thinking...you know in their late 30’s with a wife and a couple of kids…but will only joke amongst the guys, my wife or occasionally a co-worker or two after knocking back a few beers.

When I first heard that Magary was publishing a new book about the challenges and tribulations of being a husband and dad in the 21st century, I immediately signed up on Amazon for a pre-order. I didn’t even have to read a review or a hear Magary pitch his writing online or on the radio.

My kindle version was available about a week ago Thursday, which I immediately pulled down. Heck, I live about 10 miles out of DC. The Caps lost again in the first round. The Wizards perpetually stink. The Nats and O’s were amidst losing streaks. I was pretty much tired of listening about the progress on RG3’s knee. Reading Magary’s book was a no brainer.

Let’s start off with the logistics. It’s a quick read. Since it’s on Kindle, I don’t know the exact paper pages, though if I got off my fat ass and looked on Amazon, I would know that it’s 256 pages of pure humor and bliss. I probably could have read it cover to cover, but let’s face it…like Magary I have 2 kids of my own. So I’m not littered with hours and hours of free time on my hands to read a book un-interuppted. I split my reading across 4 or 5 sessions right before bed and while I was waiting with my pre-schooler for her school to start after dropping off my first grader.

Image

Magary bookends his tales of parenthood with a really serious story about his third child born premature and dealing with a major intestinal issue. I didn’t face the agony of premature childbirth like Magary and his wife went through, but my first daughter spent 2+ days in an incubator under the billyrubin lights. Definitely small potatoes compared to what Magary and his wife went through. In between the chapters there is endless delight and humor about the foibles and challenges that Magary went through in raising his oldest daughter and middle son.

My favorite chapter was definitely the story about his daughter’s first halloween. Magary appropriately labeled the title of the chapter “Slow Guy” mocking a sign he and his wife put up in front of their house to slow traffic down from speeding on his street. As most people would conclude, there’s always an ulterior motive in anything Magary does…so the costume was definitely a cheap, yet funny joke. Kind of reminds me of the time I was in my early 20’s and I thought it would be hilarious to dress up like Warren from There’s Something About Mary. As you could probably guess, most people didn’t find me too funny unless they were obliterated or high. It didn’t hit me that my costume was completely insensitive until the morning after…

Image

I definitely could relate to so many of Magary’s stories. I remember the first time my oldest daughter said a curse word. My wife and I definitely dropped $200+ on the Lice Lady. I fight daily with my youngest about brushing her teeth. I have buckets of toys that my kids have thrown aside like a dirty rag doll. Sadly I miss each passing phase my daughters go through as they get older. It makes me want a 3rd kid from time to time.

Things or stories I wished Magary had written…god knows he’s probably got loads of material about these topics:

Wine Drinking Wives: My wife loves wine. It was definitely not her go to drink before we had kids. That would be a captain and coke, an occasional Cider or a giant swig from a bottle of Champagne. Since having kids the consumption of wine by my wife and nearly every mother of 2+ kids I know makes wine their afternoon delight.

First Kiss: I nearly crashed my car the time my 6 year-old told me she kissed a boy. My immediate response was “where” in hopes of finding out if she kissed him on the lips or some other body part. Her innocent answer was “at school” and the location was the eye brow.

10 Hour Road Trips with Your Wife and Kids in the Mini-Van: The picture below is taken from my recent road trip to Kentucky last Labor Day. This is apparently what they keep in bathrooms and rest stops in West Virginia. I definitely heard banjos…

Image

Shots at the Doctor’s Office: Magary talks about his youngest getting needles poked into his body, which was sad and certainly not intended to be funny. It would have been great to hear a story like the time my wife called me to meet her at the pediatrician’s office. Our oldest, who at the time was 2 had the lead test on her finger. According to my wife it was like a scene from the Exorcist with spinning heads and blood everywhere. Once I arrived at the doctor’s office, it was my job to pin her down. I’m a modest 210 lbs and my daughter may have been 30 lbs at the time. She had superhuman strength that day…

St. Patrick’s Day or Cinco de Mayo: It’s like a right of passage to take your first born to a bar on some amateur drinking holiday like St. Patrick’s Day, Cinco de Mayo or Flag Day. I remember the first time I took my kid I did the obligatory baby bjorn with beer in hand move.

Crazy Art Work Your Kids Bring Home: I can’t tell if this was a death threat from my kid or her finest winter snowman. Either way I might have dozens of skull art like the one below…

Image

Note to Drew…you only get 1 shot I guess at writing a book about parenthood, that is unless you are Bill Cosby and you forgot that you wrote the same book three times before. If by chance you do write another book of similar genre or better yet using the same tone and writing style, you definitely have a reader and supporter in me…

– Steve

Why The Mechanical Turk Just Doesn’t Work

I’m on a tear these days talking about my frustrations with offshoring versus outsourcing. By tear, I mean I wrote one blog a few weeks back so in the natural law of thinking that makes me a pseudo expert of sorts right? Well maybe not an expert, but I do have over 12 years experience working with outsourced teams and about 7 years working with offshore salaried teams. There is no jury to be had. I can pretty much tell you that I’m siding with offshore salaried teams over outsourced contractor teams ten times over. I’ll throw in “Sunday” just for grins. Those who know me often joke that I mix-up figures of expression all of the time, so why not do it in this blog.      

Let me start off by saying I have a problem. I know I have a problem and sadly I and my teammates created the problem. We didn’t mean to create the problem. In fact, we had the best of intentions when we set about our work. No matter how we look at it, we created the problem. 

It all started back in 2006 when my company made a big acquisition. We ended up acquiring our largest competitor at the time. I was given some investment to make use of some outsourced resources from a service provider in India. I won’t use their name, but I’m sure if you look through some of my past blogs you ‘may’ be able to figure it out. We brought on a small team of two engineers. They came to my office in Washington, DC. I even had them sit in my personal office, which at the time was small and uncomfortable. It didn’t have good air ventilation or air conditioning for that matter. It was a room with 4 walls and a white board. That kind of sucked for me, but the two guys were used to rolling power outages and work conditions far worst than a humid DC summer in a stuffy office building. I brought in a fan…everyone was fine.

These guys stayed with me on-site for about 3 months. It was great as each day I really spent my time teaching about Performance Engineering which is by far my absolute favorite topic to teach others and talk about with others. Even though these guys were contractors, I treated them like they were teammates. Heck, they sat next to me for 90 days, how could I not. When the 3 months were over, they went back to India and we continued the working relationship remote. These two guys were great as one stuck on our team for two years and the other stayed with us for 3 years. 

 Image

Over time we would add team members. I think at one point the team got as big as 7. Today it’s 4 which has pretty much been the sweet spot in terms of what my North American team could manage day to day. Having 7 teammates remote is tough unless you have good leadership. The best leadership we ever got was with those first two teammates. You can tell from that last statement that after year three, everything went downhill. I’ve been thinking a lot about why this outcome of a struggling contractor team has been. The obvious is that these teammates are/were contractors and not officially part of our team. While we try/tried to make them feel as though they are/were part of our team, their solidarity is/was with their own company. They go to their company’s headquarters. They have bosses in India and only points of contact in the states. It’s not like I give them raises or bonuses. I tell them good work or you were awesome today. That’s basically it… I’ve been able to narrow down three reasons in greater detail as to why this model of offshore contracting has been unsuccessful for me and my team. 

The first and obvious is the difficulty to find good leadership to run the team’s day to day operations on the ground. The most talented resources who can do this are identified and hand-picked very early on by these large professional services companies. They become good candidates for on-shore work because they have good presence and can represent the company better. I could be totally off-base on this comment as I’m really only basing this conclusion on about five teammates I’ve had over the years that left our team in an offshore capacity to find work onshore. I am pretty confident that a sample size of five is pretty strong statistically so I’ll stand by my argument.

Second is that I’ve found that many of my contractors took a position in consulting because they wanted the diversity of changing projects. That’s not what I want or need from my consultants. I need a consultant who’s going to stick with my team for many years and advance their career and pay on our team. I need someone who doesn’t get bored after two months on a project. Financially it’s still more cost effective for me to have this kind of model because I can cut overhead costs and still have a team member that can stay long with us. That doesn’t happen in this world of outsourced and offshore contractors. You really are lucky to have someone stick around for a year. If you are really lucky you can keep someone around for 18 months. If someone stayed longer than 18 months it has a negative affect on their resume and future job prospects. It’s like a budding young wannabe NBA start who hasn’t declared for the draft by the age of 4. By 5 years-old you over the hill. So you better declare for the draft by the time you know how to tie your shoes before you become stale and old. That’s the same deal here in this part of the world in the land of consulting. 

The third problem is something that me and my team created. Many years ago we started this automation project we called Fusion. We put a front-end UI on it called Galileo. It’s been a huge asset for our team. Maybe one day we might even open source the code and give this stuff away for free. As we were building this, I remember one of my DC engineers joked and named it “Automaton” to reflect how it could potentially make all of our teammates into robots. Awkwardly, I think he felt like it would open up the doors for us to replace staff with robots. That was never its intention. The intention was to make our business processes more reliable, timely and predictable. Most importantly it was designed to make our team scale without having to make loads of investment in resources.

Image 

 Galileo has been an incredibly successful project for my team. I am very proud of it and the team that has spent years building, maintaining and promoting the platform to others at my company. What has happened with my outsourced contractors is that they lost their way of contributing to the team in a meaningful sense. They used to be adequate performance engineers. They used to spend time digging into issues. They used to do more work sadly. With Galileo, the team has had a propensity to become push button operators. When Galileo has a problem they stop working like a bunch of office workers when the Internet is down. They have turned themselves into Mechanical Turks. 

“Bummer…looks like the Internet is down. Let’s go to Starbucks, get some coffee and smoke a couple of cigarettes before Phil in IT resets the switch and we have to get back to work!”

Image 

I’m not talking about Amazon’s Mechanical Turk program when I talk about my contractor team. Heck if I used Amazon it would be a whole heck cheaper, but all I would get are fake twitter feeds, email spam and blog comments. The term Mechanical Turk came out of the late 18th century in Europe. Someone had invented a fake chess-playing machine to impress some royalty in Austria. It was all an elaborate hoax based on a human chess player inside of the machine making moves via magnets. The robot itself was a lifeless puppet that would make the moves required by a master chess player hiding in the box. 

Image

I’m not saying Galileo has become a hoax. It’s not at all. I am saying that the team (our contractors) who primarily interact with it have become mechanical turks in their own right pushing buttons and watching spinning wheels and status pages as they wait for tasks to complete. They completely lost their priorities. The whole vision behind Galileo was that it would do all of the repeatable, brute force work in an automated fashion. It would do all of the work that the contractors had historically performed years back. It would create “opportunities” for the team to become better chefs in our kitchen. They could assemble Galileo templates like recipes and do some cooking. While the cooking was happening, they could do more analysis of performance issues. Unfortunately that never materialized. Rather than becoming better performance engineers, our contractors turned themselves into pushbutton operators. They knew how to do one thing well, run a test. It didn’t matter that the machine was running it for them. 

Image

You Can’t Have it Both Ways: Offshoring Can’t Be Done via Outsourcing Companies

So the title of this blog most likely yields a giant “no kidding” response from my loyal readership, which is totally expected. A few of my teams work with an offshore consulting company. I won’t mention them by name, but I will say that they are big enough and global enough that most people in and out of the tech circle know who they are. Like most offshore companies they have their marketing propaganda that breaks down their value by industry vertical, area of specialization and global presence. They gloat about their amazing “Centers of Excellence” which differentiate themselves from their competitors. News flash…nearly every global consulting firm touts a Center of Excellence, which is code for “Sometimes we show customers a room with computers, monitors and white boards. This room is where all of the magic and innovation of INSERT MAJOR SKILL HERE happens.” Of course we all know that these CoE’s are just marketing fronts that really don’t exist.

Image

Today I read an email from one of our contacts of this offshore service provider informing me of not one, but two team members who are “moving on to greener pastures”. This has been a theme for us now for roughly two years. Back when the economy was tanking in 2007/2008 we agreed to expand our investment in this firm. We made it clear that we were not outsourcing an entire responsibility, but rather extending our team through third-party team members overseas. The service provider wanted our business and made it clear that they would be able to help us find and retain talent. We agreed that we would do our part to invest in the skill set and agree to incrementally increase compensation. If at any point we were at risk of losing a team member, the 3rd party would do its part to shadow resources and seamlessly replace the team member who left via attrition.

Well as I said, the last two years haven’t been anything like the first four years. We have had more folks come in and out of the team that in the previous four years combined. In this particular part of the world, job jumping has returned to Silicon Valley like conditions in 1998/1999. We are lucky if we see a teammate come on-board for 8 months without either the threat of leaving or actually picking up and leaving. For a while I thought it was us that was the problem. I thought it was our work. I thought it was compensation issues. I thought it was quality of work life balance. For the most part I know that we do our part to address what we can control: the type of work, conditions of work and the opportunities for learning. We have little control over compensation, though we push our 3rd party service provider to address promotions and role leveling on a frequent enough basis that I am confident we influence each team members position and standing in their organization.

Don’t get me wrong…we have had issues ourselves. Forces above me have influenced contract pricing. We have fought rate increases and expenses. The folks on our business side definitely see the relationship as pure outsourcing (professional services) and not offshoring. They are right to do so as this 3rd party company deals with all of the headaches that I and my company don’t have to deal with these headaches.

Image

The lesson here is that you can’t have it both ways. You really can’t use an outsourcing company to act as your offshore team, no matter how each of you spin it to the other. As the recipient of the service, you have to maintain a vested financial interest in your offshore team. You have to be able to directly influence the culture of the organization and not just the team. You have to be able to influence the quality of work/life balance and conditions of work. While controlling the type of work and norms of the team is important, it’s not the same unless you can also influence the organization. Probably the most important point is that your own organization has to view your team members who reside offshore as colleagues and teammates of the company. Just because they live in an another country with a different economic system doesn’t mean they should be inclined to a constant pay rate and salary.