Monthly Archives: October 2012

Interesting Agile Project Management Software Tool

In the last two days I’ve had independent conversations about agile Project Management tools with both Nori and Bob. My first conversation was with Nori in which he demonstrated a tool he’s been using for a few weeks called Trello. The second conversation was with Bob in which he showed me a tool called Pivotal Tracker.

From what I can tell both tools have value. The Trello tool is free. It’s very free-form in what you can do from a content perspective. Trello integrates with Gmail, which is a plus and has a very cool user experience, much like a hybrid of Twitter and Facebook.

Take a look at the screen shot…It’s from Nori’s project where he compares the various systems…

From the Trello FAQ

Are you going to screw me later and make me pay for something I got hooked on?
No. You have my word that we will not give away our free service to
you as a trick and then later make you pay. First, we make good money
on our existing suite of products already, so we have no temptation to
change our minds. Also, that sort of trickery would cost us all of
the goodwill we’ve built up over the last 11 years of running this
company.

We do eventually plan to monetize the service when we have a bazillion
users, but it won’t be by charging you for what we’re offering now.
Think freemium, or app store models…

Pivotal Tracker is not free. It’s not all that expensive, but it does cost money. For less than the cost of a cup of coffee each day you can get 10 collaborators with unlimited viewers, projects and file storage. It ends up costing $600 a year, which is not real bad.

 

 

Some Good Habits We Need to Create In Learn DevOps

For many years I used to coach competitive swimming. I had mainly 8 to 12 year-olds in my different age groups. There was one constant that we always talked about and that was creating good habits and eliminating bad habits. I remember that we used to say it takes 6 to 8 weeks to break a bad habit, but it takes only minutes to create a new one. To a little kid, they hear that expression and they understand that 6-8 weeks is a long-time. It really doesn’t take that long by the way. We used to say it because we wanted to emphasize that bad habits required time to give up and that we shouldn’t look for short-cuts to make it happen. At the same time we didn’t want to discourage our swimmers in thinking that it took forever to create new habits when in fact they could begin developing those habits immediately…

So why do I share this story? Well first of all I don’t think I could get away with telling anyone on any of my teams that it takes 6 to 8 weeks to break a habit. No sane adult would believe me  The main reason for bringing these points up is that the Learn DevOps team doesn’t know me all that well. They certainly don’t know my various manifestos that I produce with every team and every year of change.

Interview Your Customers Reporting Issues

This is possibly the MOST IMPORTANT HABIT that needs to be formed. We need to be the best at providing customer service to our end users. When someone reports an issue, we need to reach out to that user whether it be by phone, email, JIRA, communicator, skype, tin cans on strings, pony express, carrier pidgin or even a message in a bottle. Let me put some context around this. Let’s say we file a ticket with Perforce or even any of the Atlassian products. Nobody from any of those companies contact us. How are we going to feel about this?

Why is interviewing our customers so important? Well most of the time the process of working a support ticket requires round and round communication. Let’s try our hardest to minimize the downtime, but at the same time talk to your customers first before jumping in the water head first. You never know, there might be Piranhas in that water.

Question Why You Are Being Asked To Do Something Different

We are all technologists. Let’s face it, we are pretty smart people who understand software, hardware, etc…That’s why we are systems engineers at the end of the day. So when a vendor (or even a boss) tells us how to do something, we shouldn’t do it if we have doubt. We should ask the question “why”. Asking the question is what I’m saying, don’t just blatantly ignore or delay because you are suspect of the request. Ask the question right then. If you are not comfortable with the answer, then ask the person to go spend a minute or two capturing more information so that you can get comfortable with the answer. If the person you are working with doesn’t give you a good answer, it’s on you to go research and seek out if others are asking the same question. Trust me…others are asking the question.

Remember the example about deploying Perforce on bare metal and local storage.

Every Administrator Should Be Curious About Performance and Scalability

You can’t take the performance guy out of me regardless of what I’m doing. Everybody is affected by performance. If something is slow people will get upset. I read a great book about this many years ago. In fact I wrote a blog about it. No kidding 

So here are my points about this…

  • If we are running software from a vendor we need to research as MUCH information we can find from the vendor about running this system for the fastest performance (responsiveness) and greatest scalability (concurrency and throughput).
  • If we are running software from a vendor that uses common components (ie: Java, RDMS like SQL Server, Oracle, MySQL, Postgres, Web Server like Apache or IIS) we need to understand the profile of the application and take the tuning process further than what the vendor provides. It is our responsibility, whether you like it or not!
  • If we built the application and it’s as slow as a dog…go back and refactor it ASAP!
  • Turn on monitoring
  • Ask your users about responsiveness
  • Test for yourself!

Stop the Bleeding…Cut Off Access to Your Users…Then Try to Figure Out the Issue

We run a bunch of systems that a lot of people have access to around the world. When I say a bunch, more than 1 is enough. Even if all you have is 1 system, that’s enough. So what do I mean by “Stop the Bleeding”? This can mean a lot, but for the most part it’s about getting a system to stabilize first before fixing. If a process is running crazy, a log file blowing up with error messages or even something as simple as users complaining about functionality be broken, do what you can to stop that issue without resolution. That may mean hopefully performing a graceful shutdown of the application or at worst killing a process. CAUTION: Make sure you know what process can be killed and what cannot be killed. You don’t want to corrupt any data.

Once you cut off the bleeding, then before bringing the system back online, disable it so non of your users can jump in. This was something plaguing us during the Perforce, Crucible, Crowd and JIRA outage as users were coming in while we were attempting to debug. At this point you and maybe a small audience of others on your team should have access. The investigation can recommence, like turning the application/process back on.

 

 

SOS Has Many Meanings

I’m probably going out on a limb to think that everybody reading my post knows what the term SOS means. I’m sure some of you are thinking the slang meaning, which I’m not referring to. Rather, I’m referring to the Morse Code reference to S.O.S. used by ships in distress.

From Wikipedia

SOS is the commonly used description for the international Morse code distress signal (· · · — — — · · ·). This distress signal was first adopted by the German government in radio regulations effective April 1, 1905, and became the worldwide standard under the second International Radiotelegraphic Convention, which was signed on November 3, 1906 and became effective on July 1, 1908. SOS remained the maritime radio distress signal until 1999, when it was replaced by the Global Maritime Distress Safety System. (With this new system in place, it became illegal for a ship to signal that she is in distress in Morse code.) SOS is still recognized as a visual distress signal.

From the beginning, the SOS distress signal has really consisted of a continuous sequence of three-dits/three-dahs/three-dits, all run together without letter spacing. In International Morse Code, three dits form the letter S, and three dahs make the letter O, so “SOS” became an easy way to remember the order of the dits and dahs. In modern terminology, SOS is a Morse “procedural signal” or “prosign”, and the formal way to write it is with a bar above the letters: SOS.

In popular usage, SOS became associated with such phrases as “save our ship”, “save our souls” and “send out succour”. These may be regarded as mnemonics, but SOS is not an abbreviation, acronym or initialism. In fact SOS is only one of several ways that the combination could have been written, VTB, for example, would produce exactly the same sound, but SOS was chosen to describe this combination. SOS is the only 9-element signal in Morse code, making it more easily recognizable, as no other symbol uses more than 8 elements.

I remember when I was a kid and we would play pretend games, like commanding a submarine and surprisingly piloting a fighter plane (seems odd to save SOP right?) we would pretend to issue radio commands (not even morse code)…I never said we were the smartest kids back then. We would pretend that we were under attack and about to go down.

So what does S.O.S. and my childhood games have to do with the price of tea in China? Well, because it’s about having an awareness that some help is needed…some rescuing in fact is needed. You could say we need to issue an S.O.S. because the issue is that we are doing the SOS all the time and we need a little change. Nice play on words right?

Some Change…Some Energy…Some Magic

It’s time for a little magic team. We need some creativity…some flair. Yeeeeaaaahh, I’m gonna need you to come in on Saturday….

  

I’m not really going to need you to come in on Saturday. But I need each and everyone of you to think about how to save the ship…how to change things up so it’s a little less of the same old stuff.

remember: ALL IDEAS ARE WELCOME

Creating Quality Habits as Software Developers

I have a number of teams that today produce working code for either Blackboard or internal systems. As a development team we do a lot of classic software quality efforts as a means of producing and maintaining solid code.

Code Quality Evaluation and Metrics

I’m big fan of static code analysis. It was a huge initiative for the Performance and Security Engineering teams dating back to early 2010. Engineering then joined the counter-movement in 2011 and included it into their daily working process as part of 2012. We use a number of tools for evaluation purposes that integrate with a reporting framework called Sonar. We have legitimate executive buy-in to make SCA a major check-and-balance as part of our code quality initiatives.

In late 2012, the Release Engineering team started to play around with SCA, implementing their code projects into Sonar for evaluation purposes. The team hasn’t done much with Sonar as of yet, but I’m expecting to see an emphasis on both evaluation, code coverage and measurement.

Code Reviews are Critical

The previous topic brings me to my point about this blog. About a year ago I wrote a blog about Crucible Reviews. I felt like we had a lot of issues with Crucible Reviews. The issues varied from getting all members of the staff to participate in reviews to changing the direction and content of a code review. I was adamant then and am still adamant today that we need to train developers (new and experienced) about how to give a good code review. We need good examples. We need formulas for input. We can’t just assume that everyone knows what good feedback is all about.

Here’s a quick caption from my blog from back in 2011 in which I called out my views on Code Reviews…

  So what’s my point? Well, code reviews were considered a means to an end. The end being better coding practices and improved collaboration. I’m sure we can say that there’s anecdotal evidence that quality has improved and quite possibly code reviews were a contributing factor. Code reviews are very subjective. They have a huge dependency on human interference. They are dramatically influenced by the person giving the review, specifically the subject-matter-expertise and skill set of that individual. They are influenced by the environmental conditions impacting the reviewer. For example, what if the reviewer is rushed for time because he/she has 4 other reviews that need to be completed before the end of the day. It is common for the same person to be included on multiple reviews. I know for certain that whenever Nori’s developers are working on Course Delivery refactoring, they without a doubt always include the Course Delivery Architect (Lance) on every review. A lot of times we will have as many as 2 to 3 reviews out at any time. So it’s not that far off of an example.

Code reviews are about reading code. They should be used to verify that designs were implemented. In fact, design trace-ability should be one of the most important outcomes of a code review. In some cases, CRs can be leveraged as a way to suggest alternative approaches for implementation. We want our code manageable and should therefore be evaluated for manageability during a CR. They can be used to find bugs. They can be used for mentoring, specifically for helping someone gain experience with the code base, as well as providing a medium for sharing feedback for improving coding abilities of team members.

So What’s My Point?

All of my teams (Performance, Security, DBAs, Systems Engineering and Release Engineering) need to learn how to request a code review via Crucible and how to perform a code review in Crucible. We need to define Code Review policies and perform measurements on both the act of requesting and giving a code review, as well as the quality of content provided in a code review.

I’m looking for our team leads and managers to take this to the next level…

Sometimes Bad Things Happen to Good People

It’s 4:30pm on Wednesday, September 19th. Perforce has been non-operational since yesterday morning. Technically, Monday it wasn’t all that operational given the issues we were seeing with Crucible. The team (Anatoliy, Nikolai, Robert S., Kohn and Robert M.) have been working around the clock for 2 days. Pretty much nobody in the list has slept in 36 hours and counting. It sucks that it has come down to this, but at the end of the day it’s been one of the best team building exercises I’ve ever been a part of. Nobody goes out of their way to find situations of adversity, but the best teams volunteer to meet it head on. Those are our dream teams!

Quick Note About Sleep

I’ve been with the team for a little over 2 months. When I first took over the group I met with each person. If they recall, I asked them the question “What do you lose sleep over at night?”…I followed-up each of their answers with “I want you to know that at the end of the day, I’m not going to lose sleep over your items. I have different items that I lose sleep over which you do not have to…”

Well let me take back that phrase. I’m losing sleep over this. I’m losing sleep over the situation of course, because it’s affecting the business. I’m losing sleep over the impact/effect to each of the people on my team. Why on the last point? Well, it comes down to this…It’s tough to see the human spirit challenged in such a way that vulnerabilities are exposed. We have vulnerabilities that deep down inside we considered as risk, but had never had to deal with the adversity of an incident like the one we are dealing with at the present moment.

Thanks to the entire DIRE team for the hard work, dedication and resilience during this epic event!

An Idea…Not Necessarily a Tool Replacement

A few months back at the offsite I pulled Patrick and Chris aside to discuss the notion of diverging from LoadRunner. At the time I had a lot of reasons, but the main reasons were tied to frustration with the LoadRunner/PC product, as well as the costs of maintaining a product that we rarely update. The more and more I think about it, I realize that there are so many other aspects of LoadRunner that frustrate me.

First off, the language is C and while C is a simple language, it’s not ideal for managing reusable libraries and utilities in the same ease of use that an object-oriented language would be. I can’t actually recall a true IDE designed for C. I use Visual SlickEdit, which is great for reading individual C files, Perl or SQL scripts, but in general SlickEdit isn’t an IDE. Second, debugging LoadRunner is a complete pain in the @$$. Granted, much of the LoadRunner debugging issues we have our self-inflicted based on our approach to codeline management. You would think the LoadRunner community would be more interested in our approach to code line management, then their rickety ways. Third, the wlrun and analysis engines are really unreliable. We have had more issues with those utilities than anything.

Truth is…I’m not necessarily blaming LoadRunner for everything. It’s an OK tool. It gets the job done. We have invested a lot of money in it to date. So why get rid of it?

This is where I get really lost in my thoughts. So please be patient with me in this blog. I hope to have my thoughts organized by the end.

What we do with LoadRunner and performance testing in general is different than any other organization in the world. I can say without a doubt that we have by far one of the most advanced testing approaches and frameworks ever known to the software world. You could put our framework in a room with the 10 best software companies and none of them would come close in any of our capabilities. We not only have an advanced scripting framework, customized to the tee…we also have a robust integrated data generation framework suitable as a load testing tool in its own right. Our distribution modeling capabilities (Servlet) set us apart from any load test or benchmark. The fact that we have reverse engineered the LoadRunner schema in order to extract and transform performance metrics is another one of those (no one else is doing this). Then you factor in our Galileo statistical capabilities which allows so much data analysis. I’m not even counting the Fusion framework that manages and conducts all of the work. Seriously…nobody in their right minds are doing what we are doing.

But it’s not enough

I have a lot of thoughts about what we are not doing and things we should consider doing better…as well as things that would continue to set us apart even further from the competition.

Open Up the Network…We are Way too Closed

So this isn’t really a new idea. There are a couple ways to handle this notion of being closed. I use the term closed primarily in that we build, maintain and run the performance test automation independently. Developers can’t plug into our network. They can ask for a request for a load test, but a) they can’t contribute to the code base b) they don’t have access to run a test c) they don’t have the environments to run the test.

I think we need to be able to give our customers (Engineering) more independent capabilities. They should have the ability to leverage what we have already built with little or no effort. They should have the ability to contribute new pieces to our framework in an open and constructive manner. We need to be more flexible, even if engineering isn’t asking us to be flexible. The more flexible and open we are, the more willing engineering will take us up on taking advantage of leveraging or contributing to our framework.

Too Much Junk in the Trunk

The root of all evil in our framework is our ClickPaths and Servlet tool. The ClickPaths were intended to be disposable. They are hardly disposable. In fact, they are a giant mess with little to no rhyme or reason in how they are managed and maintained. I don’t even have a handle on how they are defined anymore. That’s more an artifact of less day-to-day involvement from me.

The Servlet might be in a bigger mess than the ClickPaths. The Servlet is so antiquated. It’s unnecessarily confusing, very unreliable and a maintenance nightmare. While it gives us control of percentages, it doesn’t give us the true distribution we want. We need more control in our tests. We need greater configuration control with better checks and balances. Heck, we need this to be 100% automated.

We Need a True IDE…Because We Need a Code Library…Not a Script Library

We tried to make our C code as close to a library of reusable components. In the end it’s a gigantic mess. We have redundant functions. We have major standards violations. We have no static analysis of our code. We don’t have any code metrics. We have no development tool set (IDE) that makes development easier to build and manage.

Our problem isn’t necessarily that we don’t have an IDE. Rather, I think our problem is that we are using a script language that’s very atomic in nature. I personally think our testing language should outweigh the tool decision. We should decide on the programming language, our coding standards, utility classes, debugging capabilities, extensibility, etc…One of the inputs into the programming language should be the IDE’s available for coding. Another input should be test engines that support the language we are most comfortable with. If there are none, this opens the door to either eliminate the language in lieu of another language, or consider building our own test engine.

Need to support multiple testing types (HTTP, Browser, API, SQL, Web Services)

I’ve been pushing hard for us to be able support multiple test types. When I say that, I mean from the same test ID. It doesn’t need to be the same test tool. For example, I think it’s important when running an HTTP test, you should have the ability to sample a full-page load via a browser. There should be hooks in our automation that presents the Browser transaction in the same view as our HTTP transactions, but obviously with filtering capabilities. The same could be said for running an API request, a SQL transaction or even a Web Services request.

I’m a believer that be able to do something like this, you need more granular control of your test process in order to control the request to do the test and the process to extract the results.

We Need More Conductor Control

So I’m referring to the conductor as an extension of Galileo/Fusion. I see Galileo is the input engine, Fusion is the coordinator engine/framework and some third or fourth component becomes worker bee. In our world today, you could probably say that LoadRunner is that worker bee. What I’m imaging is an abstraction of the worker bee in which many worker bees can work in parallel with granular assignments. We could have load tests over HTTP mixed with browser load tests. The test results could ultimately be presented in the Galileo test details with some differentiation. We might even create new modules…

Build a browser plugin and server agent for HTTP

I’ve spent all of this morning to get to this last paragraph. I’m in the camp of “building a tool” over leveraging an open source project. We build all of this customization within Galileo and Fusion to begin with, plus the servlet, datagen, etc…

First off, I’m imagining a tool makes the coding of a load test a lot simpler. I think it would be cool to create a plugin to Firefox (Firebug) and/or Chrome that captures HTTP requests and inputs for automation. The plugin would have the ability to search a code repository to see if any previous code existed to perform the function/request.

For example, imagine you open up your browser and log into a Blackboard release. The plugin would detect which version you are running, hence would identify the code branch you are running on. It would also evaluate the URI parameters (for HTTP). If it detected that the code already existed in the library, it would present a screen dialogue or tab that informed you that the code was available and could be reused. You could even have another tab that allowed to re-run the request (like a play tab). The whole premise of this plugin is that it would replace VUgen, but give substantially more features and control. If new code was needed, it would help organize and identify new packages, classes and methods. Then it would allow for the creation of variables and declarations for the purpose of parameterization.

The browser plugin wouldn’t just be able to talk with our source code repository. I would recommend that we also build an agent within the JVM that allows the plugin to instrument the execution of code. I’m still trying to figure out all of the reasons for doing this. One reason off the bat that I’m thinking is the ability to handle code coverage and mapping. I’m also thinking we may be able to handle the server side requests more appropriately by having the server side data with the client/http request. Then of course there are other benefits such as having awareness of what code can be used for verification (at an API or even directly into the DB). As I said…this idea is not baked. It’s just a thought…

If you build all of this to facilitate recording, then you most likely will need to build something that can act as the harness…I haven’t thought that one out yet. I’m looking for ideas.