For many years I used to coach competitive swimming. I had mainly 8 to 12 year-olds in my different age groups. There was one constant that we always talked about and that was creating good habits and eliminating bad habits. I remember that we used to say it takes 6 to 8 weeks to break a bad habit, but it takes only minutes to create a new one. To a little kid, they hear that expression and they understand that 6-8 weeks is a long-time. It really doesn’t take that long by the way. We used to say it because we wanted to emphasize that bad habits required time to give up and that we shouldn’t look for short-cuts to make it happen. At the same time we didn’t want to discourage our swimmers in thinking that it took forever to create new habits when in fact they could begin developing those habits immediately…
So why do I share this story? Well first of all I don’t think I could get away with telling anyone on any of my teams that it takes 6 to 8 weeks to break a habit. No sane adult would believe me The main reason for bringing these points up is that the Learn DevOps team doesn’t know me all that well. They certainly don’t know my various manifestos that I produce with every team and every year of change.
Interview Your Customers Reporting Issues
This is possibly the MOST IMPORTANT HABIT that needs to be formed. We need to be the best at providing customer service to our end users. When someone reports an issue, we need to reach out to that user whether it be by phone, email, JIRA, communicator, skype, tin cans on strings, pony express, carrier pidgin or even a message in a bottle. Let me put some context around this. Let’s say we file a ticket with Perforce or even any of the Atlassian products. Nobody from any of those companies contact us. How are we going to feel about this?
Why is interviewing our customers so important? Well most of the time the process of working a support ticket requires round and round communication. Let’s try our hardest to minimize the downtime, but at the same time talk to your customers first before jumping in the water head first. You never know, there might be Piranhas in that water.
Question Why You Are Being Asked To Do Something Different
We are all technologists. Let’s face it, we are pretty smart people who understand software, hardware, etc…That’s why we are systems engineers at the end of the day. So when a vendor (or even a boss) tells us how to do something, we shouldn’t do it if we have doubt. We should ask the question “why”. Asking the question is what I’m saying, don’t just blatantly ignore or delay because you are suspect of the request. Ask the question right then. If you are not comfortable with the answer, then ask the person to go spend a minute or two capturing more information so that you can get comfortable with the answer. If the person you are working with doesn’t give you a good answer, it’s on you to go research and seek out if others are asking the same question. Trust me…others are asking the question.
Remember the example about deploying Perforce on bare metal and local storage.
Every Administrator Should Be Curious About Performance and Scalability
You can’t take the performance guy out of me regardless of what I’m doing. Everybody is affected by performance. If something is slow people will get upset. I read a great book about this many years ago. In fact I wrote a blog about it. No kidding
So here are my points about this…
- If we are running software from a vendor we need to research as MUCH information we can find from the vendor about running this system for the fastest performance (responsiveness) and greatest scalability (concurrency and throughput).
- If we are running software from a vendor that uses common components (ie: Java, RDMS like SQL Server, Oracle, MySQL, Postgres, Web Server like Apache or IIS) we need to understand the profile of the application and take the tuning process further than what the vendor provides. It is our responsibility, whether you like it or not!
- If we built the application and it’s as slow as a dog…go back and refactor it ASAP!
- Turn on monitoring
- Ask your users about responsiveness
- Test for yourself!
Stop the Bleeding…Cut Off Access to Your Users…Then Try to Figure Out the Issue
We run a bunch of systems that a lot of people have access to around the world. When I say a bunch, more than 1 is enough. Even if all you have is 1 system, that’s enough. So what do I mean by “Stop the Bleeding”? This can mean a lot, but for the most part it’s about getting a system to stabilize first before fixing. If a process is running crazy, a log file blowing up with error messages or even something as simple as users complaining about functionality be broken, do what you can to stop that issue without resolution. That may mean hopefully performing a graceful shutdown of the application or at worst killing a process. CAUTION: Make sure you know what process can be killed and what cannot be killed. You don’t want to corrupt any data.
Once you cut off the bleeding, then before bringing the system back online, disable it so non of your users can jump in. This was something plaguing us during the Perforce, Crucible, Crowd and JIRA outage as users were coming in while we were attempting to debug. At this point you and maybe a small audience of others on your team should have access. The investigation can recommence, like turning the application/process back on.