Monthly Archives: October 2020

The 70% Theory on Specifications

Original Internal Post: March 2020

For the last few weeks a number of us in Engineering and Product have been participating in a Book Club – Specification by Example reading Specification by Example by Gojko Adzic. In this past week’s session we talked about the concept of achieving a “shared understanding” of a problem.

One can simply parse the two words “shared understanding” with relative ease and conclude that a shared understanding is when two or more parties discuss a problem/topic and those parties have an equal or specific understanding of the problem.

The 70% Theory (which I’m coining) is that when a conversation starts, there is a likely outcome that the information presented and immediately understood between 2 parties is roughly 70% of a problem. The goal of the two people having the conversation is to elaborate with examples (concrete and discrete) to raise shared understanding from 70% to as close to 90% or more.

We often take for granted whether we have a true understanding of a problem. For example, I may send an email to one of my teammates regarding concerns about the assignment of another team member to a project. I may try to articulate my concerns in the email. My teammate might write me back and say they have it under control.

Do I have certainty that we have a shared understanding? There’s a good chance we might. My concern might be about related to schedule conflicts of another project that my teammate doesn’t know about and I failed to communicate it. She might derive my concern is over a skill gap issue and have nothing to do with my concern.

While the example above is somewhat weak and obvious. I’m using it to illustrate a point. We often fail to realize that there is information that is known that we often fail to communicate. There is information that is unknown that needs to discovered. There are implications to information that need to be evaluated and discussed. That last point might not be too obvious to most people. We will come back to it later in this blog.

The Game of Telephone

As a young engineer, I was taught that so much of my work was precedented in my ability to:

  • Derive more complete specifications from the known (decoding the unknown).
  • Identify, isolate and mitigate risk at the interfaces and integration points.
  • Measure more frequently with the goal of eliminating/cutting less frequently.
  • Increase my confidence in a solution using the scientific method of exploration and testing.

I often wrote very detailed design documents with elaborate specifications. I might spend 5 to 20 business days writing a document which I spent 30 to 90 days capturing notes from interviews and research.

Often those interviews involved talking with external and internal stakeholders. My stakeholders were usually representing other stakeholders. Communication streamed in one direction. It passed through a chain from person to person. Most of the time it would hit the mark, but a good number of times I found myself iterating back and forth. It was like a daisy chain or game of telephone.

A good system engineer, product manager, designer, etc…learns not only to ask questions, but to use examples to clarify. These examples have to be concrete and discrete. They are not effective when they are abstract and generalized.

The first phase in working with stakeholders is for these product representatives and stakeholders to come to a shared understanding. This is first of what could be many “shared understanding” between parties when addressing a business problem(s) and goal(s).

The game of telephone doesn’t stop once the stakeholder and product representatives come to this first shared understanding. Next the product representative(s) needs to articulate the problem(s) and goal(s) in some declarative form. The daisy chain continues to one or more members of the delivery team.

The anti-pattern of “the telephone game” can continue or worst re-start if the next set of conversations are not given a chance to iterate with the use of concrete and discrete examples. Let’s talk about this more in detail…

The Technical Translation

When the product representative(s) meets with the delivery team to discuss a customer’s problem(s) and goal(s), we have to go back to the 70% theory. We may have a high degree of confidence that our needs analysis and requirements vetting stage addressed the 30% gap. In all likelihood we may have written an elaborate design and requirements specification that articulates all of this wrapped in a pretty little bow.

There are two problems that we have put out there. First is that the delivery team is dependent on the product representative(s) to bridge the communication gap and accelerate the shared understanding of a problem. Reading a specification, set of emails or Jira tickets likely will only get the delivery team part of the way there. You might be the best and greatest author. It is safe to assume that this new set of conversations has a new baseline of 70%.

We have to ensure that there is a shared understanding between the next parties involved. I recommend that teams use a requirements or specification workshop to discuss known discrete, concrete examples to better understand the problem/goal at hand. Assume the documentation can be enough to present 70%. The workshops are effective at bumping that score to maybe an 80% or 85%. What will happen at those workshops are exercises in referencing and deriving more examples. Examples simplify the process of learning and understanding.

Getting to 80-85% is great, but it’s not complete. This is where teams need to be able to work together collaboratively to fill the remaining gap of 15 to 20%. I use the word collaboratively, but what I’m really intending to call-out is how we address the two points I made above “…information that is unknown that needs to discovered. There are implications to information that need to be evaluated and discussed.

Workshops can be used to address and articulate the information that is known and present a forum to review and create examples to convert the “unknowns” into “knowns” in order to have a more thorough shared understanding. The workshop can also be used for elaboration on scope that the delivery team may present back to the product representative(s) that wasn’t discussed or considered. This might be a technical detail or a business risk that was overlooked in the first set of discussions with the stakeholders.

The workshop often leads to a technical discovery session. Remember that product delivery folks (engineers, designers, analysts) are pretty analytical. They generally need time to process information and come back with questions, ideas and concerns. This workshop → discovery loop helps fill that remaining gap through evaluation and more discussion.

Taking This Back to Your Teams

There is a lot of information to digest in this post. While encourage everyone to read Specification By Example, the cliff notes might be best served by spending an hour to watch this video.

The key take-aways from this blog:

  • With any conversation the goal is to achieve a shared understanding.
  • Shared understanding comes from iterating back and forth using concrete and discrete examples.
  • It’s our responsibility as product representative(s) and product delivery members to iterate on discussions consistently to obtain a shared understanding.
  • We can’t just be order takers if we want to be effective product makers.

The next sprint you start where you are accepting new work, whether that be a bug or a feature, make sure to iterate back and forth with the product representative(s) until you both have a shared understanding. Assume you only have 70% of the information. You might find that you have more. You likely might find that you have less. Don’t assume you have it all. Engage in conversation…challenge assumptions…use examples that are concrete and discrete…agree as representatives and delivery members that you have a shared understanding of the problem.

I would be remiss if I didn’t reiterate the visual above. Examples truly elaborate meaning to requirements and specifications. Those examples should become discrete tests. Those tests are used to verify the requirements.

Think about that lifecycle of examples → requirements → tests the next time you work on a bug or a feature.

Using Examples as Part of the Forensic Process

This week I’m going to be starting our next book club on Gojko Adzic’s Specification by Example. This is one of my favorite books that I’ve probably read 11 or 12 times over the past 7+ years. The premise of the book is about using examples to establish a shared understanding of a business goal or problem that will be solved in a software system. Think of it like this…Examples are used to elaborate Requirements. Examples can become Tests. Tests verify Requirements.

Most teams that make use of SBE, do so as they are building product or feature for the first time. It’s often used so that you can minimize risk in building the product or feature right the first time and ensuring functional acceptance on behalf of the user upon release or deployment. Good teams that make use of SBE, go the distance and create a living, executable specification (aka automated tests).

Bugs Happen and Sometimes The Tests are Missing or Wrong

We talk as a team a lot about the fact that bugs will happen. A potential byproduct of every software deployment are new software defects that are found in production, either during regression / acceptance testing or (unfortunately) by our users or customers. It is inevitable that we will find new application errors, performance problems, quirky user experience problems, or some other issue. It is just the nature of the process. The goal is to find these defects and issues before they get to production.

One of the key metrics that our team tracks is how many defects are found prior to deployment/release versus in production. This ratio of where defects are found can help us create and track our defect escape rate.

It’s very easy for us to determine when and where a defect happens. We do a pretty good job tagging and labeling in JIRA. While we find more issues before release than after, we still have a lot defects per artifact (Java Agent, .NET Agent, Node Agent, TeamServer, etc…). The costs, whether time or reputation, add up.

Applying SBE to Forensics

When bugs do happen, we want to move fast and swiftly. So one of the first things I recommend our engineers do when working a bug is to come up with with an actual, working example. It might mean a deeper conversation with the customer. You might have to Slack a teammate in Support or on your Engineering team to come-up with an example. It’s imperative that you do come-up with an example before you start anything with the ticket.

Sadly, this assumes that either the user acceptance test doesn’t exist or the one that does isn’t really good. As such, having a shared understanding of the user’s business goals/objectives to start is better than simply understanding the steps to reproduce the problem.

Remember…bugs just aren’t about a workflow or scenario failing to happen in sequential steps. The bug might be a result of something bigger or simply due to confusion in the technical implementation.

Start by understanding the User Story. If you can’t find it in JIRA, write some notes down about the persona and what they are trying to accomplish. At this point you want to write down a concrete example of what the user is trying to accomplish. You can do this using the Given – When – Then ubiquitous language (see below).1 2 3 4 5 6 Given a vulnerability [type] is Reported When a route is exercized And the code is refactored using the Contrast Remediation Guidance And the same route is exercised again And the route remediation policy is enabled Then the the vulnerability [type] will change to Remediated: Auto-Verified

Not only do you want to write your GWT statement, but you want to put together a working concrete example in the form of a test. For our example above, you might want to create a working scenario where a route contains vulnerable code like a SQLI or XSS. You would also want a refactored version of the same code to exercise. At this point, you only have a single scenario. You may want to elaborate different “WHEN” conditions. Understanding the permutations of the “WHEN” conditions are simply additional examples to the business objective of auto-remediating the verification process. So it’s important that you explore a variety of scenarios to elaborate your example above.

The end result is that your example(s) can and should become your tests. You will use the tests to recreate the bug, as well as test your refactored code. Don’t be surprised if your test fails from the start. Ultimately, if you work this bug correctly, the example will help you gain context on the requirements. When you start looking into the code after having a test that successfully recreates your bug, it makes it a whole lot easier to refactor the code due to that shared context or understanding about the defect. In all likelihood, you might reach a conclusion and refactoring effort faster and with greater accuracy in resolving the issue.

We Should Run a Premortem

Original Internal Post: Oct 08, 2014

About a year ago I was asked to participate in a meeting with a few colleagues before a big deployment of one of our cloud products. The meeting was called by a member of our PMO organization (note this was a past company). She had a ton of concerns bubbling up from the development team, the product management team, the support team and the operations team. The only voice she hadn’t heard from were our end-users, and that’s because this product had no customers. 

She called the meeting to order. There was no real agenda. Pretty much everyone in the room and the phone had no idea why we were being called together. Most of us figured this was some form of roll-up meeting for her to share status or task out assignments. Our PMO organization often used our meetings to play the role of air traffic control. I personally thought the time could be used better. 

She started by saying, “I’m getting a lot of concerns about the upcoming launch and release of Project Sphinx. Each of you has casually mentioned to me and to others that you have concerns. None of these concerns were documented in our Risk Plan. So if they are not in the Risk Plan, how can we mitigate them?”

At this point, I remember a few people attempted to interrupt her by calling out several places they wrote down in our Wiki page, as well as in JIRA where our stories resided. I also remember a couple people calling out a few email threads that had 20+ responses to a distribution list in the masses. All I could think was “boy we are screwed if we can’t get the group to get on the same page.”

The meeting went on for another 45 or 50 minutes. It turned into a live email thread where the PMO person went one by one to each member of the group. People were constantly interrupting. There were a lot of upset individuals. I distinctly remember talking with our PMO after the meeting. She asked me how I thought the meeting went. I got the sense that she thought the meeting was awesome, so I tempered my response. I suggested that we have a follow-up meeting, but tried to do it differently. I felt like it was too serialized and not flowing. She disagreed and said, our Risk Plan grew by 40 items so it had to be effective. I think she even put her fist out for a fist bump. I cautiously gave her a bump…

I remember thinking there had to be a better way for the team to talk about concerns in a format that led to more than just a risk mitigation document. My rational was that a risk mitigation document would simply turn into a checklist manifesto for the PMO team. The development team wouldn’t feel like they worked through their issues as a group. In fact, that’s what happened over the course of the next two weeks. I noticed that the sprint assignments included new items to tackle the “so-called” risk items. They were assigned in each case to the person who brought up the issue. In most cases, the person who brought up the issue was looking for a group conversation and potentially a swarming activity by a few developers. That didn’t happen…

A Better Way…

I came across a presentation in my twitter feed a couple of days after that debacle of a PMO meeting that introduced the notion of a pre-mortem. At the time I was connecting with a new set of influencers. One such influencer was an agile coach named Jabe Bloom. We invited him to talk with our development team about practicing kanban in a more succinct manner. I followed Jabe on Twitter as well as several other practitioners.

We used post-mortems already, though we called them retrospectives and changed the format to be more inclusive of each member and more actionable in terms of defining how the collective we as a group move forward. The notion of a pre-mortem made sense. Essentially, you gather the team connected to the project (not just the doers but all of the stakeholders) in the same fashion as a post-mortem. Basically, you imagine that the product launched and it failed miserably. Think big disaster like the Titanic sinking because the Hindenberg collided with it. 

Every member would have to participate. It wasn’t an option, but it also wasn’t a consideration by anyone on the team to not participate. Everyone had a voice…two voices. The first voice was to share all of those horrible thoughts that went through each person’s mind during the day or at night when they woke up in a cold sweat. The second voice was to collaborate (not commiserate) as a group about why X, Y and Z was going to fail. We had to come to some form of consensus that at our current trajectory or attention, we would indeed fail with a particular item, task, component and/or the entire system.

The difference that I saw in this form of meeting versus the previous one I described is that the team had to take a step back and come to agreement that a risk was indeed a risk. There was consensus amongst the staff that an issue was indeed an issue, versus the previous where issues were simply put on a task list and then assigned back to the person who raised the issue. The previous way of managing risk discouraged people from stepping up and calling out an issue. Why…because the team would simply put the task on the person who raised the issue. Without realizing, it took the accountability of the risk away from the team and put it on the individual. How stressful…imagine the cold sweats that ensued the following day? 

Premortems aren’t necessarily the silver bullet to save a project or product. Often the team is super heads down and won’t compromise to make changes mid-stream. That’s a great example of when a premortem would be perfect. The team is likely on a death march. Half the team realizes it and half of the team doesn’t realize it. Oh boy!

Premortems can be effective at bridging communication among teammates who struggle to communicate. They can be helpful for over-confident teams. They help get the team to think about how their success will be measured. That last point might raise a few eye brows. Think about it…teams often focus on the positives when they start something from scratch or do a kick-off to a new effort or project. It’s human nature to think positively.

So what am I saying “Don’t think positively.” Not at all. Am I saying “The sky is falling!” Not at all. I’m really saying “Keep an open mind. Don’t be scared to work through a problem before it has happened.”