When I first read the Tipping Point years ago, I remember Gladwell’s fascinating story about the NYC police and how they cracked down on crime by replacing broken windows and painted over graffiti every single day/week. So if a window was broken by a stone, a brick, a bullet, whatever…it was replaced as quickly as possible. If graffitti was on a building wall or a city vehicle, it would be quickly painted. The core idea was that if vandalism was left in place, disorder would invite even more disorder-that a small deviation from the norm would set into motion a cascade of more vandalism. When addressed in a timely manner, it established a quality standard for vandalism to not be tolerated.
So what does that have to do with Blackboard Learn?
Good question Steve. Let’s start by talking about Exceptions and Log Messages. I’ve written two blogs, one about the cost of an exception and the second about using Dynatrace to evaluate exceptions and logs. Both of which discuss the great insight we can see quite easily with Dynatrace.
In the visual below you have a case of neglect. Yuck! We have broken windows everywhere. The visual is a snapshot of almost 1 million exceptions that our PVT test, which runs for about an hour. This test had about 4100 unique user sessions who performed about 56,000+ page requests. You saw what I wrote right? I said 1 million exceptions from an hour worth of user activity only by 4100 unique user attempts. That’s over 230+ exceptions a second. Holy smokes batman!
I call out the broken window theory because I see this problem of throwing exceptions as epidemic in the product. Fortunately, I’ve got the Flex team starting to begin looking into why are we throwing so many exceptions. We are wasting a ton of memory and CPU time by throwing these exceptions. Our goal is to get to no exceptions. This could, should and will become the standard going forward.
The visual below is also from the same PVT test. What we are showing here are all WARN, ERROR and SEVERE log messages (stack traces) that were raised during the test. It’s about 1100 log messages, which is about 1 log message every 4s. Not good…and yes another standard that we need to address.