We have started to make use of Dynatrace in our day to day lives now as performance engineers. I’m trying to do my part as well so that I can contribute to the test analysis lifecycle. Today I’m going to start by providing an overview of two types of Run-Time Events: Exceptions and Logging. Both are critical features of Dynatrace as they provide us aggregate statistics of problems that are so-called “under the covers”. Exceptions are used to provide error-handling capabilities for applications. DynaTrace makes use of run-time bytecode instrumentation to capture throwable classes, messages, and stack traces of java classes that extend java.lang.Exception. Logging APIs and Frameworks allow the developer to control which logging information is processed with arbitrary granularity. The Logging dashlet shows an aggregated view of the logging events, grouped by the class and method that issued the logging event, the DynaTrace Agent that recorded the logging event, the message string, severity, and name of the logger.
The visual you are looking at below comes from a Windows PVT run. It appears that we have nearly 800k exceptions that were generated during the course of a test. The first two exceptions were responsible for 550k exceptions as well. We don’t quite know what the cost of an exception is to date. Dynatrace is still working that problem. I did some informal correlations. I’m still working the problem, but there are a couple of data points that I’m willing to share.
Let’s say you take 2 identical PurePaths and perform a PurePath comparison. The first PurePath is riddled with exceptions and the second one is completely clean, meaning no exceptions. In my example I was looking at a recorded session called “Win20minscalib_run”, specifically looking at (1) transaction: “LS Grade Center: Grade Detail Attempt Save And Continue Action”. We had 171 samples of this transaction according to Dynatrace. The worst transaction took 5.7s. It had a total of 17 exceptions (6 key not found and 11 constraint violations for NULL values.
What’s amazing in my mind is that when you review the exceptions in question, none of the columns throwing exceptions are actually “NOT NULL” columns. It’s a completely wasted exception.
SQL> desc attempt; Name Null? Type ----------------------------------------- -------- ---------------------------- GRADEBOOK_GRADE_PK1 NOT NULL NUMBER(38) PK1 NOT NULL NUMBER(38) QTI_RESULT_DATA_PK1 NUMBER(38) LINKREFID VARCHAR2(255) SCORE NUMBER(15,5) GRADE NVARCHAR2(32) STUDENT_COMMENTS NCLOB INSTRUCTOR_COMMENTS NCLOB FEEDBACK_FORMAT_TYPE CHAR(1) COMMENT_PUBLIC_IND NOT NULL CHAR(1) EXEMPT_IND NOT NULL CHAR(1) OVERRIDE_IND NOT NULL CHAR(1) INSTRUCTOR_NOTES NCLOB NOTES_FORMAT_TYPE CHAR(1) STATUS NUMBER(38) ATTEMPT_DATE NOT NULL DATE DATE_ADDED DATE DATE_MODIFIED DATE LATEST_IND CHAR(1) GROUP_ATTEMPT_PK1 NUMBER(38) STUDENT_SUBMISSION NCLOB TEXT_FORMAT_TYPE CHAR(1) SQL> desc gradebook_grade; Name Null? Type ----------------------------------------- -------- ---------------------------- GRADEBOOK_MAIN_PK1 NOT NULL NUMBER(38) PK1 NOT NULL NUMBER(38) COURSE_USERS_PK1 NOT NULL NUMBER(38) COMMENTS NCLOB NOTES_FORMAT_TYPE CHAR(1) FOR_STUDENT_COMMENTS NCLOB FEEDBACK_FORMAT_TYPE CHAR(1) DATE_ADDED DATE DATE_MODIFIED DATE STATUS NOT NULL NUMBER(38) HIGHEST_ATTEMPT_PK1 NUMBER(38) LOWEST_ATTEMPT_PK1 NUMBER(38) LAST_ATTEMPT_PK1 NUMBER(38) FIRST_ATTEMPT_PK1 NUMBER(38) EXEMPT_ATTEMPT_IND NOT NULL CHAR(1) AVERAGE_SCORE NUMBER(15,5) MANUAL_GRADE NVARCHAR2(32) MANUAL_SCORE NUMBER(15,5) VERSION NOT NULL NUMBER(38) EXEMPT_IND NOT NULL CHAR(1) EXCLUDED_IND NOT NULL CHAR(1)
The second transaction, which is identical from a use case perspective. It was smaller in object size 13 versus 163 for the previouse (another data point to consider btw). It was faster (only 104.52ms) and less CPU time (5.8ms versus 264.27ms). This one only had a single exception, an exception mind you that did not show up in the first PurePath. This exception was “Entity type not found for key: caliper”.
Below is a quick PurePath Comparison of these two PurePaths in question. You can see a completely different structural difference between the two PurePaths. I’m sure some of it was data related. When you see so many more method calls you have to call into question why? There could be a lot of reasons. It could be an application error. It could be a LoadRunner error. It could be an instrumentation error. The conclusion that I’m inferring is that there is a penalty for all of the exceptions resulting in more method calls to account for the exceptions.
During the course of the run we issued close to 800 WARN, WARNING and ERROR messages via log4j and java.util.logging. These are obviously a lot more managable then the exceptions. You can literally sort through ever single exception if necessary and tie the message to a PurePath, which can then help you tie to a LoadRunner transaction (CPID and VUser). From a forensic perspective this is awesome as we can open up a ticket and have someone from Chris’ team dig into each of these.
I say Chris and AppLabs as a starting point (Tier 1) to review the issues because most of these appear as though they are scripting inefficiencies. You dig into the exceptions and they look as though we are trying to do something we aren’t allowed to. A fair number of them also look like simple defects scu as the “Not Found” resources listed at the bottom.
My approach was to isolate PurePaths responsible for issue a logging message. The best way to do that is the following:
- Open Up the Diagnose Events –> Logging for your recorded session
- Sort Descending by Count
- Right Mouse Click on the Logging Event and Drill Into PurePaths
For the vast majority you will see that it’s very specific transactions issuing the same logging messages. I would start by evaluating the transaction in question first via LoadRunner to see if the exception can be generated on Demand. I would then evaluate the same use case by hand with careful inspection to ensure I am functionally completing the scenario. I would evaluate the LR code to see that it’s in fact posting/getting everything correct. If I’m unable to determine that it’s a LR issue, meaning I could create the same problem by hand, I would open up a bug.