My intention with this blog is to share the process that I’ve become custom to using when analyzing results in Galileo and Dynatrace as part of a performance optimization exercise. It’s by no means the holy grail for forensic investigations. It’s a solid approach from my perspective to narrowing down the problem.
There’s really no reason to create a “Chicken or Egg” kind of problem with this investigation. For simplicity, I’m going to say that I have a transaction in question. I’ve done an analysis of “Transaction X” looking at my baseline results and my regression test results. I see that “Transaction X” took a 50% hit. The transaction went from 3s to 6s. Not a big deal, but big enough for it to be on my top regression issues list.
First thing I want to do is map the data in Dynatrace with the data in Galileo. What’s interesting is that Galileo presents the Timestamp and Response Time. The Timestamp looks to be the end time. Dynatrace shows the Duration in the container (which should always be equal or less than Galileo) plus the start time. You have to do a little bit of math to figure whether the PurePath in question matches the individual sample. Why am I doing this? A couple of reasons. First, it gives me confidence that the data is correct in both systems. Second, it gives me both the CPID and the user in question so that I can use that information for profiling. I would always profile with the system at rest. Even that’s not exactly necessary as we can see “Suspension Duration” in the Dynatrace profile to see whether we were waiting on Garbage Collection. We can also look at the Method time to see if we are ever waiting to get a Database connection.
Below is my Galileo details view as well. I match the Start/End times plus the duration. Now I have the user’s information and the CPID.
Before I go into any kind of profiling exercise, I want to review as much of the data that I have available in the Dynatrace recording. I start with the PurePath of course. This gives me an idea of my hotspots of where to investigate. In this case I can see the class ViewQuestionsActions is where I’m spending my time. Interesting enough it’s a Struts call. So most likely I know that I don’t have enough information to make a call as to where my problem is or is not.
Next I dig into the tier breakdown so I can be confident of where to spend my time. If I see a lot of time is in Struts, but a ton of time also in JDBC, it could be that my Struts action is being masked by database latency. In the example below it looks like I’m spending about 5s in DB time, but close to 79s in Struts time. I wonder why.
I want to know where to look next. I obviously could go do a deeper YourKit profile with an emphasis on the ViewQuestionsAction. I want to be absolutely sure of the data. I can dig into the Methods view next for confirmation. Interesting enough I see 1 execution which accounted for 84 out of 89s of runtime. The key is understanding what this code is doing and why.
I still want to see where my database time is being spent.
Unfortunately I have an issue where I don’t see where the time is being executed. I do have the knowledge that we always reference Proxy9 when we prepare and Proxy10 when we execute. We can do a quick filter on the word Proxy and viola we see the statements in question. Note that we always prepare before execute. So it’s a guarantee that the Proxy10 that falls in question with no text is the statement from Proxy9.
So now I have the 4s+ of JDBC time narrowed down to this query referenced above. I have my class in question above. I have 2 optimization exercises to focus on. The 84s is obviously my priority. The 4s query is just as alarming.
As I mentioned above, there really are 2 optimization exercises. We really need to get to the bottom of the latency exhibited in the ViewQuestionsAction. Obviously part of my problem is that I’m using an Extreme data set. So I have some kind of workload variation anti-pattern that gets worst and worst with the bigger the data set. If the problem was more uniform or static in nature, I wouldn’t see the transaction in question perform better with smaller data sets, which is exactly what we see.
I’m definitely going to run a YourKit profile with as much emphasis placed on that class and related classes. I could do the same with Dynatrace, but it’s not necessarily the intended purpose of Dynatrace. The Dynatrace tool helps us narrow down where to investigate forensically. Not only would I do a deeper profile with YourKit, I would also do a deep code review to better understand what are we doing functionally.
At this point I’m focused on decomposing which anti-pattern is introduced plus what functional problem we are trying to solve. If I truly understand what we are doing from a code perspective, I would then begin a refactoring exercise. If I didn’t understand both functionally and technically what we were doing, I would bring the problem to my architect or development colleagues for their input.
With regards to the second optimization exercise, I would start by investigation more about the query in question. We kind of luck out because it’s a single query executing that consumes 4.2s of time. So it’s not like I have to look at DAO anti-patterns. I basically can focus my attention on query tuning exercise.