I thought about just putting my notes in bulleted form like we did on the wall, but where’s the fun in that. Several of our teammates weren’t able to attend this session because of module conflicts, so a bunch of bullet points aren’t going to offer much context. So if I’m going to provide context, I might as well provide my perspective, eh?
We had a lot of opinions about what we believed to be the problems with Datagen. It was actually quite refreshing to hear so many opinions about the challenges we all face with Datagen. Nori kicked us off as the session leader with his perspective. He called-out reliability and debugging (as well as error handling) as two of his top concerns. Let’s be clear when we say reliability. We are not talking about Fusion reliability. Rather, we are talking about Datagen reliability. We cross our fingers every time we kick-off datagen. It’s not repeatable, reliable and worst of all it’s not timely. Our confidence is quite low to be frank. Because debugging is so poor, it decreases our ability to depend on datagen. In just the last few months, I’ve watched Geoff develop data models faster by SQL. I’ve seen Patrick work with DIRE to work on cross-server restores. These are just loud clues that there is a high degree of dissatisfaction with Datagen. Just look at the control chart below. Here’s the two things I’m inferring. First, we are seeing fewer executions of datagen over time. Second, when we do increase our usage of datagen, we are more likely to fail. Sounds like a problem to me.
There are lots of other problems which various team members called out. I will put a brief statement or two around them:
- Non-Functional Requirements: If my notes aren’t failing me, I’m referring to the NFR of datagen to scale in minutes and hours, not days and weeks. So the NFR we established is struggling to be achieved.
- Complexity: We all agreed that datagen was unnecessarily complex. Entity relationships are not well understood. Take into account the highly complex XML definition structure we have and you can really confuse any outsider, as well as insider. It’s simply unlikely that anyone would want to use Datagen outside of our team, let alone our department.
- Validation: We never took the time to solve the validation problem. If we want 1000 courses, why aren’t we validating as part of the datagen process that we indeed created 1000 courses. Just another reason for us to lose confidence in datagen.
- Missing Entities: Learn is a big system, so it makes sense that we are missing entities. The fact that we are missing entities is more a fact that we as SPE’s didn’t need them or realize we needed them.
- Efficient: We were definitely talking about efficient from a resource perspective. We get out of memory exceptions all of the time. We constantly go into Full GCs. This is definitely a sign that our own datagen development is lacking efficiency.
- Performance: Datagen is simply too slow. It goes back to the NFRs above. We can’t predict when datagen will finish, yet we often need datagen to finish in minutes. Why can’t it?
- Recovery: Today datagen cannot reliably recover in the middle of a task. It’s not efficient from a clean-up perspective either. So it’s not as simple as clean-up and recover. You are better off running datagen step by step with snapshots between. Yet another confidence problem…
- Codeline Management: Datagen is supposed to call LEARN APIs. Often it’s just a local copy of a LEARN API. Not good…not good at all. This is a huge management nightmare for us.
- Date/Time Issue: This is something we could solve, but have never considered it too high a priority. The basic point is that entities have time stamps of when datagen ran, which doesn’t account for legacy or historical data.
From these points, we highlighted the four biggest areas to focus on:
- Code Maintenance
- Error Handling and Recoverability
We had a solid discussion about where to go with the 4 biggest areas covered above. Our conclusion was that really needed to move away from Datagen the way we see it today. Success is possibly best achieved if the code is truly tied to a branch in LS. This has great dependencies on LS having a formalized Unit Testing framework for C.R.U.D on all entities. We would then add some additional abstraction via a wrapper to handle logging, monitoring and quite possibly a validation framework.
The key is the Unit Testing Framework. We don’t really have a sound UTF right now. We can’t guarantee appropriate coverage and consistent coverage.
We definitely have some big barriers to overcome. A couple were covered above. The key is that we don’t have a UTF in place that’s followed. So if we are going to make any progress moving from Datagen to UTF, we are really need to a) put the UTF in place and b) gauge our entity gaps. This won’t happen unless we can get some significant buy-in from our teammates within Engineering. Next we have to deal with NFRs for the UTF. No problem would be complete unless we dealt with the barriers tied to our LoadRunner dependencies. One additional barrier to consider is to find a project champion to drive this initiative.