Top 10 Users…Why This is Important

Added by Stephen Feldman, last edited by Stephen Feldman on Mar 23, 2011

I’ve been meaning to write this blog for quite a while. Procrastination has definitely held me back…that and the fact that what I want to do is no simple task. When you want a complex task to be done, you better have your thoughts organized.

Over the years, I’ve considered so many different ways to forensically study what people do in the system. I’ve looked at logs. I’ve looked our ACTIVITY_ACCUMULATOR table. I’ve looked at aggregates of data as well. I’ve brought in tools like Coradiant Dynatrace and Quest User Performance Management. None of these tools has ever met my real needs. The reason is that I haven’t been able to articulate what I am really in search of.

I think I’ve had a few eureka moments as of late with what I’m interested in seeing. I know that I want to see what is being done when in our product. I know that I want to understand the sequence of events and the orientation of where events happen in the system. I want to understand the probability of something happening. I want to see the frequency of something happening. In the case of frequency, I want to understand the volume related to frequency. I think all of this data is relavent because it will give us more insight into predicting patterns of usage of a system.

Where a lot of this has come from centers around coversations I’ve had recently about Assessment performance. A lot of customers have been complaining about high-stakes assessments in which they have hundreds of students taking tests all within a lab. They have been complaining about both memory issues (makes sense) and I/O issues (inserts/updates on QTI_RESULT_DATA) which also makes sense. In the case of I/O they didn’t really call them out. Rather, after discussing, I called-out that there likely were some I/O issues based on the behavior an assessment. One of the things I’ve been suggesting to customers was to query the QTI_RESULT_DATA table to get a resultset of row’s inserted versus modified. Then put it in a scatter plot (from an isolated period of time) to see the volumes of inserts versus updates to see when the timeslices of these events were occuring. From that data, then go into their I/O sub-system and graph their IOps for those same periods of time and overlay the two charts…

SQL> desc QTI_RESULT_DATA;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 QTI_ASI_DATA_PK1                          NOT NULL NUMBER(38)
 PK1                                       NOT NULL NUMBER(38)
 POSITION                                           NUMBER(38)
 ASI_PLIRID                                         VARCHAR2(255)
 ASI_TITLE                                          NVARCHAR2(255)
 DATA                                               BLOB
 BBMD_RESULTTYPE                                    NUMBER(38)
 PARENT_PK1                                         NUMBER(38)
 BBMD_DATE_ADDED                                    DATE
 BBMD_DATE_MODIFIED                                 DATE
 BBMD_GRADE                                         NVARCHAR2(32)

 

Back to My Point

So all of this talk about using scatter plots to isolate time of when certain events happened in mass, got me thinking about why I wasn’t getting what I really wanted (aka…my rambling above). What I really wanted to create an identity of a user. I didn’t care about their name, just their role. I would call them “Insanely Ambitious Student” or “Constantly Connected Teacher”. It really doesn’t matter. What matters is that you can start building profiles about these users. Before you can build the profile, you have to have a starting point.

My starting point is to look at every entity in the system. I would like to be able to directly or indirectly trace back a row of data to a user. It’s not as simple as you think. First off, not every table has a foreign key relationship to USERS. Some tables have a tie back to COURSE_USERS, which is not a problem per se, but it’s not a straight-up look at each table with USER_PK1 foreign keys.

As a starting point, I would like to do a gap analysis to determine what entities can be directly tied back to the user. From that, we need to know whether the row entry can be presented as time/date value. In some cases, the entity can even show the initial INSERT versus an UPDATE. We really need to understand this system-wide, which means yes we could/would touch the monster ACTIVITY_ACCUMULATOR table.

We could even start with a single entity as a starting point. I would even compromise for an entity that stores USER_PK1 in it. It has to be a table that can present a many to one reference of rows to a user. A good example might be MSG_MAIN as a starting point since it covers all of the criteria.

We could easily look at time series data by user, as well as aggregate statistics. Both are relevant, but obviously time series is a little more visual. I think you need aggregate statistics or at a minimum binned data (binned by time series per user) like aggregate counts by user over each week as a key data point.

SQL> desc MSG_MAIN;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PK1                                       NOT NULL NUMBER(38)
 DTCREATED                                 NOT NULL DATE
 DTMODIFIED                                         DATE
 POSTED_DATE                                        DATE
 LAST_EDIT_DATE                                     DATE
 LIFECYCLE                                 NOT NULL VARCHAR2(64)
 TEXT_FORMAT_TYPE                                   CHAR(1)
 POST_AS_ANNON_IND                         NOT NULL CHAR(1)
 CARTRG_FLAG                               NOT NULL CHAR(1)
 THREAD_LOCKED                             NOT NULL CHAR(1)
 HIT_COUNT                                          NUMBER(38)
 SUBJECT                                            NVARCHAR2(300)
 POSTED_NAME                                        NVARCHAR2(255)
 LINKREFID                                          VARCHAR2(255)
 MSG_TEXT                                           NCLOB
 BODY_LENGTH                                        NUMBER(38)
 USERS_PK1                                          NUMBER(38)
 FORUMMAIN_PK1                             NOT NULL NUMBER(38)
 MSGMAIN_PK1                                        NUMBER(38)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s