Daily Archives: July 7, 2009

A Little Behind on My Blog Reader…New Java OptimizationTool from IBM

IBM Real Time Application Execution Optimizer for Java

IBM Real Time Application Execution Optimizer for Java helps to optimize and verify a compiled Java application, preparing the application for deployment in specific environments. It is a command line tool that can operate on any compiled Java application, whether standard edition, micro-edition, or real-time. The tool provides the following functions:

  • Escape analysis of objects per method invocation
  • Control flow analysis that splits an application into archives according to thread accessibility
  • Control flow analysis that detects potental occurrences of real-time java runtime errors MemoryAccessError, IllegalAssignmentError, IllegalThreadStateException
  • Control flow analysis that determines entry points into an application
  • Addition of stackmaps to Java class files
  • Verification of Java class files
  • Auto-generation of classes that will load and initialize all other classes within the same archive
  • Specialized packaging of Java applications into deployable archives by packaging all referenced classes from a dual class path
  • Removal of unwanted attributes from Java class files

Please note this was taken directly from the IBM source…just wanted to capture notes and share the info!

Day Two of Velocity 2009

Morning Keynote

The funniest intro started the morning. They lead with a YouTube clip of Conan O’Brian and Louis CK about people appreciating technology. It was absolutely hilarious. Shortly after Steve Souders talked, a guy named Jeremy Bingham from DailyKos.com came-up to talk about surviving the 2008 elections. It was quite possibly the worst presentation in the history of mankind. I thought the guy was going to freeze on stage and someone would have to rescue him tarzan like swinging from one end of the stage to another. I haven’t jumped on Twitter, but I’m almost certain the Twitts flogged him.

Then the real speaker came-up, a guy named Jonathan Heililger from Facebook came-up to talk about Facebook scalability. A couple of walkaways from this speech:

  • Facebook tackled I18LN a completely different way then any other company has done this.
  • FB defines an active user as a user who comes to the site within the last 30 days.
  • FB doesn’t have QA, but rather ENG is responsible for all test case development, execution and even deployment.
    • They do have an Opps team that works with ENG to assist with deployment
  • FB has a suite of tools for Performance: still looking for documentation on this
  • FB has a performance engineering team
  • One major point the speaker said is that we really need to consider testing with real users and not just depending on automated performance tests.

Microsoft and Google Co-Present

Eric Schuman from the Microsoft Bing team and Jake Brutlag from Google put together a joint presentation on the affects of artificial delays injected into page responsiveness and there affects on user behavior with search engines. The two teams kind of randomly worked on this and came to the conclusion that they had a lot of similarities with their data. They studied three things: server delays, page weight increases and progressive rendering.

They determined that server delays as much as 50ms to 2s had a drastic affect on behavior. Users became quickly uninterested in working with the search site and often abandoned their work. They also found that page weight had little impact. They made changes of 1.05 to 5X page sizes. For higher bandwidth users it simply didn’t make a difference.

Progressive rendering which is based on chunk transferring encoding provided a positive experience for users. Users felt more captivated and subsequently kept working within their application. This is definitely something we need to investigate further.

Key Take-Aways:

  • Delays under .5s impact business
  • # of bytes in response time is less important then what they are and when sent
  • Progressive rendering should be used in order to get quick feedback to users
  • Make investment in experimental platforms

Next Web Challenges from Keynote

This was just a marketing presentation about Keynote’s new product called Transaction Perspective 9. It was cool, but definitely too much markitechture.

The best part of this presentation was the opening. They showed the YouTube clip for Cool Guys Don’t Look at Explosions. It’s a must watch…

John Adams from Twitter

I will keep this short. One thing this guy talked about was an idea at Twitter called Whale Watching. Apparently Twitter has had some scalability issues over the past year. They try to keep their whales per second (HTTP 503 errors) > whale threshold.

Another interesting thing they have done is make their website performance completely transparent. Take a look here for an uptime report.

Page Speed with Bryan McQuade and Richard Rabbat

I’ll keep this brief as well. Page Speed is cool. It’s not a replacement for YSlow, Fiddler or HTTPWatch. It tries to be the replacement, but fails to do just that. I see it has providing similar, yet different data to those other tools. One thing it does is optimize images for you to place back in your code. It also tells what JS is wasted and deferred. It also minifies for you…

The team built in rules for determining inefficient CSS selectors. They built in the rules from David Hyatt’s CSS best practices that I talked about in yesterday’s blog. That’s pretty cool…They also have an activity panel that will soon show reflow (paint events).

IE vs. FF vs Chrome Session

The three top dogs from IE, FF and Chrome went back to back to back on sessions about their browsers. It was cool as I got to meet Christian Stockwell personally from the IE team. He worked with me on some of our Grade Center issues last year. Mike Belshe from Chrome and Christian Blizzard from FireFox also spoke.

Highlights:

  • IE team says they focused on layout, JScript and Networking with IE8 improvements.
  • Chrome team says they focused on rendering, JS and Network with Chrome 3 improvements
  • FF team says they focused on Network Performance (HTTP stack) and DNS prefetching
  • IE 8 has native JSON support, raised connections from 2 to 6 and a new Selectors API
  • Chrome is based on 3 processes (Browser, Renderer and Plugin)
  • Chrome uses WebKit for rendering
    • Use V8 for scripting engine
  • FF uses trace monkey (JS engine) and Gecko for DOM (rendering)

One last point…need to look at Chrome’s community page

MySpace’s Performance Tracker

Ok…I have to admit I’ve never been on MySpace. I don’t have an account, nor have I ever logged in. I’ve seen shots from my wife’s computer and in the news, but I never made the jump. The PE team fromMySpace presented a tool they wrote called MSFast. The tool looks cool…it’s a JS injector. I doubt I would use it. I’ll give it a spin when I get back and make a final judgement.

Doug Crockford’s AJAX Performance

So the father of AJAX performance from Yahoo presented. Sadly, I walked away from the presentation disappointed. I think the only thing I walked away was that IE doesn’t handle arrays, but rather uses linked lists.

Blackboard Questions

  • How big is our cache at Bb for the typical customer?
  • Have we considered HTTP chunking?

What I Did on My Summer Vacation…Just Kidding…Day 1 at Velocity Part Two

The Fast and the Fabulous: 9 Ways Engineering and Design Come Together to Make your Site Slow

Nicole Sullivan from Yahoo gave her presentation on Object-Oriented CSS. It looks like she just launched a new site about OOCSS. She will also have an article in Layers Magazine later next month about the topic.

Her main argument is that CSS has started to grow into weeds and is very difficult to manage. She is pushing for a component library for manageability and reuse. She has (4) examples of default CSS, grids, modules and content worth looking at. As she put it, we should be separating the following:

  • Container and Content
  • Structure and Skin
  • Contour and Background
  • Objects and Mixins

Other Notes to Reference

  • Take a look at this article from alistapart.
  • Keep selectors fast: define default values and style classes (not objects)…except for defaults (globals)
  • Avoid specifying location
  • Avoid overly specific classes
  • Use mixins to avoid repeating code
  • Encapsulation: don’t access sub-nodes of objects directly
  • Outcomes: Measure your results

What I Did on My Summer Vacation…Just Kidding…Day 1 at Velocity

Day 1 is officially in the books for my first day at Velocity. It was a really exciting day meeting other web performance specialists. The best part of the day had to be sitting next to Steve Souders and even getting the chance to talk to him for a bit. The guy reminds me a lot of Cary Millsap. He’s incredibly hands on which is awesome.

So where to begin…

The day started with Souder’s giving a presentation called Website Performance Analysis. Before he spoke a single word, he had YSlow running against the Alexa top 20 sites automatically using the auto-run mode. He encouraged all of the attendees to look at Google Page Speed. Surprisingly he wasn’t part of the development team, he offered some consulting, but that’s it.

He also announced his new book called Even Faster Web Sites which came out last week. My plan is definitely to pick-up a copy this week at the conference and yes I will get a signed copy. What’s unique about this book is that he has several authors contributing chapters. We are talking about authors like Doug CrockfordNicholas ZakasStoyan Stefanov and Nicole Sullivan. The last two are the developers of SmushIt.

Souders’s plans to announce a new tool tomorrow. Can’t wait to hear what tool he is talking about. He didn’t say who is the author of the tool, though my bet is on Google. He said this new tool is supposed to address the gaps that you see when JavaScript is executing, you know…the so called white spaces or empty spaces in an HTTP timeline chart.

Why Focus on JavaScript

This was definitely the most important part of the session. Souders brought Nicholas Zakas immediately after his session to do an entire session on JavaScript performance. Essentially scripts block downloading and rendering as they have to execute in order. Downloads of JS don’t do the blocking, but rather their initial execution of the JS blocks. He basically conceded that the problem was the browsers which force this blocking.

He showed a slide demonstrating old browsers (IE 6/7, FF 3, Chrome 1 and Safari 3) versus the new browsers (IE 8, FF 3.5, Chrome 2 and Safari 4). At best we see 20% improvement. New browsers still block. Souders then demonstrated his own tool that he wrote called Cuzillion which allows developers the chance to model their rendering edge cases by creating mock pages represented with HTML objects such as scripts, images, CSS, etc…I definitely think it’s worth having the team look at the tool.

Splitting the Initial Payload

Functions that execute before the OnLoad() event have the opportunity for lazy loading. Souders brought this point up this point which was covered in this AJAXIAN article as a means for minimizing script blocking latency.

His main argument about splitting up the payload is trying to determine: What’s necessary to render vs. Everything Else. One exampled he cited was MSN which has some kind of secret sauce for doing this. I did some research and found an article that articulates the MSN example as a good design pattern for JavaScript loading.

I came across the tool Souders references by slides, but made no verbal acknowledgement of this tool called Doloto from Microsoft which analyzes what workloads can be better split. The tool was presented last year at Velocity. Doloto is a system that analyzes application workloads and automatically performs code splitting of existing large Web 2.0 applications. After being processed by Doloto, an application will initially transfer only the portion of code necessary for application initialization. The rest of the application’s code is replaced by short stubs – their actual function code is transferred lazily in the background or, at the latest, on-demand on first execution. Since code download is interleaved with application execution, users can start interacting with the Web application much sooner, without waiting for the code that implements extra, unused features.

Sharding Workload

This was really interesting, but probably not relevant to us as I believe we are limited to only 1 domain for all of our code, then N number of domains for external content. Now that I think about it, it may apply for these external edge cases. This process is called Domain Sharding Essentially the browser will perform more then their minimum parallel downloads. Let’s say you have to download 4 objects from one domain and 4 from another. The modern day browsers could parallelize both domain calls in one waterfall call, whereas the older browsers would be more stair case oriented. I’ve attached his presentation on the topic and would love for the team to review.

Flushing the Document

I’m not sure if this is something we are capable of doing in Java. Essentially, Souders talks about how you can flush the HTML document to create a waterfall like scenario for speeding up the download of content. He cites examples in PHP, Ruby, etc…but no mention of Java. It looks like there is a similar flush() method in Java, but I’m not certain. I’ll add it to our list.

Investing More Time in Understanding Our CSS

I’ll talk about CSS in a later blog about Nicole Sulivan’s perspective on Object-Oriented CSS. Souders brought up CSS from the context of understanding: Rules and Elements. He referenced an article by David Hyatt from 9 years ago that is an absolute must-read. Souders himself wrote a blog about simplifying CSS selectors which is something we must read as well.

Sprites

Ok UI team, we really need to talk and act. Sprites aren’t just a craze…they are legit. Every major web site has moved away from GIF and JPG in favor of Sprites and occassionally PNG (as a replacement to GIF). We need to make a serious move to Sprites ASAP. There’s a great project out there that will do the conversion.

Blackboard Specific Notes I Captured

  • Souders talked about how YSlow used to be a GreaseMonkey script that interacted with Firefox. What about the idea of developing a Bb extension acting like a Bb Perf plugin? I’m not sure what it would be, but the idea of us developing tools for the team sounds intriguing and almost certainly what our next steps as a team need to be.
  • JavaScript is an area of weakness for PE in general and something we need heavier investment in before 9.2
  • Should we make the investment and buy HTTPWatch for the team?
  • We need to investigate the Doloto Tool
  • How many DNS lookups do we have for a Bb load?
  • Can we perform a flush call in Java?
  • We really need to do an isolated analysis of our CSS. If our designers are doing our CSS and not developers, we run a bigger risk that they are not as aware of performance. Not only should we do a CSS audit of the code, but do a browser analysis for performance. This should be a part of every release in which we make changes to the CSS and/or introduce a new browser.
  • We really need to make a serious push to Sprites, as well as encourage content authors in Bb to use them.
    • I’m serious in that I think we should write a special document specifically for content authors about making their pages more usable and responsive.
  • We also need to abandon GIFs and JPGs in favor of Sprites and PNG. More on this in a later blog.

Couple Thoughts on Abandonment in Prep for Dell

I’ve been running through Galileo studying how some of the most recent calibration efforts have been going. I wanted to get a sense of what lays ahead to the benchmark team as we prepare for the Dell benchmark next week. What’s interesting about the recent way we have been calibrating is the length of time it takes to run. In a most recent PVT for SP1, it took us 265 to 285 minutes (~5 hours). That’s not bad when you compare it to the old way of calibrating which could take upwards of 8+ hours to complete. I don’t want to waste 5 hours calibrating, when I think the answer can be obtained in 3 hours. So I would like to consider an alternative calibration method. Don’t worry Anand and Nakisa, I’ve done a sample run or two to prove it out. Before I get into this possible new calibration method, let me explain the old and current method.

Old Calibration Method: POC, AOC and LOC

Our old method of calibration was based on our original concepts of user abandonment. We believed that there was this idea of a peak of concurrency (POC) and a level of concurrency (LOC). In this model, abandonment began when user response time thresholds couldn’t be maintained. The LOC was where the rate of arrival equaled the rate of departure. The AOC was our view of the midpoint between the POC and LOC. Essentially, the AOC was relevant because we felt there had to be a workload that could be studied between the peak and level.

Once we determined the POC, AOC and LOC, we would then run a simulation with abandonment disabled. All simulations were 70 minutes in length. Our process start to finish took about 2 hours, hence the process took a little more then 8 hours to complete. A long while ago we found fault with this approach. I’m feverishly looking for my old blog post about it, but can’t find it. Basically, my frustration with this effort is that we often settled on AOC and LOC results most of the time. I don’t have statistics on it, which I should. I’m basing mostly from anecdotal memory.

There were other faults as well. It turned out that often the workload we settled on wasn’t sufficient to saturate our systems causing our PARs to be off substantially. So from this old method came the current one…

Current Calibration: Steady-State 90-9-1

Our current calibration approach is derived from steady-state workloads. We essentially look at a staircase of workloads over a period of time. We use a 2 minute recovery period between workloads. Each workload is responsible for 10 samples after a short ramp-up period. What’s interesting about this type of test is that we look at different workloads.

I have 2 problems right now with this type of calibration. The first issue is that the process takes upwards of 5 hours to complete. This is way too long to calibrate because if it fails we at best have 2 shots a day at running this. The second issue is related to faults I find with arrivals. We ramp-up in each cycle. So if we have a workload of 20 with a 2 minute workload, we essentially are ramping 1 VUser every 8 seconds. If the sample period is based on 10 iterations which occurs over 20 minutes, the first 2 minutes skew our data. We are likely to see 90-9-1 acceptable responses during that 2 minute bin. The data might be interpreted incorrectly.

So with that I’ve come to propose a third type of calibration…

Proposed Calibration: Continuous Arrival

I’ve only had the chance to test this twice at the time of this blog. I’m already interested in what I’m seeing. Basically, I’ve designed a very simple scenario in which we take an extraordinary number of available VUsers into one pool. Let’s say for grins we take 500 VUsers. We arrive that pool of users over an hour. In our case it was about 58 minutes. We arrived 1 VUser every 7 seconds. Abandonment was on

Here’s what we learned. First off, at some point in time we will see diminishing returns. This occurs from an abandonment perspective in which those going in start to get outnumbered by those coming out. Same can be said about hits/second. We eventually reach a max and that’s all we can do.

In our scenario, our problem was unique to us. We were CPU bound on an 8-core system. I couldn’t believe my eyes when I saw that. We achieved that with only 125 VUsers. That’s crazy when you think about it. We should have been memory bound like we’ve been in the past. With the recent JVM changes, the balance of resources has shifted from memory to CPU. This could be problematic at Dell or it could be a blessing.

What’s interesting about calibration is that I still think you have to bring the data into 90-9-1. The difference is that we are really looking at a true VUser curve rather then a VUser cardiograph.

The most complete list of -XX options for Java 6 JVM

Last changed: May 13, 2009 11:28

I stumbled across this link today while doing analysis of non-standard (-XX) options. I think the most interesting set for us to pay attention to are the Product and Diagnostic flags. I have to say it’s quite amazing how many flags are available.

Given some recent tests on the Concurrent Collector (UseConcMarkSweepGC), it’s quite remarkable how many CMS options we will need to explore.

Here’s a slightly older document

I’ve also come across this pretty interesting presentation about the G1 collector

Fiddlerhook for Firefox 3

Last changed: Apr 06, 2009 12:08

I might be a little late in publishing this note. I recently upgraded to Firefox 3 this week after getting my new desktop. I’ve been patiently waiting for an upgrade and finally got one on Friday after my desktop of 4 years finally crapped out on me. I’ve been setting up my development environment somewhat frantically trying to get the latest and greatest of everything.

I did my Firefox upgrade to FF3 this morning. I did a YSlow install, as well as a Fiddler2 install independently. After rebooting Firefox to recognize YSlow, low and behold an integration between Fiddler and Firefox was automatically installed. The new integration is called Fiddlerhook. It’s been out since 3/31/09. What’s great is you can easily toggle Fiddler capturing via Firefox on via the Tools menu or the icon below.

Happy Fiddling!!!

Reconsidering Our Approach to Oracle Statistics

I might be foolish thinking I can capture my notes as eloquently as I would like given I only have 30 minutes or so to post this entry. So I’m going to try my best to get my thoughts down on the topic of Oracle statistics and see what comes out in this blog.

A few weeks back I went to the 2009 Hotsos Symposium. While I openly admit this years was not very good compared to last years, I have to say Karen Morton’s presentation on Managing Statistics for Optimal Query Performance was the best paper and presentation overall. In fact it was so good, I would say it was in my top 5 overall of all papers/presentations I’ve partaken in with regards to Oracle.

Morton’s presentation really got me thinking about how poorly we deal with statistics. First off, we gather statistics only at the schema level. So we take a one size fits all approach. Second and probably the most concerning is that we have not performed any design of experiment to validate that what we recommend has a positive affect. Surprisingly, our test results often reveal that query performance blows after our statistics. Third, I don’t think anyone (including myself) has really spent the necessary time studying everything there is to know about manual gathering of statistics. We have taken a very ignorant approach to Automatic Statistics Gathering which in my opinion is foolish considering we truly don’t understand Oracle’s approach to automatic statistics gathering.

That leads me to where we are now. Statistics help the Oracle CBO (Cost Based Optimizer) make better decisions about access paths, join methods and join order. As Morton says, “Statistics matter!” Our approach contradicts that statistics matter. We sort of put our middle finger up at the CBO and say “let’s play a little roulette”. That’s no way for us to run a performance engineering team.

So I would like us to address statistics, as well as histograms (considering they are incredibly relevant to the topic of statistics) in the form of a formalized design of experiment. Here’s what I would like to do…

First, I would like for someone to take this project on for the next few weeks and champion the design of experiment. This person would be responsible for reading Morton’s paper and summarizing her main arguments without copying word for word what she wrote. I’m seriously looking for someone to interpret and understand her core arguments as though we are trying to argue them in a court of law or a public debate.

Once you are finished reading and reviewing the article (in a blog of course), I would then like you to argue what we are doing from either a positive or negative vantage point. Specifically, we approach statistics gathering from a fairly universal (one-size-fits-all) approach. In the same regard, we do very little from an initialization perspective to manage CBO parameters. Thirdly, we have weak understanding of the use of histograms. I think we understand our data (from a scaled data model perspective), but we ignore our own knowledge of the data in order to have minimized scaling and set-up times. Ask yourself this…If statistics should be updated after 10% changes in data, shouldn’t we reconsider the role statistics as part of clp-datagen? Could lack of statistics be altering our scaling times?

I actually thought about the last question a lot while at the conference. Suppose we scaled a system using the PVT model. One thing we see is that as we permeate from XS to M3 dimension, scaling gets slower and slower. We hypothesize that the cause of the slowdown has to do with the size of the entities we are creating. What if we challenged that notion by doing a few scaling exercises:

  • Scenario A: Scale XS, ST and XL in a sequential manner
  • Scenario B: Scale XL, ST and XS in a sequential manner
  • Scenario C: Scale XS individually followed by ST individually followed by XL individually
  • Scenario D: Scale XL individually followed by ST individually followed by XS individually

The idea behind these four scenarios is to measure whether scaling any of these dimensions in any particular order with the absence of updated statistics has any affect on scaling times. I should be able to compare Scenario A against B, as well as Scenario C against D. If possible, I could compare any of the 4 scenarios against each other. Depending on what we find, we might choose to introduce two additional scenarios:

  • Scenario E: Scale XS individually and run statistics, scale ST individually and run statistics, followed by scaling XL individually
  • Scenario F: Scale XL individually and run statistics, scale ST individually and run statistics, followed by scaling XS individually

Of course we would have to make some decisions about our statistics approach. We could assume ignorance and use our current methods. Or we could use this time to really understand the best approach to statistics gathering.

Upon completion of the article and the experiment, we need to tackle the problem of lack of knowledge with regards to statistics gathering in a more scientific manner. At this point I would suggest reviewing the following presentation from Oracle Open World in 2005.

Our goal is to address two fundamental questions:

  • What’s the best way for us to gather statistics as part of the datagen process?
  • What’s the best approach for us to use for gathering statistics with the PVT data model?

We might need to address more questions. What I mean by that is we can’t necessarily take a one size fits all approach to this discovery project as well. We are going to have to dig a little deeper into our application and identify a series of use cases (our most critical and complex) to get a better understanding on a few key points:

  • How affected are these queries from a CBO perspective when we take a universal approach to gathering statistics?
  • How uniform are the data sets for the key entities for these queries? Specifically, what’s the cardinality of the data set?
    • Would these entities benefit from the use of histograms due variance and skew of data?
  • Would these entities and their schema objects benefit from a more focused statistics gathering approach specific to these entities?

I’m going to propose a few use cases in the JIRA ticket I create for whomever works on this assignment. Most likely the queries will be Grade Center, Discussion Board, Assessment, Content and Layout (My Institution or Course) use cases.

Whoever takes on this project will be responsible for designing multiple experiments to include the scenario above for datagen, some form of PVT execution scenario measuring more throughput and resource oriented metrics and then some focus test metrics for the key use cases. I would expect 10053 tracing would be used as a measurement tool. Execution plans and Estimated plans should be presented within the artifacts. Finally, an argument about what we should do next for this project should be made. I am not sure if we will be able to complete all of our goals with such a short window.

Has Anyone Ever Heard of PaperCube?

I just stumbled across a blog on the Ajaxian about this very interesting Web 2.0 application that is part search, part visualization and heavily semantic. The application is called PaperCube and it’s totally an Alpha application containing just a small repository of academic papers right now from Penn State. The killer technology is called SproutCore.

There’s a great video showing you how to interact with PaperCube.

Some Great Postings out There on Java and Memory Leaks

It feels like it’s been a while since I’ve heard anyone discuss memory leaks. So when I came across these two postings I definitely felt compelled to put together a short posting for searchability purposes later on. I’m thinking of opening up a ticket for members of the team to review the pieces…

The first blog is called Memory Leaks are Easy to Find. The second is really a sample workshop or self-paced tutorial called “How to Fix Memory Leaks in Java“.