So late night last night…not enough time to write a really detailed blog about the morning plenary sessions. We have our entire morning shared, so I anticipate frustration over the wireless…YEAH!!!
The first two sessions are really 4 mini-sessions. There’s no debate. I’m going to sit in those sessions without a doubt. They are 4 sessions about in-depth coverage about the browser. Developers and leads from Chrome, Firefox, IE and Opera will be presenting 20 minute sessions on what’s new in their browsers and how to optimize for their browsers.
So all 3 sessions could have value.
I’ll discuss many of the issues involved when it requires more than one, ten, or hundreds of machines to service web requests and the battle-tested solutions to allow those systems to communicate together rapidly, redundantly, and securely.
Topics will include:
Current technologies for interconnecting systems (Thrift, Protobuf, Memcache, Zookeeper …and more!)
Understanding timeouts in the request/response pipeline
Redundancy and service discovery
Abuse mitigation at large scale
Load balancer concurrency and connection rates
Dealing with an avalanche of logging data
Low level TCP debugging to understand system performance
Monitoring and reporting of cluster performance
We know Velocity Conference stands for Faster, Bigger, Cooler Web Operations. Chances are that together with the evolution of your technical architecture and infrastructure your organization will evolve as well. New people join or leave, knowledge needs to be transferred, processes become more complex and structured, alignment becomes harder to attain. As engineers we are trained to work at he technical level, but in order to achieve maximal throughput in our work, both levels need to be in balance as well as in good balance with the business
So how do you keep your organization scalable, adaptable, secure and performing? How do you check the health at the human level? Or ‘measure’ the devops gap to make sure the collaboration stays at a top-notch level.
Most of us are engineers, and when we are faced with a problem, we go on modeling a worldview in which we start measuring things to verify our thinking. Therefore we will show the similarities between the technical and the human level, and explore different metrics you can collect and use for monitoring the health of your organization.
This session heavily draws on various concepts of debt (technical debt, financial debt, …) and will explore how common metrics such as code debt or incidents tracking can be used as indicators for the organizational health of web operations. It will illustrate the power of so doing by citing examples from recent client engagements.
Our work on a cutting-edge HTML5-based game library exposed several flaws in canvas implementations both between browsers and between operating systems. Canvas is a unique API in that it provides extremely low-level access to a drawing context, which means that performance and functionality are determined entirely by the underlying implementation. This has interesting consequences, such as differing implementations across operating systems even in the same browser version. While developers may get the ability to draw arbitrary shapes on the screen, they lose features such as high-level mouse events and other interaction paradigms and they do not have easy access to certain features that other drawing libraries expose, such as multiple graphics buffers, double buffering and animation support, and fast raw pixel APIs. We present common problems and solutions that users developing with canvas will face. Topics include image manipulation techniques, browser and OS differences, upcoming HTML5 APIs, and JS optimization techniques for high-performance drawing. We discuss in detail some surprising performance characteristics of image compositing, clipping, scaling, and rotation, as well as how the different canvas rendering backends contribute to overall canvas performance. Finally, we offer suggestions for implementation and optimization strategies that leverage existing and future canvas capabilities.
In this session, we will discuss how operations teams can extract key information from user-perceived performance measurements in real-time to make key operational assessments and decisions. Network routing collapses, CDN nodes fail, DNS Anycast horizons can shift and it can all be shown to you by your users. However, often times, this information isn’t correctly segmented and aggregated and the gems remain undiscovered. By looking at real-time user performance data and adding a good deal of magic meta information, we can now assess uncover serious operational problems before they subtly manifest is business reports later.
Sixty-five engineers at Etsy deploy code to our production servers more than 30 times a day. We keep this process safe with a suite of unit tests, integration tests, and a large number of application-centric dashboards written by engineers. We capture metrics in Ganglia, Cacti, and Graphite and these metrics from technical aspects like outgoing bandwidth and web server requests per second to business aspects like new registrations and gross sales.
I plan to present an overview of the tools we use for collecting metrics and the code we use to quickly build one-page dashboards for different aspects of our site (e.g. general health, image storage, search infrastructure). The underlying theme is that these tools are not difficult to use, but typically lie in the “operations” domain. At Etsy, we’ve gone to great strides to get engineers excited about contributing to metrics and dashboards, and make it dead simple to do these things quickly so that it doesn’t impact their ability to meet deadlines.
Between now and the summer, we’ll be releasing some of the tools we are using for metrics collection and dashboard building on GitHub. I will be going into some technical detail (read: real code!) on how we integrate these tools.
Having been with the company for 4+ years, Jeremy will walk you through the history of reddit’s growth, detailing the kinds of problems they have run into and how they solved those problems. Not known for pulling punches, he’ll give you the low down on Amazon EC2 and how they have both helped and hindered reddit’s success.
Web performance and ad performance go hand in hand and often, ads impact web performance far more than we realize. Engineers spend months/years of effort on optimizing for capacity and performance, but ads are out usually of developer control and visibility. This presentation talks about the display ad ecosystem, how display ads are integrated into web applications, and their impact on different aspects of operations, latency, user experience, and the bottom line. It details how to measure the contribution of ads to the overall page latency for specific ad positions and/or specific markets. Methodologies on ad troubleshooting, instrumentation, and monitoring are discussed. The talk also discusses attempts in translating ad performance impact to the bottom line. Finally, techniques on how to holistically improve user experience and most importantly, perceived performance, due to ads, are shared.