author: wiltzius@ IntroRendering performance is a huge topic of great interest to the Chrome project and web developers. As the web platform is essentially an application development framework, ensuring it has the capability to handle input and put new pixels on the screen at a speed that keeps up with the display’s refresh rate is critical. “Performance” is an oft-cited problem area for web developers. We spend a lot of time investigating and (hopefully) improving performance of the platform, often guided by specific bad examples. This is the pursuit and elimination of jank. But what does jank mean? What are people complaining about when they say a web page “feels slow”? This document attempts to categorize the various symptoms that all get put under the “jank” category, for the purposes of accurate identification and precise discussion. To understand the descriptions of potential causes of jank, it's important to understand roughly how rendering works. See The Rendering Critical Path for a primer; these two documents should be read together. Finally, a note on the videos of examples -- video playback performance can tragically mask subtle performance problems as it frequently isn't rendered at 60Hz. The examples here should be obvious enough that the problem is still visible, but the originals of these videos are attached to the page, which you can download for slightly more accurate recordings. Symptom TaxonomyIncomplete contentCheckerboardingBrief: Parts of the page not showing up, particularly during a fast scroll or animation, and instead patches of either a gray checkerboard pattern or the background of the page showing through. Some background, for the curious. The checkerboarding problem is a very fundamental one for a document viewer: do you preserve responsiveness to user input or do you only show fully correct content? Chrome has developed a lot of (expensive) machinery to combat this problem, preferencing responsiveness. Checkerboarding occurs mainly during fast scrolling, when the page cannot be rasterized quickly enough to put up a new viewport’s worth of content every frame. When this happens Chrome will continue to scroll in an attempt to preserve the physical metaphor of page under the user’s finger... but in place of the missing content there will just be a blank space. On desktop platforms Chrome will draw a greyish checkerboard pattern; on Android (where this is more common) it draws the background color of the page. To try to avoid having to checkerboard Chrome will pre-rasterize content around the viewport and keep it as bitmaps in memory. This policy trades CPU and memory resource consumption for a greater likelihood that the page content will be ready if/when the user starts scrolling. It’s essentially a giant buffer to guard against how slow software rasterization can be. This approach has several limitations. For one, it’s a bit of a resource hog. It also isn’t under the control of the app developer at all, so if for instance they would prefer to jank (not produce a frame) instead of checkerboard when the scroll hits an unrasterized region of the page, they’re out of luck. It’s also ineffective if parts of the page that are pre-rasterized get invalidated by JavaScript -- now that pre-rasterized content is stale. It’s still useful to have around, since at least Chrome can display the pre-rasterized content until the new content is ready (this is the whole pending tree vs. active tree architecture; see the compositing design doc for an explanation), but it isn’t a replacement for fast rasterization. This is one of the (many) reasons motivating the move to GL-based rasterization (Ganesh), and when using Ganesh Chrome pre-paints only a very small buffer around the screen.
FramerateLow sustained framerate during any kind of animation (“janky animations”)Brief: animation with a mostly-steady framerate that’s below 60Hz. There’s typically a consistent bottleneck in the system when this is the case, and about:tracing is designed for diagnosing exactly this case. If encountered, a standard trace (no need for Frame Viewer) of the low-framerate animation should show what operations are the long pole for the animation. It’s worth noting that there are two types of animation: those handled entirely by the compositor thread (so-called “accelerated” animations) and those that need to synchronously call into Blink or v8. The only type of accelerated animations are CSS animations (and CSS transitions and Web Animation) on opacity, transform, and certain CSS filters; plus scrolling and pinch/zoom. Everything else goes through Blink. Accelerated animations will never be bottlenecked on anything in the RendererMain thread, but non-accelerated animations can be. Delayed beginning to any animation
This is commonly caused by accelerated animations that are set up but require rasterization before beginning. E.g. creating a new layer and setting a CSS animation on its transform -- the animation will not start until the new layer is done being painted. The timeline delay here is designed to prevent animations beginning immediately only to drop a number of frames. The result is typically a much smoother-looking interaction overall, but it can be surprising. Low framerate during to swipe input (throughput)Brief: special case of low-framerate animation, specifically when the animation is moving something in response to a touch movement. All touch-event-driven animations are non-accelerated animations. Unlike the delay at the beginning of an animation, which can be stretched out tens of milliseconds without the user really noticing, the only acceptable frame time for a running animation is the vsync interval (e.g. 16ms). Touch input can only be programmatically handled by JavaScript on the RendererMain thread, which means that all visual touch-input-based effects are technically non-accelerated animations. Note that they can still avoid painting by using an accelerated animation property, but they’ll subject to jank from all other activity on the main thread because the JavaScript input events can be blocked. Isolated long pauses (“jank”) during JS-driven animations (incl touch)Brief: Single long pauses, rather than a consistent low framerate, during animations Sometimes an animation is mostly fine, but stutters at one point. This is often harder to isolate, but about:tracing and the Dev Tools timeline are invaluable for figuring out what went wrong. Special cases of this include the delay at the beginning of animations. All non-accelerated animations are subject to jank of this type from any other activity on the Blink main thread (e.g. JavaScript timers executing). LatencyLong input latency (rather than low framerate) Brief: Most jank is related to interruptions in frame production, but latency represents another class of problem manifesting as longer delays between input events entering the system and corresponding new frames being output. It’s possible to have a good / high framerate but bad / long latency Some examples of high latency are covered in the framerate examples, for instance delays at the beginning of an animation that’s kicked off in response to input. Generally if the entire pipeline takes long enough that it doesn’t fit into frame budget, we categorize it as a framerate rather than a latency issue. However, there are some edge cases where latency can be increased by seemingly unrelated issues. High scrolling latency or delays beginning scrolling when the document has touch event handlersBrief: Calling preventDefault on a touch move event is spec’d to prevent scrolling from happening. Chrome therefore tries to give JavaScript a chance to run any registered touch move event handlers before scrolling the page. If JavaScript takes a while to respond, however, it can increase input latency during the scroll. There’s a delicate balance between honoring the contract with touch event handlers the page has registered and staying responsive. In particular, becuase JavaScript touch event handlers are stuck on the main thread with everything else, completely unrelated activity such as XHR parsing or style recalculation or JavaScript timers that run on the main thread will all block touch event handlers from running (none of these tasks currently yield, and as of this writing Blink’s entire event loop is a FIFO queue with no prioritization, although that’s changing with the advent of the Blink scheduler). The result is that if style recalculation runs for 100ms right when the user is dragging the page around, the scroll won’t move for 100ms until style recalc is finished, the touch event gets run and preventDefault doesn’t get called during it. Note that browser behavior here is wildly under-specified. Chrome’s behavior has evolved over time, and changed significantly recently with the advent of asynchronous touch move processing. One of the more egregious behavior hacks here is that Chrome typically gave touch events a ~200ms deadline to run, and if the event wasn’t processed by then it would get dropped and the page would scroll anyway. This was designed to preserve basic responsiveness even in the face of adversarial content. Scroll-linked effects out of syncBrief: visual effects linked to the scroll position can become 1-2 frames out of sync from the actual scroll This is somewhat complicated, but boils down to two different modes of operation in Chrome’s multi-stage rendering pipeline. In low-latency mode each stage of the pipeline will complete fast enough that all of the stages complete within 16ms. If any of the stages run long, Chrome will fall out of low latency mode into high latency mode. At that point there will be an extra frame of latency in the entire rendering pipeline, which effectively increases frame budget by allowing thread parallelism but at the cost of latency. Note that technically there is currently a low/high latency mode for both the main thread > compositor thread step and the compositor thread > browser UI thread step. The problem described here is specific to the main thread > compositor thread step. The synchronization issue is that scrolling is a compositor-thread operation whereas the scroll-linked effects are necessarily main thread operations. This means that in high latency mode the main thread may be 1 frame behind the compositor thread, which means that the scroll position that the compositor is using to position the page and the scroll position Blink/JavaScript is currently aware of will be 1 step out of sync. Other than keeping the page in low latency mode by never blowing frame budget, there’s currently nothing a page can do about this. It’s also worth noting, though, that input latency is rarely a visible problem outside of these scroll-linked effects (like pull to refresh, or poorly implemented parallax, etc). The platform’s biggest problem remains that it’s incredibly difficult to maintain a high framerate. |