Within Chromium, tasks are regularly dispatched between various threads, by "posting a task," or between processes, by receiving an IPC directive. This code structure can make it difficult for a standard instruction profiler to understandably attribute execution costs. It also makes it difficult to follow the flow of code in a debugger.
At the about:profiler internal URL, Chromium provides a profiling and debugging infrastructure: It provides answers to questions such as: What tasks are taking a "long" time? What tasks are called "too often?" What tasks are suffering "significant" queueing delay? How busy is each thread? etc.
This document describes some of the features supported by the about:profiler infrastructure.
You need M17
The automatic tracking of tasks is now supported in the default Release builds of Chromium as of version M17 (Chrome Dev and Canary channel 11/1/2011, which is 17.0.925.0 or later). [Warning: Windows XP may requireh the "--enable-profiler" command line flag due to an allocator bug. This restriction will be lifted ASAP].
The infrastructure described has three basic uses:
Posted-tasks and IPC tasks are tracked, along with their performance characteristics, across the life-cycle of each task. Information is maintained about where each task was constructed/posted (function name, file and line number), along with the thread that was active when the task created. After each task has been executed, information about the run of the task is tallied. Information includes where it executed (thread, process, etc.), as well as counts and durations (time, queueing time, etc.) All that information is compactly accumulated in real time, while Chromium is running.
The accumulated performance characteristics are all available during a any session. Additional reference snapshots can be taken at any time during the execution of the browser. The results, as well as deltas between snapshots, can be viewed, sorted, aggregated, and filtered, all within the about:profiler page.
For learning about the codebase, is is very educational to perform some browsing actions, and then look and see what collection of chromium tasks were fired on various threads. Rather than pouring over the codebase, it is often more educational to look at these lists of tasks. The about:tracing support is also extremely useful for more detailed investigation in this regard.
Common Profiling Activities When Trying to Identify Performance Issues
Overall Performance: Task Execution Counts
The easiest way to improve performance is to do less work. If you wish to identify tasks that are running "too often," you should bring up the about:profiler page, which will by default sort all entry by the number of times each task was performed (its "count"). The page will have a header above a table that looks like:
The red asterisks in the column heading "Count" indicate that the results are sorted by that column. The row above the column headings provides an aggregate view of the table that appears below the he
Keep in mind that although high-frequency tasks (tasks with high "Counts") may have a short run times, the overhead for such tasks is not included in their run times. Overhead includes construction, posting to a queue (via locks), getting time slices that other tasks could have enjoyed, cleaning up after a task is executed, etc. As a result, reduction in the number of tasks, for very high frequency tasks may be very beneficial to performance.
CPU Utilization: Where Is The Time Spent?
If you wish to look for hot spots, or tasks that are using up a lot of the CPU time, click on the "Total Run Time" column heading to sort on that column. All times shown are in milliseconds.
The sorted table will highlight tasks which have total execution time that is large. That statistic means that the product of the number of executions, and the execution per run, was high. If that task can be made more efficient, it may have a positive impact on overall CPU utilization. Caveat: by default, Execution Time is measured with a wall clock, not based on activity of the CPU on the thread. Beware of tasks that block, such as waiting for a DNS resolution.
Major Jank or Delay In Responsiveness
When the browser is not responsive, the most common cause is that a task is running too long, and has blocked the ability of some critical thread to respond. Most obviously, this can happen on the CrBrowserMain (i.e., the UI thread in the browser process). It is also common to see this caused by a slow task on the Chrome_IOThread (i.e., the thread used for IPC communications with the renderer). At times, some threads send messages to other threads to get work results, and may be waiting (without officially "blocking") for a posted response, so almost any blocked thread *might* be able to cause jank, but the Chrome_FileThread (i.e., the thread that reads and writes from disk) is a most common culprit.
To see what tasks have taken an unusually long time to execute, click to sort on the "Max Run Time" column.
It is often helpful to view the results on various processes separately, as Jank tends to be caused by delays in the Browser process. To separate out the results, by process, find the drop-down box labeled "Group By" at the top of the page. Select "Process Type" to cause the browser process to be isolated in its own table. Once you select that, a second drop down will appear to the immediate right of the first. You should then select "Exec Thread" in that pull-down. The top of the page will then look like:
Separate tables will now be visible to make it more visible, for each thread, what tasks is taking a "long time." You'll also notice that since each table only list a single process type and a single execution thread, that the corresponding columns are no longer shown in each table.
After you have identified a thread that appears to have abnormally slow processes (that might cause jank), it is often helpful to see what tasks were delayed (an that thread!) by such excesses.
To see what tasks have been forced to wait, click to sort on the Max Queue Time. Assuming the observed execution time was long, you'll tyically see a number of tasks that have a commensurate queuing delay.
Quickly Drill Down
If you know what you wan to look at (example: a specific file; a specific thead; etc), you can instantly filter all your results to only show rows that contain an arbitrary string. The text box at the top-right of the about:profiler page provides automatic case-insensitive filtering.
For example, if you wanted to only see results that contain the word "browser" (perhaps, items running on the "browser" thread), simply enter the text "browser" (without the quotes) into the box at the top right. Similary, the box could be used to illuminate only lines that contain "sync," or lines that contain only "AnimationContainer::SetMinTimerInterval".
Understanding Profiler Results from about:profiler
When you visit the URL about:profiler, you'll get a set of lines which might begin with the following (entry and select boxes don't seem to appear in this documentation):
Before describing the top line, it is significant to clarify what each of the columns presented in any table are. The columns are:
Top Line Elements
The first line provides pull down boxes to organize and filter what appears in the table(s). There are currently 4 parts on this headline, such as the following (pulldown and text entry boxes do not appear in this documentation :-( ):
. The parts are:
Each table consists of 4 groups of rows. In addition, if multiple tables are presented, after a "Group by" pulldown is selected, there may be a title for the table. The title for table indicates what rows have been selected for presentation in the table. For example, a selection to Group by "Exec thread" was selected, then one possible title is "Exec thread = CrBrowserMain".
The following are the 4 groups of rows in each table (with sample lines taken from the table shown above):
1194 rows (1184 hidden) Show none Show more Show all
Final Link: Reset Tracking data
At the very bottom of the about:profile page is a link for Reseting Tracking data. This link sulfaces a useful hack until snapshot functionality can provide the maximum value between two snapshots. The link can be used to force all data entries in all rows of the browser to zero. The next snapshot (or page reload) will show all new data, an dwill, as a result, show the max value since the last reset. The reset is done using various hackery, and minor miscounts are possible. As a result, this is really a hack. Once the snapshot facility better supports max, this link will be removed from the UI.
I'd like to see just the tasks that appear between two distinct points in time. How do I avoid seeing all the other data?
On the top line, select the Snapshot link. When that is selected you will see the data and time of each data collection effort. Instead of hitting reload, you can click the snapshot button, and acquire another image of the profiler state of the browser. If you click on any two lines, then you will see the changes that have taken place between those two snapshots. When you click on only one line, then you'll see the data from startup through that point in time.
note: Max is not currently supported in the delta between two points. That feature will be supported RSN.
How much does this profiling and analysis impact performance?
There is very little measurable performance impact. If you'd like to disable the tracking, use the command line switch --enable-tracking=0
The implementation attempts to gather all this information across all threads with asymptotically zero locking, zero atomic operations, and a very small amount of memory (that does not grow after all birth locations and associated destruction threads have been observed at least once). The details of how this achieved is provided in a long comment atop the file src/base/tracked_objects.h, as well as a forth-coming video and associated slide presentation.
Basic Debugging Assistance: How did *this* task get posted?
When a breakpoint is hit in a debugger on a given thread, the stack often starts with a message loop running a task, and there is very little information about what part of the codebase truly caused that task to be run. When a newbie developer is trying to follow the flow of events in Chromium, it is often difficult to understand the connection between (for example) the IO Thread, the File Thread, and the Main UI Thread. The infrastructure for task tracking provides basic information to see through these posted task transitions.
Assuming you've reached a breakpoint of interest, but the top of the call stack shows a message loop, and the calling of a Run() method, then you are perfectly set up to see the source of any task. Most debuggers will traverse to the following base class of interest seamlessly, but here is the more complete description of where to look to find the answer (a specific file and line number).
Please address comments and suggestions to Jim Roskind a.k.a., jar at chromium.org.