Page Cycler Tests

This page details the internals of the page cycler test system. If you are looking for information on how to update the expectations as a result of a new, acceptable performance regression, see the page regarding performance plots.

Quickstart

1. Get a Chromium source checkout and build a copy of Chromium.
2. Run Chromium as: ./chromium --enable-file-cookies --js-flags="--expose_gc"
3. Open src/tools/page_cycler/acid3/start.html in Chromium.
4. Press the Start button.

Overview at a glance

A page cycler test consists of 3 pieces: the run_measurement program, a page set, and the page cycler data you're testing (which is stored in src-internal (sorry, non-Googlers :( )).  The test program accesses the test data as a file:// web page and writes the test results to stdout.  Chrome's build and test infrastructure is responsible for running that test program with the test data.  That same infrastructure then parses the test output to produce perf charts and measure our perf against a list of perf expectations stored in-tree.  If the actual results differ significantly from the expected results, an alert is triggered on the build waterfall.

The test program

The run_measurement program is located in src/tools/perf/.  The program understands how to launch your Chrome build, navigate to a set of URLs, wait for some condition that signals the test is complete, and retrieve the test results from the current Chrome process.

Sample output

$ tools/perf/run_measurement --browser=release page_cycler tools/perf/page_sets/typical_25.json

Pages: [http___www.nick.com_games,http___www.rei.com_,...,http___www.fda.gov]
HISTOGRAM V8.MemoryExternalFragmentationTotal_by_url: http___www.nick.com_games= {"buckets":[{"count":1,"high":45,"low":44},{"count":1,"high":49,"low":48},{"count":1,"high":52,"low":51},{"count":1,"high":61,"low":60}],"count":4,"flags":1,"name":"V8.MemoryExternalFragmentationTotal","params":{"bucket_count":100,"max":101,"min":1,"type":"HISTOGRAM"},"pid":16178,"sum":203.0}
Avg V8.MemoryExternalFragmentationTotal_by_url: 50.923247percent
Sd V8.MemoryExternalFragmentationTotal_by_url: 5.898667percent
...
RESULT times_by_url: http___www.nick.com_games= [915,501,515,424,421,411,471,531,428,526] ms
Avg times_by_url: 514.300000ms
Sd  times_by_url: 148.326255ms
...
WARNING:root:Errored pages:
http://arstechnica.com/

Tips

  • You might also find the --pageset-repeat option useful if you need more or less iterations. 
  • typical_25 is a good default page set to start with because it is an up-to-date collection of 25 typical web sites. There are a lot of other suites in that directory, and the older, classic suites like moz and morejs are in tools/perf/page_sets/page_cycler/.
  • To get stable results:
    1. Disable frequency scaling / power boost in the BIOS
    2. Run nothing other than a terminal, text editor, and the chrome instance
    3. Navigate to the start page of a test suite (i.e. data/page_cycler/alexa_us/start.html), set "Iterations" to 1 and click Start (Without throwing away a first run the variance was much higher as things (font caches, etc) got loaded into the process.)
    4. Navigate the same tab back to start.html (thus keeping within the same render process), set "Iterations" to 5 and click Start
    5. Use those results
  • If you want to ignore the first (noisy) result of each iteration, you can hack tools/perf/measurements/page_cycler.py to pass discard_first_result=True (I wish we had a command line option to do that).
  • To debug a specific test run run_measurement with --interactive, that way the browser window will remain open and allow one to interact with it.

Output meanings

The Pages: output lists a url for each test page. We measure each such URL multiple times.
We have HISTOGRAM and RESULT outputs.
Many outputs have a _by_url variant that appears for each URL.
Each output with multiple samples shows an Avg (arithmetic mean) and Sd (standard deviation) for its samples.

Kind
OutputMeaning
HISTOGRAMV8.MemoryExternalFragmentationTotalTotal external memory fragmentation after each GC in percent.
HISTOGRAMV8.MemoryHeapSampleTotalCommittedThe total size of committed memory used by V8 after each GC in KB.
HISTOGRAMV8.MemoryHeapSampleTotalUsedThe total size of live memory used by V8 after each GC in KB.
RESULTtimesTime from "pressing enter" in the URL bar to the onload event?
RESULTvm_final_size_browserVirtual Memory Size (address space allocated) of browser process
RESULTvm_resident_set_size_final_size_browserResident Set Size (physically resident memory) of browser process
RESULTvm_peak_size_browserThe peak Virtual Memory Size (address space allocated) usage achieved by the browser process.
RESULTresident_set_size_peak_size_browserThe peak Resident Set Size (physically resident memory) usage achieved by the browser process.
RESULTvm_final_size_rendererVirtual Memory Size (address space allocated) of renderer process
RESULTvm_resident_set_size_final_size_rendererResident Set Size (physically resident memory) of renderer process
RESULTvm_final_size_totalVirtual Memory Size (address space allocated) of GPU process
RESULTvm_resident_set_size_final_size_totalResident Set Size (physically resident memory) of all processes
RESULTcommit_chargeSystem commit charge (commited memory pages)
RESULTprocessesNumber of processes used by chrome
RESULTread_operations_browserNumber of IO read operations by the browser process
RESULTwrite_operations_browserNumber of write IO operations by browser process
RESULTread_bytes_browserNumber of IO bytes read by the browser process
RESULTwrite_bytes_browserNumber of IO bytes written by the browser process
RESULTread_operations_rendererNumber of IO read operations by the renderer process
RESULTwrite_operations_rendererNumber of write IO operations by renderer process
RESULTread_bytes_rendererNumber of IO bytes read by the renderer process
RESULTwrite_bytes_rendererNumber of IO bytes written by the renderer process

The HISTOGRAMS are formatted as a JSON string. For example:
{
    "count": 5,
    "name": "V8.MemoryHeapSampleTotalUsed",
    "buckets": [
        {
            "count": 1,
            "low": 10281
            "high": 11702,
        },
        {
            "count": 1,
            "low": 11702
            "high": 13320,
        },
        {
            "count": 1,
            "low": 13320
            "high": 15161,
        },
        {
            "count": 1,
            "low": 15161
            "high": 17257,
        },
        {
            "count": 1,
            "low": 17257
            "high": 19642,
        }
    ],
    "pid": 3124,
    "params": {
        "max": 500000,
        "bucket_count": 50,
        "type": "HISTOGRAM",
        "min": 1000
    },
    "flags": 1,
    "sum": 109271
}

The params describe how the buckets are arranged, but I can't decipher how. (e.g. each of the visible buckets has the same ratio between high and low, ~1.138, but 1.13850==641, which isn't related to 500000.)  Also, what are the flags?

The times arrays in this run have 10 entries, but the histograms usually have fewer samples. (but occasionally more??) What's up with that?

About the previous version of the page cycler

The test data

The URL Chrome is navigated to by the page_cycler_test program is where the page cycler data comes in.  Examples of this page cycler data include:
- the database page cycler test, which measures times to run Web SQL DB operations
- the acid3 page cycler, which measures the time to run the Acid3 tests to completion

More page cyclers are run in Chromium's standard tests, but they are unavailable to the public.  These include the moz, intl1, intl2, morejs, and dhtml page cyclers.  Each of these page cyclers measure the time from when navigation started to when the onload handler runs, but from the onload handler, layout is forced to ensure that the time required to run layout is included in the total time.  Note the difference with the acid3 and database page cyclers, which observe other events besides triggering from onload and forcing the page layout.

If you run your local copy of Chrome with the flags '--enable-file-cookies --js-flags="--expose_gc"' and navigate to a page cycler's start.html page, you can run these tests on your machine.

Test data directory structure
 - page_cycler/
    - common/ : a required directory with reusable code for page cycler data sets
      - start.js : used to setup the page cycling, store cookies and load webpages
      - head.js : required for report.html to properly generate data
      - report.html : template for timing reports upon completion of tests
    - sample/ : a sample page cycler data set
      - pages.js : contains directory paths (e.g. page1, page2, ...) for each URL included in
                   the cycle
      - start.html : loads pages.js and start.js; load this file to begin this page cycle.
                     If you open start.html?auto=1&iterations=5, it will automatically begin
                     the cycle otherwise a form is presented where you can manually enter the
                     number of iterations.
      - start.js
      - page1/ : a directory represents a url to be included in the cycle, should be listed in
                 pages.js
        - index.html : URL that gets loaded by the page cycler, must include head.js 
        - ... : Other files needed to properly render index.html
      - page2/ : ...
      - page3/ : ... 
      - page4/ : ... 

Page cycler test data output

Here is an example of the final report we see if we run the acid3 page cycler in our browser directly:

Summary

iterations3
pages1
milliseconds2429
mean per set809.67
mean per page809.67
timer lag395.00
timer lag per page131.67

Complete Statistics

SiteMinMaxMeanStd.dRuns
acid3.acidtests.org85.002079.00175.0090.00207926585
totals:85.002079.00175.0090.00

The page cycler test program receives this data in a different format (using the UITest framework and the automation proxy).  The program then outputs the result data to stdio in a simple format:
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from PageCyclerTest
[ RUN      ] PageCyclerTest.Acid3File
*RESULT vm_peak_b: vm_pk_b= 13615104 bytes
*RESULT ws_peak_b: ws_pk_b= 33026048 bytes
*RESULT vm_peak_r: vm_pk_r= 67616768 bytes
*RESULT ws_peak_r: ws_pk_r= 66859008 bytes
*RESULT vm_single_peak_r: vm_spk_r= 67616768 bytes
*RESULT ws_single_peak_r: ws_spk_r= 66859008 bytes
RESULT vm_final_b: vm_f_b= 12734464 bytes
RESULT ws_final_b: ws_f_b= 29773824 bytes
RESULT vm_final_r: vm_f_r= 67166208 bytes
RESULT ws_final_r: ws_f_r= 62820352 bytes
RESULT vm_final_t: vm_f_t= 79900672 bytes
RESULT ws_final_t: ws_f_t= 92594176 bytes
RESULT processes: proc_= 2 
RESULT read_op_b: r_op_b= 23896 
RESULT write_op_b: w_op_b= 12383 
RESULT other_op_b: o_op_b= 11728 
*RESULT total_op_b: IO_op_b= 48007 
RESULT read_byte_b: r_b= 29044 kb
RESULT write_byte_b: w_b= 7802 kb
RESULT other_byte_b: o_b= 193 kb
*RESULT total_byte_b: IO_b= 37040 kb
RESULT read_op_r: r_op_r= 7518 
RESULT write_op_r: w_op_r= 22756 
RESULT other_op_r: o_op_r= 410 
*RESULT total_op_r: IO_op_r= 30684 
RESULT read_byte_r: r_r= 1967 kb
RESULT write_byte_r: w_r= 6036 kb
RESULT other_byte_r: o_r= 124 kb
*RESULT total_byte_r: IO_r= 8128 kb
RESULT commit_charge: cc= 72880 kb
Pages: [acid3.acidtests.org]
*RESULT times: t= [100,106,80,...,120,118] ms
[       OK ] PageCyclerTest.Acid3File (23047 ms)
[----------] 1 test from PageCyclerTest (23047 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (23047 ms total)
[  PASSED  ] 1 test.

The perf charts and expectations

Our build and test infrastructure runs the page cycler test program and parses the resulting output.  That infrastructure writes the perf data it finds in the output to Chrome's perf test plots.  While the test infrastructure has this data it compares the results to our perf expectations and alerts us if there is a significant regression or improvement.  More information about both the perf test plots and expectations can be found on our dev site.

Each line in the expectations file qualifies a particular metric. For each metric, there's a delta and a var expectation. The delta is the amount of the current result less the reference result. The variance is how much the delta is allowed to vary in either direction.

What type of files are recommended for test data?

In general:
- all page cyclers should use standard web page assets (html, js, css, images)
- use relative references instead of absolute references
- complex web apps testing internal apis need to ensure access is available using the file:// scheme

While writing your page cycler, ask yourself:
- What apis do you want to test that's not covered by the current page cyclers?
- Is it related to the onload event, or some other signal that you can watch for within js?

Finally, any new test data should be stored in src/tools/page_cycler, and the page cycler test program should be updated to support the new test data.

Analyzing Results

When the performance numbers regress, how do you track down where?  See Analyzing Page Cycler Results for one approach.
Comments