GPU Testing

This set of pages documents the setup and operation of the GPU bots and try servers, which verify the correctness of Chrome's graphically accelerated rendering pipeline.

Overview

The GPU bots run a different set of tests than the majority of the Chromium test machines. The GPU testers are specifically focused on tests which exercise the graphics processor, and whose results are likely to vary between graphics card vendors.

Most of the tests on the GPU bots are run via the Telemetry framework. Telemetry was originally conceived as a performance testing framework, but has proven valuable for correctness testing as well. Telemetry directs the browser to perform various operations, like page navigation and test execution, from external scripts written in Python. The GPU bots launch the full Chromium browser via Telemetry for the majority of the tests. Using the full browser to execute tests, rather than custom test harnesses like content_browsertests, has yielded several advantages: better testing code that is shipped, improved reliability, and improved performance.

A subset of the tests, called "pixel tests", grab screen snapshots of the web page in order to validate Chromium's rendering architecture end-to-end. Where necessary, GPU-specific results are maintained for these tests. Some of these tests verify just a few pixels, using handwritten code, in order to use the same validation for all brands of GPUs.

The GPU bots use the Chrome infrastructure team's new recipes framework for describing what tests to execute. Compared to the legacy buildbot infrastructure, recipes make it easy to add new steps to the bots, change the bots' configuration, and run the tests locally in the same way that they are run on the bots. There is a separate page about the GPU recipe describing how it works and how to modify it.

The GPU bots use the isolated testing framework to transmit the binaries and data files from the builder to the tester. Running tests via isolates eliminates the Chromium src/ checkout on the test machines, achieving better utilization of the physical hardware containing GPUs.

The bots on the chromium.gpu.fyi waterfall are configured to always test top-of-tree ANGLE. This setup is done with a few lines of code in the GPU recipe module; search the code for "ANGLE".

These aspects of the bots are described in more detail below, and in linked pages.

Using the GPU Bots

Most Chromium developers interact with the GPU bots in two ways:
  1. Sending try jobs to them.
  2. Observing the bots on the waterfalls.
The GPU try servers only support try jobs sent from git. Subversion try jobs sent via gcl are not supported. This includes try jobs sent via "git try". You must first upload your CL to the codereview server.

git cl try

Sends your job to the default set of try servers. win_gpu, linux_gpu, and mac_gpu, and their associated triggered test jobs, are in the default set of try jobs for both Chromium and Blink CLs.

To explicitly send a try job to the GPU try servers, use:

git cl try -m tryserver.chromium.gpu -b win_gpu -b linux_gpu -b mac_gpu

(It's unfortunate that the -m argument is needed; Issue 352461 tracks inferring it.)

Alternatively, the Rietveld UI can be used to send a patch set to these try servers.

The GPU try bots explicitly support try jobs against the Chromium (src/) and Blink (src/third_party/WebKit/) trees. If you find it necessary to try patches against other sub-repositories, please file a bug with label Cr-Internals-GPU-Testing.

Running the GPU Tests Locally

All of the GPU tests running on the bots can be run locally from a Chromium build. However, figuring out the exact command line to use can be a little tricky. The Release GPU bots all run their tests via isolates, and in this case, the logs from a run on the bots will contain something like
run_isolated.py -H b83c4a0bce1c..... -I https://isolateserver.appspot.com
which doesn't help when trying to figure out the command line to run the test.

Many of the GPU tests are run via Telemetry. In order to run them, just build the chrome target and then invoke src/content/test/gpu/run_gpu_test.py with the appropriate argument. The tests this script can invoke is in src/content/test/gpu/gpu_tests/ . For example:

run_gpu_test.py context_lost  --browser=release
run_gpu_test.py memory_test --browser=release
run_gpu_test.py webgl_conformance --browser=release --webgl-conformance-version=1.0.2
run_gpu_test.py maps --browser=release

The Maps test requires you to authenticate to cloud storage in order to access the Web Page Reply archive containing the test. See Cloud Storage Credentials for documentation on setting this up.

Most of the remaining tests are simple executables which can be built and run from the command line:

angle_unittests
content_gl_tests
gl_tests
gles2_conform_test

You can find the isolates for the various tests in src/chrome/:


The isolates contain the full or partial command line for invoking the target. The complete command line for any test can be deduced from the contents of the isolate plus the stdio output from the test's run on the bot.

Adding New Tests to the GPU Bots

The goal of the GPU bots is to avoid regressions in Chrome's rendering stack. To that end, let's add as many tests as possible that will help catch regressions in the product. If you see a crazy bug in Chrome's rendering which would be easy to catch with a pixel test running in Chrome and hard to catch in any of the other test harnesses, please, invest the time to add a test!

As of this writing it isn't as easy as desired to add a new test to one of the Telemetry based harnesses. See http://crbug.com/352807 . Let's collectively work to address that issue. It would be great to reduce the number of steps on the GPU bots, or at least to avoid significantly increasing the number of steps on the bots. The WebGL conformance tests should probably remain a separate step, but some of the smaller Telemetry based tests (context_lost_tests, memory_test, etc.) should probably be combined into a single step.

If you are adding a new test to one of the existing tests (e.g., pixel_test), all you need to do is make sure that your new test runs correctly via isolates. See the documentation from the GPU recipe on testing your own isolates for the GYP_DEFINES and authentication needed to upload isolates to the isolate server. Most likely the new test will be Telemetry based, and included in the telemetry_gpu_test_run isolate. You can then invoke it via:

./src/tools/swarming_client/run_isolated.py -H [hash] -I https://isolateserver.appspot.com -- [test name] [test arguments]

Updating and Adding New Pixel Tests to the GPU Bots

Adding new pixel tests which require reference images is a slightly more complex process than adding other kinds of tests which can validate their own correctness. There are a few reasons for this.
  • Reference image based pixel tests require different golden images for different combinations of operating system, GPU, driver version, OS version, and occasionally other variables.
  • The reference images must be generated by the main waterfall. The try servers are not allowed to produce new reference images, only consume them. The reason for this is that a patch sent to the try servers might cause an incorrect reference image to be generated. For this reason, the main waterfall bots upload reference images to cloud storage, and the try servers download them and verify their results against them.
  • The try servers will currently fail if they run a pixel test requiring a reference image that doesn't exist in cloud storage. This is deliberate, but needs more thought; see Issue 349262.
  • Once the GPU try servers are part of the main waterfall, as a consequence, it won't be possible to CQ a patch which adds a new pixel test that requires a new reference image.
If a reference image based pixel test's result is going to change because of a change in ANGLE or Blink (for example), updating the reference images is a slightly tricky process. Here's how to do it:
  • Mark the pixel test as failing in the pixel test's test expectations
  • Commit the change to ANGLE, Blink, etc. which will change the test's results
    • Note that without the failure expectation, this commit would turn some bots red; a Blink change will turn the GPU bots on the chromium.webkit waterfall red, and an ANGLE change will turn the chromium.gpu.fyi bots red
  • Wait for Blink/ANGLE/etc. to roll
  • Commit a change incrementing the revision number associated with the test in the page set
  • Commit a second change removing the failure expectation, once all of the bots on the main waterfall have generated new reference images. This change should go through the commit queue cleanly.
When adding a brand new pixel test that uses a reference image, the steps are similar, but simpler:
  • Mark the test as failing in the same commit which introduces the new test
  • Wait for the reference images to be produced by all of the GPU bots on the waterfalls (see the cloud storage bucket)
  • Commit a change un-marking the test as failing
When making a Chromium-side change which changes the pixel tests' results:
  • In your CL, both mark the pixel test as failing in the pixel test's test expectations and increment the test's version number in the page set (see above)
  • After your CL lands, land another CL removing the failure expectations. If this second CL goes through the commit queue cleanly, you know reference images were generated properly.
In general, when adding a new pixel test, it's better to spot check a few pixels in the rendered image rather than using a reference image per platform. The GPU rasterization test is a good example of a recently added test which performs such spot checks.

The GPU bots' recipe

The GPU bots are using the Chrome infrastructure team's new "recipe" infrastructure, which makes it dramatically easier than before to make configuration changes to the bots. See the documentation on the GPU bots' recipe for details on how to add new steps to the bots and modify their existing steps.

Stamping out Flakiness

It's critically important to aggressively investigate and eliminate the root cause of any flakiness seen on the GPU bots. The bots have been known to run reliably for days at a time, and any flaky failures that are tolerated on the bots translate directly into instability of the browser experienced by customers. Critical bugs in subsystems like WebGL, affecting high-profile products like Google Maps, have escaped notice in the past because the bots were unreliable. After much re-work, the GPU bots are now among the most reliable automated test machines in the Chromium project. Let's keep them that way.

Flakiness affecting the GPU tests can come in from highly unexpected sources. Here are some examples:
  • Intermittent pixel_test failures on Linux where the captured pixels were black, caused by the Display Power Management System (DPMS) kicking in. Disabled the X server's built-in screen saver on the GPU bots in response.
  • GNOME dbus-related deadlocks causing intermittent timeouts (Issue 309093 and related bugs).
  • Windows Audio system changes causing intermittent assertion failures in the browser (Issue 310838).
  • Enabling assertion failures in the C++ standard library on Linux causing random assertion failures (Issue 328249).
  • V8 bugs causing random crashes of the Maps pixel test (V8 issue 3022, 3174).
  • TLS changes causing random browser process crashes (Issue 264406).
  • Isolated test execution flakiness caused by failures to reliably clean up temporary directories (Issue 340415).
  • The Telemetry-based WebGL conformance suite caught a bug in the memory allocator on Android not caught by any other bot (Issue 347919).
  • context_lost test failures caused by the compositor's retry logic (Issue 356453).
  • Multiple bugs in Chromium's support for lost contexts causing flakiness of the context_lost tests (Issue 365904).
  • Maps test timeouts caused by Content Security Policy changes in Blink (Issue 395914).
  • Weak pointer assertion failures in various webgl_conformance_tests caused by changes to the media pipeline (Issue 399417).
  • A change to a default WebSocket timeout in Telemetry causing intermittent failures to run all WebGL conformance tests on the Mac bots (Issue 403981).
  • Chrome leaking suspended sub-processes on Windows, apparently a preexisting race condition that suddenly showed up (Issue 424024).
If you notice flaky test failures either on the GPU waterfalls or try servers, please file bugs right away with the label Cr-Internals-GPU-Testing and include links to the failing builds and copies of the logs, since the logs expire after a few days. GPU pixel wranglers should give the highest priority to eliminating flakiness on the tree.

Swarming Transition for Tryservers

As of this writing, the GPU tryservers are being transitioned from a traditional waterfall model to Chromium's swarming infrastructure, with an expected completion date of January 2015. The benefits of this transition are manifold:
  1. The binaries will be built by the regular Chromium trybot builders, eliminating the duplication of the GPU-specific linux_gpu, mac_gpu and win_gpu configurations, and reclaiming some 90 virtual machines.
  2. The sharding of the test execution will be finer-grained. Each individual test may be run on a separate machine, leading to better parallelism and faster cycle times.
Concretely, the GPU tryserver test bots (for example, mac_gpu_triggered_tests) are being put into the swarming pool, and triggered from the regular Chromium and Blink tryservers. The tests are moving in the following ways:
The trybots analyze the incoming CLs and only build and test those targets which may be affected by that CL. Thus it might be necessary to scan through multiple builds in order to find one which has run the GPU tests. The GPU tests are those whose description contains the specific GPU on which the test was run: for example, "webgl_conformance on NVIDIA GPU on Mac Retina (with patch) on Mac-10.9".

When the transition is completed, the tryserver.chromium.gpu waterfall will be decommissioned, meaning that there will no longer be a single centralized place to examine the GPU tryservers' results. Instead, the following tools should be used to check their health:
  1. The chromium-try-flakes tool. Look for tests that run on the GPU trybots, and the names of the bots which are currently running the GPU tests. If you see a flake, file a bug like this one. This is the most targeted tool for discovering flakiness on the trybots and should be heavily relied upon.
  2. The Swarming Server Stats tool. Examine the bots whose dimensions contain a non-empty GPU. Check the activity on these bots to ensure the number of pending jobs seems reasonable according to historical levels. If you have an @google.com account, you can drill down into the individual bots' history to see the success and failure rates of tests on an individual bot.
  3. As a last resort, the individual bots like mac_chromium_rel_ng can be examined to see if there are any patterns in the failures, such as a particular trybot failing tests all of the time.

Comments