Tour of the Chromium Buildbot Waterfall

Chromium uses buildbot to run continuous builds and tests. The most useful is the Chromium Continuous master. From the top section you'll see other masters linked. Each can display a waterfall and console view. The Chromium Buildbot waterfall shows the status of many tests. That waterfall shows a lot of information that can be hard to understand at first, so let's take a quick tour. 
The buildbot master process is watching the Subversion and git repository, telling slaves to start building and testing new revisions, and serves the "waterfall" page that shows all the results. (For network reasons, Chromium actually has a separate machine showing the waterfall, delayed by up to two minutes. Usually this doesn't matter, but every once in a while it can lead to confusion.). Every time a new revision is discovered, the master triggers builders as each serve a different purpose. "Testing" builders are triggered when a "building" builder doing compilation archives its build. For example, all "XP Tests (dbg)(N)" and "Vista Tests (dbg)(N)" are triggered when "Chromium Builder (dbg)" is done archiving its build.

Slaves 

Let's take a look at the waterfall page. For now, skip over the header (green, red, yellow, etc.) and the box at the top with lots of links and horizontal stripes (we'll come back to them later), and look down at the row of grey boxes with "changes" at the left. Those boxes show one column per build slave.

The main waterfall page shows only the builders, machines that build the executables and then distribute them to numerous machines to run various kinds of tests.



"Memory" slaves run tests using ASAN to check memory correctness, "Perf" slaves are running performance tests, and "GPU" slaves run tests related to GPU usage. "Tree closers" automatically close the tree (preventing people from committing new changes) when they fail; "FYI" testers are considered less important, so failures there shouldn't close the tree.

Click on one of the platform names next to a row of colored boxes in the big grey box at the top, and then on "waterfall" at the left, to see the testing slaves working on that platform, for example Windows. The set of slaves changes pretty often, so any list we could put here would go out of date quickly, but the names are generally pretty good at describing what kinds of machines they are and what kinds of tests they're running. A "dbg" slave is running a Debug build rather than a Release build. Some machines both build and test, but "builder" machines only build the executables, then upload them to "tests" machines to run the tests.





Build Steps 

Now look down one of the columns, starting at the grey box with the slave's name. Each box shows one step in that slave's build/test sequence, with the oldest one at the bottom and the current or latest one just below the name box. Just above the name box is the slave's current activity, and above that is the final result of the last build/test sequence that finished. 

A yellow build step is still in progress, green finished successfully, red finished with errors, orange finished with warnings, and purple had an internal Buildbot error. 

Step nameScript DescriptionWhen is it orange?When it is red...
svnkilltaskkillKill any leftover svn processes before starting a new test cycle.N/AN/A; contact trooper
update scriptsgclient syncUpdate the internal buildbot scripts on the slave.N/Asvn server failure; contact trooper
updategclient sync Update the checkout with gclient. This also runs the hooks like gyp to generate the make/vcproj/xcodeproj files.N/Asvn server failure or DEPS breakage; contact trooper
taskkillkill_processes.pyKill a bunch of other possible leftover processes (test_shell.exe, ui_tests.exe, etc.) that would interfere with a clean run.N/AN/A; contact trooper
gclient_revertgclient_safe_revert.pyRevert any changes if a checkout exists.N/Asvn server failure or broken checkout; contact trooper
cleanup_tempcleanup_temp.pyRemove any leftover temp files.N/AN/A; contact trooper
check depscheckdeps.py Ensure that source dependencies stay clean. It's done by parsing .cc and .h files according to rules to DEPS files.N/AA bad change; revert
compilecompile.pyBuild the executablesN/AA bad change; revert
archive buildarchive_build.pySave the executables and symbols into "snapshots"N/AN/A
extract buildextract_build.pyExtract an archive build on a tester from the corresponding "builder"Failed to fetch the url requested. The last archived build is used instead.Failed to fetch any build; the slave probably needs to be restarted, contact a trooper
(various tests)runtest.py, debuger_unittests.py, chrome_tests.py, etcSee testing informationOnly FLAKY_ tests failedTests failed; revert.
layout testswebkit_test.pyRun html based tests from webkitTests marked as FAIL passed, no test unexpectedly failed. See test_expectations in layout test doc.Unexpected layout test failure. It's usually related to a Webkit roll.
BVT testswait_for_bvt.sh and othersRun tests on actual ChromiumOS hardwareN/ATests failed or machine broke.
Reliability testsreliability_tests.pyRun distributed tests to find non-deterministic crashes. It is green when only "known crashers" happensFails to grab the summary of the test runs for the expected build.New stack traces appeared in crashes. 

If you click on the "stdio" link for a step, you can see the exact command that was run and the environment it was run in in blue, and any output it produced in black. stderr is in red.

Most of the tests are pretty straightforward, but performance test output can be complicated at times. See the Guide to Perf Test Plots for more about that. 

Builders vs Testers

A tester doesn't compile, so you can't clobber it. It simply extracts its build from a builder. Testers are triggered when the corresponding builder finishes.

Changes

Each time someone commits a change to the svn repository, the build master tries to get the slaves to start their build and test sequences. (If they're already busy, the change is queued, which means that more than one change is included in a single run if they're coming in faster than the slaves can test.) The "changes" column at the left of the waterfall shows who committed a patch and when. If you hover over that link, you'll see a summary of the change; click on it to see a little more information. The times at the very left are in Pacific time. 

At the start of each run (that is, at the bottom of each series of steps for a slave), there's a yellow box holding the build number. Clicking on that build-number link shows more information about the run, including in principle the "blamelist" of changes that went into it. Every now and then, though, that list of changes is off by one. If you need to know for sure, look at the results of the "update" step to see exactly what gclient sync pulled in. 

Tree state

The "tree" is the sum of the various source repositories used to build the project, being Chromium, ChromiumOS, NativeClient, etc. In Chromium case, it's chrome/src/ plus everything listed in its DEPS file, and a bit more for Google Chrome like trademarked graphics. The tree can be "open", "closed" or "throttled". The normal state is open. When tests break, the tree is closed by putting the word "closed" in the tree status; PRESUBMIT.py checks the status and will block commits, and the build sheriff will act to fix the tree. When the tree is throttled, commits are only allowed with specific permission from the build sheriff, generally because the sheriff wants to make sure the tree is stable before opening it up to unlimited commits.

Seeing More and Seeing Less

Tucked away down at the very bottom of the waterfall page is a set of small links, of which two are particularly useful. The [next page] link takes you to the next screenful of the waterfall, which is backward in time to earlier builds. The [help] link, among other things, shows you a list of all the slaves attached to this master. You can choose which ones you'd like to see, then bookmark the resulting URL so you can get that view easily next time. 

Banner and Box

Now back to the top of the waterfall. At the very top is a banner showing the current state of the tree. If the tree is closed because of build or test failures, it should be mentioned here. If there's an announcement about a new build process, expected downtime, or some other aspect of Chromium development, it'll generally be shown here, too. 

Below that is an oval box. It has a number of handy links at the left -- try them to see where they go. It also has four rows of colored boxes, which show the pass/fail status of the last completed runs for several categories of slaves. If you click on one of those category names, you'll go to a partial waterfall view that shows only the related slaves. Hovering over a colored box shows you the name of the slave it's summarizing, and clicking on the box will go to a waterfall view with only that one slave. 

Sheriffs 

The last thing in the oval is a list of this week's build sheriffs. Although every developer is responsible for running tests before committing patches and watching the tree for problems afterward, the sheriffs have overall responsibility in case someone else is away or not paying attention. 

Sources

All the source for Chromium's buildbot setup is found in our Subversion repository. See Getting the Buildbot Source if you'd like to take a look or help develop it.

Builder setup

For some builders, there exist no trybots. In order to debug compile failures, you need to setup a similar compile environment locally:

Other views

The console view is the default one. Marc-Antoine is also looking for people willing to use the json interface to create a cool live interface in Javascript. Contact him for more details.

Glossary

It's important to use the right words so here are the official definitions:
  • buildbot master: the process that observes the source code repository, triggers builds on the slaves and serves the waterfall and console web pages.
  • buildbot builder: a column in the waterfall or console view, doing a series of build steps. The build steps involve compiling and/or running tests.
  • buildbot slave: an actual machine (often a VM) connected to a builder. In the case of the try server, multiple slaves are connected to one builder. In general there is a 1:1 mapping between a slave and a builder so they can usually be used interchangeably.
  • build step: a shell invocation like compile or update sources. Each builder has a determined series of build steps that are executed on the slave.
  • "a tester": a buildbot builder that only runs tests, it gets its binaries from the extract step.
  • "a builder": a builder has a double-meaning, it can be a buildbot builder but also only buildbot builders that run the compile step.
  • "an incremental builder": a builder that does incremental compiles.
  • "a full builder": a builder that does full compiles, it does a clobber on every compile (deleting all build products), starting fresh.
  • clobber: the act of doing a full builder, versus an incremental build. E.g.: rm -rf out; make
Comments