For Developers‎ > ‎Tree Sheriffs‎ > ‎

Sheriff details: NaCl


What to watch

  • Failures only waterfall
  • Console view, to make sure the bots aren't not too far behind (try the "collapse" and "merge" links to get a more compact overall view that shows you what's been tested together)
  • #Chromium IRC on freenode (you're the point person for Chromium-NaCl build failures on the Chrome tree)
  • Be available on IM
  • Periodically check the waterfalls that aren't on the main page: (linked to in the upper left of the main waterfall page)
    • Toolchain
    • Chrome Integration
    • SDK (NOTE: The 'trout' column refers to the current release branch, and is only built when someone checks-in a change to the trout branch)
    • NaCl Ports
    • Trybots (look for repeated failures due to infrastructure issues - contact a trooper if you see one)

When to close the tree

Whether you close the tree or not, be sure to update the status message with the relevant details about who's whether you expect the bot to cycle green or who's investigating a problem.
  • A test went red: Tree maybe closed
    • If the cause is obvious (the FooShouldWork test broke, and someone just checked in changes to foo_utils.cc), the tree can stay open. Revert the change, sending the review to the person. 
    • If the cause isn't obvious, close the tree. Ask everyone on the blamelist to help track it down and revert the patch as soon as found.
  • A test occasionally goes red:  Tree open
    • This is a flaky test. If the change is obvious, revert the change.
    • If the change isn't obvious, disable the test and file a bug. See below for details
  • One category of bot fails to build or has a swarm of test failures: Tree closed
  • One bot went red: Tree open
    • If only one buildbot is having problems (can't update, can't compile, exploding in some other way), the tree can stay open while it's fixed. We have reasonable redundant coverage now. Ask a trooper for help.
  • A slave is hung at a step: Tree maybe closed
    • If a slave hangs, sometimes just cancelling the build may not work. In that case call a trooper.
  • Bot warns about performance improvement: Tree open
    • If it is obvious why, update the nacl_perf_expectations.json to set a new bound.
    • See note below about README for how update nacl_perf_expectations.json.
    • Check the graph to see if it is stable.
      • To see the graph, click the [results] link which is right next to the PERF_IMPROVE message.
  • Bot warns about performance regression: Tree maybe closed
    • Check the [results] link which is right next to the PERF_REGRESS message. See how big of a spike in timings this is. See if it is a flake.
    • If it is not a flake:
    • If it is obvious why and unacceptable revert the change. Consult w/ the CL author.
    • If it is obvious why and considered acceptable (e.g., for security reasons), update nacl_perf_expectations.json to set a new bound. Consult w/ the CL author.
    • NOTE: the performance results link may occasionally get broken.  It's usually: something like http://build.chromium.org/f/client/perf/nacl-lucid_64-newlib-x86_32-pnacl-spec/spec2k/report.html?history=150
      • The link has "client" instead of "chromium"!
    • You can also check older results in the performance graph by changing "history=150" to "history=X" where X > 150 and you can set change the "rev=-1" to a different ending rev.
    • There is a newer performance dashboard linked from the "[Results Dashboard]" link. The old [results] link will be deprecated later.
      • You must be logged into google.com account to access it.
    • To look for the buildbot test log output, search for "RESULT ${TestName...}". E.g., "RESULT TestThreadCreateAndJoin" under the buildbot stdio links.
  • One of the bots on one of the non-main-page waterfalls is broken: Tree maybe closed
    • Toolchain: TBD
    • Chrome Integration: TBD
    • SDK: TBD
    • NaCl Ports: TBD
  • TBD: fill in more NaCl-specific issues

Common Tasks

Reverting Changes

  • RECOMMENDED: Use drover:
    • http://dev.chromium.org/developers/how-tos/drover
    • Setup a directory with drover.properties containing:
    • BASE_URL = "svn://svn.chromium.org/native_client"
      TRUNK_URL = "svn://svn.chromium.org/native_client/trunk"
      BRANCH_URL = BASE_URL + "/branches/$branch"
      SKIP_CHECK_WORKING = True
      FILE_PATTERN = r"[ ]+([MADUC])[ ]+/((?:trunk|branches/\d+)(.*)/(.*))"
      Then run:
      drover --revert CLNUM

  • Other methods:
    • The easiest way: (from a git checkout) git revert <hash> && git cl dcommit --tbr --bypass-hooks
    • The hard way: (from a svn checkout) gclient sync ; svn merge -c -1234 . ; gcl change XXXX ; gcl commit XXXX

Compile Failure

  • REVERT
  • Waiting for a fix it not a good idea. Just revert until it compiles again.
  • If it's not clear why compile failed, contact a trooper.
  • NOTE:  For a nacl toolchain, revert persistent problems by reverting the DEPS, for flakiness file a bug.

Handling failing & flaky tests

In recent experience most flakeyness is due to flakey bot infrastructure, not flakey tests. If you see that a failure on a bot is not repeatable across consecutive runs or failures on multiple bots, try forcing a build for that bot to see if the failure is repeatable.

If no recent commit is the source of the problem then proceed with the steps below.
  • Head over to code.google.com/p/nativeclient/issues and file a bug indicating the failure and that the test has been disabled.  Make sure to include sample output from the test (since the buildbots don't keep the data forever). Make sure to assign an owner, usually whoever modified the test last from a svn blame.
  • Disable the test.  (unfortunately, we don't have a flaky test mechanism like chromium)
  • In the change description add the line "BUG=xyz" where xyz is the bug number you filed
  • Ping the owner directly.  If you have time and relevant knowledge, assist the owner in tracking down the flake.
Use the issue tracker to track progress on resolving flakeyness: http://code.google.com/p/nativeclient/issues/list?can=2&q=label%3AFlakeyBots

Updating the DEPS

DEPS files should be updated:
  • NaCl's revision in Chrome.  It is not essential for the sheriff to do this -- see below.
  • Chrome's revision in NaCl.  Since this is only used for testing, it is relatively low priority.
  • NaCl toolchain (to a rev after the chrome revision has been updated - this means you should wait for the toolchain to build after the previous DEPS update)
  • NaCl's and the toolchain's revision in the NaCl SDK (could be done before or in parallel with updating NaCl's revision in Chrome)
Note:  The ordering of these updates used to be significant, but that is no longer the case, now that the NaCl toolchain no longer contains libraries (such as libppruntime) that need to be synchronized with the NaCl plugin.

Updating Chrome's revision in NaCl

  • Check out NaCl.
  • Find the appropriate revision of Chrome.
  • Update "chrome_rev" in native_client/DEPS.
  • Upload the change and verify that the change doesn't break the build on NaCl's trybots.
  • Get the CL reviewed. Bradley Nelson, Noel Allen, Nick Bray or David Sehr are good candidates
  • Submit the CL
  • Verify that the CL does not break the build on NaCl's build bots.
    • If it does, revert the change (see above for details). Reopen the tree if necessary.


Updating the NaCl perf expectations

This should only happen if there are performance improvements or regressions (e.g., PERF_REGRESS messages).

At this point you've decided that the regression should be accepted and you've read through the "When to close the tree" section above. All the remains is to update the test expectations. To do so, see native_client/tools/nacl_perf_expectations/README. The basic steps are:
  1. Update the nacl_perf_expectations.json file w/ a new range of revisions [revA, revB] reflecting the new expected performance.
  2. Run the tool to automatically grab improve/regress bounds. This will update the .json file.
  3. Get a CL to check in the new .json file.

    Updating the NaCl toolchain

    • Check out NaCl
    • Find a valid toolchain using the script specifying the highest rev number to start searching at:
      • nacl/native_client$ python build/find_toolchain_revisions.py -s <Starting_Rev_Number>
    • While not recommended, you can find toolchains manually by checking the toolchain bot output and looking at the archive directories such as:
    • Update "arm_toolchain_version" and "x86_toolchain_version" lines in the native_client/DEPS. 
      • If all toolchain files are in place, you may be able to set arm and x86 versions to the same number. Otherwise, you might have to set them to slightly different versions.
    • Get the CL reviewed.  Bradley Nelson, Noel Allen, and David Sehr are good candidates. Make sure to include someone from the Moscow office, such as Victor Khimenko as a reviewer too -- they can help point out cases where you are trying to move to a toolchain that is not good.
    • Make sure to do a "gcl try" and verify that the CL does not break the build on NaCl's trybots.
    • If you have an LGTM and green trybots, then submit the CL.
    • Verify that the CL does not break the build on NaCl's build bots.
      • If it does, revert the change (see above for details). Reopen the tree if necessary.

    Updating NaCl's revision in Chrome

    Note that it is no longer necessary for the NaCl sheriff to do this.  mseaborn@chromium.org has a script which automates this (nacl_deps_bump.py in https://github.com/mseaborn/nacl-dev-tools) which gets run every week day.
    • Take a look at the NaCl integration bots. If any of them is failing, updating NaCl's revision in Chrome will probably break Chrome's build. Proceed only if you know what's causing the failure and why it will not happen in Chrome. (As of May 16, this is not strictly true, as NaCl tests are mostly turned off. If your try succeeds, you will likely be able to update DEPS.)
    • Checkout and build Chrome: (WindowsMac OS X, and Linux)
    • Update Native Client's revision (nacl_revision) and the tools version (nacl_tools_revision) in src/DEPS
      • nacl_tools_revision should match tools_rev from native_client/DEPS, not  the toolchain version
    • Run gclient runhooks to get the new hashes for nacl_irt_hash* entries in src/DEPS. Update those entries. Run gclient runhooks again to verify the hashes match. If the python script fails, it's likely because the IRT files do not exist. It sometimes takes a while for them to get uploaded.
    • Upload the change and verify that the change doesn't break the build on the Chromium trybots.
      • On Linux, the incremental build is overly aggressive and stale build products can cause your compile to fail when it shouldn't. If you suspect this is the case, submit your try with a clobber (gcl try foo -c). If that succeeds, you will probably have to do the same on the Chrome Bots when you commit. Just let the Chrome sheriffs know you are doing this. There are about 7 Linux builders that will go red. Click on the top Bot tab and force a rebuild with a clobber.
      • If you get a seemingly unrelated failure on a Chrome test, there's a good chance it's flaky. Ask someone on the Chrome team.
    • Get the CL reviewed. Bradley Nelson, Noel Allen, David Sehr are good candidates
    • Make sure you're on #chromium on IRC in case there are failures
    • Submit the CL  (if you check the "commit" box on the review site, it will run a try job and commit it for you if the try job passes.  This is handy, particularly if the Chrome tree is red.)
    • Verify that the CL does not break the build on Chrome's build bot.
      • If the build breaks, revert the change (using drover script from depot_tools is recommended). Tell the IRC channel.  Work with the sheriff to reopen the tree.

    Updating the NaCl and Toolchain revision in the NaCl SDK

    • Check out a writable version of the NaCl SDK source code.  If you don't have write access, either request access from the owners, or coordinate with an SDK team member for submitting the change.
    • In src/DEPS, update native_client_version to the same version that you used in updating Chrome.  Also, update x86_toolchain_version to the same version as what went into the NaCl DEPS file.
    • Submit a try job with the change (i.e., 'gcl try' or 'git try') and make sure the try succeeds.
    • Get the CL reviewed (good candidates: dspringer, mball, mlinck) and check it in, or forward the patch to an SDK member for checking in.
    Comments