For Developers‎ > ‎Tree Sheriffs‎ > ‎

Sheriff details: NaCl


What to watch

  • Failures only waterfall
  • Console view, to make sure the bots aren't not too far behind (try the "collapse" and "merge" links to get a more compact overall view that shows you what's been tested together)
  • #Chromium IRC on freenode (you're the point person for Chromium-NaCl build failures on the Chrome tree)
  • Be available on IM
  • Periodically check the waterfalls that aren't on the main page: (linked to in the upper left of the main waterfall page)

What to ignore 

    These are current known issues and flakiness that are being investigated and do not require a closed tree. If you are reading this and know of something that is no longer current, please update this list.

    As of 2011/10/03, these are the issues that can be ignored:
    • NaCl waterfall
      • Mac 10.5 chrome_browser_tests failures when the browser crashes (bradchen investigating, may clear up on next DEPS roll http://codereview.chromium.org/8071003/):
        [SERVER_ERROR] Browser process ended during test (return code -10)
      • flakes on ARM wrt untrusted_crash_test  ( http://code.google.com/p/nativeclient/issues/detail?id=2142  )
    • Chrome Integration
    • NaCl ports
      • NONE
    • Toollchain bots
      • arm trusted TC ( http://code.google.com/p/nativeclient/issues/detail?id=2248 )
    • ppapi integration
      • NONE (useful for detecting when a Chrome/PPAPI proxy change violates our compiler warning settings)


    When to close the tree

    Whether you close the tree or not, be sure to update the status message with the relevant details about who's whether you expect the bot to cycle green or who's investigating a problem.
    • A test went red: Tree maybe closed
      • If the cause is obvious (the FooShouldWork test broke, and someone just checked in changes to foo_utils.cc), the tree can stay open. Revert the change, sending the review to the person. 
      • If the cause isn't obvious, close the tree. Ask everyone on the blamelist to help track it down and revert the patch as soon as found.
    • A test occasionally goes red:  Tree open
      • This is a flaky test. If the change is obvious, revert the change.
      • If the change isn't obvious, disable the test and file a bug. See below for details
    • One category of bot fails to build or has a swarm of test failures: Tree closed
    • One bot went red: Tree open
      • If only one buildbot is having problems (can't update, can't compile, exploding in some other way), the tree can stay open while it's fixed. We have reasonable redundant coverage now. Ask a trooper for help.
    • A slave is hung at a step: Tree maybe closed
      • If a slave hangs, sometimes just cancelling the build may not work. In that case call a trooper.
    • Bot warns about performance improvement: Tree open
      • If it is obvious why, update the nacl_perf_expectations.json to set a new bound.
    • Bot warns about performance regression: Tree maybe closed
      • Check the [results] link which is right next to the PERF_REGRESS message. See how big of a spike in timings this is. See if it is a flake.
      • If it is not a flake:
      • If it is obvious why and unacceptable revert the change. Consult w/ the CL author.
      • If it is obvious why and considered acceptable (e.g., for security reasons), update nacl_perf_expectations.json to set a new bound. Consult w/ the CL author.
    • One of the bots on one of the non-main-page waterfalls is broken: Tree maybe closed
      • Toolchain: TBD
      • Chrome Integration: TBD
      • PPAPI Integration: TBD
      • SDK: TBD
      • NaCl Ports: TBD
    • TBD: fill in more NaCl-specific issues

    Common Tasks

    Reverting Changes

    • RECOMMENDED: Use drover:
      • http://dev.chromium.org/developers/how-tos/drover
      • Setup a directory with drover.properties containing:
      • BASE_URL = "svn://svn.chromium.org/native_client"
        TRUNK_URL = "svn://svn.chromium.org/native_client/trunk"
        BRANCH_URL = BASE_URL + "/branches/$branch"
        SKIP_CHECK_WORKING = True
        FILE_PATTERN = r"[ ]+([MADUC])[ ]+/((?:trunk|branches/\d+)(.*)/(.*))"
        Then run:
        drover --revert CLNUM

    • Other methods:
      • The easiest way: (from a git checkout) git revert <hash> && git cl dcommit --tbr --bypass-hooks
      • The hard way: (from a checkout) gclient sync ; svn merge -c -1234 . ; gcl change XXXX ; gcl commit XXXX

    Compile Failure

    • REVERT
    • Waiting for a fix it not a good idea. Just revert until it compiles again.
    • If it's not clear why compile failed, contact a trooper.
    • NOTE:  For a nacl toolchain, revert persistent problems by reverting the DEPS, for flakiness file a bug.

    Handling failing & flaky tests

    In recent experience most flakeyness is due to flakey bot infrastructure, not flakey tests. If you see that a failure on a bot is not repeatable across consecutive runs or failures on multiple bots, try forcing a build for that bot to see if the failure is repeatable.

    If no recent commit is the source of the problem then proceed with the steps below.
    • Head over to code.google.com/p/nativeclient/issues and file a bug indicating the failure and that the test has been disabled.  Make sure to include sample output from the test (since the buildbots don't keep the data forever). Make sure to assign an owner, usually whoever modified the test last from a svn blame.
    • Disable the test.  (unfortunately, we don't have a flaky test mechanism like chromium)
    • In the change description add the line "BUG=xyz" where xyz is the bug number you filed
    • Ping the owner directly.  If you have time and relevant knowledge, assist the owner in tracking down the flake.
    Use the issue tracker to track progress on resolving flakeyness: http://code.google.com/p/nativeclient/issues/list?can=2&q=label%3AFlakeyBots

    Updating the DEPS

    DEPS files should be updated:
    • NaCl's revision in Chrome.  It is not essential for the sheriff to do this -- see below.
    • Chrome's revision in NaCl.  Since this is only used for testing, it is relatively low priority.
    • NaCl toolchain (to a rev after the chrome revision has been updated - this means you should wait for the toolchain to build after the previous DEPS update)
    • NaCl's and the toolchain's revision in the NaCl SDK (could be done before or in parallel with updating NaCl's revision in Chrome)
    Note:  The ordering of these updates used to be significant, but that is no longer the case, now that the NaCl toolchain no longer contains libraries (such as libppruntime) that need to be synchronized with the NaCl plugin.

    Updating Chrome's revision in NaCl

    • Check out NaCl
    • Find the appropriate revision of Chrome that have prebuilt binaries for every platform by using:
      • nacl/native_client$ python build/find_chrome_revisions.py
    • Update "chrome_rev" in native_client/DEPS.
      • It is not necessary for Chrome's revision to exactly match the Chrome revision you submitted which updated the Native Client revision in Chrome's src/DEPS (below). Any recent, good Chromium build is probably ok as long as the revision number is greater than or equal to the revision where you updated src/DEPS last.
      • Check that this file is available at your rev:  http://commondatastorage.googleapis.com/chromium-browser-continuous/Win/####/chrome-win32.zip (where #### is the chrome_rev)
    • Upload the change and verify that the change doesn't break the build on NaCl's trybots.
      • If the build breaks, you'll need to fix things, usually in the same CL as the native_client/DEPS file change
      • Some things are very tricky because they live in the Chrome tree
        • If the file lives in a directory other than native_client/src/... or native_client/tests/... then it probably needs to be fixed in a different repository, usually in Chrome
        • Chrome is compiled with "normal" compilation flags while Native Client is compiled with the -pedantic compilation flag. This can lead to some very weird errors. For example, type conversion between int32 and int32_t can be done implicitly in Chrome but must be explicitly cast in Native Client
        • Therefore, some fixes might require:
          • Change a type or add a typecast in Chrome
          • Submit a Chrome CL
          • Wait until all the bots have finished building the Chrome CL. (Really -- they all need to be finished)
          • Use the new Chrome CL (or something past it) as the new Chrome version in native_client/DEPS
          • Start over
      • Changes that affect the toolchain are also very hard
        • You cannot do complete builds at your desk because you will continue to get the current toolchain instead of your new toolchain. The trybots will do a partial SDK build to pick up your toolchain changes, so their results should be representative of the state of your change
        • Once you submit a change that affects the toolchain, all of the non-toolchain bots will go red because they rely on the new toolchain but it is not built yet
          • Once the toolchain bots are complete, you must restart builds on all the other bots. They should go green since the new toolchain is now built and available
    • Get the CL reviewed. Bradley Nelson, Noel Allen, Nick Bray or David Sehr are good candidates
    • Submit the CL
    • Verify that the CL does not break the build on NaCl's build bots.
      • If it does, revert the change (see above for details). Reopen the tree if necessary.


    Updating the NaCl perf expectations

    This should only happen if there are performance improvements or regressions (e.g., PERF_REGRESS messages).
    See native_client/tools/nacl_perf_expectations/README. The basic steps are
    1. Update the nacl_perf_expectations.json file w/ a new range of revisions [revA, revB] reflecting the new expected performance.
    2. Run the tool to automatically grab improve/regress bounds. This will update the .json file.
    3. Get a CL to check in the new .json file.

      Updating the NaCl toolchain

      • Check out NaCl
      • Find a valid toolchain using the script specifying the highest rev number to start searching at:
        • nacl/native_client$ python build/find_toolchain_revisions.py -s <Starting_Rev_Number>
      • While not recommended, you can find toolchains manually by checking the toolchain bot output and looking at the archive directories such as:
      • Update "arm_toolchain_version" and "x86_toolchain_version" lines in the native_client/DEPS. 
        • If all toolchain files are in place, you may be able to set arm and x86 versions to the same number. Otherwise, you might have to set them to slightly different versions.
      • Get the CL reviewed.  Bradley Nelson, Noel Allen, and David Sehr are good candidates. Make sure to include someone from the Moscow office, such as Victor Khimenko as a reviewer too -- they can help point out cases where you are trying to move to a toolchain that is not good.
      • Make sure to do a "gcl try" and verify that the CL does not break the build on NaCl's trybots.
      • If you have an LGTM and green trybots, then submit the CL.
      • Verify that the CL does not break the build on NaCl's build bots.
        • If it does, revert the change (see above for details). Reopen the tree if necessary.

      Updating NaCl's revision in Chrome

      Note that it is no longer necessary for the NaCl sheriff to do this.  mseaborn@chromium.org has a script which automates this (nacl_deps_bump.py in https://github.com/mseaborn/nacl-dev-tools) which gets run every week day.
      • Take a look at the NaCl integration bots. If any of them is failing, updating NaCl's revision in Chrome will probably break Chrome's build. Proceed only if you know what's causing the failure and why it will not happen in Chrome. (As of May 16, this is not strictly true, as NaCl tests are mostly turned off. If your try succeeds, you will likely be able to update DEPS.)
      • Checkout and build Chrome: (WindowsMac OS X, and Linux)
      • Update Native Client's revision (nacl_revision) and the tools version (nacl_tools_revision) in src/DEPS
        • nacl_tools_revision should match tools_rev from native_client/DEPS, not  the toolchain version
      • Run gclient runhooks to get the new hashes for nacl_irt_hash* entries in src/DEPS. Update those entries. Run gclient runhooks again to verify the hashes match. If the python script fails, it's likely because the IRT files do not exist. It sometimes takes a while for them to get uploaded.
      • Upload the change and verify that the change doesn't break the build on the Chromium trybots.
        • (Optional - until Chrome gets an x64 trybot) Get access to a x64 Linux machine and run the build locally. nacl-linux1 machine is available for this purpose and has all the required packages installed. Be sure to set the GYP_DEFINES environment variable to 'target_arch=x64' before running gyp to get the right project settings (gyp generates a 32-bit project by default). Mysterious build failures can occur (e.g., a 32-bit library was left around and not rebuilt), and a hammer -c is needed to clear things up
        • On Linux, the incremental build is overly aggressive and stale build products can cause your compile to fail when it shouldn't. If you suspect this is the case, submit your try with a clobber (gcl try foo -c). If that succeeds, you will probably have to do the same on the Chrome Bots when you commit. Just let the Chrome sheriffs know you are doing this. There are about 7 Linux builders that will go red. Click on the top Bot tab and force a rebuild with a clobber.
        • If you get a seemingly unrelated failure on a Chrome test, there's a good chance it's flaky. Ask someone on the Chrome team.
      • Get the CL reviewed. Bradley Nelson, Noel Allen, David Sehr are good candidates
      • Make sure you're on #chromium on IRC in case there are failures
      • Submit the CL  (if you check the "commit" box on the review site, it will run a try job and commit it for you if the try job passes.  This is handy, particularly if the Chrome tree is red.)
      • Verify that the CL does not break the build on Chrome's build bot.
        • If the build breaks, revert the change (using drover script from depot_tools is recommended). Tell the IRC channel.  Work with the sheriff to reopen the tree.

      Updating the NaCl and Toolchain revision in the NaCl SDK

      • Check out a writable version of the NaCl SDK source code.  If you don't have write access, either request access from the owners, or coordinate with an SDK team member for submitting the change.
      • In src/DEPS, update native_client_version to the same version that you used in updating Chrome.  Also, update x86_toolchain_version to the same version as what went into the NaCl DEPS file.
      • Submit a try job with the change (i.e., 'gcl try' or 'git try') and make sure the try succeeds.
      • Get the CL reviewed (good candidates: dspringer, mball, mlinck) and check it in, or forward the patch to an SDK member for checking in.
      Comments