This page has details to help Chromium sheriffs. For more information and procedures, see Tree Sheriffs and read the theses on Sheriffing.
For passing the torch, you can also leave notes here.
What to watch
- Failures-only waterfall. It will show you only the bots a sheriff would need to look at.
(A builder is considered failing if the last finished build was not successful, a step in the current build(s) failed, or if the builder is offline.)
- Console view to make sure we are not too much behind in the testing.
- Some sheriffs don't look at the waterfall at all, instead the open this console and choose [merge] at the bottom.
- LKGR status. Make sure it moves forward relatively often, as other trees depend on it.
- IRC #chromium on freenode.
- Be available on IM.
- Do not ignore the Reliability tester. It's very important for Chromium stability.
- Do not ignore ChromeOS bots. These bots build and run Chrome for ChromeOS on Linux and ChromiumOS respectively and are as important as win/mac/linux bots. If you're not sure how to fix an issue, feel free to contact ChromiumOS sheriffs.
- Do not ignore ASan bots. This is called "Memory waterfall" but is nevertheless required to be watched by the regular sheriffs. Bugs reported by ASan usually cause memory corruptions in the wild, so do not hesitate to revert or disable the failing test (ASan does not support suppressions).
- It's up to the main sheriffs to keep an eye on the Official waterfall.
When to close the tree
All bots that you need to watch are "tree closers". If any of them fails, the tree will be automatically closed by the gatekeeper (it will become red, and you'll see a "tree closed" message from trungl-bot in IRC). Then, you need to act:
- A test went red: Tree maybe closed
- If the cause is obvious (the FooShouldWork test broke, and someone just checked in changes to foo_utils.cc), the tree can stay open. Revert the change, sending the review to the person.
- If the cause isn't obvious, close the tree. Ask everyone on the blamelist to help track it down and revert the patch as soon as found.
- A test occasionally goes red: Tree open
- This is a flaky test. If the change that made it flaky is obvious, revert the change.
- If the change isn't obvious (or the test is new), keep the tree open but disable the test and file a bug. See below for details.
- webkit_tests went red or got new regressions: Tree maybe closed
- Layout tests are just like other kinds of tests, except that sometimes we file and mark their new failures rather than fixing them right away. See below for details.
- One category of bot fails to build or has a swarm of test failures: Tree closed
- If all the debug, release, Vista, XP, etc. builds go red, act as with a single test going red.
- One bot went red: Tree open
- If only one buildbot is having problems (can't update, can't compile, exploding in some other way), the tree can stay open while it's fixed. We have reasonable redundant coverage now. Ask a trooper for help.
- An update failed: Tree maybe closed
- Try again from the internal waterfall. Ask for the url to colleague. If it keeps failing or gives a worrisome error, contact a trooper
- "extract build" is orange, or fails once: Tree open
- Orange "extract build" means it's using the latest built revision and not the one it's supposed to. If it does not work the second time, contact a trooper.
- A slave is hung at a step: Tree maybe closed
- If a slave hangs, sometimes just cancelling the build may not work. In that case call a trooper.
- Small insects crawling on stems and leaves seem to be eating sap: Tree infested
- The tree probably has aphids. Release ladybugs nearby to eat them.
$ cd $TMP_DIR; drover --revert 12345
$ git checkout trunk; git pull; gclient sync
$ git svn find-rev r12345 # -> a git hash
$ git checkout -b revert_foo trunk
$ cd $SRC # a gcl/svn repo
- Unless this is Incredibuild flakiness, REVERT.
- If this is Incredibuild flakiness, just force a clobber.
- Waiting for a fix it not a good idea. Just revert until it compiles again.
- If it's not clear why compile failed, contact a trooper.
Handling failing perf expectations (like the sizes step)
When a step turns red because perf expectations weren't met, use the instructions on the perf sheriffs page to give you information on how to handle it. It can also help to get in touch with the developer that landed the change along with the current perf sheriff to decide how to proceed.
Coordinating WebKit breakages / fixes
Tips and Tricks
How to read the tree status at the top of the waterfall
The memory sheriff helps with tending the Memory FYI tree, and the webkit sheriff helps out with the Webkit bots.
- Chromium / Webkit / Modules rows contain all the bots on the main waterfall.
- Official and Memory bots are on separate waterfalls, but the view at the top show their status.
Checking whether a test is flaky
Merging the console view
If you want to know when revisions have been tested together, open the console view and click the "merge" link at the bottom.
- Open a GChat session with your fellow sheriffs. This is useful for coordinating outside of IRC. (e.g. lunch breaks, who will pursue what, etc)
- Open a shared GDoc and use it to track open issues. For example, if a test starts flaking, drop in the dashboard links. Take notes about your discoveries, CLs, crbugs, owners, etc. If anything outlasts your shift, put it in the Sheriff Log.
NOTE: If your shift spans a weekend, you aren't expected to sheriff on the weekend (you do have to sheriff on the other days, e.g. Friday and Monday). The same applies for holidays in your office.