For Developers‎ > ‎Tree Sheriffs‎ > ‎

Sheriffing Bug Queues

tl;dr Sheriffing bug queues contain the issues a sheriff should investigate. The goal of the Sheriffs should be to keep the tree green and make sure that change lists can land through the commit queue reliably.

Sheriffing bug queues are shown in sheriff-o-matic and correspond to crbug.com/?q=label:Sheriff-*. The Sheriff-Chromium label is shown at https://sheriff-o-matic.appspot.com/chromium when there are bugs in the queue. You can use this queue to track work as a sheriff (including communicating unfinished tasks to the next sheriff) and other folks can file bugs in this queue to put them on the sheriff's radar.

The sheriff's goal should be to end each day with 0 bugs in the queue. With the exception of bugs intentionally being passed on to the next sheriff, bugs should not be left long-standing for days. If a bug is already triaged by a Sheriff and an owner is assigned, it is ok to remove the Sheriff label.

For any issues that are infra failures, the Infra-Troopers label should be added together with a priority representing the urgency of the issue (see goto.google.com/bug-a-trooper).

Triaging Auto-filed flakiness bugs

Bugs for flaky tests are filed when the system detects a test/step that failed and passed in two different tryjob runs for the same patchset. If the developer has introduced a bug and fixed it in a follow-up patchset, it would not be treated as a flake. A sheriff is not required to fix the flaky tests, but needs to act on them as soon as possible.

If it's an infra failure:
  1. Apply the Infra-Trooper label and remove the Sheriff-Chromium label.
  2. Not the Chromium sheriff's problem anymore. No need to followup.
If it's a different sheriff's problem (e.g. it's a memory bot):
  1. Assign to that sheriff, apply the appropriate Sheriff-* label, remove the Sheriff-Chromium label.
  2. Not the Chromium sheriff's problem anymore. No need to followup.
It's a test flake:
  1. Try to find the patch that caused the flake. It should be recent (e.g. last day or two) in all likelihood.
  2. If successful with finding that patch, revert the patch. This is especially true if the flake is from a new test introduced in that patch. If the culprit CL is more than a couple of hours old, it is probably a good idea to run try jobs before landing the revert, just in case more recent patches already depend on the culprit CL.
  3. Close the bug.
It's a test flake and you can't find the culprit patch within 30 minutes:
  1. Disable the test (how?).
  2. Find appropriate devs (e.g. as per git blame).
  3. Assign the bug to one of them and CC the others.
  4. Remove the Sheriff-Chromium label to remove the bug from the sheriff queue.
  5. Not the Chromium sheriff's problem anymore. No need to followup.
Flaky tests should be dealt with as soon as possible. It's not ok to ignore them, or to disable them and not file a bug, or to file a bug without finding an owner for it.

Issues that have not been updated for 3 days, will be returned back to the Sheriff queue. Even if the test has been disabled and no further flakes are reported, the sheriffs should still find an owner to fix the flaky test and enable it again. If the owner decides that the test is not needed, please ask them to remove the test from the codebase and mark the issue as WontFix.

Known issues

The display in sheriff-o-matic has a delay. So changes you make to the bugs will only show up after the backend has cycled (~1-2 minutes).

If you have a suggestion or find something broken, please file a bug.