Triage Best Practices

  • Roles
    • TPMs/ Product/ QA - Should be there to classify bugs into buckets, and in the case of P0/ReleaseBlock issues find owners
      • TPMs/Product may step in from time to time to assist w/ triage when help is needed (e.g. setting up meetings, sitting in on triage, etc...).
    • Engineers - Should own triage for a given label, and ensure that issues are reviewed in a timely manner, per the guidelines below.
    • Managers - Managers should own responsibility for the Engineers, and hold them accountable (through OKRs), to ensure that issues are triaged in a timely manner.
      • TPMs will assist w/ weekly activity reports.
  • Priorities
    • P0 - *EMERGENCY* - Drop everything else and work on this issue to conclusion.  This issue needs immediate resolution and attention..
    • P1 - Work on these issues first, assuming no P0s are on your plate.  All regressions, including top crashers, should be considered P1.  These issues should ideally be completed in the span of the current milestone.  Generally speaking these issues are highly important to users (e.g. breaks usage of a feature, causes stability problems, etc...)
    • P2 - Semi-critical work, could span multiple releases.  This is work that would be nice to try and fit into this release, but can safely be punted if the deadline passes.
    • P3 - Non-critical work or work that will never gate a release.
  • 30 Second Triage Checklist
    • Load up your base query (bookmark it):
      • internals:network -feature:spdy,preload -label:spdy
    • Step 0: Check label:releaseblock issues (Regularly if possible)
      • append ‘label:releaseblock’ to base
      • sort doesn’t matter, all should be reviewed
      • Objective
        • Make sure these issues are assigned and being worked on w/ urgency.
    • Step 1: Check status:untriaged issues (Target: 25-50% of time)
      • append ‘status:untriaged’ to base query
      • sort down (i.e. process request LIFO)
        • Newest issues typically have the most impact
      • Objective:
        • HIT REGRESSIONS HARD (tackle them first)
        • Assign owners and milestones to these bugs.
        • Any nice to have bugs should simply be classified (e.g. Feature-Foo) and marked as available, adding any appropriate folks on cc
    • Step 2: Check mstone:<current>, <next> (Target: 25% of time)
      • append ‘mstone:N,N+1’ to base (e.g. ‘mstone:13,14’)
      • Objective:
        • Make sure all issues in the current mstone should be assigned
        • Punt issues that will not get completed
    • Step 3: Check available (Target: 25% of time/ Time permitting)
      • append ‘status:available’
      • sort up (start w/ the oldest)
      • Objective:
        • If people aren’t fully loaded, help them out
        • If a bug is out of date/ close it
        • If a bug needs more info to be actionable, find out how to get that info
  • Best Practices
    • Have Auto-cc rules setup for your Feature-Label group so that engineers are pinged as soon as an issue is raised/ classified
      • Default issue owners can be useful
    • Triage at a regular times, at least once a week
    • Do your best to make sure that Mstone-N and Mstone-N+1 issues have owners.  The sooner people know they own something the sooner they’ll start looking at it.
      • Ownership accuracy is less important 2 releases (i.e. 12 weeks) out
    • Even if a bug is in the future, it’s best to have a preference to assign a bug to someone, rather than mark it Available
      • It’s more likely that someone will look at it, even if the bug ultimately needs to be re-assigned.
    • Keep in mind that each release cycle is 6 weeks long, plan your work pessimistically, you can always pull stuff back into a release.
    • Feel free to break planning down into smaller chunks/ iterations (e.g. ChromeOS uses 2 week sprints). Such a practice can help w/ planning and scoping, and will be generally successful so long as the team is cognisant of the higher level release milestones.
Comments