the Chromium logo

The Chromium Projects

Breakage and Flake Policy

While in an ideal world, the commit queue should help us keep the tree green, occasionally, it's possible that breakages can get introduced. This is no good, as it blocks everyone from being able to get their work done.

How can bad CLs land in the tree?

Bad CLs can land a number of ways:

  1. The CL was chumped.
  2. Due to a CI configuration, a builder was marked as either informational or post-submit only, and was skipped during the commit queue run of the bad CL. Note that we intentionally don't run certain builders on certain files/repositories to save on testing resources and developer time, as our CI configuration considers those files/repositories low risk to the builders.
  3. Two CLs land which introduce conflicting functionalities separately via the Parallel CQ. Note that this is actually rather rare, but has happened.
  4. A CL introduces a build or test flake to the tree. Any flake can land through the CQ with enough tries.

Reverting Bad CLs

ChromiumOS is a revert-first project.

This means, whenever it's possible to solve a breakage simply by reverting a CL or two, this is the preferred way to solve the breakage.

Why revert first instead of fixing forward? Breakages impact everyone's productivity. Generally, reverting a recently-landed CL which is known to cause a failure is so safe that the revert can be chumped, giving us many valuable hours during the time a commit queue run would have taken for a forward fix. Of course, please coordinate with the sheriff before chumping.

If you are sent a revert for a CL you wrote or reviewed, don't feel ashamed, it happens to the best of us. Instead, help the sheriff verify that reverting your change will fix the issue, and you can reland it later with the fix applied.

Making Builders/Tests Informational

If the cause of the failing builder or test cannot be narrowed down to a CL that can be reverted, the sheriffs and on-callers may have no other choice but to disable it, or mark the builder/test as informational, experimental, or non-critical.

Generally, this is done as a last-resort step, as it allows new breakages to slip in. A reverted CL would be preferred much more to this.

In most cases, we do not allow making builders or tests informational in order to land a change which introduces a breakage to that builder or test, unless it is coordinated by or closely with the owners of the build/test.

Flakes

There's a special variety of breakages called "flakes," breakages that only happen occasionally. Even flakes can cost us many developer hours.

For example, consider a hypothetical build flake which happens 1 out of every 1000 builds. This may seem harmless at first, but lets run some numbers:

Hopefully this goes to show that flakes should be treated as high priority issues, just like total breakages. Given this, similar guidelines should apply:

If the sheriff or on-call staff reaches out to you to help with solving a flake, please treat this as a high priority issue, and don't forget that a revert is always preferred over a forward fix.