To understand how failures happen, in order to prevent future occurrences by education and process changes.
A postmortem is expected for any tree closures lasting longer than 4 hours, within 72 hours of the outage.
The postmortem should be written by someone involved with detecting and correcting the issue, preferably someone who can take responsibility for the followup.
Please use the postmortem template found here (file -> make a copy).
Your postmortem should include the following sections:
- Summary of the event
- Full timeline
- Root cause(s)
- What worked and what didn't (a.k.a., lessons learned)
- Action items (followup bugs assigned to specific people)
Whenever possible, postmortems should be accessible to the entire Chromium community. If you are a Google employee, and your postmortem contains internal details, see the internal infrastructure team's postmortem site instead.
- With your chromium.org, write it in a Google Doc, set sharing permissions to “Anyone who has the link can comment”
- Add it to the list below.
- Send the link to email@example.com or firstname.lastname@example.org, as relevant.
|SPDY/QUIC Connection Pooling Bug Postmortem|
|chromium.perf tests on android userdebug builds failing for 7 days||2015-01-30|
|No data from some android bots on chromium.perf for 2 weeks||2014-12-22|
|Grit Compile Errors Require Clobber||2014-07-25|
|Swarming Postmortem: 2014-12-04||2014-12-04|
|.dbconfig files loss on master3||2014-09-19|
|Postmortem: 15 hour tree closure by a file named "about"||2014-05-09|
|Swarming Postmortem: Undeleteable directorie||2015-11-22|