Sheriff Log: Chromium OS (go/croslog)

10/24- 10/31
Sheriffs: kirtika, mka, semenzato (honorary)
    Ongoing Issues on canaries
  • SetupBoard failure, last ~10 parrot canaries failed. 
  • Provision failure with error "Devserver portfile does not exist".
  • provision failure | DUT runs out of space due to overfull /tmp

10/17- 10/23
Sheriffs: cychiang, briannorris, semenzato (honorary)
Gardeners: dshi, jrbarnette
        Ongoing Issues on canaries:
  • autoupdate_EndToEndTest, many different failures
  • autoupdate_Rollback
  • provision_Autoupdate.double
  • other provisioning failures (rsync errors, timeouts, error 37)
PFQ (gardening) issues:
  • New Issues:
  • - lakitu cloud_SystemServices flakiness
  • - autotest-web-tests build errors are too opaque
    • Filed, noted a potential fix
  • - Not enough falco_li DUT in the lab.
  • - kunimitsu-release: build_packages failed on autotest-deps-ltp with undefined ltp_syscall, happen once.
  • - guado_moblab-paladin: moblab_RunSuite: FAIL: Unhandled AttributeError: '_CrosVersionMap' object has no attribute 'get_stable_version'
  • - celes-release, gandof-release: signing failed due to gsutil/ssl timeout
  • - pre-cq failed because nyan_freon is removed
  • - x86-mario-release: security_ModuleLocking timed out
  • - Falco device chromeos2-row4-rack5-host7 is flaky in provision
  • - multiple paladins: security_ptraceRestrictions: DUT rebooted during the test run.
    • Caused by bad CLs that made it through for
    • Poor Kernel 3.10 HW coverage:
    • Bad CL in 3.10 has been reverted, but still flushing out of some canaries (2016-10-20)
  • - Nearly all canary failed: paygen and AUtest fail to install device image.
  • - chell signing/paygen failing due to new kernel cmdline flag
  • - jetstream_LocalApi failure
  • - wolf + veyron_speedy DUT availability
  • - kunimitsu build failures
    • Still not resolved; there's no paladin?
  • Resolved Issues:
  • - Chrome PFQ manifest errors
    • Waiting for next PFQ runs to come through
  • build_packages fail on almost all release builders, some paladin builders.
  • - security_SandboxedServices failure "One or more processes failed sandboxing"
  • - canary build failure because of minijail tree change. uprev of ebuild chumped. Fix to security_SandboxedServices chumped.
  • - autotest-web-tests issues on guado_moblab-paladin (experimental)
  • root caused to libcups/icedtea-bin - fix is in flight
  • cave-release: Fail to resolve host name for cros-beefy19-c2
  • b/32292437 - DUTs in pool crosperf are all 'repair failed'
  • Need to push change to autotest shard.

10/10 - 10/16
Sheriffs: chirantan, julanhsu, kinaba
Gardeners: lpique, dbehr

PFQ (gardening) issues:
  •  New Issues:
  • - guado_moblab: Repair failing. Happened once, didn't reoccur
  • - falco-chrome-pfq failing since build 4821 with apparent network issues after updating. Filed after digging into one of the failures on falco, and noticing that in one case the infra didn't reconnect to the DUT after it was provisioned. Possibly related to where it falco becomes unpingable during provisioning.
  • - select_to_speak exists build error. Occurred once.
  • - Microcode SW error detected. Occurred once.
  • - [bvt-inline] security_SandboxedServices failure on lumpy-chrome-pfq (flake). "awk cannot open /proc/xxx/status" because the process ended between when the filename was generated and when awk tried to open it.
  •  Ongoing Issues:
  • [falco-chrome-pfq] almost always red
    • - provision failure "Device XXX is not pingable". This has plagued the falco-chrome-pfq builder, and is one of the main reasons we didn't automatically uprev Chrome this week.
  • [x86-generic-tot-asan-informational] almost always red
    • - login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational.
  • [ChromeOS Buildspec] red for M54 builds
    • - browser tests failing M54 builds on ChromeOS Buildspec builder. Landed a fix on the M54 branch that was made after the branch was cut, and was otherwise missed. For the builds to go green, we need a new M54 release though, since the builder pulls the current stable version release.
  • [Chrome4CROS Packages] always red
  • [lumpy-chrome-pfq] occasionally red
    • lumpy-chrome-pfq HWTest [bvt-inline] timed out waiting for json_dump. This is still happening, as the build time is too long occasionally. Added a note to the bug about certain tests taking much longer than the mean according to the gathered statistics when this occurs.
  •  Resolved Issues:
  • - Manually uprev Chrome to 56.8891.0.0 for Chrome OS. Since we otherwise would not have done so at all this week.
    • Actually there happened to be a green master run late Friday, for the first time in nine days.
  • - BuildPackages broken in multiple chrome-pfq builders. The CL for the fix landed and the builds were fixed Monday.
  • - (New) Media.VideoCaptureGpuJpegDecoder.InitDecodeSuccess not loaded or histogram bucket not found or histogram bucket found at < 100%". Caused failures on peach-pit. The fix landed early Thursday.

10/3- 10/9
Sheriffs: rajatja, denniskempin
Gardenersihf, glevin
  • DebugSymbols error. Happens occasionally across boards:
  • AU Retry issues:
  • message_types_by_name error in dev_server:
  • buddy_release has been failing for weeks: need to investigate
  • gandof-release:
  • GSUtil timeout issues:
  • sentry-release: Some odd issues with HWTest need to investigate
  • bots failing graphics_Gbm check during hwtest

    PFQ (gardening) issues:
  •  New Issues:
  • - BuildPackages broken in multiple chrome-pfq builders.  There's a CL  for the fix, but it hasn't been committed yet.
  • - AboutTracingIntegrationTest.testBasicTraceRecording failing on x86-generic-telemetry and amd64-generic-telemetry.  CL to disable the test currently under review.
  • , , , - Autobugs for occasional HWTest provision flakes, mostly masked by 653900 since Thursday.
  • - falco- and tricky-chrome-pfq's failed w/timeouts during  Occasional flake, but no logs, no work done.
  • - lumpy-chrome-pfq HWTest [bvt-inline] timed out waiting for json_dump.  Flaked once, didn't recur.
  •  Ongoing Issues:
  • - Chrome4CROS Packages builder still broken (3+ weeks)
  • - Still happening on x86-generic-tot-asan-informational, with occasional successes slipping through.
  • - Occasional flake in PageLoadMetricsBrowserTest.FirstMeaningfulPaintNotRecorded
  • - HWTest[bvt-inline] : "security_NetworkListeners FAIL: Found unexpected network listeners".  Single flake, waiting to see if it recurs.
  •  Resolved Issues:
  • - [VMTest - SimpleTestVerify] failing on cyan-tot-chrome-pfq-informational : "Could not access KVM kernel module".  Reverted offending CL, builder green since then.
  • - Linux ChromiumOS Tests (dbg) failure of two DevToolsAgentTest.* tests.  Issue contains cause, revert, and subsequent fix.
  • - Linux ChromeOS Buildspec Tests failed intermittently for weeks.  Failure not seen since 10/7, when issue comment suggested that potential fix had landed.
  • - Multiple generic pfq builders failing with "Invalid ebuild name".  Fixed.

9/26 - 10/2
Sheriffs: dbasehore, akahuang
Gardenersjdufault, glevin

9/19 - 9/25
Sheriffs: apronin, charliemooney, vpalatin
Gardeners: stevenjb
  • chromiumos-sdk failed to build (missing efi.h) - fixed, build CL at fault CL to fix
  • Cyan has broken/flaky test performance in ToT, was causing CQ failures bug here
  • DataLinkManager crashing and breaking Canaries bug here (fixed: CL reverted)
  • Surfaceflinger crashing on oak bug here
  • Paladins fail to connect to MySQL instance bug here
  • Canaries were failing with "no attribute 'SignedJwtAssertionCredentials'" bug here (workaround CL submitted)
  • arc_mesa builds broken on auron, buddy, gandof, lulu, bug here, mostly fixed, buddy still fails as of buddy/428
  • manifest generation fails w/binary data in commit messages (e.g. CL:387905)
  • libmtp roll broke build packages due to autotools regen (fixed in CL:389031)
  • Root FS is over the limit for glimmer bug here
  • Reef builds were broken (unit tests failed to build), fixed here
  • Gru builds are broken (fail during uploading command stats) due to this CL, bug here, CL to fix
  • Some CLs are not marked as merged in Gerrit after a CQ run bug here
  • Tests that succeeded but left crashdumps frequently aborted on crashdump collection timeouts bug here, crashdump symbolication turned off if tests passed (here)
PFQ (gardening) issues:
  • Chrome4CROS Packages builder failing in compile -
  • login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational -
  • login_OwnershipNotRetaken fails regularly on PFQ. -
    • Ongoing investigation
  • Shutdown crash in ~ScreenDimmer > SupervisedUserURLFilter::RemoveObserver -
    • FIxed
  • Several PFQ failures due to timeouts -
    • Some timeouts are triaged, but some still need investigation

9/10 - 9/18
Sheriffs: cernekee, kkunduru, chinyue

9/5 - 9/9
Sheriffs: jdiez, dhendrix, mcchou, josephsih
Gardeners: achuith
  • Mostly having issues that affect many builders.
  • Canaries failing due to "HWTest did not complete due to infrastructure issues (code 3)", suspect b/31011610. May file more bugs...
  • Several builders failing due to misconfigured cheets_CTS test:
  • Kevin failing badly:
  • master-paladin infra failures (build 12292): this CL broke several paladin builds. Told the CL owner not to mark ready before fixing problems.
  • master-paladin infra failures (build 12294): failed 4 consecutive times. 20 paladins did not start in CommitQueueCompletion. Similar to build 12281 yesterday but build 12283 passed later.
  • provision_AutoUpdate.double ABORT: Timed out, did not run.
    • master-paladin infra failures (builds 12301, 12302): failed in these 2 builds
    • Looked similar to crbug/593423: Need to watch this as more builders were broken due to the timeout issue.
    • Build 12303 passed. Flaky?
  • signers failing while signing android apks:

8/29 - 9/4
Sheriffs: kitching, bleung, yixiang@
Gardeners: michaelpg, afakhry
  • CQ paladin build #12207 failed due to whirlwind-paladin #5640 HWTest jetstream_ApiServerAttestation failing, but passes in #5641
  • CQ paladin build #12215 failed due to many repo sync errors (example: daisy_skate-paladin), looks like subsequent builds do not exhibit repo sync problems
  • CQ paladin build #12216 failed due to:
  • CQ paladin build #12218 failed due to "No room left in the flash" Vpalatin knows about it and looking for ways to make it fit. 
  • - Slave frozen, needed to be restarted.
  • - Timeout on Paygen curl /list_suite_controls (auron-release)
  • - Timeout on Paygen curl /stage (banon-release)
  • - Paygen suite job timed out despite all PASSED
  • - buddy-release: Paygen suite job timed out, all tests FAILED/ABORT
  • Top Issue on 8/31 - - lab database problem
  • b/31011610 - ATL14 packet loss bringing down ChromeOS Commit Queue
  • - guado_moblab broken due to testing outage
  • -  nyan_freon-paladin timed out during p2p unittest
  • - gru-paladin attestation unittest failure. Possibly flaky test. apronin@ looking at fixing test. Also affects gale-paladin
  • - All paladins failed during CommitQueueSync.  akeshet@ theory is that backlog of CLs (especially on kernel repo) overwhelmed GoB. akeshet@ put in a CL to temporarily limit CQ volume to 50 : TODO: Revert this once the backlog is cleared. nxia@ also added this mitigation :
8/22 - 8/28
Sheriffs: bhthompson, nya, walker
Gardeners: jennyz, lpique
    8/15 - 8/21
    Sheriffs: benzh, sureshraj, yoshiki
    Gardeners: jamescook, domlaskowski
    • security_StatefulPermissions failures on canaries: 
    • provision_AutoUpdate.double failures on chrome pfq informational: 
    • SyncChrome failures due to "Repository does not yet have revision" on chrome informational pfq -> infra, ongoing flake
    • Chrome telemetry failures due to missing system salt file -> reverted
    • cyan chrome pfq informational builder cros-beefy191-c2 is out of disk space building chrome -> infra
    • pool: bvt, board: falco in a critical state -> infra
    • Chrome4CROS Packages builder failing in bot_update "fatal: reference is not a tree" -> infra
    • VMTest failing on telemetry bots due to telemetry_UnitTests_perf -> bug in test script?, disabled
    • cros amd64-generic Trusty builder failing to start goma in gclient runhooks step -> networking flake?
    • login_CryptohomeIncognito -> flaky, but real failure
    • cheets_NotificationTest failure on Cyan PFQ -> real failure in chrome (crash in shelf)
    • falco-full-compile-paladin has failed to start with exception setup_properties
    • x86-generic-tot-asan-informational failures in tpm_manager (odr-violation) and attestation (leaks) -> new target added to cros build that had failures, reverted
    • Kernel panics on Cyan PFQ -> ???
    • link-paladin BuildPackages failure with SSLError The read operation timed out
    • AUTest failed on most canaries due to no test configurations
    8/8 - 8/14
    Sheriffs: davidriley, vprupis, takaoka, smbarber (Mon afternoon only)
    • Continued UnitTest failures on canaries and release branches:
    • lakitu failures:
    • edgar missing duts:
    • kevin firmware prebuilt:
    • x86_alex and veyron_rialto pool health: and
    • Chumped change broke everything (eg pre-CQ, CQ, canaries) until revert was chumped in
    • infrastructure flake
      • celes-release/289, setzer-release/292 (build interrupted) ->
      • nyan-release/293, wolf-release/1294 (sudo access) ->
      • pre-cq (gerrit quota limits) ->
    • Friday: lab downtown affected builds for much of the day
    8/1 - 8/8
    Gardeners: stevenjb@, khmel@

    7/29 Notes for the next sheriffs from aaboagye, kirtika: 
    • Major issues we are seeing, format is <Impact: Issue: Links>::
      • Tree closure, fixed now: "No space left on device" for cheets builds: aaboagye@'s post-mortem here.
      • CQ failures: We've been seeing intermittent failures due to hitting git fetch limits with gerrit (commit queue sync step doesn't work). The current CQ run failed due to this, would not be surprised if the next one does too.
      • Several canaries failing: Unit-test times out, possibly due to overloaded machines:
      • Android-PFQ failures: adb is not ready in 60 seconds:
    • Minor issues, work-in-progress
      • Android-PFQ: mmap_min_addr not right on samus/x86:
      • Paygen/signing issues.
      • Autoupdate-rollback (likely network SSH issue): example

    2016-07-25 thru 2016-07-29
    Sheriff: aaboagye, kirtika, hidehiko (non-PST)

    • PST
      • Canaries
        • kevin-release was broken, but a fix is on the way. (wfrichar@ knows)
      • CQ
    • Non-PST:

    • PST
      • Canaries
        • Still seeing the error in the unittest phase. See
        • Paygen issue still affecting some canaries (x86_alex-he -
        • Saw a failure with auron_yuna canary with an error parsing a JSON response. See
        • samus failed with platform_OSLimits Found incorrect values: mmap_min_addr. Filed
      • CQ
        • Closed the tree because the CQ would just reject people's changes because of the no-disk-space error.
      • Chrome PFQ
        • Still seeing some failures in the login_CryptoHomeIncognito test. See
    • Non-PST
      • CQ:
        • RED.
        • samus-paladin is failing due to no-disk-space error.
        • cheets tests are failing two times with actual error ( Being fixed.
      • Chrome PFQ:
      • Android PFQ:

    • PST
      • Canaries
        • Seems like nearly all the canaries failed during HWTest stage apparently due to Infra issues.
      • CQ
        • On one run, some of the paladins failed during the CommitQueueSync step due to git rate limiting.
      • Android PFQ
        • An overloaded devserver is causing provisioning to fail for cyan-cheets-android-pfq and veyron_minnie-android-pfq (wolf-tot-paladin too).
    • (Non-PST)
      • CQ:
        • Master paladin looks flaky due to various reasons.
          • CQ limit hitting
          • HWtest time out
          • kOmahaErrorInHTTPResponse: looks a tracking issue. 
        • These look not always reproducible, and some runs pass successfully.
      • Chrome PFQ:
        • Finally passed at #3175.
      • Android PFQ:
        • Failing in latest several runs. Though the reasons are variety. Looks just too flaky.

    7/26 (18:20 PST)
    • Canary Failure Classification: Lots of canary failures (~50%) this afternoon, so listing unique causes here to track down tomorrow: 
      • x86-zgb: Pool-health issue, infra (kevcheng@) looking into it, may be back up next canary run? 
      • x86-mario: Not sure if the manifestversionedsync is a real issue or not, filed anyway. 
      • Paygen failures: falco, falco_li, gru, jecht, kip, lumpy, ninja, parrot, peppy, samus, smaug, x86_alex-he, stumpy. TBD: Update more details here. 

    • (PST)
      • Canaries
        • Still some errors on nyan_blaze and nyan_kitty caused by the vboot_firmware CL.
          • Fixes posted to gerrit and making it's way through the CQ.
        • Still some unittest failures. There's a CL that just landed to reduce the parallelism. Will be following to see if the situation improves.
          • That CL did not seem to resolve the issues.
        • Saw a few canaries yesterday (celes this morning) that had issues when uploading debug symbols. dgarret@ is working on a fix.
        • security_StatefulPermissions is pretty flaky, veyron_minnie canary failing on it. wmatrix is all red: Investigating
        • There was canary failure on lars-release which reported all the DUTs in the pool as dead, but they seem to be up now.
        • x86-zgb pool health is poor - most devices down. kevcheng@ taking a look.
        • Towards the end of the day, a larger number of canaries were failing at the paygen step. I think what may be happening is network flakiness, but I wonder why we don't just retry again?
      • CQ
        • panther_embedded-minimal-paladin has been down for quite some time now. Pinged the bug to see if there are any updates.
          • A restart of the master has been scheduled. Need to check back later today if that fixes things.
        • No elm devices in pool:cq making elm-paladin fail. kevcheng@ taking a look. No bug yet. 
      • Android PFQ 
        • harmony_java_math CTS test is causing failures with its causing android-pfq failures "cts test does not exist".  Filed b/30413761. Ping ihf@ if it doesn't get better. 
      • Chrome PFQ 
    • (Non-PST)
      • Canaries
        • platform_FilePems issue was fixed by yusukes@.
        • Investigated a bit more about UnitTest failure. Not yet reached to root cause.
      • CQ
        • Looks flaky: Sometimes failing ErrorCode=37 (OmahaErrorInHTTPResponse).
      • Chrome PFQ:
        • Looks flaky. Sometimes failing due to login error, but there is variety of failing boards.

    • Canaries
      • Several of the canaries were failing in the platform_FilePerms HwTest.
        • This was seen on cyan, elm, lulu, oak, samus, and veyron_minnie.
        • Appears to be missing expectations for ARC containers.
        • Filed
      • The unittest stage seems to be timing out somewhat fairly often now.
      • nyan-big is failing on a vboot_firmware CL not building. Filed Fix is in CQ now. 
    • CQ 
      • Generally okay today. There was one issue regarding a failure in VMTest, but that was caught.

    2016-07-18 thru 2016-07-24
    Sheriff: wuchengli

    • 628990: DebugSymbolsUploadException: Failed to upload all symbol
    • 593461: Chrome failed to reach login screen within 120 seconds
    • 628494: chromeos-bootimage build failures in canary builds
    • 609931: 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-5:6:7:3, started)> for 8610 seconds
    • 629094: cannot find source stateful.tgz

    2016-06-27 thru 2016-07-01
    Sheriff: mojahsu
    • 624744Canary Build: Exception on build packages.
    • Try to fix error by rebooting chromeos4-row6-rack9-host14.cros.
    • 624328: Canary Build: cros_sdk:enter_chroot: Not mounting chrome source: could not find CHROME_ROOT/src dir.
    • 598779(lumpy), 623803(stumpy): NotEnoughDutsError: (DUTs are expected to be back online by noon Tuesday, 6/28)
    • Canary build parrot_ivbNotEnoughDutsError: skip_duts_check option is on and total number DUTs in the pool is less than the required minimum avaialble DUTs.
    • 623873Canary: ERROR: ** HWTest did not complete due to infrastructure issues (code 3) **
    • 623880Canary: No output from <_BackgroundTask(_BackgroundTask-5:7:4, started)> for 8640 seconds
    • 623448unknown target 'khronos_glcts_test' (daisy_skate, nyan, peach_pit,  veyron_minnie, veyron_pinky,veyron_rialto,x86-alex)
    • 622789StageControlFileFailure: Failed to stage 
    • 623502Unable to create project hosting api client: Cannot get apiclient library.
    2016-06-20 thru 2016-06-26
    Sheriff: zhihongyu, reinauer

    • 623116 OOM-induced kernel panic when running hardware_RamFi
    • FIXED 623115 chromeos-base/libchrome fails to emerge
    • 622789 StageControlFileFailure: Failed to stage tricky-chrome-pfq/R53-8490.0.0-rc2
    • 617666 CQ could not update Gerrit: many CLs are now in limbo?
    • 622365 Missing permission_status.mojom.h on Chrome OS (flakey?)
    • 621971 PFQ: HWTest: Failure summary not always reported in autofiled issues
    • 517995 ./build_packages fails for amd64-generic board target when kernel-3_18 is selected in board definition
    • 621396: gclient runhooks => TypeError: unhashable type: 'list'
    • 622293: sdk bot failing after perl upgrade
    • 621676 stop drones from using devserver for static content

    2016-06-13 thru 2016-06-17
    Sheriff: tbroch, puneetster

    • 620015sync_chrome: Unhandled exception: OSError: [Errno 2] No such file or directory: '/b/cbuild/internal_master/.cache/distfiles/target/chrome-src-internal'
    • 609886: beaglebone-release: GCE / repo-cache issue? Failed to import ts_mon, monitoring is disabled: cannot import name gce
    • 619615: tricky-*: network_DefaultProfileCreation fails
    • 619980: veyron_gus-release: No such configuraton target: "veyron_gus-release"
    • 619754: buildPackages failing on internal builders : chromeos-chrome: png->io_ptr
    Sheriff: tfiga
    • 618916: panther_embedded-minimal-release builders failing with "No such configuraton target"
    • 618919: veyron_gus-release builds failing with "No such configuraton target"
    • 618923: Canaries fail AUtest/HWtest due to infrastructure issues (not only canaries actually)
    2016-06-07 thru 2016-06-08
    Sheriff: shawnn, vpalatin
    • 617979: samus failing signing stage
    • 618020: Provision failure on braswell devices
    • 618131: BuildPackages failure due to socket timeouts / FileNotFound
    • 618159: pre-cq-launcher failures
    • 618523: cryptohome unit test failures
    Sheriff: smbarber, abhishekbh
    • 617704 - EC change needed to be reverted since DUTs were no longer reporting AC power
    • 617979: samus failing signing stage
    Sheriff: smbarber, abhishekbh
    • Chrome PFQ had manual uprev, accidentally broke some canaries. Canaries manually restarted.
    Sheriff: moch, scollyer
    • 54007 - cyan-cheets-paladin failed (due to some necessary chumped changes). Tree closed but later throttled as Android container fix is being landed.
      Sheriff: moch, scollyer, djkurtz
      • 605181 - veyron_speedy-paladin/peppy-paladin: The HWTest [bvt-cq] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** - Flaky provisioning issues
        Sheriff: djkurtz, ejcaruso, zachr
        • (ongoing) 615730: Rialto build break in libpayload: multiple definition of `video_console_init'
        • 615997 - 18:05 canary runs failing bvt-inline on many systems
        • 615993 - video_VideoSanity / video_ChromeHWDecodeUsed / video_ChromeRTCHWDecodeUsed on arm chrome PFQs: (daisy_skate, peach_pit, veyron_minnie-cheets
        • 616015 - veyron_jerry-release - buildpackages fails - chromeos-base/chromeos-ec-0.0.1-r3046 - No room left in the flash
        • 616236 - CL with anonymous owner crashes buildbot
        • 616238 - Normal buildbot failures are sometimes reported to gerrit as timeouts

        Sheriff: wnhuang
        • [RESOLVED] veyron_speedy-paladin: The HWTest [bvt-cq] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **
          • filesystem become read-only due to error. Fixed by rebooting.
        • cyan-cheets-paladin: The HWTest [arc-bvt-cq] stage failed: ** Suite timed out before completion **
          • chromeos4-row6-rack9-host1 repair failed, scheduled another repair.

        Sheriff: jrbarnette, waihong, wnhuang
        • 615474x86-alex-paladin HwTest timeout abort
        • 615151: guado_moblab: failing provision because moblab-scheduler-init isn't running

        Sheriff: djkurtz
        Gardeners: slavamn, puthik
          • 614579: [bvt-inline] security_ASLR Failure on daisy_skate-chrome-pfq/R53-8368.0.0-rc2
          • 614606: nyan-release consistently failing signing
          • 615029: minnie failing to sign
          Sheriff: littlecvr
          Gardeners: stevenjb, levarum
          • 613868: build141-m2 had been swapped, but a restart is needed. The restart has been scheduled at the EOD (PDT time).
          • 614261: build141-m2 had been replaced by build257-m2, but build257-m2 died again.

          Sheriff: littlecvr
          Gardeners: stevenjb, levarum
          • 613868: build141-m2 is offline and there is no backup.
          • 612688: KioskTests are flaky on ChromiumOS bots.
          • 611405 ASan builders failed when building update_engine
          • 614040: cyan-cheets continues to faill with PoolHealthBug

          Sheriff: martinroth, wfrichar
          Deputy: akeshet
          • p/53507 VMTests have been failing for several days in the canary builds due to crashing DisplayLinkManager.

          Sheriff: robotoboy, dtor
          • 611405 ASan builders failed when building update_engine - deymo@: A CL on AOSP landed to fix that last week, there's an uprev blocked on some CQ issues that I'll get to today.

          Sheriff: ravisadineni, zqiu
          Deputy: shuqianz
              [ONGOING] :  mccloud-release, stumpy-release [Issue 609926]  FAIL: Powerwash count didn't increase after powerwash cycle
              [FLAKE]       :   paygen issue [Issue 605181  Issue 606071] : paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a                                connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log.
             [RESOLVED] : relm-release [ Issue 611528] : doins failled.

          Sheriff: reveman, sonnyrao, tbroch
          Deputy: shuqianz
          • [RESOLVED] everything : manifestversionedsync: GoB quota issue (611084b/28721585b/28720367
            • veyron_pinky-release
            • samus-paladin : Tried fetch locally and it worked.
              • RunCommandError: return code: 128; command: git fetch -f refs/changes/27/258727/1
              • fatal: remote error: Git repository not found
          • [FLAKE] zako-release : paygen : (605181)
            • paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log
          • [EXPECTED] gru-release : chromeos-initramfs emerge fails (605597)
          • [RESOLVED] master-paladin : daisy_skate-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** 
            • provision_AutoUpdate.double [ FAILED ] 
            • provision_AutoUpdate.double ABORT: None
              • [FLAKE] test running successfully but suite aborted at ~30min.  Says it should run for 90min however.

          Sheriff: reveman, sonnyrao, tbroch
          Deputy: shuqianz
          • [ONGOING] guado_moblab, - [provision]: FAIL: Moblab has 0 Ready DUTs, completed successfully (610727, repair: b/28690294)
          • [RESOLVED] devserver issue: (b/28704856)
          • [INFO] buildbot slave shutdowns on 5/9 for emergency maintenance having some fallout (paladins)
          • [FLAKE] ninja-release - [bvt-inline]: FAIL 62794807-chromeos-test/chromeos4-row3-rack9-host6/provision_AutoUpdate provision
            • Unhandled AutoservSSHTimeout: ('ssh timed out', * Command:
            • flake?  Host is fine now.
          • [RESOLVED] veyron_speedy-paladindaisy_skate-release - [bvt-cq]: Exception waiting for results, JSONRPCException: Error decoding JSON response (606071)
          • [RESOLVED] *-cheets-android-pfq [buildPackages]: autotest-cheets-* import error: No module named (b/28694363)
          • [RESOLVED] Lars builds down for hardware swap ()

          2016-05-06 to 09
          Sheriff: mruthven, rspangler, kcwu
          Gardener: jennyz

          More detailed notes on our shift are here.

          Stuff that broke and was fixed:
          • Lots of other release builders failing with "timed out", "didn't start", or on Sync-Chrome on Friday.  dgarrett@ said the release builders are being reorganized and will be highly unreliable.  Cleared up over the weekend.
          • CQ failed on multiple paladins HWTest with two types of failure, but both seem to have the same underlying cause in the logs.  Filed 610000 and throttled tree.
            • [bvt-inline] - logging_CrashSender: retry_count: 2, FAIL: Simple minidump send failed
            • [bvt-cq] - logging_UserCrash: FAIL: Did not find version 8288.0.0-rc2 in log output
            • Cause: CL 342574 (fixed)
          • [bvt-cq] - graphics_Gbm: FAIL: Gbm test failed().  Bad CL has been identified and fixed.
          Stuff that's still broken:
          • veyron_rialto-release fails: BuildPackages: Cannot find prebuilts for chromeos-base/chromeos-chrome.  (590784)
          • stout-paladin builder (build126-m2) is offline (609682)
          • daisy_skate-release - AUTest misconfigured (610088)
          • CQ failed with CommitQueueSync errors on multiple paladins (server hung up unexpectedly), but passed on the next run.  Seems to happen in the afternoon.

          Sheriff: groeck, furquan
          Gardener: jennyz
          • 609610: MobLab ToT not showing network bridge
          Sheriff: johnylin
          • 609054: M52: Failed to update the status for master-release
            • Error message: "fatal: could not read Username for '': No such device or address
            • Many CQ/PFQ build failure related to this as well
              • CQ: failed CommitQueueSync
              • PFQ: failed MasterSlaveLKGMSync
          • 608838: Some video/media tests are temporary waived on veyron
            • Workaround needs to revert after this fixed

          Sheriff: johnylin
          • Powerwash flakes on Canaries 605325:
            •  => almost never passed
            •  => almost never passed
          • Paygen flakes on Canaries 516795:
          • Build failures on lakitu-release:
          • HWTest flakes on Canaries:
          • Some autoupdate rollback failures in terra / wizpig / reks / celes / ultima. Lab network issue? 596262
          • Not enough disk space on veyron-b-release-group 605601
          • CQ:
            • veyron_rialto is failing with "ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto"
              • Failed for a long time. Under tracking in 590784

          Sheriff: drinkcat
          • Lab issue? 605464
            • I think it got better
          • CQ:
            • One instance of a "nyan-full-compile-paladin did not start": seems like random flake
          • Canaries
            • Paygen on link almost never passes 605849
          • Chrome-PFQ:
            • BuildImage: ERROR: test_elf_deps: Failed dependency check (chromium-pfq on arm/x86 platforms) 605851
              • Chumped a revert, but the bug the original CL was fixing is also P0: 601854, please coordinate with ihf & gardener.
          • Android-PFQ:
            • chrome gs handler issue: some files do not have a md5 sum 605861

          Sheriff: drinkcat, denniskempin, dbasehore
          • Lab issue? 605464
            • wolf-paladin fail: wolf-tot-paladin/builds/6443 wolf-paladin/builds/10777
            • A number of HWTest timeout:
              • => no, different stuff
            • Paygen failure:
          • CQ
            • *-cheets-paladin fail in HWTest: 605309
          • Canaries
            • rambi-release BuildPackages timeout: 605402 . Likely a flake.
            • Minor guado_molab-release BuildPackages issue: 605408
            • guado-moblab-release HWTest: 605409
            • x86-alex/x86-zgb/x86-alex_he: chromeos-kernel-3_8: undefined reference to `watchdog_dev_unregister': 605458
            • cros_make_image_bootable is failing 605587
            • veyron_rialto still failing due to lack of chrome prebuilt: 597966
            • More powerwash failures 605325
            • More autoupdate failures 605181
            • More /dev/loop0 issues 605176
            • security_test_image failing on amd64-generic-goofy-release 605595
            • Gru is failing to build chromeos-initramfs 605597
            • Not enough disk space on veyron-b-release-group 605601
            • Enguarge builds packages for >7 hours, gets killed. 605608
            • gale-release failing to build chromeos-bootimage 605638
          • PFQ
            • Chrome fails to build on all PFQs 605592

          Sheriff: jcliang, denniskempin, dbasehore
          • CQ
            • veyron_rialto has been failing for ages due to lack of chrome prebuilt: 597966
          • Canaries
            • stumpy pool health bug: 596647
            • Powerwash is still failing on multiple boards: 589030
            • Intermittent au test failures on multiple boards. Looks like infra flakes.
            • auron-release-group and ivybridge-release-group keep failing paygen 605159
            • auron-b-release-group fails to build image 605155
            • daisy-release-group failing hwtest 535795
            • Hosts not returning after powerwash 589089
            • rambi-e-release-group is having issues with /dev/loop0 605176
            • Tons of failures of autoupdate_EndtoEndTest.paygen 605181
            • Veyron-b builders still out of space: bug
            • AutoservRunError on guado_moblab-paladin 605241
          • PFW
            • nyan-chrome-pfq fails to build packages 605202
          Sheriff: jcliang, puneetster, charliemooney
          • Powerwash is still failing on multiple boards: 589030
          • panther pool health bug: 597744
          • Veyron-b builders running out of space: bug
          • It looks like the master builder crashed and took out several slaves, but then recovered gracefully.
          • Chrome PFQ did not update over the weekend.  Working with dimu@ and ketaki@ to figure out why
          • Lulu Cheets failing to sign bug
          • "Timeout deadline set by master" error in PayGen for Auron bug
          • Alex's missing in the pool bug

          Sheriff: puneetster, charliemooney
          • Powerwash continues to fail bug
          • Not enough builders in the pool, killing some canaries bug
          • Generic SSH (255) errors continue bug

          Sheriff: bleung, briannorris, cywang
          • CQ
            • Pool Health Bug (almost all boards are affected, peppy-paladin:601988 wolf-palain:603450 veyron-speedy-paladin:603455,  daisy-skate-paladin:603456, ...)
              • machines in pool:{cq, bvt} are all marked as 'Repair Failed', no bvt-cq bvt-inline suites can be executed.
              • clicked 'Repair button' on a failed DUT but in vain.
          • Canaries
          • PFQ
            • issue 603169 : extensions_to_load has been moved to browser options, a hiccup during transition on Chrome PFQ, fixed by achuith
          Sheriff: bleung, briannorris, cywang
          • CQ
          • Canaries
          • PFQ
          • Other
            • issue 603248 : gizmo-paladin and gizmo-release builders were removed yesterday, but they still appear on the waterfall, failing again and again. Waterfall may need to be restarted.

          Sheriff: cychiang, bleung, adlr
          Sheriff: cychiang

          Sheriff: shchen, briannorris

          Redirects to log files are now working again.  No more hand-modifying urls :).

          • CQ
            • There is an ongoing provisioning error (598517) that's hitting the CQ with the error: FAIL: Failed to install device image using payload athttp:// on chromeos4-row7-rack13-host11. SSH reports a generic error (255) which is probably a lab network failure.  It is still under investigation.  I think that I saw it at least 5 times during my sheriffing shift.
            • veyron_speedy had failed three times, twice with a provisioning error: "Update failed. Returned update_engine error code: ERROR_CODE=49, ERROR_MESSAGE=ErrorCode::kNonCriticalUpdateInOOBE. Reported error: AutoservRunError". This is a known issue: 600737.
            • veyron_minnie-cheets failed with a timeout error.  I checked the individual tests in the suite and they seemed to all pass (nothing aborted).  I contacted the deputy and he added more DUTs to the pool for minnie to hopefully rectify this situation.
          • PFQ
            • HWTest and VMTest failures on daisy_skate and lumpy possibly caused by dev tools regression: 601533
            • veyron_minnie-cheets failed with the same timeout error described above.  Hopefully the additional DUTs will resolve this situation.
            • cyan-cheets failed 6/8 runs due to timeouts.  There were many jobs aborted so it seems that there was a significant shortage of machines for this platform.  Infromed the deputy and he increased the allocation of DUTs from 6 to 11.
          • Canaries
            • The canaries look sad.  About half are failing for various reasons below:
              • Timeouts: ivybridge (during paygen), rambi-c (during paygen), rambi (during buildpackages), strago-b (this is due to cyan-cheets, which just upped its allocation), veyron (during paygen)
              • Powerwash (host did not return from reboot): jecht, beltino-a
              • Powerwash (Powerwash count didn't increase after powerwash cycle): beltino-b
              • autoupdate_Rollback (host did not return from reboot): kunimitsu
              • build_image: pineview
              • tar: chromiumos_base_image.bin: file changed as we read it: rambi-b
              • slippy and strago failing with "TestLabException: Number of available DUTs for board falco_li pool bvt is 0, which is less than the minimum value 1."  Bugs 590398590522 were automatically filed, but seems not be have been triaged.  Pinged deputy.
          Sheriff: shchen, adlr

          Log files are still broken.  Workaround described in b/27653354.  Solution is take and append test name to it.  
          • CQ
            • 601224:  buildpackages error on glados, strago, cyan-cheets.  Error in iwl7000 wireless driver.  The merge has been reverted.
              • So apparently merges do not show up in the change list on the builder pages.  I had an instance where a merge occurred (without me knowing) and I could not figure out what was causing the error from the waterfall pages.  It was in the kernel code, but there was only 1 kernel CL that was unrelated.  What I ended up having to do was find the hash used for the build.  It looks like:

                <project name="chromiumos/third_party/kernel" path="src/third_party/kernel/v3.18" revision="b850f41a01164fe1eb4cf76b5178194d53394130"/>

                and matching that up with a commit in 
            • whirlwind has failed three times in a row with jetstream test failures.  Deputy is trying to track down the error at 593404.  This problem seems to have been fixed.
            • cyan-cheets is failing due to a timeout: "ERROR: Timeout occurred- waited 13461 seconds, failing. Timeout reason: Slave reached the timeout deadline set by master."
              • Dug into log and seems like the provisioning stage never connected to the machine because it was down.  Checked on state of machine in the lab and it seems to be up and running again.  Will keep an eye on the test to make sure that it doesn't happen again today/tomorrow.
                • To find error, go to the cyan-cheets builder and click on last build, in this case #37.  Scroll down to the test (HWTest) and click "link to suite", which will take you to the Autotest results.  Here you find the Failed job and click on that test, which then you can find the logs to search through.
                • Currently, the log redirect links are failing, so you need to get to them with the instructions in the Notes above.  Take the test name (found in parenthesis next to the job name) and append to the link above.  So, you'll end up going to:  You'll see a folder for the hostname of the machine.  The logs are in <hostname>/debug/.
                • To check status of host, click on the hostname (to the left of the currently broken log links).  This will take you to the host's page and you can check the status of it there.
          • PFQ
            • veyron_rialto is failing with "ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto"
              • This has been failing forever.  I checked back 200 builds (as far as I could) and they're all failing with the same error.
              • 597966 has already been filed for it, but it remains untriaged.  Pinged for update.

          Sheriff: aaboagye, abhishekbh

          To next sheriff(s):
          !! There's an issue where trying to get the logs from a test suite returns "Not Found". See b/27653354 for more details. !!
          I expect the lakitu incremental builders to both go green once CL:337302 lands.
          Canaries will probably fail due to timeouts (it's a known issue), but check the slave builder for any non-timeout related failure like Paygen or AUTest.
          PFQs should go green since the DUTs were repaired or replaced. Watch 598517 for updates regarding the generic SSH errors.
          amd64-generic-asan will continue to fail until this CL lands.
          • CQ
            • The link paladin failed during the provision step with an error that says "provision: FAIL: Failed to install device image using payload. It appears to be an update_engine error with an error code of "kNonCriticalUpdateInOOBE".
              • was filed to track it.
              • It doesn't seem to be related to any one board.
          • PFQ
            • Yesterday the PFQs were green, but today there seem to be some issues present.
            • There at least two different issues here that both occur during the provision step:
              • "SSH reports a generic error (255) which is probably a network failure" ->
              • "Update failed. The priority of the inactive kernel partition is less than that of the active kernel partition." ->
          • Canaries
            • Previous runs of the canary builders were still timing out. There was also one run where they just failed to start, but the timeout issues are still prevalent.
            • Towards the end of the day,  powerwash issues surfaced for beltino-a, jecht, and rambi canaries. ->
          • Incremental
            • Since build #7794, the lakitu-incremental builder has been failing VMTest for the test logging_UserCrash.
              • Since it was an autotest failure, searching for the string "END FAIL" in the stdio, leads to the error message pretty quickly.
              • Filed
              • This seemed to be caused by an inadvertent change due to rebasing and patch shuffling. Unfortunately wasn't caught by the CQ since the CQ doesn't run VMTests.

          Sheriff: aaboagye, abhishekbh, vapier

          This morning there were a bunch of paladin failures, with the CQ master failing 26(!) consecutive times. Because of this I throttled the tree because there's no use in trying new changes until we get the CQ actually finishing correctly.
          • Guado moblab is one of the main offenders. To get to the debug logs, I clicked on one of the failed builds, scrolled to the HWTest section, clicked on "Link to suite".
            • Once there, I clicked on the failed test, provision. (Shows up in a purple box) Then on the new page that opened, clicked on "view all logs".
            • From there, navigated to the debug directory and took a look at "autoserv.DEBUG".
              • Searched until I found the string "Autotest caught exception when running test". Just above that line, it shows the command that was attempted. In this case it was "/tmp/stateful_update: Permission denied".
            • Filed
              • The infrastructure teams notes that it's helpful to include the hostname in the bug as well. The hostname of the DUT and the buildslave.
          • daisy-full failed SimpleChromeWorkflow. The buildslave also appears to be offline since last friday.
            • To find the logs, I clicked on the failed build, and clicked the "stdio" link under SimpleChromeWorkflow. Scrolled down to find the traceback && STEP_FAILURE.
            • Looks like a couple different errors: a read-only filesystem, and an ImportError for no module named apport.fileutils.
            • Filed
          amd64-generic-asan has been failing for a _very_ long time. I found where some progress is being made. The primary CL is still pending review.

          For triaging canary failures, I first take look at each release group. Most of the failures seem to be due to the suite timing out. However there are a few other issues. You can check these by viewing the "stdio" link under step that's yellow or red.
          In the afternoon, the internal waterfall seemed dead. Infra deputy filed to a trooper.

          Sheriff: moch, zachr
          • 599674: glados-release-group chell and cave fail buildpackages
          • 599982: daisy-paladin did not start

          Sheriff: kitching, moch, zachr
          • 597866: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto
            • Apparently nothing to worry about since important=False
          • Almost all CQ builders are timing out, but seems like current builds are succeeding
          • 596630: "Failed to install device image using payload" during provision errors (x86-zgb-paladin)
          • 579119: unittest timeout (peach-pit-paladin)

          Sheriff: kitching
          • 589885: failure in desktopui_ScreenLocker still showing up in amd64-generic
          • 598967: LeakSanitizer: detected memory leaks in update_engine-0.0.3-r1895 UnitTest (deymo@ investigating)
          • [from akeshet@, fixed] 598960: storm and whirlwind paladins failing consistently in vox unittest
          • 59898051703: rialto-services use of ReadFileToString needs updating
          • 598517: "SSH reports a generic error" failures during provision
          • strago board issues:
            • 583014: Strago boards don't have bvt results for the last week
            • 51482: Braswell systems are repeatedly failing to install in the autotest lab with eMMC failures
          Chromeos Gardener: jennyz
          Sheriff: bsimonnet, gwendal, wuchengli
          • amd64-generic
            • 589885:  failure in desktopui_ScreenLocker
            • 556785:  Reduce parallelism during unittests - unit test fail.
          • autoupdate failure:
            • 557106:  File system corruption on samus DUTs.
          • 598224:  Several CQ/paladin builders offline.
          • 596150:  pineview-release-group fails InitSDK
          • 593565:  Paygen failure (FAIL: Unhandled TypeError: expected string or buffer)
          Chromeos Gardener: jennyz Sheriff: dgreid, josephsih
          • 597213: [bvt-cq] platform_Perf Failure on tricky-chrome-pfq/R51-8100.0.0-rc3: Could not find build id for a DSO. 
          • 597183: provision_AutoUpdate.double_SERVER_JOB Failure on tricky-chrome-pfq/R51-8100.0.0-rc2: assigned to dfang.
          • 597111:  SitePerProcessBrowserTest.PagePopupMenuTest flaky on Linux_Chromeos_Test bot: kenrb fixed it today.
          Chromeos Gardener: jennyz Sheriff: tbroch, scollyer
            • 594336: network_DefaultProfileCreation Failure on tricky-tot-chrome-pfq-informational/R51-8053.0.0-b61: assigned to zqiu@.
            • 597111:  SitePerProcessBrowserTest.PagePopupMenuTest flaky on Linux_Chromeos_Test bot.
            • 536061 : (non-closer, builds fixed on retry) debugd:missing dependency. fixed by olofj
            Sheriff: tbroch, scollyer
            • 595274 : (tree-closer) webRTC HW Decode/Encode crashes tab
            • 595988 : (flaky/non-closer) network_DefaultProfileCreation Failure on tricky-tot-chrome-pfq-informational
            • 596150 : (non-closer) pineview-release-group fails InitSDK
            Sheriff: djkurtz, marcheu, shawnn
            • 51123: oak-release-group: elm: SignerTest fails: security_test_image failed == "CHROMEOS_RELEASE_BOARD: Value 'elm' was not recognized"
            • 594556: x86-generic-paladin: VMTest: desktopui_ScreenLocker fails => Screen is locked
            • 594565: mttools: BuildPackages fails on first attempt
            • 594571 veyron_rialto-paladin: BuildPackages fails: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto
            • 594622 veyron_minnie-cheets paladin consistently failing
            • 594592 lakitu-incremental builder failing gcetest
            • 594699 samus vmtest failures

            Deputy: shuqianz, Sheriff: marcheu, shawnn
            • 594176: daisy_skate-chrome-pfq provision failing
            • 593926: Lars devices in lab going down
            • 592766: chromeos-bootimage build failures
            • 594233: paladin builders offline

            2016-03-04 wiley, drinkcat (honorary), aaboagye
            • PreCQ
              • 592143: PreCQ: Failing InitSDK (Fixed due to chumping some python changes.)
            • PFQ failures
              • 591401: BuildImage step failing on PFQ "No space left on device" (Fixed with a revert.)
              • 554222: AutoservSSHTimeout PFQ failures
              • 582477: video_ChromeHWDecodeUsed is flake on CQ
            • Canary
              • 591965: guado_moblab-paladin: HWTest fails "bash: /tmp/stateful_update: Permission denied"
                • A following run also failed, but what looks like to be for a different reason.
              • 591957: smaug-paladin: BuildPackages failure "sys-fs/udev[gudev(-)] ("sys-fs/udev[gudev(-)]" is blocking dev-libs/libgudev-230)"
            • CQ
              • 592148: chromeos-test-testauthkeys-mobbase failed to build due to collisions.
              • 592182: guado_moblab moblab_RunSuite failure in CQ run.

            Sheriff: cywang, aaboagye, wiley
            • Chrome PFQ failures
              • 590762: Broken CrOS build of telemetry autotests - still happening
              • 591731: chromeos-chrome: build failure 'ppapi_example_video_decode': No such file
                • See 591782 and 59140 for the background for this bug.
                • Basically, trying to add earlier failures for file operations in the chromeos-chrome.ebuild.
                • The 1st change was submitted, but led to the ppapi_example_video_decode error. Change was then reverted.
                  • At a later time the cleanup will land.
                • This may cause the telemetry failures to pop up again.
            • CQ
              • 591639graphics_GLBench(graphics_utils) failed in HWTest - fix submitted
              • 591837: prebuilts failing to upload on certain paladins. GS flake? (lakitu, guado)
            • Canary
              • 591656security_AccountBaseline failed on lulu - fix submitted
              • 591658security_StatefulPermissions Failure on lulu - fix submitted
              • 583014: strago release groups red since December 2015 (~2% pass rate)
            • Misc
              • 591853public waterfall is missing the status boxes
            Sheriff: bfreed, charliemooney, cywang
            • PFQ failures
              • 591308: ChromeSDK failed in Chromium PFQ
              • 590762: Broken CrOS build of telemetry autotests - force another chromium PFQ build
              • 591401: Builders failing in BuildImage step because they run out of storage
              • 376372: about 8 canaries hit a HWTest "Suite timed out" error.
              • 590372: A few builders died trying to sync the source (error: Exited sync due to gc errors)
            Sheriff: bfreed, charliemooney
            • 591097: shill and dhcpd flake causing HWTest infrastructure failures and 10 straight CQMaster failures.
            • 591231: samus canary timeout in paygen stage while trying to copy a gsutil file.
            • 589135: rambi-c-group canary failed in Archive: "tar: chromiumos_test_image.bin: file changed as we read it"
            • 591256: peach group canary failed in Paygen with LockNotAcquired error
            • 583364: Veyron Paygen downloading failures
            Sheriff: drinkcat
            • 590113: x86-generic incremental VMTest security_ASLR fails (once in VMTest, a bit strange)
            • Closed the tree for 1 minute, false alarm: CQ-master page gave me the impression that the built failed because of rialto
            • PFQ failures
              • 590133: amd64-generic chromium PFQ: fatal error: ui/accessibility/ax_enums.h: No such file or directory
              • 590114: [bvt-cq] provision Failure on daisy_skate-chrome-pfq/R50-7966.0.0-rc2 (autofiled)
            • 584542: toybox build is flaky, but never caused an actual build failure. Local fix on gerrit, started upstream discussion about fix
            Sheriff: jrbarnette, quiche
            • 590065: toybox build is flaky
            • 589879: Build failures on "Lumpy (Chrome)" and "Alex (Chrome)"
            • 589905: Lumpy timing out in afdodatagenerate
            • 589885: desktopui_ScreenLocker failure on chromiumos.chromium
            • 589844: CQ failure due to HWTest failure on veyron_minnie-cheets-paladin
            Sheriff: drinkcat
            • 589690CQ fails at CommitQueueSync, other builders in Sync (Cannot fetch chromiumos/third_party/arm-trusted-firmware)
              • Chumped manifest change to pin 
              • 589713: third_party/arm-trusted-firmware: Figure out which branch to track (Follow up on underlying issue)
            • p/50460: oak-full build failure
            • 589777: lakitu: security_AccountsBaselineLakitu Baseline mismatch
            • 2 Sync issues:
            Sheriff: jrbarnette, quiche
            • 588834audio_CrasSanity fails: "CRAS stream count is not matching with number of streams"
              • This can cause failures in the CQ.  All boards seem to be affected.
              • Reverted three CLs; it's not yet known whether that will stop the problems.
            • 589641 graphics_Sanity failing on veyron boards 
              • This has caused some failures in the CQ.  So far, only veyron shows the problem.
            • 589623 Pre-CQ cannot uprev and rejects new CLs
              • A bad CL was chumped in without review.
              • Chumped in a fix to go with it.
            Sheriff: ejcaruso, waihong
            • 588739: Timed out going through login screen. Cryptohome not mounted.
            • 588834audio_CrasSanity fails: "CRAS stream count is not matching with number of streams"
            • 588921: Some builder suffer a virtual drive failure.

            Sheriff: wnhuang
            • 587411: Multiple CQ build failure due to infrastructure issue
            Gardeners: jennyz
            Sheriff: wfrichar, davidriley, kcwu
            • 558983daisy-skate PFQ occasionally failed for this issue. The pending cl for fix this is not landed yet. guidou@ is working on it.
            • 585973: daisy-skate PFQ occasionally failed for this issue.
            Gardeners: stevenjb
            Sheriff: dtor, avakulenko
            • 586180: Pre-CQ and CQ masters failed due to git outage during source sync
            • 586179: Canaries fail due to provision timeout (SuitePrep: ABORT due to timeout)
            Gardeners: stevenjb / afakhry
            Sheriff: scollyer, furquan

            Gardener: stevenjb
            • 584722 chromeos-chrome build failure: "No package 'gtk+-2.0' found" while running pkg-config with media.gyp

            Sheriff: dhendrix
            • 584542: sys-apps/toybox failing to compile on amd64-generic
            • 473899: paygen "Not all images for this build are uploaded", smaug has been seeing this for months.
            • 569358: pool: bvt, board: x86-mario in a critical state. (assigned now)
            • 584447: pool: bvt, board: veyron_mickey in a critical state. (assigned)
            • 571757: [sanity] provision Failure on expresso-release/R49-7760.0.0. Note: This manifested itself as a swarming failing when I updated the bug (#68).
            Sheriff: johnylin,grundler, dbasehore
            • 561036: FIXED: paygen timing out: dshi appears to have fixed this
            • 574915: VMTest failures in desktopui_ScreenLocker - jdufault investigating
            • 578771: GPT Header Issue
            • 579119: Unittest timeout
            • 581639: IGNORE: lakitu_mobbuild fails cloud_SpinyConfig: turning down this build (sosa)
            • 582144: FIXED: security_ASLR: reverting changed fixed problem (
            • 582325: veyron-b: rialto-services emerge fail
            • 582521: FIXED? error in gsutil: samus canary builds succeeded on Feb 02 19:15. Also seen on daisy.
            • 583081: FIXED: autotest-chrome build failures (
            • 583535: FIXED: login_* test failures: reverted (alchuith, dup:583382)
            • 583684: FIXED: CommitQueueSync repo sync: manifest referred to a tag instead of branch
            Sheriff: grundler,dbasehore
            • 561036paygen timing out on release builders
            • 574915: VMTest failures in desktopui_ScreenLocker (later forked into three bugs)
            • 581639 - lakitu_mobbuild fails cloud_SpinyConfig (known issue)
            • 582521 - samus canary failed because of error in gsutil
            • 583375: provision thrashing causing canary/beta build timeouts (kevcheng)
            • 583382: login_* tests failing (may be dup of 574915 or others)

            Sheriff: bleung, puthik
            • 582531 - flaky HWTest for Pineview/ strago-b / sandybridge
            • 583375 - canary and beta builds can cause provision thrashing which can cause hwtests to time out

            Sheriff: bleung, puthik
            • 582521 - samus canary failed because of error in gsutil
            • 581639 - lakitu_mobbuild fails cloud_SpinyConfig
            • 576879 - pool: bvt, board : candy in a critical state.
            • 582325 - veyron-b: rialto-services emerge fail

            Sheriff: bhthompson, shchenhychao
            • 582144security_ASLR test failing on glados, strago, strago-b with Unhandled TypeError

            Sheriff: bhthompson, shchenjchuang
            • 581598: archive stage failure at BuildAndArchiveFactoryImages 
            • 581624: gd-2.0.35 build failed on guado_moblab
            • 581630: docker build failed on lakitu_next
            • 543649: smaug paygen failing with "Not all images for this build are uploaded, don't process it yet" (does not cause canary failure, low priority)
            • 581631: cheets_SettingsBridge: Timed out waiting for condition: Android font size set to smallest
            • 581639: GCETest fail at 01-cloud_SpinyConfig on lakitu_mobbuild

            Sheriff: robotboy, semenzato, jchuang
            • 580184PFQ failed to build related to chromeos/ime/input_methods.h missing
            • 561036paygen timing out on release builders
            • 581382: perf_dashboard_shadow_config.json syntax error led to parse job failure (causing several timeout)

            Sheriff: littlecvr
            • 486098Builder failure HWTest Code 3 - not enough detail to debug
            • 561036paygen timing out on release builders
            • 547055Jecht Group Failed Archive Step

            Sheriff: littlecvr
            • 547055Jecht Group Failed Archive Step
            • 578771Paygen error: GPT_ERROR_INVALID_HEADERS
            • 558266[au] autoupdate_Rollback Failure on ultima-release/R49-7655.0.0
            • 580184Master: PFQ failed to build related to chromeos/ime/input_methods.h missing
            • 580261Update/provisioning timeouts during tests due to slow network
            • 579811lakitu-release build continuously failed at GCETest

            Sherif: deymo, zqiu, hungte
            Chromeos Gardener: jennyz
            • 580184: Master: PFQ failed to build, related to missing chromeos/ime/input_method.h

            Sheriff: stevefung, dlaurie, hungte
            Chromeos Gardener: jennyz
            • 579565: M49: PFQ Failing chromite unit testing on lumpy.

            Sheriff: stevefung, dlaurie
            • 322443: M49 PFQ failing unit tests
            Sheriff: vapier, zeuthen
            • 577549: lakitu_mobbuild_paladin fails at mariadb
            • 577542: build_packages fails at chromeos-mrc on strago canary and paladin build
            • 577836: lakitu_mobbuild_paladin fails at serf
            Sheriff: cychiang
            • 576905: pool: bvt, board: veyron_mighty in a critical state.
            • 576992: util-linux-2.25.1-r1 build failure on cyan canary build
            • 577025: TestFailure(paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to perform stateful update on chromeos2-row2-rack10-host9)
            • 571747: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row2-rack3-host1)
            • 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
            • 571884: [bvt-inline] security_ASLR Failure: No such file or directory: '/proc/32189 32187/maps'. (on PFQ)
            • 577549: lakitu_mobbuild_paladin fails at mariadb
            • 577542: build_packages fails at chromeos-mrc on strago canary and paladin build
            Sheriff: cychiang
            • 576525: chromeos-bootimage build failure on nyan_blaze: Unknown blob type 'boot' required in flash map
            • 576526: cheets_PerfBootServer failure at wait_for_adb_ready
            • 529612: lakitu_mobbuild: cloud_CloudInit fails in VMTest
            • 576549: lakitu_mobbuild canary build fails at GCE test because of quota exceeded
            • 576545: rambi-a-release group clapper build_packages fails at net-misc/strongswan
            • 571749: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row5-rack8-host11)
            • 571747: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row2-rack3-host1)
            • 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
            • 576608: security_AccountsBaselineLakitu fails with Baseline mismatch
            Sheriff: moch, zachr
            • 572745[bvt-cq] graphics_GpuReset Failure on falco-chrome-pfq
            • 574870: [sanity] dummy_PassServer.sanity_SERVER_JOB Failure on veyron-b-group canary
            • 574915: VMTest failures in desktopui_ScreenLocker, securityASLR, login_LoginSuccess
            • 574303provision Failure on cyan-release

              Sheriff: moch, zachr
              • 574501: amd64-generic ASAN vmtests failing (desktopui_ScreenLocker, buffet_InvalidCredentials, buffet_IntermittentConnectivity)

              • 574197 Peach group Canary failing since 12/29
              Gardener: stevenjb@/jdufault@
              • 574104 : LKGM builder needs to be updated to git
              • 573961 : Peach pit failures
                • Forcing a rebuild, looks like it might be infra flake: 'Failed to install device image using payload at...'
              • 574198 : PFQ flake, security_SandboxStatus
              OLDER ENTRIES MOVED TO THE ARCHIVE so this page doesn't take forever to load.  See Sheriff Log: Chromium OS (ARCHIVE!)