Sheriff Log: Chromium OS (go/croslog)


9/26 - 10/2
Sheriffs: dbasehore, akahuang
Gardenersjdufault

9/19 - 9/25
Sheriffs: apronin, charliemooney, vpalatin
Gardeners: stevenjb
  • chromiumos-sdk failed to build (missing efi.h) - fixed, build CL at fault CL to fix
  • Cyan has broken/flaky test performance in ToT, was causing CQ failures bug here
  • DataLinkManager crashing and breaking Canaries bug here (fixed: CL reverted)
  • Surfaceflinger crashing on oak bug here
  • Paladins fail to connect to MySQL instance bug here
  • Canaries were failing with "no attribute 'SignedJwtAssertionCredentials'" bug here (workaround CL submitted)
  • arc_mesa builds broken on auron, buddy, gandof, lulu, bug here, mostly fixed, buddy still fails as of buddy/428
  • crbug.com/649582: manifest generation fails w/binary data in commit messages (e.g. CL:387905)
  • crbug.com/649592: libmtp roll broke build packages due to autotools regen (fixed in CL:389031)
  • Root FS is over the limit for glimmer bug here
  • Reef builds were broken (unit tests failed to build), fixed here
  • Gru builds are broken (fail during uploading command stats) due to this CL, bug here, CL to fix
  • Some CLs are not marked as merged in Gerrit after a CQ run bug here
  • Tests that succeeded but left crashdumps frequently aborted on crashdump collection timeouts bug here, crashdump symbolication turned off if tests passed (here)
PFQ (gardening) issues:
  • Chrome4CROS Packages builder failing in compile - crbug.com/648308
  • login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational - crbug.com/648665
  • login_OwnershipNotRetaken fails regularly on PFQ. - crbug.com/618392
    • Ongoing investigation
  • Shutdown crash in ~ScreenDimmer > SupervisedUserURLFilter::RemoveObserver - crbug.com/648723
    • FIxed
  • Several PFQ failures due to timeouts - crbug.com/647303
    • Some timeouts are triaged, but some still need investigation

9/10 - 9/18
Sheriffs: cernekee, kkunduru, chinyue
Gardenersafakhry



9/5 - 9/9
Sheriffs: jdiez, dhendrix, mcchou, josephsih
Gardeners: achuith
  • Mostly having issues that affect many builders.
  • Canaries failing due to "HWTest did not complete due to infrastructure issues (code 3)", suspect b/31011610. May file more bugs...
  • Several builders failing due to misconfigured cheets_CTS test: crbug.com/641208
  • Kevin failing badly: crbug.com/644908
  • master-paladin infra failures (build 12292): this CL broke several paladin builds. Told the CL owner not to mark ready before fixing problems.
  • master-paladin infra failures (build 12294): failed 4 consecutive times. 20 paladins did not start in CommitQueueCompletion. Similar to build 12281 yesterday but build 12283 passed later.
  • provision_AutoUpdate.double ABORT: Timed out, did not run.
    • master-paladin infra failures (builds 12301, 12302): failed in these 2 builds
    • Looked similar to crbug/593423: Need to watch this as more builders were broken due to the timeout issue.
    • Build 12303 passed. Flaky?
  • signers failing while signing android apks: crbug.com/645628

8/29 - 9/4
Sheriffs: kitching, bleung, yixiang@
Gardeners: michaelpg, afakhry
  • CQ paladin build #12207 failed due to whirlwind-paladin #5640 HWTest jetstream_ApiServerAttestation failing, but passes in #5641
  • CQ paladin build #12215 failed due to many repo sync errors (example: daisy_skate-paladin), looks like subsequent builds do not exhibit repo sync problems
  • CQ paladin build #12216 failed due to:
  • CQ paladin build #12218 failed due to "No room left in the flash" Vpalatin knows about it and looking for ways to make it fit. 
  • crbug.com/642478 - Slave frozen, needed to be restarted.
  • crbug.com/642608 - Timeout on Paygen curl /list_suite_controls (auron-release)
  • crbug.com/642616 - Timeout on Paygen curl /stage (banon-release)
  • crbug.com/642611 - Paygen suite job timed out despite all PASSED
  • crbug.com/642617 - buddy-release: Paygen suite job timed out, all tests FAILED/ABORT
  • Top Issue on 8/31 - crbug.com/641290 - lab database problem
  • b/31011610 - ATL14 packet loss bringing down ChromeOS Commit Queue
  • crbug.com/643278 - guado_moblab broken due to testing outage
  • crbug.com/643300 -  nyan_freon-paladin timed out during p2p unittest
  • crosbug.com/p/56862 - gru-paladin attestation unittest failure. Possibly flaky test. apronin@ looking at fixing test. Also affects gale-paladin
  • crbug.com/643452 - All paladins failed during CommitQueueSync.  akeshet@ theory is that backlog of CLs (especially on kernel repo) overwhelmed GoB. akeshet@ put in a CL to temporarily limit CQ volume to 50 : https://chromium-review.googlesource.com/#/c/380457/ TODO: Revert this once the backlog is cleared. nxia@ also added this mitigation : https://chromium-review.googlesource.com/#/c/380343/2
8/22 - 8/28
Sheriffs: bhthompson, nya, walker
Gardeners: jennyz, lpique
    8/15 - 8/21
    Sheriffs: benzh, sureshraj, yoshiki
    Gardeners: jamescook, domlaskowski
    • crbug.com/637868 security_StatefulPermissions failures on canaries: 
    • crbug.com/593423 provision_AutoUpdate.double failures on chrome pfq informational: 
    • crbug.com/637962 SyncChrome failures due to "Repository does not yet have revision" on chrome informational pfq -> infra, ongoing flake
    • crbug.com/637960 Chrome telemetry failures due to missing system salt file -> reverted
    • crbug.com/637900 cyan chrome pfq informational builder cros-beefy191-c2 is out of disk space building chrome -> infra
    • crbug.com/637472 pool: bvt, board: falco in a critical state -> infra
    • crbug.com/637931 Chrome4CROS Packages builder failing in bot_update "fatal: reference is not a tree" -> infra
    • crbug.com/637938 VMTest failing on telemetry bots due to telemetry_UnitTests_perf -> bug in test script?, disabled
    • crbug.com/638348 cros amd64-generic Trusty builder failing to start goma in gclient runhooks step -> networking flake?
    • crbug.com/631640 login_CryptohomeIncognito -> flaky, but real failure
    • crbug.com/638656 cheets_NotificationTest failure on Cyan PFQ -> real failure in chrome (crash in shelf)
    • crbug.com/638980 falco-full-compile-paladin has failed to start with exception setup_properties
    • crbug.com/638968 x86-generic-tot-asan-informational failures in tpm_manager (odr-violation) and attestation (leaks) -> new target added to cros build that had failures, reverted
    • crbug.com/639102 Kernel panics on Cyan PFQ -> ???
    • crbug.com/639107 link-paladin BuildPackages failure with SSLError The read operation timed out
    • crbug.com/639314 AUTest failed on most canaries due to no test configurations
    8/8 - 8/14
    Sheriffs: davidriley, vprupis, takaoka, smbarber (Mon afternoon only)
    • Continued UnitTest failures on canaries and release branches: crbug.com/627881
    • lakitu failures: crbug.com/635562
    • edgar missing duts: crbug.com/596262
    • kevin firmware prebuilt: crbug.com/635598
    • x86_alex and veyron_rialto pool health: crbug.com/634471 and crbug.com/592002
    • Chumped change broke everything (eg pre-CQ, CQ, canaries) until revert was chumped in
    • infrastructure flake
      • celes-release/289, setzer-release/292 (build interrupted) -> crbug.com/602565
      • nyan-release/293, wolf-release/1294 (sudo access) -> crbug.com/616206
      • pre-cq (gerrit quota limits) -> crbug.com/624460
    • Friday: lab downtown affected builds for much of the day
    8/1 - 8/8
    Gardeners: stevenjb@, khmel@

    7/29 Notes for the next sheriffs from aaboagye, kirtika: 
    • Major issues we are seeing, format is <Impact: Issue: Links>::
      • Tree closure, fixed now: "No space left on device" for cheets builds: aaboagye@'s post-mortem here. crbug.com/630426
      • CQ failures: We've been seeing intermittent failures due to hitting git fetch limits with gerrit (commit queue sync step doesn't work). The current CQ run failed due to this, would not be surprised if the next one does too. crbug.com/632065.
      • Several canaries failing: Unit-test times out, possibly due to overloaded machines: crbug.com/627881
      • Android-PFQ failures: adb is not ready in 60 seconds: crbug.com/632891
    • Minor issues, work-in-progress
      • Android-PFQ: mmap_min_addr not right on samus/x86: crbug.com/632526.
      • Paygen/signing issues.
      • Autoupdate-rollback (likely network SSH issue): example crbug.com/596262



    2016-07-25 thru 2016-07-29
    Sheriff: aaboagye, kirtika, hidehiko (non-PST)

    7/29
    • PST
      • Canaries
        • kevin-release was broken, but a fix is on the way. (wfrichar@ knows)
      • CQ
    • Non-PST:

    7/28
    • PST
      • Canaries
        • Still seeing the error in the unittest phase. See crbug.com/627881
        • Paygen issue still affecting some canaries (x86_alex-he - crbug.com/629094).
        • Saw a failure with auron_yuna canary with an error parsing a JSON response. See crbug.com/632433.
        • samus failed with platform_OSLimits Found incorrect values: mmap_min_addr. Filed crbug.com/632526.
      • CQ
        • Closed the tree because the CQ would just reject people's changes because of the no-disk-space error. crbug.com/630426.
      • Chrome PFQ
        • Still seeing some failures in the login_CryptoHomeIncognito test. See crbug.com/631640.
    • Non-PST
      • CQ:
        • RED.
        • samus-paladin is failing due to no-disk-space error. crbug.com/630426
        • cheets tests are failing two times with actual error (https://chrome-internal-review.googlesource.com/#/c/270781/). Being fixed.
      • Chrome PFQ:
      • Android PFQ:

    7/27
    • PST
      • Canaries
        • Seems like nearly all the canaries failed during HWTest stage apparently due to Infra issues.
      • CQ
        • On one run, some of the paladins failed during the CommitQueueSync step due to git rate limiting.
      • Android PFQ
        • An overloaded devserver is causing provisioning to fail for cyan-cheets-android-pfq and veyron_minnie-android-pfq (wolf-tot-paladin too).
    • (Non-PST)
      • CQ:
        • Master paladin looks flaky due to various reasons.
          • CQ limit hitting
          • HWtest time out
          • kOmahaErrorInHTTPResponse: crbug.com/621148 looks a tracking issue. 
        • These look not always reproducible, and some runs pass successfully.
      • Chrome PFQ:
        • Finally passed at #3175.
      • Android PFQ:
        • Failing in latest several runs. Though the reasons are variety. Looks just too flaky.

    7/26 (18:20 PST)
    • Canary Failure Classification: Lots of canary failures (~50%) this afternoon, so listing unique causes here to track down tomorrow: 
      • x86-zgb: Pool-health issue, infra (kevcheng@) looking into it, may be back up next canary run? 
      • x86-mario: Not sure if the manifestversionedsync is a real issue or not, filed crbug.com/631867 anyway. 
      • Paygen failures: falco, falco_li, gru, jecht, kip, lumpy, ninja, parrot, peppy, samus, smaug, x86_alex-he, stumpy. TBD: Update more details here. 

    7/26
    • (PST)
      • Canaries
        • Still some errors on nyan_blaze and nyan_kitty caused by the vboot_firmware CL. crbug.com/631192
          • Fixes posted to gerrit and making it's way through the CQ.
        • Still some unittest failures. There's a CL that just landed to reduce the parallelism. Will be following to see if the situation improves. crbug.com/627881.
          • That CL did not seem to resolve the issues.
        • Saw a few canaries yesterday (celes this morning) that had issues when uploading debug symbols. dgarret@ is working on a fix. crbug.com/212437.
        • security_StatefulPermissions is pretty flaky, veyron_minnie canary failing on it. wmatrix is all red: https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=security_StatefulPermissions. Investigating crbug.com/604606
        • There was canary failure on lars-release which reported all the DUTs in the pool as dead, but they seem to be up now. crbug.com/631530.
        • x86-zgb pool health is poor - most devices down. kevcheng@ taking a look. crbug.com/590653.
        • Towards the end of the day, a larger number of canaries were failing at the paygen step. I think what may be happening is network flakiness, but I wonder why we don't just retry again?
      • CQ
        • panther_embedded-minimal-paladin has been down for quite some time now. Pinged the bug to see if there are any updates. crbug.com/630494.
          • A restart of the master has been scheduled. Need to check back later today if that fixes things.
        • No elm devices in pool:cq making elm-paladin fail. kevcheng@ taking a look. No bug yet. 
      • Android PFQ 
        • harmony_java_math CTS test is causing failures with its causing android-pfq failures "cts test does not exist".  Filed b/30413761. Ping ihf@ if it doesn't get better. 
      • Chrome PFQ 
    • (Non-PST)
      • Canaries
        • platform_FilePems issue was fixed by yusukes@. crbug.com/631080
        • Investigated a bit more about UnitTest failure. Not yet reached to root cause. crbug.com/627881.
      • CQ
        • Looks flaky: Sometimes failing ErrorCode=37 (OmahaErrorInHTTPResponse).
      • Chrome PFQ:
        • Looks flaky. Sometimes failing due to login error, but there is variety of failing boards.

    7/25
    • Canaries
      • Several of the canaries were failing in the platform_FilePerms HwTest.
        • This was seen on cyan, elm, lulu, oak, samus, and veyron_minnie.
        • Appears to be missing expectations for ARC containers.
        • Filed crbug.com/631080.
      • The unittest stage seems to be timing out somewhat fairly often now.
      • nyan-big is failing on a vboot_firmware CL not building. Filed crbug.com/631192. Fix is in CQ now. 
    • CQ 
      • Generally okay today. There was one issue regarding a failure in VMTest, but that was caught.

    2016-07-18 thru 2016-07-24
    Sheriff: wuchengli
    7/19


    7/18
    • 628990: DebugSymbolsUploadException: Failed to upload all symbol
    • 593461: Chrome failed to reach login screen within 120 seconds
    • 628494: chromeos-bootimage build failures in canary builds
    • 609931: 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-5:6:7:3, started)> for 8610 seconds
    • 629094: cannot find source stateful.tgz

    2016-06-27 thru 2016-07-01
    Sheriff: mojahsu
    6/30
    • 624744Canary Build: Exception on build packages.
    6/29
    • Try to fix error by rebooting chromeos4-row6-rack9-host14.cros.
    • 624328: Canary Build: cros_sdk:enter_chroot: Not mounting chrome source: could not find CHROME_ROOT/src dir.
    6/28
    • 598779(lumpy), 623803(stumpy): NotEnoughDutsError: (DUTs are expected to be back online by noon Tuesday, 6/28)
    • Canary build parrot_ivbNotEnoughDutsError: skip_duts_check option is on and total number DUTs in the pool is less than the required minimum avaialble DUTs.
    • 623873Canary: ERROR: ** HWTest did not complete due to infrastructure issues (code 3) **
    • 623880Canary: No output from <_BackgroundTask(_BackgroundTask-5:7:4, started)> for 8640 seconds
    6/27
    • 623448unknown target 'khronos_glcts_test' (daisy_skate, nyan, peach_pit,  veyron_minnie, veyron_pinky,veyron_rialto,x86-alex)
    • 622789StageControlFileFailure: Failed to stage 
    • 623502Unable to create project hosting api client: Cannot get apiclient library.
    2016-06-20 thru 2016-06-26
    Sheriff: zhihongyu, reinauer

    6/24
    • 623116 OOM-induced kernel panic when running hardware_RamFi
    • FIXED 623115 chromeos-base/libchrome fails to emerge
    6/23
    • 622789 StageControlFileFailure: Failed to stage tricky-chrome-pfq/R53-8490.0.0-rc2
    • 617666 CQ could not update Gerrit: many CLs are now in limbo?
    6/22
    • 622365 Missing permission_status.mojom.h on Chrome OS (flakey?)
    6/21
    • 621971 PFQ: HWTest: Failure summary not always reported in autofiled issues
    • 517995 ./build_packages fails for amd64-generic board target when kernel-3_18 is selected in board definition
    6/20
    • 621396: gclient runhooks => TypeError: unhashable type: 'list'
    • 622293: sdk bot failing after perl upgrade
    • 621676 stop drones from using devserver for static content

    2016-06-13 thru 2016-06-17
    Sheriff: tbroch, puneetster

    6/14
    • 620015sync_chrome: Unhandled exception: OSError: [Errno 2] No such file or directory: '/b/cbuild/internal_master/.cache/distfiles/target/chrome-src-internal'
    • 609886: beaglebone-release: GCE / repo-cache issue? Failed to import ts_mon, monitoring is disabled: cannot import name gce
    • 619615: tricky-*: network_DefaultProfileCreation fails
    • 619980: veyron_gus-release: No such configuraton target: "veyron_gus-release"
    • 619754: buildPackages failing on internal builders : chromeos-chrome: png->io_ptr
    6/13
    2016-06-10
    Sheriff: tfiga
    • 618916: panther_embedded-minimal-release builders failing with "No such configuraton target"
    • 618919: veyron_gus-release builds failing with "No such configuraton target"
    • 618923: Canaries fail AUtest/HWtest due to infrastructure issues (not only canaries actually)
    2016-06-07 thru 2016-06-08
    Sheriff: shawnn, vpalatin
    • 617979: samus failing signing stage
    • 618020: Provision failure on braswell devices
    • 618131: BuildPackages failure due to socket timeouts / FileNotFound
    • 618159: pre-cq-launcher failures
    • 618523: cryptohome unit test failures
    2016-06-06
    Sheriff: smbarber, abhishekbh
    • 617704 - EC change needed to be reverted since DUTs were no longer reporting AC power
    • 617979: samus failing signing stage
    2016-06-03
    Sheriff: smbarber, abhishekbh
    • Chrome PFQ had manual uprev, accidentally broke some canaries. Canaries manually restarted.
    2016-06-02
    Sheriff: moch, scollyer
    • 54007 - cyan-cheets-paladin failed (due to some necessary chumped changes). Tree closed but later throttled as Android container fix is being landed.
      2016-06-01
      Sheriff: moch, scollyer, djkurtz
      • 605181 - veyron_speedy-paladin/peppy-paladin: The HWTest [bvt-cq] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** - Flaky provisioning issues
        2016-5-31
        Sheriff: djkurtz, ejcaruso, zachr
        • (ongoing) 615730: Rialto build break in libpayload: multiple definition of `video_console_init'
        • 615997 - 18:05 canary runs failing bvt-inline on many systems
        • 615993 - video_VideoSanity / video_ChromeHWDecodeUsed / video_ChromeRTCHWDecodeUsed on arm chrome PFQs: (daisy_skate, peach_pit, veyron_minnie-cheets
        • 616015 - veyron_jerry-release - buildpackages fails - chromeos-base/chromeos-ec-0.0.1-r3046 - No room left in the flash
        • 616236 - CL with anonymous owner crashes buildbot
        • 616238 - Normal buildbot failures are sometimes reported to gerrit as timeouts

        2016-5-30
        Sheriff: wnhuang
        • [RESOLVED] veyron_speedy-paladin: The HWTest [bvt-cq] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **
          • filesystem become read-only due to error. Fixed by rebooting.
        • cyan-cheets-paladin: The HWTest [arc-bvt-cq] stage failed: ** Suite timed out before completion **
          • chromeos4-row6-rack9-host1 repair failed, scheduled another repair.

        2016-05-26-27
        Sheriff: jrbarnette, waihong, wnhuang
        • 615474x86-alex-paladin HwTest timeout abort
        • 615151: guado_moblab: failing provision because moblab-scheduler-init isn't running

        2016-05-25
        Sheriff: djkurtz
        Gardeners: slavamn, puthik
          • 614579: [bvt-inline] security_ASLR Failure on daisy_skate-chrome-pfq/R53-8368.0.0-rc2
          • 614606: nyan-release consistently failing signing
          • 615029: minnie failing to sign
          2016-05-24
          Sheriff: littlecvr
          Gardeners: stevenjb, levarum
          • 613868: build141-m2 had been swapped, but a restart is needed. The restart has been scheduled at the EOD (PDT time).
          • 614261: build141-m2 had been replaced by build257-m2, but build257-m2 died again.

          2016-05-23
          Sheriff: littlecvr
          Gardeners: stevenjb, levarum
          • 613868: build141-m2 is offline and there is no backup.
          • 612688: KioskTests are flaky on ChromiumOS bots.
          • 611405 ASan builders failed when building update_engine
          • 614040: cyan-cheets continues to faill with PoolHealthBug

          2016-05-18
          Sheriff: martinroth, wfrichar
          Deputy: akeshet
          • p/53507 VMTests have been failing for several days in the canary builds due to crashing DisplayLinkManager.

          2016-05-16
          Sheriff: robotoboy, dtor
          Deputy:
          • 611405 ASan builders failed when building update_engine - deymo@: A CL on AOSP landed to fix that last week, there's an uprev blocked on some CQ issues that I'll get to today.

          2016-05-12
          Sheriff: ravisadineni, zqiu
          Deputy: shuqianz
              [ONGOING] :  mccloud-release, stumpy-release [Issue 609926]  FAIL: Powerwash count didn't increase after powerwash cycle
              [FLAKE]       :   paygen issue [Issue 605181  Issue 606071] : paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a                                connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log.
             [RESOLVED] : relm-release [ Issue 611528] : doins failled.

          2016-05-11
          Sheriff: reveman, sonnyrao, tbroch
          Deputy: shuqianz
          • [RESOLVED] everything : manifestversionedsync: GoB quota issue (611084b/28721585b/28720367
            • veyron_pinky-release
            • samus-paladin : Tried fetch locally and it worked.
              • RunCommandError: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/ap-daemons refs/changes/27/258727/1
              • fatal: remote error: Git repository not found
          • [FLAKE] zako-release : paygen : (605181)
            • paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log
          • [EXPECTED] gru-release : chromeos-initramfs emerge fails (605597)
          • [RESOLVED] master-paladin : daisy_skate-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** 
            • provision_AutoUpdate.double [ FAILED ] 
            • provision_AutoUpdate.double ABORT: None
              • [FLAKE] test running successfully but suite aborted at ~30min.  Says it should run for 90min however.

          2016-05-10
          Sheriff: reveman, sonnyrao, tbroch
          Deputy: shuqianz
          • [ONGOING] guado_moblab, - [provision]: FAIL: Moblab has 0 Ready DUTs, completed successfully (610727, repair: b/28690294)
          • [RESOLVED] devserver issue: (b/28704856)
          • [INFO] buildbot slave shutdowns on 5/9 for emergency maintenance having some fallout (paladins)
          • [FLAKE] ninja-release - [bvt-inline]: FAIL 62794807-chromeos-test/chromeos4-row3-rack9-host6/provision_AutoUpdate provision
            • Unhandled AutoservSSHTimeout: ('ssh timed out', * Command:
            • flake?  Host is fine now.
          • [RESOLVED] veyron_speedy-paladindaisy_skate-release - [bvt-cq]: Exception waiting for results, JSONRPCException: Error decoding JSON response (606071)
          • [RESOLVED] *-cheets-android-pfq [buildPackages]: autotest-cheets-* import error: No module named cros.graphics.drm (b/28694363)
          • [RESOLVED] Lars builds down for hardware swap ()

          2016-05-06 to 09
          Sheriff: mruthven, rspangler, kcwu
          Gardener: jennyz

          More detailed notes on our shift are here.

          Stuff that broke and was fixed:
          • Lots of other release builders failing with "timed out", "didn't start", or on Sync-Chrome on Friday.  dgarrett@ said the release builders are being reorganized and will be highly unreliable.  Cleared up over the weekend.
          • CQ failed on multiple paladins HWTest with two types of failure, but both seem to have the same underlying cause in the logs.  Filed 610000 and throttled tree.
            • [bvt-inline] - logging_CrashSender: retry_count: 2, FAIL: Simple minidump send failed
            • [bvt-cq] - logging_UserCrash: FAIL: Did not find version 8288.0.0-rc2 in log output
            • Cause: CL 342574 (fixed)
          • [bvt-cq] - graphics_Gbm: FAIL: Gbm test failed().  Bad CL has been identified and fixed.
          Stuff that's still broken:
          • veyron_rialto-release fails: BuildPackages: Cannot find prebuilts for chromeos-base/chromeos-chrome.  (590784)
          • stout-paladin builder (build126-m2) is offline (609682)
          • daisy_skate-release - AUTest misconfigured (610088)
          • CQ failed with CommitQueueSync errors on multiple paladins (server hung up unexpectedly), but passed on the next run.  Seems to happen in the afternoon.

          2016-05-05
          Sheriff: groeck, furquan
          Gardener: jennyz
          • 609610: MobLab ToT not showing network bridge
          2016-05-03
          Sheriff: johnylin
          • 609054: M52: Failed to update the status for master-release
            • Error message: "fatal: could not read Username for 'https://chrome-internal.googlesource.com': No such device or address
              "
            • Many CQ/PFQ build failure related to this as well
              • CQ: failed CommitQueueSync
              • PFQ: failed MasterSlaveLKGMSync
          • 608838: Some video/media tests are temporary waived on veyron
            • Workaround needs to revert after this fixed

          2016-05-03
          Sheriff: johnylin
          • Powerwash flakes on Canaries 605325:
            • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group
            • https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group  => almost never passed
            • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1760
            • https://uberchromegw.corp.google.com/i/chromeos/builders/sandybridge-release-group  => almost never passed
          • Paygen flakes on Canaries 516795:
            • https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/124
          • Build failures on lakitu-release:
          • HWTest flakes on Canaries:
            • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-c-release-group/builds/2225
            • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1760
            • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-e-release-group/builds/1063
            • https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group
          • Some autoupdate rollback failures in terra / wizpig / reks / celes / ultima. Lab network issue? 596262
            • https://uberchromegw.corp.google.com/i/chromeos/builders/strago-b-release-group
            • https://uberchromegw.corp.google.com/i/chromeos/builders/strago-release-group/builds/1135
          • Not enough disk space on veyron-b-release-group 605601
          • CQ:
            • veyron_rialto is failing with "ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto"
              • Failed for a long time. Under tracking in 590784


          2016-04-22
          Sheriff: drinkcat
          • Lab issue? 605464
            • I think it got better
          • CQ:
            • One instance of a "nyan-full-compile-paladin did not start": seems like random flake
          • Canaries
            • Paygen on link almost never passes 605849
          • Chrome-PFQ:
            • BuildImage: ERROR: test_elf_deps: Failed dependency check (chromium-pfq on arm/x86 platforms) 605851
              • Chumped a revert, but the bug the original CL was fixing is also P0: 601854, please coordinate with ihf & gardener.
          • Android-PFQ:
            • chrome gs handler issue: some files do not have a md5 sum 605861

          2016-04-21
          Sheriff: drinkcat, denniskempin, dbasehore
          • Lab issue? 605464
            • wolf-paladin fail: wolf-tot-paladin/builds/6443 wolf-paladin/builds/10777
            • A number of HWTest timeout:
              • https://uberchromegw.corp.google.com/i/chromeos/builders/auron-b-release-group/builds/1473
              • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2087
              • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group/builds/2100
              • https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/89
              • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1725
              • https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group/builds/3631
              • https://uberchromegw.corp.google.com/i/chromeos/builders/strago-b-release-group/builds/549
              • https://uberchromegw.corp.google.com/i/chromeos/builders/veyron-b-release-group/builds/1473
              • https://uberchromegw.corp.google.com/i/chromeos/builders/kunimitsu-release-group/builds/796 => no, different stuff
              • https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group/builds/3632
            • Paygen failure:
              • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2087
              • https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group/builds/1404
              • https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group/builds/1405
              • https://uberchromegw.corp.google.com/i/chromeos/builders/nyan-release-group/builds/2750
              • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-b-release-group/builds/2297
              • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-c-release-group/builds/2190
              • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1725
              • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-e-release-group/builds/1029
              • https://uberchromegw.corp.google.com/i/chromeos/builders/strago-c-release-group/builds/415
              • https://uberchromegw.corp.google.com/i/chromeos/builders/smaug-release/builds/1037
              • https://uberchromegw.corp.google.com/i/chromeos/builders/glados-release-group/builds/910
              • https://uberchromegw.corp.google.com/i/chromeos/builders/auron-release-group/builds/1822
              • https://uberchromegw.corp.google.com/i/chromeos/builders/ivybridge-release-group/builds/1917
          • CQ
            • *-cheets-paladin fail in HWTest: 605309
          • Canaries
            • rambi-release BuildPackages timeout: 605402 . Likely a flake.
            • Minor guado_molab-release BuildPackages issue: 605408
            • guado-moblab-release HWTest: 605409
              • https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-release/builds/883
            • x86-alex/x86-zgb/x86-alex_he: chromeos-kernel-3_8: undefined reference to `watchdog_dev_unregister': 605458
            • cros_make_image_bootable is failing 605587
            • veyron_rialto still failing due to lack of chrome prebuilt: 597966
            • More powerwash failures 605325
              • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1726
              • https://uberchromegw.corp.google.com/i/chromeos/builders/sandybridge-release-group/builds/2363
              • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group/builds/2101
            • More autoupdate failures 605181
              • https://uberchromegw.corp.google.com/i/chromeos/builders/daisy-release-group/builds/4887
              • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2088
            • More /dev/loop0 issues 605176
              • https://uberchromegw.corp.google.com/i/chromeos/builders/pineview-release-group/builds/2300
            • security_test_image failing on amd64-generic-goofy-release 605595
              • https://uberchromegw.corp.google.com/i/chromeos/builders/amd64-generic-goofy-release/builds/231
            • Gru is failing to build chromeos-initramfs 605597
              • https://uberchromegw.corp.google.com/i/chromeos/builders/gru-release-group/builds/60
            • Not enough disk space on veyron-b-release-group 605601
              • https://uberchromegw.corp.google.com/i/chromeos/builders/veyron-b-release-group/builds/1474
            • Enguarge builds packages for >7 hours, gets killed. 605608
              • https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/90
            • gale-release failing to build chromeos-bootimage 605638
              • https://uberchromegw.corp.google.com/i/chromeos/builders/gale-release/builds/57
              • https://uberchromegw.corp.google.com/i/chromeos/builders/gale-release/builds/58
          • PFQ
            • Chrome fails to build on all PFQs 605592

          2016-04-20
          Sheriff: jcliang, denniskempin, dbasehore
          • CQ
            • veyron_rialto has been failing for ages due to lack of chrome prebuilt: 597966
          • Canaries
            • stumpy pool health bug: 596647
            • Powerwash is still failing on multiple boards: 589030
            • Intermittent au test failures on multiple boards. Looks like infra flakes.
            • auron-release-group and ivybridge-release-group keep failing paygen 605159
            • auron-b-release-group fails to build image 605155
            • daisy-release-group failing hwtest 535795
            • Hosts not returning after powerwash 589089
            • rambi-e-release-group is having issues with /dev/loop0 605176
            • Tons of failures of autoupdate_EndtoEndTest.paygen 605181
            • Veyron-b builders still out of space: bug
            • AutoservRunError on guado_moblab-paladin 605241
          • PFW
            • nyan-chrome-pfq fails to build packages 605202
          2016-04-19
          Sheriff: jcliang, puneetster, charliemooney
          • Powerwash is still failing on multiple boards: 589030
          • panther pool health bug: 597744
          • Veyron-b builders running out of space: bug
          • It looks like the master builder crashed and took out several slaves, but then recovered gracefully.
          • Chrome PFQ did not update over the weekend.  Working with dimu@ and ketaki@ to figure out why
          • Lulu Cheets failing to sign bug
          • "Timeout deadline set by master" error in PayGen for Auron bug
          • Alex's missing in the pool bug

          2016-04-18
          Sheriff: puneetster, charliemooney
          • Powerwash continues to fail bug
          • Not enough builders in the pool, killing some canaries bug
          • Generic SSH (255) errors continue bug

          2016-04-14
          Sheriff: bleung, briannorris, cywang
          • CQ
            • Pool Health Bug (almost all boards are affected, peppy-paladin:601988 wolf-palain:603450 veyron-speedy-paladin:603455,  daisy-skate-paladin:603456, ...)
              • machines in pool:{cq, bvt} are all marked as 'Repair Failed', no bvt-cq bvt-inline suites can be executed.
              • clicked 'Repair button' on a failed DUT but in vain.
          • Canaries
          • PFQ
            • issue 603169 : extensions_to_load has been moved to browser options, a hiccup during transition on Chrome PFQ, fixed by achuith
          2016-04-13
          Sheriff: bleung, briannorris, cywang
          • CQ
          • Canaries
          • PFQ
          • Other
            • issue 603248 : gizmo-paladin and gizmo-release builders were removed yesterday, but they still appear on the waterfall, failing again and again. Waterfall may need to be restarted.

          2016-04-12
          Sheriff: cychiang, bleung, adlr
          2016-04-11
          Sheriff: cychiang


          2016-04-07
          Sheriff: shchen, briannorris

          Redirects to log files are now working again.  No more hand-modifying urls :).

          • CQ
            • There is an ongoing provisioning error (598517) that's hitting the CQ with the error: FAIL: Failed to install device image using payload athttp://100.107.160.2:8082/update/peach_pit-paladin/R51-8162.0.0-rc2 on chromeos4-row7-rack13-host11. SSH reports a generic error (255) which is probably a lab network failure.  It is still under investigation.  I think that I saw it at least 5 times during my sheriffing shift.
            • veyron_speedy had failed three times, twice with a provisioning error: "Update failed. Returned update_engine error code: ERROR_CODE=49, ERROR_MESSAGE=ErrorCode::kNonCriticalUpdateInOOBE. Reported error: AutoservRunError". This is a known issue: 600737.
            • veyron_minnie-cheets failed with a timeout error.  I checked the individual tests in the suite and they seemed to all pass (nothing aborted).  I contacted the deputy and he added more DUTs to the pool for minnie to hopefully rectify this situation.
          • PFQ
            • HWTest and VMTest failures on daisy_skate and lumpy possibly caused by dev tools regression: 601533
            • veyron_minnie-cheets failed with the same timeout error described above.  Hopefully the additional DUTs will resolve this situation.
            • cyan-cheets failed 6/8 runs due to timeouts.  There were many jobs aborted so it seems that there was a significant shortage of machines for this platform.  Infromed the deputy and he increased the allocation of DUTs from 6 to 11.
          • Canaries
            • The canaries look sad.  About half are failing for various reasons below:
              • Timeouts: ivybridge (during paygen), rambi-c (during paygen), rambi (during buildpackages), strago-b (this is due to cyan-cheets, which just upped its allocation), veyron (during paygen)
              • Powerwash (host did not return from reboot): jecht, beltino-a
              • Powerwash (Powerwash count didn't increase after powerwash cycle): beltino-b
              • autoupdate_Rollback (host did not return from reboot): kunimitsu
              • build_image: pineview
              • tar: chromiumos_base_image.bin: file changed as we read it: rambi-b
              • slippy and strago failing with "TestLabException: Number of available DUTs for board falco_li pool bvt is 0, which is less than the minimum value 1."  Bugs 590398590522 were automatically filed, but seems not be have been triaged.  Pinged deputy.
          2016-04-06
          Sheriff: shchen, adlr

          Notes:
          Log files are still broken.  Workaround described in b/27653354.  Solution is take https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/ and append test name to it.  
          • CQ
            • 601224:  buildpackages error on glados, strago, cyan-cheets.  Error in iwl7000 wireless driver.  The merge has been reverted.
              • So apparently merges do not show up in the change list on the builder pages.  I had an instance where a merge occurred (without me knowing) and I could not figure out what was causing the error from the waterfall pages.  It was in the kernel code, but there was only 1 kernel CL that was unrelated.  What I ended up having to do was find the hash used for the build.  It looks like:

                <project name="chromiumos/third_party/kernel" path="src/third_party/kernel/v3.18" revision="b850f41a01164fe1eb4cf76b5178194d53394130"/>

                and matching that up with a commit in 
                https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-3.18.
            • whirlwind has failed three times in a row with jetstream test failures.  Deputy is trying to track down the error at 593404.  This problem seems to have been fixed.
            • cyan-cheets is failing due to a timeout: "ERROR: Timeout occurred- waited 13461 seconds, failing. Timeout reason: Slave reached the timeout deadline set by master."
              • Dug into log and seems like the provisioning stage never connected to the machine because it was down.  Checked on state of machine in the lab and it seems to be up and running again.  Will keep an eye on the test to make sure that it doesn't happen again today/tomorrow.
                • To find error, go to the cyan-cheets builder and click on last build, in this case #37.  Scroll down to the test (HWTest) and click "link to suite", which will take you to the Autotest results.  Here you find the Failed job and click on that test, which then you can find the logs to search through.
                • Currently, the log redirect links are failing, so you need to get to them with the instructions in the Notes above.  Take the test name (found in parenthesis next to the job name) and append to the link above.  So, you'll end up going to: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/59112285-chromeos-test.  You'll see a folder for the hostname of the machine.  The logs are in <hostname>/debug/.
                • To check status of host, click on the hostname (to the left of the currently broken log links).  This will take you to the host's page and you can check the status of it there.
          • PFQ
            • veyron_rialto is failing with "ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto"
              • This has been failing forever.  I checked back 200 builds (as far as I could) and they're all failing with the same error.
              • 597966 has already been filed for it, but it remains untriaged.  Pinged for update.

          2016-04-05
          Sheriff: aaboagye, abhishekbh

          To next sheriff(s):
          !! There's an issue where trying to get the logs from a test suite returns "Not Found". See b/27653354 for more details. !!
          I expect the lakitu incremental builders to both go green once CL:337302 lands.
          Canaries will probably fail due to timeouts (it's a known issue), but check the slave builder for any non-timeout related failure like Paygen or AUTest.
          PFQs should go green since the DUTs were repaired or replaced. Watch 598517 for updates regarding the generic SSH errors.
          amd64-generic-asan will continue to fail until this CL lands.
          • CQ
            • The link paladin failed during the provision step with an error that says "provision: FAIL: Failed to install device image using payload. It appears to be an update_engine error with an error code of "kNonCriticalUpdateInOOBE".
              • crbug.com/600737 was filed to track it.
              • It doesn't seem to be related to any one board.
          • PFQ
            • Yesterday the PFQs were green, but today there seem to be some issues present.
            • There at least two different issues here that both occur during the provision step:
              • "SSH reports a generic error (255) which is probably a network failure" -> crbug.com/598517
              • "Update failed. The priority of the inactive kernel partition is less than that of the active kernel partition." -> crbug.com/599893
          • Canaries
            • Previous runs of the canary builders were still timing out. There was also one run where they just failed to start, but the timeout issues are still prevalent.
            • Towards the end of the day,  powerwash issues surfaced for beltino-a, jecht, and rambi canaries. -> crbug.com/600892
          • Incremental
            • Since build #7794, the lakitu-incremental builder has been failing VMTest for the test logging_UserCrash.
              • Since it was an autotest failure, searching for the string "END FAIL" in the stdio, leads to the error message pretty quickly.
              • Filed crbug.com/600774.
              • This seemed to be caused by an inadvertent change due to rebasing and patch shuffling. Unfortunately wasn't caught by the CQ since the CQ doesn't run VMTests.

          2016-04-04
          Sheriff: aaboagye, abhishekbh, vapier

          This morning there were a bunch of paladin failures, with the CQ master failing 26(!) consecutive times. Because of this I throttled the tree because there's no use in trying new changes until we get the CQ actually finishing correctly.
          • Guado moblab is one of the main offenders. To get to the debug logs, I clicked on one of the failed builds, scrolled to the HWTest section, clicked on "Link to suite".
            • Once there, I clicked on the failed test, provision. (Shows up in a purple box) Then on the new page that opened, clicked on "view all logs".
            • From there, navigated to the debug directory and took a look at "autoserv.DEBUG".
              • Searched until I found the string "Autotest caught exception when running test". Just above that line, it shows the command that was attempted. In this case it was "/tmp/stateful_update: Permission denied".
            • Filed crbug.com/600403.
              • The infrastructure teams notes that it's helpful to include the hostname in the bug as well. The hostname of the DUT and the buildslave.
          • daisy-full failed SimpleChromeWorkflow. The buildslave also appears to be offline since last friday.
            • To find the logs, I clicked on the failed build, and clicked the "stdio" link under SimpleChromeWorkflow. Scrolled down to find the traceback && STEP_FAILURE.
            • Looks like a couple different errors: a read-only filesystem, and an ImportError for no module named apport.fileutils.
            • Filed crbug.com/600413.
          amd64-generic-asan has been failing for a _very_ long time. I found crbug.com/589885 where some progress is being made. The primary CL is still pending review.

          For triaging canary failures, I first take look at each release group. Most of the failures seem to be due to the suite timing out. However there are a few other issues. You can check these by viewing the "stdio" link under step that's yellow or red.
          In the afternoon, the internal waterfall seemed dead. Infra deputy filed crbug.com/600526 to a trooper.


          2016-04-01
          Sheriff: moch, zachr
          • 599674: glados-release-group chell and cave fail buildpackages
          • 599982: daisy-paladin did not start

          2016-03-31
          Sheriff: kitching, moch, zachr
          • 597866: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto
            • Apparently nothing to worry about since important=False
          • Almost all CQ builders are timing out, but seems like current builds are succeeding
          • 596630: "Failed to install device image using payload" during provision errors (x86-zgb-paladin)
          • 579119: unittest timeout (peach-pit-paladin)

          2016-03-30
          Sheriff: kitching
          • 589885: failure in desktopui_ScreenLocker still showing up in amd64-generic
          • 598967: LeakSanitizer: detected memory leaks in update_engine-0.0.3-r1895 UnitTest (deymo@ investigating)
          • [from akeshet@, fixed] 598960: storm and whirlwind paladins failing consistently in vox unittest
          • 59898051703: rialto-services use of ReadFileToString needs updating
          • 598517: "SSH reports a generic error" failures during provision
          • strago board issues:
            • 583014: Strago boards don't have bvt results for the last week
            • 51482: Braswell systems are repeatedly failing to install in the autotest lab with eMMC failures
          2016-03-25
          Chromeos Gardener: jennyz
          Sheriff: bsimonnet, gwendal, wuchengli
          • amd64-generic
            • 589885:  failure in desktopui_ScreenLocker
            • 556785:  Reduce parallelism during unittests - unit test fail.
          • autoupdate failure:
            • 557106:  File system corruption on samus DUTs.
          • 598224:  Several CQ/paladin builders offline.
          • 596150:  pineview-release-group fails InitSDK
          • 593565:  Paygen failure (FAIL: Unhandled TypeError: expected string or buffer)
          2016-03-23
          Chromeos Gardener: jennyz Sheriff: dgreid, josephsih
          • 597213: [bvt-cq] platform_Perf Failure on tricky-chrome-pfq/R51-8100.0.0-rc3: Could not find build id for a DSO. 
          • 597183: provision_AutoUpdate.double_SERVER_JOB Failure on tricky-chrome-pfq/R51-8100.0.0-rc2: assigned to dfang.
          • 597111:  SitePerProcessBrowserTest.PagePopupMenuTest flaky on Linux_Chromeos_Test bot: kenrb fixed it today.
          2016-03-22
          Chromeos Gardener: jennyz Sheriff: tbroch, scollyer
            • 594336: network_DefaultProfileCreation Failure on tricky-tot-chrome-pfq-informational/R51-8053.0.0-b61: assigned to zqiu@.
            • 597111:  SitePerProcessBrowserTest.PagePopupMenuTest flaky on Linux_Chromeos_Test bot.
            • 536061 : (non-closer, builds fixed on retry) debugd:missing dependency. fixed by olofj
            2016-03-21
            Sheriff: tbroch, scollyer
            • 595274 : (tree-closer) webRTC HW Decode/Encode crashes tab
            • 595988 : (flaky/non-closer) network_DefaultProfileCreation Failure on tricky-tot-chrome-pfq-informational
            • 596150 : (non-closer) pineview-release-group fails InitSDK
            2016-03-14
            Sheriff: djkurtz, marcheu, shawnn
            • 51123: oak-release-group: elm: SignerTest fails: security_test_image failed == "CHROMEOS_RELEASE_BOARD: Value 'elm' was not recognized"
            • 594556: x86-generic-paladin: VMTest: desktopui_ScreenLocker fails => Screen is locked
            • 594565: mttools: BuildPackages fails on first attempt
            • 594571 veyron_rialto-paladin: BuildPackages fails: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto
            • 594622 veyron_minnie-cheets paladin consistently failing
            • 594592 lakitu-incremental builder failing gcetest
            • 594699 samus vmtest failures

            2016-03-11
            Deputy: shuqianz, Sheriff: marcheu, shawnn
            • 594176: daisy_skate-chrome-pfq provision failing
            • 593926: Lars devices in lab going down
            • 592766: chromeos-bootimage build failures
            • 594233: paladin builders offline

            2016-03-04 wiley, drinkcat (honorary), aaboagye
            • PreCQ
              • 592143: PreCQ: Failing InitSDK (Fixed due to chumping some python changes.)
            • PFQ failures
              • 591401: BuildImage step failing on PFQ "No space left on device" (Fixed with a revert.)
              • 554222: AutoservSSHTimeout PFQ failures
              • 582477: video_ChromeHWDecodeUsed is flake on CQ
            • Canary
              • 591965: guado_moblab-paladin: HWTest fails "bash: /tmp/stateful_update: Permission denied"
                • A following run also failed, but what looks like to be for a different reason.
              • 591957: smaug-paladin: BuildPackages failure "sys-fs/udev[gudev(-)] ("sys-fs/udev[gudev(-)]" is blocking dev-libs/libgudev-230)"
            • CQ
              • 592148: chromeos-test-testauthkeys-mobbase failed to build due to collisions.
              • 592182: guado_moblab moblab_RunSuite failure in CQ run.

            2016-03-03
            Sheriff: cywang, aaboagye, wiley
            • Chrome PFQ failures
              • 590762: Broken CrOS build of telemetry autotests - still happening
              • 591731: chromeos-chrome: build failure 'ppapi_example_video_decode': No such file
                • See 591782 and 59140 for the background for this bug.
                • Basically, trying to add earlier failures for file operations in the chromeos-chrome.ebuild.
                • The 1st change was submitted, but led to the ppapi_example_video_decode error. Change was then reverted.
                  • At a later time the cleanup will land.
                • This may cause the telemetry failures to pop up again.
            • CQ
              • 591639graphics_GLBench(graphics_utils) failed in HWTest - fix submitted
              • 591837: prebuilts failing to upload on certain paladins. GS flake? (lakitu, guado)
            • Canary
              • 591656security_AccountBaseline failed on lulu - fix submitted
              • 591658security_StatefulPermissions Failure on lulu - fix submitted
              • 583014: strago release groups red since December 2015 (~2% pass rate)
            • Misc
              • 591853public waterfall is missing the status boxes
            2016-03-02
            Sheriff: bfreed, charliemooney, cywang
            • PFQ failures
              • 591308: ChromeSDK failed in Chromium PFQ
              • 590762: Broken CrOS build of telemetry autotests - force another chromium PFQ build
              • 591401: Builders failing in BuildImage step because they run out of storage
              • 376372: about 8 canaries hit a HWTest "Suite timed out" error.
              • 590372: A few builders died trying to sync the source (error: Exited sync due to gc errors)
            2016-03-01
            Sheriff: bfreed, charliemooney
            • 591097: shill and dhcpd flake causing HWTest infrastructure failures and 10 straight CQMaster failures.
            • 591231: samus canary timeout in paygen stage while trying to copy a gsutil file.
            • 589135: rambi-c-group canary failed in Archive: "tar: chromiumos_test_image.bin: file changed as we read it"
            • 591256: peach group canary failed in Paygen with LockNotAcquired error
            • 583364: Veyron Paygen downloading failures
            2016-02-26
            Sheriff: drinkcat
            • 590113: x86-generic incremental VMTest security_ASLR fails (once in VMTest, a bit strange)
            • Closed the tree for 1 minute, false alarm: CQ-master page gave me the impression that the built failed because of rialto
            • PFQ failures
              • 590133: amd64-generic chromium PFQ: fatal error: ui/accessibility/ax_enums.h: No such file or directory
              • 590114: [bvt-cq] provision Failure on daisy_skate-chrome-pfq/R50-7966.0.0-rc2 (autofiled)
            • 584542: toybox build is flaky, but never caused an actual build failure. Local fix on gerrit, started upstream discussion about fix
            2016-02-25
            Sheriff: jrbarnette, quiche
            • 590065: toybox build is flaky
            • 589879: Build failures on "Lumpy (Chrome)" and "Alex (Chrome)"
            • 589905: Lumpy timing out in afdodatagenerate
            • 589885: desktopui_ScreenLocker failure on chromiumos.chromium
            • 589844: CQ failure due to HWTest failure on veyron_minnie-cheets-paladin
            2016-02-25
            Sheriff: drinkcat
            • 589690CQ fails at CommitQueueSync, other builders in Sync (Cannot fetch chromiumos/third_party/arm-trusted-firmware)
              • Chumped manifest change to pin 
              • 589713: third_party/arm-trusted-firmware: Figure out which branch to track (Follow up on underlying issue)
            • p/50460: oak-full build failure
            • 589777: lakitu: security_AccountsBaselineLakitu Baseline mismatch
            • 2 Sync issues:
            2016-02-24
            Sheriff: jrbarnette, quiche
            • 588834audio_CrasSanity fails: "CRAS stream count is not matching with number of streams"
              • This can cause failures in the CQ.  All boards seem to be affected.
              • Reverted three CLs; it's not yet known whether that will stop the problems.
            • 589641 graphics_Sanity failing on veyron boards 
              • This has caused some failures in the CQ.  So far, only veyron shows the problem.
            • 589623 Pre-CQ cannot uprev and rejects new CLs
              • A bad CL was chumped in without review.
              • Chumped in a fix to go with it.
            2016-02-22/23
            Sheriff: ejcaruso, waihong
            • 588739: Timed out going through login screen. Cryptohome not mounted.
            • 588834audio_CrasSanity fails: "CRAS stream count is not matching with number of streams"
            • 588921: Some builder suffer a virtual drive failure.

            2016-02-17/18
            Sheriff: wnhuang
            • 587411: Multiple CQ build failure due to infrastructure issue
            2016-02-16
            Gardeners: jennyz
            Sheriff: wfrichar, davidriley, kcwu
            • 558983daisy-skate PFQ occasionally failed for this issue. The pending cl for fix this is not landed yet. guidou@ is working on it.
            • 585973: daisy-skate PFQ occasionally failed for this issue.
            2016-02-10/11
            Gardeners: stevenjb
            Sheriff: dtor, avakulenko
            • 586180: Pre-CQ and CQ masters failed due to git outage during source sync
            • 586179: Canaries fail due to provision timeout (SuitePrep: ABORT due to timeout)
            2016-02-09
            Gardeners: stevenjb / afakhry
            Sheriff: scollyer, furquan

            2016-02-05
            Gardener: stevenjb
            • 584722 chromeos-chrome build failure: "No package 'gtk+-2.0' found" while running pkg-config with media.gyp

            2016-02-04
            Sheriff: dhendrix
            • 584542: sys-apps/toybox failing to compile on amd64-generic
            • 473899: paygen "Not all images for this build are uploaded", smaug has been seeing this for months.
            • 569358: pool: bvt, board: x86-mario in a critical state. (assigned now)
            • 584447: pool: bvt, board: veyron_mickey in a critical state. (assigned)
            • 571757: [sanity] provision Failure on expresso-release/R49-7760.0.0. Note: This manifested itself as a swarming failing when I updated the bug (#68).
            2016-02-03
            Sheriff: johnylin,grundler, dbasehore
            • 561036: FIXED: paygen timing out: dshi appears to have fixed this
            • 574915: VMTest failures in desktopui_ScreenLocker - jdufault investigating
            • 578771: GPT Header Issue
            • 579119: Unittest timeout
            • 581639: IGNORE: lakitu_mobbuild fails cloud_SpinyConfig: turning down this build (sosa)
            • 582144: FIXED: security_ASLR: reverting changed fixed problem (https://chromium-review.googlesource.com/324950)
            • 582325: veyron-b: rialto-services emerge fail
            • 582521: FIXED? error in gsutil: samus canary builds succeeded on Feb 02 19:15. Also seen on daisy.
            • 583081: FIXED: autotest-chrome build failures (https://chrome-internal-review.googlesource.com/#/c/247126/)
            • 583535: FIXED: login_* test failures: reverted https://codereview.chromium.org/1646223002 (alchuith, dup:583382)
            • 583684: FIXED: CommitQueueSync repo sync: manifest referred to a tag instead of branch
            2016-02-02
            Sheriff: grundler,dbasehore
            • 561036paygen timing out on release builders
            • 574915: VMTest failures in desktopui_ScreenLocker (later forked into three bugs)
            • 581639 - lakitu_mobbuild fails cloud_SpinyConfig (known issue)
            • 582521 - samus canary failed because of error in gsutil
            • 583375: provision thrashing causing canary/beta build timeouts (kevcheng)
            • 583382: login_* tests failing (may be dup of 574915 or others)

            2016-02-01
            Sheriff: bleung, puthik
            • 582531 - flaky HWTest for Pineview/ strago-b / sandybridge
            • 583375 - canary and beta builds can cause provision thrashing which can cause hwtests to time out

            2016-01-29
            Sheriff: bleung, puthik
            • 582521 - samus canary failed because of error in gsutil
            • 581639 - lakitu_mobbuild fails cloud_SpinyConfig
            • 576879 - pool: bvt, board : candy in a critical state.
            • 582325 - veyron-b: rialto-services emerge fail

            2016-01-28
            Sheriff: bhthompson, shchenhychao
            • 582144security_ASLR test failing on glados, strago, strago-b with Unhandled TypeError

            2016-01-27
            Sheriff: bhthompson, shchenjchuang
            • 581598: archive stage failure at BuildAndArchiveFactoryImages 
            • 581624: gd-2.0.35 build failed on guado_moblab
            • 581630: docker build failed on lakitu_next
            • 543649: smaug paygen failing with "Not all images for this build are uploaded, don't process it yet" (does not cause canary failure, low priority)
            • 581631: cheets_SettingsBridge: Timed out waiting for condition: Android font size set to smallest
            • 581639: GCETest fail at 01-cloud_SpinyConfig on lakitu_mobbuild

            2016-01-26
            Sheriff: robotboy, semenzato, jchuang
            • 580184PFQ failed to build related to chromeos/ime/input_methods.h missing
            • 561036paygen timing out on release builders
            • 581382: perf_dashboard_shadow_config.json syntax error led to parse job failure (causing several timeout)

            2016-01-25
            Sheriff: littlecvr
            • 486098Builder failure HWTest Code 3 - not enough detail to debug
            • 561036paygen timing out on release builders
            • 547055Jecht Group Failed Archive Step

            2016-01-22
            Sheriff: littlecvr
            • 547055Jecht Group Failed Archive Step
            • 578771Paygen error: GPT_ERROR_INVALID_HEADERS
            • 558266[au] autoupdate_Rollback Failure on ultima-release/R49-7655.0.0
            • 580184Master: PFQ failed to build related to chromeos/ime/input_methods.h missing
            • 580261Update/provisioning timeouts during tests due to slow network
            • 579811lakitu-release build continuously failed at GCETest

            2016-01-21
            Sherif: deymo, zqiu, hungte
            Chromeos Gardener: jennyz
            • 580184: Master: PFQ failed to build, related to missing chromeos/ime/input_method.h

            2016-01-20
            Sheriff: stevefung, dlaurie, hungte
            Chromeos Gardener: jennyz
            • 579565: M49: PFQ Failing chromite unit testing on lumpy.

            2016-01-14
            Sheriff: stevefung, dlaurie
            • 322443: M49 PFQ failing unit tests
            2016-01-14
            Sheriff: vapier, zeuthen
            • 577549: lakitu_mobbuild_paladin fails at mariadb
            • 577542: build_packages fails at chromeos-mrc on strago canary and paladin build
            • 577836: lakitu_mobbuild_paladin fails at serf
            2016-01-13
            Sheriff: cychiang
            • 576905: pool: bvt, board: veyron_mighty in a critical state.
            • 576992: util-linux-2.25.1-r1 build failure on cyan canary build
            • 577025: TestFailure(paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to perform stateful update on chromeos2-row2-rack10-host9)
            • 571747: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row2-rack3-host1)
            • 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
            • 571884: [bvt-inline] security_ASLR Failure: No such file or directory: '/proc/32189 32187/maps'. (on PFQ)
            • 577549: lakitu_mobbuild_paladin fails at mariadb
            • 577542: build_packages fails at chromeos-mrc on strago canary and paladin build
            2016-01-12
            Sheriff: cychiang
            • 576525: chromeos-bootimage build failure on nyan_blaze: Unknown blob type 'boot' required in flash map
            • 576526: cheets_PerfBootServer failure at wait_for_adb_ready
            • 529612: lakitu_mobbuild: cloud_CloudInit fails in VMTest
            • 576549: lakitu_mobbuild canary build fails at GCE test because of quota exceeded
            • 576545: rambi-a-release group clapper build_packages fails at net-misc/strongswan
            • 571749: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row5-rack8-host11)
            • 571747: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row2-rack3-host1)
            • 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
            • 576608: security_AccountsBaselineLakitu fails with Baseline mismatch
            2016-01-06
            Sheriff: moch, zachr
            • 572745[bvt-cq] graphics_GpuReset Failure on falco-chrome-pfq
            • 574870: [sanity] dummy_PassServer.sanity_SERVER_JOB Failure on veyron-b-group canary
            • 574915: VMTest failures in desktopui_ScreenLocker, securityASLR, login_LoginSuccess
            • 574303provision Failure on cyan-release

              2016-01-05
              Sheriff: moch, zachr
              • 574501: amd64-generic ASAN vmtests failing (desktopui_ScreenLocker, buffet_InvalidCredentials, buffet_IntermittentConnectivity)

              2016-01-04
              • 574197 Peach group Canary failing since 12/29
              Gardener: stevenjb@/jdufault@
              • 574104 : LKGM builder needs to be updated to git
              • 573961 : Peach pit failures
                • Forcing a rebuild, looks like it might be infra flake: 'Failed to install device image using payload at...'
              • 574198 : PFQ flake, security_SandboxStatus
              OLDER ENTRIES MOVED TO THE ARCHIVE so this page doesn't take forever to load.  See Sheriff Log: Chromium OS (ARCHIVE!)
              Comments