Sheriff Log: Chromium OS (go/croslog)


Please update go/cros-sheriff-playbook when you find a build/infra failure and can map it to what action the sheriff should take for it.

9/18-9/24
Sheriffs: puneetster, amstan, 

9/11-9/17
Sheriffs: yueherngl, seobrien, josephsih
    • 762579p2p_ShareFiles fails - "Expected exported file ...". This fixed the flaky test failure of nyan_kitty-paladin.
    • 765145: CommitQueueCompletion: The master destructed itself and stopped waiting for the following slaves
      • A series of 7 such continuous CommitQueueCompletion errors.

    9/2-9/10
    Sheriffs: jintao, tbroch, yamaguchi
    • 763290: MasterSlaveLKGMSync failed on 3 devices in nyc-android-pfq
    • 762883: veyron_mickey-release:1470 failed
    • 762865: crostestutils.au_test_harness.au_test.AUTest failing at SimpleTestVerify
    • 762826: HWTest [sanity] failing on Canary for many boards by "No JSON object could be decoded"
    • 762812: master-paladin failing by "losetup: could not find any free loop device"
    • 762525: AUTest flaky by autoupdate_EndToEndTest_npo_delta_9915.0.0 timeout/abort
    • 762400: autoupdate_EndToEndTest.paygen_au_dev_full flaky on multiple canary builds
    • 762393: banon-release:1466 failed: timeout reached running bvt-arc on 3 DUTS
    • b/65478658: gsutil flaking during upload with 412 Precondition Failed
    8/28-9/1
    Sheriffs: mka, walker, kitching
    • 758247: Issue with EXPERIMENTAL keyword in tree status
    • 757510: veyron_tiger-android-pfq:608 failed: Multiple Android versions: set([u'4289255', '4287917']) - ignore, M builders will be removed eventually (b/64821099)
    • 755060: Eve trackpad FW regression possibly still causing issues with bvt DUT(s) in the lab (the DUT in question is working again)
    • 759450: HWTest timing out in relm-release builds
    • 759976: repeated ADB timeouts on eve-release
    • 759977: relm-release failing due to lack of BVT DUTs in lab (required: 4, found: 3)
    • 760011: enguarde-release failing due to lack of BVT DUTs in lab (required: 4, found: 3)
    • 760016: poppy-release builds failing due to swarming timeout
    • 760254provision_FirmwareUpdate is devouring shard inodes
    • 760314Chell PFQ build failed (gsutil failed)
    • 757943Devservers are having trouble (intermittently?) resolve DUT names
    • 760739: Runaway processes on chromeos-skunk*
    • 760789: whirlwind: missing include file "ap-daemons/vorlon/client/dbus-proxies.h"
    • 760843: /b/c/cbuild/repository/chroot/usr/bin/xz: /lib/x86_64-linux-gnu/liblzma.so.5: version `XZ_5.2' not found
    • 761169: OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0")
    • 761271: PaygenBuild* steps failing due to missing bspatch binary
    • 761422: Archive step failed: timeout while copying artifacts
    • 761513: bluestreak-pre-cq sanity build failure
    • 761471: Flaky cheets_StartAndroid.stress in veyron_minnie
    8/21-8/25
    Sheriffs: briannorris, apronin
    • 757351: HWTest failed on most of build
    • 757510: veyron_tiger-android-pfq:608 failed: Multiple Android versions: set([u'4289255', '4287917'])
      • "Just ignore it"
    • 757599, 757658: moblab_RunSuite fails on guado_moblab-paladin
      • Bad CL landed in moblab tests last week. Reverted.
    • 757824: chromiumos-sdk failing to set up chroot
      • Reverted CL. Investigation continued on 757147
      • Investigation / re-landing to be continued on original bug: 756240
    • 757866 (-> 755914): testReapAllInOrder: AssertionError: False is not true
      • Relatively new, flaky test; failed at least twice on canaries today
    • 756957: amd64-generic-asan failing in build_packages
      • Autotest eclasses weren't packaging the tarball right; CL:634627 is in flight for fixing this
    • x86-generic - builders to be deleted from waterfall (729645):
      • The builders are to retire soon - ignore the errors.
      • 757929: x86-generic-full: build_packages: media-libs/mesa: econf failed: LLVM target 'amdgpu' not enabled
      • 757934x86-generic-incremental: build_packages fails: Cannot find prebuilts for chrome
    • amd64-generic-tot-asan-informational failures:
      • 748216bluez ASAN failures heap-buffer-overflow and global-buffer-overflow
      • 757921: cryptohome ASAN failure: detected memory leaks in libchaps (cryptohome_testrunner)
    • 757958: provision_Autoupdate.double failures on reef-paladin
    • 758036: nyan_kitty-paladin:2691 failed
    • 757943 (-> 712682): Devservers are having trouble (intermittently?) resolve DUT names
    • 758251: coral-paladin failing build-packages
      • bad chump. Reverted.
    • b/64842314: coral release builder failing firmware signing
    • b/65013136: guado_moblab-release fails paygen
    • 758665: login_MultipleSessions failed on link-paladin
    • 759039: cheets_PlayStoreTest fails with Unhandled TimeoutException: Timed out while waiting 20s for IsJavaScriptExpressionTrue.
      • Revert bad CL
    • 759093: CQ submitted a change via strategy:cq-submit-partial-pool-cq-history that broken HWTest on multiple platforms
      • CQ really shouldn't have accepted the bad CL from bug 759039

    8/14-8/18
    Sheriffs: tfiga
    • 755051: CrOS Git commands fail because Git bundle is missing PERL regexp support. (e.g. PublishUprevChanges on master-paladin)
    • 755060: Eve bvt DUTs in the lab timing out
    • 755080: daisy_skate-release builders frequently hang in test server job (dup into 736393)
    • 755461cheets_StartAndroid.stress (and other) tests failing on Intel boards due to graphics driver timeout when restarting chrome
    • 755470: lakitu-release/lakitu-gpu-release failing cloud_KonletStartup test
    • CL:608907: file collision due to missing blocker breaks CQ
    • CL:614310: ec-utils, coreboot breakage due to eclass change
    • 755699: peppy-paladin: desktopui_ExitOnSupervisedUserCrash: Timed out waiting for condition: Session stopped.
    • 755843: gandof-release: Tests failing likely due to DUT malfunction
    • 755882: reef-uni-release: Not enough DUTs for board: reef, pool: bvt-uni; required: 4, found: 3
    • 755906: canary: Big series of build timeouts
    • 755914: poppy-release: chromite UnitTests flaky?
    • 755917: poppy-release: chromeos-ec UnitTests failure (dup of 715011)

    7-31-8/4
    Sheriffs: dbasehore, jkwang, hidehiko
    • b/62087733: daisy_spring devices in the lab are having issues
    • 751285: many paygentest[canary, dev] failures on specific boards like quawks
    • 751315: arc-networkd causing security_SandboxedServices failures on newbie
    • 751895: Issue causing moblab provisioning to fail
    • 752176: X11 libva file collisions breaking the build
    • CL600543: Blew up the builder
    • 752656: Timeouts on AUTest for some canaries
    • 751762: Temp short cut for CtsAccountManagerTestCases
    • 752269: jetstream_BluetoothBeaconing failure
    • 752562: Temp fix for https://bugs.chromium.org/p/chromium/issues/detail?id=752562

    7/17-7/23
    Sheriffs: wuchengli
    • 744212: autoupdate_EndToEndTest.paygen_au_dev_delta: Unhandled DevServerException: CrOS auto-update failed
    • 744569: Flaky PreCQLauncher GOBError
    • 743292: Cannot build gralloctest for eve-paladin
    • 746230: cheets_CTS_N.arm.CtsAppUsageHostTestCases failures
    • 746327: ASyncHWTest is forgiven
    • 746336: TestLabException: Not enough DUTs for board: zako
    • 746347: Sometimes autoupdate_Rollback needs retry to pass
    • 653048: cheets_KeyboardTest fails with "Timed out waiting for condition: Expected text entered"
    • 746548: Android boot test fails for strago boards
    • 746808: Paygen failure: CatFail AccessDeniedException 403
    • 746814: BuildPackages failed in chromeos-base/cryptohome.
    • 747211: Canary passed but no payload was produced
    • 747254: Chrome PFQ failing -- Wrong file version in settings_resources.pak
    • 747278: reks: SSHConnectionError: Connection to 100.115.202.94 timed out

    6/26-6/30
    Sheriffs: cernekee, sjg
    • crbug.com/736807 turned out to be due to the toolchain binutils uprev from last week (e.g. breaking auron_paine). This was reverted and many builds became green.
    • Filed various bugs for broken builders. Seems like we should look into update_manager (e.g. crbug.com/578270)
    6/19 ~6/23
    Sheriffs: kirtika, pmalani

    State of the test lab (6/19)
    • Chrome pinned due to a bad CL (not yet identified) that brought down kitty & blaze last week. Partly happened because we don't have any coverage for nyan on the PFQ. (crbug.com/734074)
    • hwtest lab will be shutdown tonight and tomorrow night. 
    • kirtika@ to do a kernel merge (6/20 afternoon) - Intel wifi driver drop.  
    • Infra team: 4+ bugs to be filed for servo and repair (1 specific to servo on kitty, 3 others about how the servo repair process failed in the case of the kitty errors). 

      6/5 ~6/11
      Sheriffs: bleung, mcchou, mojahsu
      • 729766: autoupdate_EndToEndTest.paygen_au_canary_full, on a bunch of boards. Suspect AU from R53 8530.96.0
      • 730272: hostinfo attributes refer to incorrect job_repo_url, causing tests to fail
      • 731253: Skylake and Elm CQ failed due to drm patch from Intel. Error message is misleading, suggesting a failure in the infrastructure rather than a HWTest system that legitimately did not boot with the new image. crbug.com/731274 was filed to improve the messaging.

      5/26~5/30
      Sheriffs: benzh, vprupis, seanpaul
      • 727685: all bots failing in SyncChrome stage
      • 729016: many bots failing in TestSimpleChrome with clang++ bad indent
      5/22~5/26
      Sheriffs: aaboagye, grundler, hashimoto
      • 720192: beaglebone-release is offline since Apr 20
      • 723645: CQ failures [elm, cave?]: HWTest failed due RPC layer timeouts
        • Most recently seen on guado_moblab-paladin
      • 725152ec-utils broken for bob
      • 708679: Some shards stop taking RPC calls.
        • This has manifested in boards appearing as "repair failed" (e.g. - buddy bvt DUTs)
      • 718083: x86-generic-asan builder has been failing. May need to be removed since x86 machines are EOL'd.
      • 677293: Re-occurrence of a failure when running `git clone`.
      • 725586: banjo, banon, cave, dasy_spring, parrot_ivb, and zako, failed HWTest this morning due to not enough DUTs in the BVT pool.
      • 725856: authpolicy build failure when USE=-cros-debug (broke all release builders)
      • 726134: Many canaries failing with "SSL connection error: ASN: bad other signature confirmation"
      • 715011: nvmem ec test crashes resurfaces
      • 726383: bob-release failed cheets_StartAndroid
      • b/62087733: Broken daisy_spring DUTs.
      • 714330: guado_moblab-paladin fails a lot with "host did not return from reboot".
        • It has been moved to experimental, but still needs to be root caused.
      • 726757: Many canaries failing in chromeos-base/authpolicy unit test.
      • 726835: gale-relase: BuildPackages failure in sys-boot/depthcharge
      5/15~5/21
      Sheriffs: davidriley
      • 722599: provision failures due to bad prod push
      • 722961: cave (and celes/lars) HWTest stages slow to run, each test running slow
      • 723645: AFE RPC timeouts causing HWTest to fail
      • 723026: Chrome puppet roll out caused git failure
      • 723964: LXC artifact download issues
      5/8~5/14
      Sheriffs: rbhagavatula, jwerner, hirono(non-PST):
      • 719342: guado-release has been broken for a month
      • 721855: Broken DUT chromeos2-row7-rack10-host11 caused elm-release failures
      • 719786: Seems that RootfsUpdateError and paygen issues on Braswell boards were caused by a kernel crash triggered by cbmem
      • 720087: AFE outage caused all paladin HWTests to time out
      • 720005: Swarming outage killed all HWTest with "Waiting for results from the following shards: 0 N/A: 3606683bbab9bf10 None"
      • 689105: Lots of autoupdate_EndToEndTest.paygen_au_* failures.
      • 717746: cheets_StartAndroid.stress Failure on bob-release/R60-9516.0.0. Tracking in the ARC++ issue entry.
      • 719342: security_AccountsBaseline consistently failed on guado since Apr 6

      5/1~5/7
      Sheriffs: dnojiri, sduvvuri, hychao(non-PST):
      • 717061: some shards are missing lxc, failing ssp container setup
      • 716913: Upgrade openvpn package to v2.4.1
      • 718355: Packages failed in ./build_packages: chromeos-base/autotest-tests-cheets

      4/22-4/28
      Sheriffs: chadversary, oliverwen, mnissler(non-PST):
      • 710492: TPM2 does not work inside VMTest: eve-pre-cq VMTests are failing, apparently unrelated to CLs being tested. CLs are blocked.
        • As potential fix, chumped CL:477090 eve: don't run VMTests in Pre-CQ
      • 715855: buildbot timeout in BuildPackages sentry-release on autotest-tests-cheets-0.0.1-r375.ebuild
      • 714571: HWtest swarming.py invocation times out
      • 714598Image signing step fails on release builders
      • 714601: logging_UserCrash fails on x86-generic-incremental
      • 714608: vmtest timeout on ASAN bots
      • 714451: sumo/ninja failing to sign due to maxcpus=2 [FIXED]
      • 715011: nvmem ec test crashes
      • 715066: HWtest failure return code -9 / code 247, but all tests pass
      • 715108: Build step failures (code 1) with missing AFE output
      • 715012: paladin failed HWTest stage due to post-suite JSON decode on chromite side | ValueError: No JSON object could be decoded
      • 716399PFQ builders fail TestSimpleChromeWorkflow with "Could not run pkg-config."
      • 716412: UnitTest failures on amd64-generic-asan

      4/17-4/21
      Sheriffs: tbroch, zhihongyu, owenlin(non-PST):

      • 697274: daisy-skate-chrome-pfq: running hwtest but cq doesn't. CL to remove hwtest from pfq
      • CQ:14360: arc-camera breakage -> reverts 14364
      • 713531: security_SandboxStatus fails.  Remove from bvt-inline for now
      • 713004: Tests passed but got aborted by AutotestAbort
        • 713856: network load likely suspect. 10x increase in file size since 4/12
        • 4/20: unpin chrome not clear it can be blamed
        • 4/19: pin chrome to 59.0.3064.0_rc-r1 as workaround
      • 713226: boost, python-gflags, other pkgs changed on mirror mismatch manifest.
      • 712679: canary builders failing : long build time for chromeos-chrome
      • 712297: chrome PFQ failures, goma enablement side-effect for TestSimpleChromeWorkflow
      • 712102: veyron_minnie-chrome-pfq bots look full-disk.
      • 712109: cyan-chrome-pfq is failing due to libstdc++ version mismatch.
      • 685889: (dup) veyron_mighty_paradin, winky_paradin failed due to (IntegrityError: "Duplicate entry for key 'buildbucket_id_index'")
      • 712505: Fizz Paladin: Failed steps failed cbuildbot: failed androidmetadata
      • 689105: /usr/bin/python: bad interpreter: No such file or directory in autoupdate_EndToEndTest
      • 697967: ASAN failures : no space left during build_images

        4/03-4/07
        Sheriffs: shchen, philipchen, itspeter(non-PST):
        • 689105: /usr/bin/python: bad interpreter: No such file or directory in autoupdate_EndToEndTest
        • 708679: Some shards stop taking RPC calls
        • 708715: Frequent pre-cq failures on caroline.
        • 708429:4/5 : suspect pre-cq-launcher has permission issue, Closing the Tree
        • Suspect CL:465488 breaks pre-cq-launcher #8925, revert and restart pre-cq.
        • 707696: itspeter@ believe Builder master-paladin Build #14172 is marked incorrectly. It should be green based on issue.
        • 707629master-paladin failed continuously, suspect a slave is missing python package but not able to investigate further. Looks flaky as it passed on guado_moblab-paladin #5520
        3/20~3/24
        Sheriffs: pberny, norvez, wnhuang (non-PST)
        • 689105 /usr/bin/python: bad interpreter: No such file or directory in autoupdate_EndToEndTest
        • 703914 platform_MemCheck is flaky => flaky test
        • 699353 desktopui_ScreenLocker FAIL: Unhandled DevToolsClientUrlError => flaky test (Chrome crash)
        • 703789 graphics_Gbm: DUT Rebooted unexpectedly nyan_kitty. => flaky test
        • 690307 swap shard workload => Fixed
        • 704669  (resolved) Reef derivative canaries have broken linux firmware
        • 704381 "Report" build step doesn't time out refreshing access tokens for gsutil
        • 704194 (Two) Asuka devices not coming up after reboot during AUTest
        • 705247 android image signing failing due to out of space

        3/13-3/17
        Sheriffs: smbarber, hoegsberg, shunhsingou (non-PST)
        • 701400 Repair flow no longer working for guado_moblab
        • 701693 SSH connection fails for veyron_speedy-paladin/veyron_mighty-paladin
        • 689105 /usr/bin/python: bad interpreter: No such file or directory in autoupdate_EndToEndTest

        3/6-3/10
        Sheriffs: leecy, scollyer (PST), littlecvr (non-PST)
        • 699353 desktopui_ScreenLocker FAIL: Unhandled DevToolsClientUrlError
        • 698825 caroline gets canceled because the build takes too long to finish
        • 700021 gsutil issue in container setup causes "missing lockfile" failure
        • 695287 Slowness and 502 errors from cautotest AFE because of cautotest mysql slowness

        2/27-3/3
        Sheriffs: moch, marcheu (PST)
        • 696606 devserver load may contribute to some provision failures
        • 696696 desktopui_MashLogin | FAIL: Autotest client terminated unexpectedly: DUT rebooted during the test run.
        • 698096 some canaries are running out of time
        2/20-2/24
        • 694081 ARC availability check
        • 693610 tko_parser error
        • 694642 missing autoserv logs
        • 690822 CTS scheduling
        • 694755 chromeos.branch dying
        • 695172 cyan-chrome-pfq stuck
        • 695733 chrome re-pin
        • 695641 pre-cq-launcher failures due to oauth token invalidation
        • 695529 excessive provisioning errors
        • 696039 several jetstream flakes
        • 695940 kevin FW re-update
        • 639301 cyan stuck on shutdown
        2/13-2/17
        Sheriffs: ejcaruso, mqg (PST), adurbin (non-PST)
        Infra: shuqianz
        • Generally swarming issues and network problems have been a huge problem this week.
        • reef, snappy, and pyro release builders were all marked important on 2/14
        • 693734 guado_moblab: AndroidMetadata failure; no ebuilds to satisfy "x11-base/xorg-server"
        • 693691 falco-release: suite timeouts (maybe network related? logs are bad, this is also happening on other boards)
        • 693597 nyan_kitty: CQ test failure
        • 693331 nyan_kitty: all CQ DUTs failed to provision
        • 693318 peppy: generic_RebootTest failure
        • 693313 breakpad compile failure from -Werror,-Winconsistent-missing-override
        • 693310 guado_moblab: broken CL made it past the CQ somehow
        • 693101 lab DHCP server configuration update took out the whole lab
        • 692342 kevin: provision failure loops (possible eMMC failures?)
        • 692236 falco_li: not enough DUTs to test canary
        • 692232 peppy: failed to provision
        • 692214 caroline: canary paygen issues
        • 692206 clapper: VMTest broken
        • 692129 snappy: no good repair build (unstable ToT)
        • 691729 kevin: unable to reach devserver
        • 690616 caroline: failed to perform stateful update (continued from last week)
        • 690286 reef: cs50-updater causing reboots and rollbacks (continued from last week)
        • 690232 candy: dbus issues causing canary failures (continued from last week)
        Maintenance:
        • 692240 setzer was moved between servers, resulting in some planned throttling
        Gardener: jennyz
        • 692247 falco-chrome-pfq, daisy_skate-chrome-pfq: failed to connect to DUT after AU
        • 687248 falco-chrome-pfq: flakiness in provisioning prevents chrome uprevs (continued from last week)

        2/6-2/10
        Sheriffs: jinjingl, waihong
        • 691009 daisy_skate CQ: Devserver call failed: "http://100.108.1.152:8082/check_health?" => Restarted devserver.
        • 690616 Coreline canary: Failed to perform stateful update
        • 690232 Candy: The name org.chromium.UpdateEngine was not provided by any .service files
        • 690286 no green build for reef family
        • 689794 samus-android-pfq failing HWTest - CrOS auto-update failed
        • 689694 CQ Failing Gerrit Unittests - gaierror => Fixed the test
        • 689105 multiple autoupdate_EndToEndTest failures at about 6:40 => Reverted CL
        • 689072 build_image failing again in canary archive step with cryptic error => Reverted CLs
        Gardeners: michaelpg, glevin
        • crbug.com/687248: frequent falco-chrome-pfq failures. Suspect DUT replaced, but issue ongoing.
        • crbug.com/688568: VideoPlayerBrowserTest.OpenSingleVideoOnDrive still flaky.
        • crbug.com/685340 (fix in review): LKGM builder fails 50% of nights. Uploaded CL, ensured a run succeeded, and updated YAQS.
        • crbug.com/691058 (resolved): depot_tools CL breaks SyncChrome step on canaries and PFQ; quickly reverted by dpranke@
        • crbug.com/685313 (resolved): linker failure on amd64-generic-tot-asan-informational
        • crbug.com/689264 (resolved): piex_loader.js is noisy in chromium browser_tests
        • various (resolved): flaky tests on Linux ChromiumOS. CLs reverted.
        • ketakid: PFQ failure for samus on 57 branch (fix here)
        1/30-2/5
        Sheriffs: uekawa
        Infra:
         - crbug.com/686940 stateful.tgz missing from caroline dev release builds. -- manually fixed
         - crbug.com/687237 devserver down due to disk full, cleanup script wasn't running due to manifest. -- resolved and pushed.
         - crbug.com/687402 dhcp outage caused lots of ssh connection timeout. -- should be resolved.
         - crbug.com/687935 lakitu-paladin failing with GS upload failure. -- ACL was fixed.
         - crbug.com/687437 lakitu-gpu-incremental has never succeeded -- a change went in.
         - crbug.com/687248 falco-chrome-pfq failure. -- tried locking 
         - crbug.com/686854 signers timing out
         - crbug.com/685313 libinstallattributes failing with asan build. now fails with another failure.
         - seems to be failing all builders now with failing to uprev, what!?


        1/23 - 1/25
        Gardeners: jamescook, warx
        • 683977 git lockfiles breaking chromeos amd64-generic Trusty builder. Resolved.
        • 683640 cheets_GTS.google.admin: FAIL: Test did not complete due to Chrome or ARC crash. Java version issue. Disabled.
        • 684044 "All devservers are currently down" - incorrectly blaming *all devservers* when a single devserver call flakes. WontFix.
        • 683304 falco-chrome-pfq failures. Infra / test problem. Fix in flight.
        • 674209 constant video_ChromeHWDecodeUsed failures in tricky/peach_pit informational pfq. Reverted.
        • 685313 linker failure on chromeos asan in libbrillo, "may overflow at runtime; recompile with -fPIC". Toolchain? Still failing.
        • 685340 chromeos Chrome LKGM builder failing in cros_best_revision, git cl land failure. Flaky. Infra?
        • 685424 scheduler: Aborting large number of bvt-prebuild request from past canary causes slowdown, CQ failure. Ongoing.
        • 683828 Chrome compile failure, openh264 cpu architecture. Reverted.
        • 685675 Manually uprev Chrome to 58.0.2993.0 for Chrome OS
        • BuildPackages failure due to camera HALv2 autotest (not chrome), https://chromium-review.googlesource.com/#/c/433383/ Reverted.
        • 685269 [VMTest] cheets_CTS.6.0_r14.x86.android.core.tests.libcore.package.harmony_java_math fails on cyan-tot-chrome-pfq-informational. Chrome / ARC incompatible. Fixed on ARC side.
        • 686193 amd64-generic-telemetry failure in vmtest telemetry_UnitTests SimpleTestVerify, PlayActionTest.testPlayWaitForPlayTimeout and webservd crash. Flaky.
        • 686265 Frequent exceptions (timeout) on Linux ChromiumOS Tests (dbg). Flaky.
        • 686266 Chrome OS PFQ annotator marks passing PFQ runs as failed if chrome didn't need to update. Tool issue.
        1/16-1/22
        Sheriffs; abhishekbh, adlr, kcwu
        Infra: dshi

        Ongoing:
        Resolved issues:
        • crbug.com/680843ts_mon outage for builders ('module' object has no attribute 'RetriableHttp')
        1/9-1/15
        Sheriffs: snanda, rspangler
        Infra: dgarrett

        Ongoing:
        • crbug.com/679580: canary paygen failures with "no JSON object could be decoded"
        • crbug.com/681052build329-m2, build315-m2, build318-m2 can't sync (makes trybots somewhat unreliable)
        • crbug.com/681096[cyan-chrome-pfq] [veyron_minnie-chrome-pfq] failed HWTest [arc-bvt-cq] (swarming timeouts?)
        Not resolved, but not on fire either:
        • crbug.com/679002generic_RebootTest should not cause autotest to complain that it lost connectivity to the DUT
        • crbug.com/679487: one unresponsive DUT caused CQ to fail (dup of crbug.com/632886)
        • crbug.com/679878: several builders failed HWTest with "Android did not boot!" errors (may have 2 different root causes)
        • crbug.com/680220stumpy-paladin failed with "Couldn't resolve host 'chromium-review.googlesource.com'"
        • crbug.com/680532archive step failure should report root error (lots of GS failures on canaries Wed night)
        • crbug.com/680658no space left on device for kefka-release during paygenbuild 
        • crbug.com/680793Very long delay between tests complete & stage end during bvt-inline stage
        • crbug.com/681198: what was chromeos4-row4-rack12-host15 doing between 14:06 and 14:30?
        Resolved Issues:
        • crbug.com/676152: reef-paladin is failing with out-of-space error in rootfs (temp workaround in place; testing longer-term fix)
        • crbug.com/678643update_engine unittest stuck for 25+ minutes (temporarily markend auron-paladin not important)
        • crbug.com/679213[bvt-cq] desktopui_MashLogin Failure on x86-alex-release/R57-9163.0.0 (root cause likely crbug.com/679840 and crbug.com/659741; just disable that test on old platforms)
        • crbug.com/679410: whirlwind paladins are failing: lack of healthy DUTs? (restarted scheduler)
        • crbug.com/679452guado_moblab paladin is failing HWTest; DUTs in repair failed state (fixed bad switch in lab)
        • crbug.com/680362'class ash::ShelfWidget' has no member named 'SetShelfVisibility' (broke Chrome PFQ)
        • crbug.com/680601kip shard (chromeos-server42.cbf) is down
        • crbug.com/680849many canaries keep failing at archive step

        1/2-1/8
        Sheriffs: johnylin
        Infra: 

        Ongoing:
        • crbug.com/678271: some PFQ builders are timing out in HWTest (bvt-inline)
        • crbug.com/678811: gale, whirlwind: BuildPackages fail (liblightcontrol make fail)
        • crbug.com/668968: Not enough DUT for falco_li in lab (under request), see also b/33249596
        • crbug.com/676433: asuka, auron-yuna, banon, celes, gandof, lulu, failed at Paygen stage with [Errno 28] No space left on device
        • crbug.com/677982: chromeos-bmpblk broken for poppy --> poppy-paladin broken
        Resolved Issues:

        12/26-1/1
        Sheriffs: dtor, martinroth, yhanada
        Infra: kevcheng

        Ongoing:
        Resolved Issues:

        12/19-12/26
        Sheriffs: dianders, itspeter
        Infra: sbasi

        Ongoing:
        Resolved Issues:

        12/12-12/19
        Sheriffs: sonnyrao, benchan, mtomasz
        Gardeners: stevenjb

            Ongoing Issues:
        • crbug.com/673363canaries fail gsutil uploads with AccessDeniedException 403

        • crbug.com/655758tricky-chrome-pfq build 2448 failed due to provisioning error

        • crbug.com/673455intermittent failure in cheets_CTS.6.0_r12.arm.android.core.tests.libcore.package.harmony_java_math

        • crbug.com/673587random tests failing in canaries with no individual test logs

        • crbug.com/673584GoB rejecting manifest pushes ("failed to lock" error)


            Resolved Issues:


        12/04-12/11
        Sheriffs: jinsong, puthik, hungte
        Gardeners: 

            Ongoing Issues:
        • None


            Resolved Issues:

        11/28-12/04

        Sheriffs: mcchou, mruthven, ravisadineni
        Gardeners: 

        Internal Waterfall:
           Ongoing Issues:

        • crbug.com/669298: Lumpy provision failed due to Unhandled DevServerException: CrOS auto-update failed for host chromeos6-row2-rack7-host12
          (b/33185795 is filed for tracking the offline status of this bot.)

        • crbug.com/654934: Paygen issue on arkham-release builder seems to be the reoccurance

        • crbug.com/660413: lakitu-release builder GCEtest failed

        • crbug.com/662625: guado_moblab-paladin failed at HWTest stage with moblab_RunSuite: FAIL: Unhandled AutoservRunError: command execution error

        • crbug.com/670132: arkham-release builder failed at Paygen stage with cannot find source stateful.tgz error

        • crbug.com/607514: veyron_speedy, wizpig failed at AUTest stage with image installation failure

        • crosreview.com/415591 and crosreview.com/415550 broke master-paladin

        • crbug.com/670878: oak-release builder failed at HWTest stage with "(2006, 'MySQL server has gone away')" error

        • crbug.com/646812: falco_li-release failed due to lack of DUT

        • crbug.com/670911: sentry-release, Inconsistent propergation for the same test failures.

        • crbug.com/667999: provision failure, Unhandled DevServerException: CrOS auto-update failed for host chromeos2-row3-rack1-host21

        • crbug.com/668968: falco-chrome-pfq failed due to network issue
          b/33249596 P0 filed for syslab to troubleshoot

        • crbug.com/670430: build_packages error due to authpolicy on x86-generic

           Resolved Issues:

        Public Waterfall:

           Ongoing Issues:


        11/21-11/28
        Sheriffs: drinkcat, groeck, furquan
        Gardeners: 

            Ongoing Issues:

        Please follow up on these, at least:
        • crbug.com/668568: Lots of -paladin builders failures during ImageTest (libwidevinecdmadapter.so contains unsatisfied symbols). Had to pin Chrome.
        • crbug.com/668418: VMTest in GCE instances?!
        • crbug.com/668127: squawks pool:bvt unbalanced (please check what's going on?)
        • crbug.com/668627: cros-beefy23-c2 out of disk space
        • crbug.com/662625: guado_moblab: bad DUT
        • crbug.com/665474 - Inadequate DUTs for falco_li
          • Maybe not be fixed in the immediate future (we are short on HW)
        • crbug.com/666070 - wizpig/terra-release builders fail during HWTest: An operational error occured during a database operation: (2006, 'MySQL server has gone away')
        Less critical:
            Issues from last week:
        • crbug.com/665235 - invalid oauth credentials. Some slaves were unable to retrieve images from google storage resulting in AUTest failures on the Canary waterfall.
        • crbug.com/666414 - ssp picks random devserver.  Patches in place to mitigate.
            Resolved Issues:
        • crbug.com/668562: terra-release. Bad DUT
        • crbug.com/667143 - kevin-tpm2 keeps failing (jwerner has a fix)
        • crbug.com/667555 - wizpig-release HWTest has been failing continuously for a few days.
          • Bad DUT
        • crbug.com/667184 - glados-release SignerTest failure (should be fixed)
        • crbug.com/667087 - pool: bvt, board: x86-mario in critical state (should be fixed)
        • crbug.com/667075 - x86-{mario/alex}-{paladin/release/chrome-pfq} failure (also seems to affect other x86 3.8 boards like peppy/falco/lumpy/etc)
        • crbug.com/667145 - veyron_minnie-android-pfq not running (builder offline)
        • crbug.com/665531 - sentry-release experiencing test timeouts (probably duplicate of 666070)
        • crbug.com/667195 - cros-beefy70-c2: Disk almost full, glimmer-cheets-release Paygen failures
            PFQ (gardening) issues:
        • None?
        11/14-11/21
        Sheriffs: skau, ntang, pgeorgi
        Gardeners: jennyz

            Ongoing Issues:
        • crbug.com/665235 - invalid oauth credentials. Some slaves were unable to retrieve images from google storage resulting in AUTest failures on the Canary waterfall.
        • crbug.com/665286 - Bad DUT for guado_moblab-paladin
        • crbug.com/646812 - Inadequate DUTs for falco_li
        • crbug.com/665531 - sentry-release experiencing test timeouts
        • crbug.com/666414 - ssp picks random devserver.  Patches in place to mitigate.

            Resolved Issues:
        • crbug.com/664994 - x86-alex-paladin reports DUT unplugged. Actually, bad firmware CL in CQ.
        • crbug.com/665061 - Not enough DUTs for buddy-release
        • crbug.com/665073 - Lab restarted overnight. Caused 2 wedged slaves.
        • crbug.com/665139 - Perceived lab slowness. Shard schedulers required restart.
        • crbug.com/665721 - oak-paladin and reef-paladin failed due to bad restart of slaves
        • crbug.com/666116 - peppy-release running client jobs as server jobs due to a bad image from devserver.
        • crbug.com/666355 - No cyan boards for hw_video_acc_enc_vp8.  Misread debug message as error message.  Failure is expected.
        • crbug.com/666372  - Multiple canaries failing due to overnight ganetti restart.
        • crbug.com/666460 - daisy_skate-paladins failing provision_AutoUpdate.double
            PFQ (gardening) issues:
        • None?

        10/31- 11/06
        Sheriffs: tfiga, dlaurie, yueherngl, semenzato (honorary)
        Gardeners: jamescook

            Ongoing Issues:

          • crbug.com/653362: StageControlFileFailure due to DownloaderException
          • crbug.com/660409: Canary runs fail with "DevServerException: stage_artifacts timed out"
          • related: crbug.com/660896: Chrome LKGM is stale due to parrot-release failures
          • crbug.com/660520: drone cannot connect to cloudSQL
          • crbug.com/648665: login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational -> address space exhaustion on 32-bit Intel ASAN
          • b/32653128 - veyron_speedy-paladin constantly failing on an ARC++ related HWTest

              Resolved Issues:


            10/24- 10/31
            Sheriffs: kirtika, mka, deanliao, semenzato (honorary)
                 
                Ongoing Issues on canaries
            • crbug.com/656205: SetupBoard failure, last ~10 parrot canaries failed. 
            • crbug.com/658374: Provision failure with error "Devserver portfile does not exist".
            • crbug.com/657548: AUTest fails with kOmahaErrorInHTTPResponse (37)
            • crbug.com/609931: No output from BackgroundTask for 8640 seconds
            • To look into: guado paladin caused consecutive master paladin failures on Friday
                
                PFQ (gardening) issues
               
            • New issues:
            • crbug.com/659277 - Last AU on this DUT failed, The python interpreter is broken, completed successfully (happened once)
            • crbug.com/659894 - HWTest security_SandboxStatus failed on elm and veyron_mighty paladin for two times.

            • Ongoing Issues:
            • crbug.com/591097 - MobLab Failures in the CQ: dhcpd is not running. Crashing on shill restart (single occurrence)

            • Resolved issues:
            • b/32420834 -  Slow UI with 500 Internal Server Error on a CL with many comments (pre-cq-launcher failed to fetch the CL


            10/17- 10/23
            Sheriffs: cychiang, briannorris, semenzato (honorary)
            Gardeners: dshi, jrbarnette
                    
                    Ongoing Issues on canaries:
            • autoupdate_EndToEndTest, many different failures
            • autoupdate_Rollback
            • provision_Autoupdate.double
            • other provisioning failures (rsync errors, timeouts, error 37)
            PFQ (gardening) issues:
            • New Issues:
            • crbug.com/656766 - lakitu cloud_SystemServices flakiness
            • crbug.com/656812 - autotest-web-tests build errors are too opaque
              • Filed, noted a potential fix
            • crbug.com/656872 - Not enough falco_li DUT in the lab.
            • crbug.com/656873 - kunimitsu-release: build_packages failed on autotest-deps-ltp with undefined ltp_syscall, happen once.
            • crbug.com/657274 - guado_moblab-paladin: moblab_RunSuite: FAIL: Unhandled AttributeError: '_CrosVersionMap' object has no attribute 'get_stable_version'
            • crbug.com/657278 - celes-release, gandof-release: signing failed due to gsutil/ssl timeout
            • crbug.com/657313 - pre-cq failed because nyan_freon is removed
            • crbug.com/657330 - x86-mario-release: security_ModuleLocking timed out
            • crbug.com/657730 - Falco device chromeos2-row4-rack5-host7 is flaky in provision
            • crbug.com/657746 - multiple paladins: security_ptraceRestrictions: DUT rebooted during the test run.
              • Caused by bad CLs that made it through for crbug.com/657609
              • Poor Kernel 3.10 HW coverage: crbug.com/657967
              • Bad CL in 3.10 has been reverted, but still flushing out of some canaries (2016-10-20)
            • crbug.com/658214 - Nearly all canary failed: paygen and AUtest fail to install device image.
            • crbug.com/658291 - chell signing/paygen failing due to new kernel cmdline flag
            • crbug.com/658338 - jetstream_LocalApi failure
            • crbug.com/658473 - wolf + veyron_speedy DUT availability
            • crbug.com/658506 - kunimitsu build failures
              • Still not resolved; there's no paladin?
            • Resolved Issues:
            • crbug.com/656726 - Chrome PFQ manifest errors
              • Waiting for next PFQ runs to come through
            • build_packages fail on almost all release builders, some paladin builders.
            • crbug.com/656903 - security_SandboxedServices failure "One or more processes failed sandboxing"
            • crbug.com/657352 - canary build failure because of minijail tree change. uprev of ebuild chumped. Fix to security_SandboxedServices chumped.
            • crbug.com/656717 - autotest-web-tests issues on guado_moblab-paladin (experimental)
            • root caused to libcups/icedtea-bin - fix is in flight
            • crbug.com/657218 cave-release: Fail to resolve host name for cros-beefy19-c2
            • b/32292437 - DUTs in pool crosperf are all 'repair failed'
            • Need to push change https://chromium-review.googlesource.com/#/c/401299/ to autotest shard.


            10/10 - 10/16
            Sheriffs: chirantan, julanhsu, kinaba
            Gardeners: lpique, dbehr


            PFQ (gardening) issues:
            •  New Issues:
            • crbug.com/654820 - guado_moblab: Repair failing. Happened once, didn't reoccur
            • crbug.com/655330 - falco-chrome-pfq failing since build 4821 with apparent network issues after updating. Filed after digging into one of the failures on falco, and noticing that in one case the infra didn't reconnect to the DUT after it was provisioned. Possibly related to crbug.com/652207 where it falco becomes unpingable during provisioning.
            • crbug.com/655750 - select_to_speak exists build error. Occurred once.
            • crbug.com/655758 - Microcode SW error detected. Occurred once.
            • crbug.com/656066 - [bvt-inline] security_SandboxedServices failure on lumpy-chrome-pfq (flake). "awk cannot open /proc/xxx/status" because the process ended between when the filename was generated and when awk tried to open it.
            •  Ongoing Issues:
            • [falco-chrome-pfq] almost always red
              • crbug.com/652207 - provision failure "Device XXX is not pingable". This has plagued the falco-chrome-pfq builder, and is one of the main reasons we didn't automatically uprev Chrome this week.
            • [x86-generic-tot-asan-informational] almost always red
              • crbug.com/648665 - login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational.
            • [ChromeOS Buildspec] red for M54 builds
              • crbug.com/654561 - browser tests failing M54 builds on ChromeOS Buildspec builder. Landed a fix on the M54 branch that was made after the branch was cut, and was otherwise missed. For the builds to go green, we need a new M54 release though, since the builder pulls the current stable version release.
            • [Chrome4CROS Packages] always red
            • [lumpy-chrome-pfq] occasionally red
              • crbug.com/653238- lumpy-chrome-pfq HWTest [bvt-inline] timed out waiting for json_dump. This is still happening, as the build time is too long occasionally. Added a note to the bug about certain tests taking much longer than the mean according to the gathered statistics when this occurs.
            •  Resolved Issues:
            • crbug.com/655800 - Manually uprev Chrome to 56.8891.0.0 for Chrome OS. Since we otherwise would not have done so at all this week.
              • Actually there happened to be a green master run late Friday, for the first time in nine days.
            • crbug.com/653900 - BuildPackages broken in multiple chrome-pfq builders. The CL for the fix landed and the builds were fixed Monday.
            • crbug.com/655228 - (New) Media.VideoCaptureGpuJpegDecoder.InitDecodeSuccess not loaded or histogram bucket not found or histogram bucket found at < 100%". Caused failures on peach-pit. The fix landed early Thursday.


            10/3- 10/9
            Sheriffs: rajatja, denniskempin
            Gardenersihf, glevin
            • DebugSymbols error. Happens occasionally across boards: crbug.com/649791
            • AU Retry issues: crbug.com/649713
            • message_types_by_name error in dev_server: crbug.com/652169
            • buddy_release has been failing for weeks: need to investigate
            • gandof-release: crbug.com/639314
            • GSUtil timeout issues: crbug.com/642986
            • sentry-release: Some odd issues with HWTest need to investigate
            • crbug.com/654245: bots failing graphics_Gbm check during hwtest

              PFQ (gardening) issues:
            •  New Issues:
            • crbug.com/653900 - BuildPackages broken in multiple chrome-pfq builders.  There's a CL  for the fix, but it hasn't been committed yet.
            • crbug.com/654044 - AboutTracingIntegrationTest.testBasicTraceRecording failing on x86-generic-telemetry and amd64-generic-telemetry.  CL to disable the test currently under review.
            • crbug.com/652195 , crbug.com/652807 , crbug.com/653006 , crbug.com/653031 - Autobugs for occasional HWTest provision flakes, mostly masked by 653900 since Thursday.
            • crbug.com/652824 - falco- and tricky-chrome-pfq's failed w/timeouts during swarming.py.  Occasional flake, but no logs, no work done.
            • crbug.com/653238 - lumpy-chrome-pfq HWTest [bvt-inline] timed out waiting for json_dump.  Flaked once, didn't recur.
            •  Ongoing Issues:
            • crbug.com/648308 - Chrome4CROS Packages builder still broken (3+ weeks)
            • crbug.com/648665 - Still happening on x86-generic-tot-asan-informational, with occasional successes slipping through.
            • crbug.com/651870 - Occasional flake in PageLoadMetricsBrowserTest.FirstMeaningfulPaintNotRecorded
            • crbug.com/651593 - HWTest[bvt-inline] : "security_NetworkListeners FAIL: Found unexpected network listeners".  Single flake, waiting to see if it recurs.
            •  Resolved Issues:
            • crbug.com/652316 - [VMTest - SimpleTestVerify] failing on cyan-tot-chrome-pfq-informational : "Could not access KVM kernel module".  Reverted offending CL, builder green since then.
            • crbug.com/639852 - Linux ChromiumOS Tests (dbg) failure of two DevToolsAgentTest.* tests.  Issue contains cause, revert, and subsequent fix.
            • crbug.com/643238 - Linux ChromeOS Buildspec Tests failed intermittently for weeks.  Failure not seen since 10/7, when issue comment suggested that potential fix had landed.
            • crbug.com/653672 - Multiple generic pfq builders failing with "Invalid ebuild name".  Fixed.

            9/26 - 10/2
            Sheriffs: dbasehore, akahuang
            Gardenersjdufault, glevin

            9/19 - 9/25
            Sheriffs: apronin, charliemooney, vpalatin
            Gardeners: stevenjb
            • chromiumos-sdk failed to build (missing efi.h) - fixed, build CL at fault CL to fix
            • Cyan has broken/flaky test performance in ToT, was causing CQ failures bug here
            • DataLinkManager crashing and breaking Canaries bug here (fixed: CL reverted)
            • Surfaceflinger crashing on oak bug here
            • Paladins fail to connect to MySQL instance bug here
            • Canaries were failing with "no attribute 'SignedJwtAssertionCredentials'" bug here (workaround CL submitted)
            • arc_mesa builds broken on auron, buddy, gandof, lulu, bug here, mostly fixed, buddy still fails as of buddy/428
            • crbug.com/649582: manifest generation fails w/binary data in commit messages (e.g. CL:387905)
            • crbug.com/649592: libmtp roll broke build packages due to autotools regen (fixed in CL:389031)
            • Root FS is over the limit for glimmer bug here
            • Reef builds were broken (unit tests failed to build), fixed here
            • Gru builds are broken (fail during uploading command stats) due to this CL, bug here, CL to fix
            • Some CLs are not marked as merged in Gerrit after a CQ run bug here
            • Tests that succeeded but left crashdumps frequently aborted on crashdump collection timeouts bug here, crashdump symbolication turned off if tests passed (here)
            PFQ (gardening) issues:
            • Chrome4CROS Packages builder failing in compile - crbug.com/648308
            • login_Cryptohome fails nearly constantly on x86-generic-tot-asan-informational - crbug.com/648665
            • login_OwnershipNotRetaken fails regularly on PFQ. - crbug.com/618392
              • Ongoing investigation
            • Shutdown crash in ~ScreenDimmer > SupervisedUserURLFilter::RemoveObserver - crbug.com/648723
              • FIxed
            • Several PFQ failures due to timeouts - crbug.com/647303
              • Some timeouts are triaged, but some still need investigation

            9/10 - 9/18
            Sheriffs: cernekee, kkunduru, chinyue
            Gardenersafakhry



            9/5 - 9/9
            Sheriffs: jdiez, dhendrix, mcchou, josephsih
            Gardeners: achuith
            • Mostly having issues that affect many builders.
            • Canaries failing due to "HWTest did not complete due to infrastructure issues (code 3)", suspect b/31011610. May file more bugs...
            • Several builders failing due to misconfigured cheets_CTS test: crbug.com/641208
            • Kevin failing badly: crbug.com/644908
            • master-paladin infra failures (build 12292): this CL broke several paladin builds. Told the CL owner not to mark ready before fixing problems.
            • master-paladin infra failures (build 12294): failed 4 consecutive times. 20 paladins did not start in CommitQueueCompletion. Similar to build 12281 yesterday but build 12283 passed later.
            • provision_AutoUpdate.double ABORT: Timed out, did not run.
              • master-paladin infra failures (builds 12301, 12302): failed in these 2 builds
              • Looked similar to crbug/593423: Need to watch this as more builders were broken due to the timeout issue.
              • Build 12303 passed. Flaky?
            • signers failing while signing android apks: crbug.com/645628

            8/29 - 9/4
            Sheriffs: kitching, bleung, yixiang@
            Gardeners: michaelpg, afakhry
            • CQ paladin build #12207 failed due to whirlwind-paladin #5640 HWTest jetstream_ApiServerAttestation failing, but passes in #5641
            • CQ paladin build #12215 failed due to many repo sync errors (example: daisy_skate-paladin), looks like subsequent builds do not exhibit repo sync problems
            • CQ paladin build #12216 failed due to:
            • CQ paladin build #12218 failed due to "No room left in the flash" Vpalatin knows about it and looking for ways to make it fit. 
            • crbug.com/642478 - Slave frozen, needed to be restarted.
            • crbug.com/642608 - Timeout on Paygen curl /list_suite_controls (auron-release)
            • crbug.com/642616 - Timeout on Paygen curl /stage (banon-release)
            • crbug.com/642611 - Paygen suite job timed out despite all PASSED
            • crbug.com/642617 - buddy-release: Paygen suite job timed out, all tests FAILED/ABORT
            • Top Issue on 8/31 - crbug.com/641290 - lab database problem
            • b/31011610 - ATL14 packet loss bringing down ChromeOS Commit Queue
            • crbug.com/643278 - guado_moblab broken due to testing outage
            • crbug.com/643300 -  nyan_freon-paladin timed out during p2p unittest
            • crosbug.com/p/56862 - gru-paladin attestation unittest failure. Possibly flaky test. apronin@ looking at fixing test. Also affects gale-paladin
            • crbug.com/643452 - All paladins failed during CommitQueueSync.  akeshet@ theory is that backlog of CLs (especially on kernel repo) overwhelmed GoB. akeshet@ put in a CL to temporarily limit CQ volume to 50 : https://chromium-review.googlesource.com/#/c/380457/ TODO: Revert this once the backlog is cleared. nxia@ also added this mitigation : https://chromium-review.googlesource.com/#/c/380343/2
            8/22 - 8/28
            Sheriffs: bhthompson, nya, walker
            Gardeners: jennyz, lpique
              8/15 - 8/21
              Sheriffs: benzh, sureshraj, yoshiki
              Gardeners: jamescook, domlaskowski
              • crbug.com/637868 security_StatefulPermissions failures on canaries: 
              • crbug.com/593423 provision_AutoUpdate.double failures on chrome pfq informational: 
              • crbug.com/637962 SyncChrome failures due to "Repository does not yet have revision" on chrome informational pfq -> infra, ongoing flake
              • crbug.com/637960 Chrome telemetry failures due to missing system salt file -> reverted
              • crbug.com/637900 cyan chrome pfq informational builder cros-beefy191-c2 is out of disk space building chrome -> infra
              • crbug.com/637472 pool: bvt, board: falco in a critical state -> infra
              • crbug.com/637931 Chrome4CROS Packages builder failing in bot_update "fatal: reference is not a tree" -> infra
              • crbug.com/637938 VMTest failing on telemetry bots due to telemetry_UnitTests_perf -> bug in test script?, disabled
              • crbug.com/638348 cros amd64-generic Trusty builder failing to start goma in gclient runhooks step -> networking flake?
              • crbug.com/631640 login_CryptohomeIncognito -> flaky, but real failure
              • crbug.com/638656 cheets_NotificationTest failure on Cyan PFQ -> real failure in chrome (crash in shelf)
              • crbug.com/638980 falco-full-compile-paladin has failed to start with exception setup_properties
              • crbug.com/638968 x86-generic-tot-asan-informational failures in tpm_manager (odr-violation) and attestation (leaks) -> new target added to cros build that had failures, reverted
              • crbug.com/639102 Kernel panics on Cyan PFQ -> ???
              • crbug.com/639107 link-paladin BuildPackages failure with SSLError The read operation timed out
              • crbug.com/639314 AUTest failed on most canaries due to no test configurations
              8/8 - 8/14
              Sheriffs: davidriley, vprupis, takaoka, smbarber (Mon afternoon only)
              • Continued UnitTest failures on canaries and release branches: crbug.com/627881
              • lakitu failures: crbug.com/635562
              • edgar missing duts: crbug.com/596262
              • kevin firmware prebuilt: crbug.com/635598
              • x86_alex and veyron_rialto pool health: crbug.com/634471 and crbug.com/592002
              • Chumped change broke everything (eg pre-CQ, CQ, canaries) until revert was chumped in
              • infrastructure flake
                • celes-release/289, setzer-release/292 (build interrupted) -> crbug.com/602565
                • nyan-release/293, wolf-release/1294 (sudo access) -> crbug.com/616206
                • pre-cq (gerrit quota limits) -> crbug.com/624460
              • Friday: lab downtown affected builds for much of the day
              8/1 - 8/8
              Gardeners: stevenjb@, khmel@

              7/29 Notes for the next sheriffs from aaboagye, kirtika: 
              • Major issues we are seeing, format is <Impact: Issue: Links>::
                • Tree closure, fixed now: "No space left on device" for cheets builds: aaboagye@'s post-mortem here. crbug.com/630426
                • CQ failures: We've been seeing intermittent failures due to hitting git fetch limits with gerrit (commit queue sync step doesn't work). The current CQ run failed due to this, would not be surprised if the next one does too. crbug.com/632065.
                • Several canaries failing: Unit-test times out, possibly due to overloaded machines: crbug.com/627881
                • Android-PFQ failures: adb is not ready in 60 seconds: crbug.com/632891
              • Minor issues, work-in-progress
                • Android-PFQ: mmap_min_addr not right on samus/x86: crbug.com/632526.
                • Paygen/signing issues.
                • Autoupdate-rollback (likely network SSH issue): example crbug.com/596262



              2016-07-25 thru 2016-07-29
              Sheriff: aaboagye, kirtika, hidehiko (non-PST)

              7/29
              • PST
                • Canaries
                  • kevin-release was broken, but a fix is on the way. (wfrichar@ knows)
                • CQ
              • Non-PST:

              7/28
              • PST
                • Canaries
                  • Still seeing the error in the unittest phase. See crbug.com/627881
                  • Paygen issue still affecting some canaries (x86_alex-he - crbug.com/629094).
                  • Saw a failure with auron_yuna canary with an error parsing a JSON response. See crbug.com/632433.
                  • samus failed with platform_OSLimits Found incorrect values: mmap_min_addr. Filed crbug.com/632526.
                • CQ
                  • Closed the tree because the CQ would just reject people's changes because of the no-disk-space error. crbug.com/630426.
                • Chrome PFQ
                  • Still seeing some failures in the login_CryptoHomeIncognito test. See crbug.com/631640.
              • Non-PST
                • CQ:
                  • RED.
                  • samus-paladin is failing due to no-disk-space error. crbug.com/630426
                  • cheets tests are failing two times with actual error (https://chrome-internal-review.googlesource.com/#/c/270781/). Being fixed.
                • Chrome PFQ:
                • Android PFQ:

              7/27
              • PST
                • Canaries
                  • Seems like nearly all the canaries failed during HWTest stage apparently due to Infra issues.
                • CQ
                  • On one run, some of the paladins failed during the CommitQueueSync step due to git rate limiting.
                • Android PFQ
                  • An overloaded devserver is causing provisioning to fail for cyan-cheets-android-pfq and veyron_minnie-android-pfq (wolf-tot-paladin too).
              • (Non-PST)
                • CQ:
                  • Master paladin looks flaky due to various reasons.
                    • CQ limit hitting
                    • HWtest time out
                    • kOmahaErrorInHTTPResponse: crbug.com/621148 looks a tracking issue. 
                  • These look not always reproducible, and some runs pass successfully.
                • Chrome PFQ:
                  • Finally passed at #3175.
                • Android PFQ:
                  • Failing in latest several runs. Though the reasons are variety. Looks just too flaky.

              7/26 (18:20 PST)
              • Canary Failure Classification: Lots of canary failures (~50%) this afternoon, so listing unique causes here to track down tomorrow: 
                • x86-zgb: Pool-health issue, infra (kevcheng@) looking into it, may be back up next canary run? 
                • x86-mario: Not sure if the manifestversionedsync is a real issue or not, filed crbug.com/631867 anyway. 
                • Paygen failures: falco, falco_li, gru, jecht, kip, lumpy, ninja, parrot, peppy, samus, smaug, x86_alex-he, stumpy. TBD: Update more details here. 

              7/26
              • (PST)
                • Canaries
                  • Still some errors on nyan_blaze and nyan_kitty caused by the vboot_firmware CL. crbug.com/631192
                    • Fixes posted to gerrit and making it's way through the CQ.
                  • Still some unittest failures. There's a CL that just landed to reduce the parallelism. Will be following to see if the situation improves. crbug.com/627881.
                    • That CL did not seem to resolve the issues.
                  • Saw a few canaries yesterday (celes this morning) that had issues when uploading debug symbols. dgarret@ is working on a fix. crbug.com/212437.
                  • security_StatefulPermissions is pretty flaky, veyron_minnie canary failing on it. wmatrix is all red: https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=security_StatefulPermissions. Investigating crbug.com/604606
                  • There was canary failure on lars-release which reported all the DUTs in the pool as dead, but they seem to be up now. crbug.com/631530.
                  • x86-zgb pool health is poor - most devices down. kevcheng@ taking a look. crbug.com/590653.
                  • Towards the end of the day, a larger number of canaries were failing at the paygen step. I think what may be happening is network flakiness, but I wonder why we don't just retry again?
                • CQ
                  • panther_embedded-minimal-paladin has been down for quite some time now. Pinged the bug to see if there are any updates. crbug.com/630494.
                    • A restart of the master has been scheduled. Need to check back later today if that fixes things.
                  • No elm devices in pool:cq making elm-paladin fail. kevcheng@ taking a look. No bug yet. 
                • Android PFQ 
                  • harmony_java_math CTS test is causing failures with its causing android-pfq failures "cts test does not exist".  Filed b/30413761. Ping ihf@ if it doesn't get better. 
                • Chrome PFQ 
              • (Non-PST)
                • Canaries
                  • platform_FilePems issue was fixed by yusukes@. crbug.com/631080
                  • Investigated a bit more about UnitTest failure. Not yet reached to root cause. crbug.com/627881.
                • CQ
                  • Looks flaky: Sometimes failing ErrorCode=37 (OmahaErrorInHTTPResponse).
                • Chrome PFQ:
                  • Looks flaky. Sometimes failing due to login error, but there is variety of failing boards.

              7/25
              • Canaries
                • Several of the canaries were failing in the platform_FilePerms HwTest.
                  • This was seen on cyan, elm, lulu, oak, samus, and veyron_minnie.
                  • Appears to be missing expectations for ARC containers.
                  • Filed crbug.com/631080.
                • The unittest stage seems to be timing out somewhat fairly often now.
                • nyan-big is failing on a vboot_firmware CL not building. Filed crbug.com/631192. Fix is in CQ now. 
              • CQ 
                • Generally okay today. There was one issue regarding a failure in VMTest, but that was caught.

              2016-07-18 thru 2016-07-24
              Sheriff: wuchengli
              7/19


              7/18
              • 628990: DebugSymbolsUploadException: Failed to upload all symbol
              • 593461: Chrome failed to reach login screen within 120 seconds
              • 628494: chromeos-bootimage build failures in canary builds
              • 609931: 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-5:6:7:3, started)> for 8610 seconds
              • 629094: cannot find source stateful.tgz

              OLDER ENTRIES MOVED TO THE ARCHIVE so this page doesn't take forever to load.  See Sheriff Log: Chromium OS (ARCHIVE!)
              Comments