Sheriff Log: Chromium OS (go/croslog)


Please update go/cros-sheriff-playbook when you find a build/infra failure and can map it to what action the sheriff should take for it.

4/16-4/20
Sheriffs:
nshai, lepton, hychao
Ongoing issues:
  • 831624eve-arcnext-chrome-pfq is failing in the PFQ due to libassistant compilation failure
  • 832880: rmi4update kills chrome on nautilus
  • 831732: scarlet-release: Not enough DUTS available - need to remove temp. experimental from chromiumos-status when fixed!
  • 827347: security_SandboxedServices test failure on stout due to root run dbus-monitor in rf-led-handler.conf (EOL device)
  • 795128: evemu-device did not die resulting in security_SandboxedServices failing
  • 796254: Archive stage fails in build_image
  • 833478some autotest_lib.site_utils.lxc.* unittests are flaky.
  • 547055: tar: file changed as we read it.
Resolved issues:
  • 833886: Data base error cause most builder error.
  • 833504: reef release broken.
  • 833456: Chrome crashes when running in mash on device.
  • 833499: nami release broken.
  • 83032167: Caroline and Terra builds are RED; Chrome crashes at boot
  • 829780: fizz R66 hwtest failures (waiting for devices in lab to get new base image flashed) - need to remove temp. experimental from chromiumos-status when fixed
  • 831391tatl-paladin:3250 failed: cannot find wayland-scanner
  • 832280: android-container-nyc failing BuildPackages on an unrelated CL
4/9-4/14
Sheriffs:
pberny, dianwa, sheckylin
Ongoing issues:
  • 831624eve-arcnext-chrome-pfq is failing in the PFQ due to libassistant compilation failure
  • 83032167: Caroline and Terra builds are RED; Chrome crashes at boot
  • 829780: fizz R66 hwtest failures (waiting for devices in lab to get new base image flashed) - need to remove temp. experimental from chromiumos-status when fixed!
  • 831391tatl-paladin:3250 failed: cannot find wayland-scanner
  • 832880: rmi4update kills chrome on nautilus
  • 831732: scarlet-release: Not enough DUTS available - need to remove temp. experimental from chromiumos-status when fixed!
  • 831624: eve-arcnext-chrome-pfq is failing in the PFQ due to libassistant compilation failure
  • 832280: android-container-nyc failing BuildPackages on an unrelated CL
  • 827347: security_SandboxedServices test failure on stout due to root run dbus-monitor in rf-led-handler.conf (EOL device)
Resolved issues:
  • 830865: PFQ builds failing - SyncChrome failure: ParseDepsFile:Found recursedeps
4/2-4/6
sheriff: smbarber, zhengpan, mnissler
Ongoing issues:
  • 828371: Not enough DUTs for board: eve
  • 828054: sporadic build_image failures
  • 829055: whirlwind build flakiness
  • 829289: CanaryCompletion failing on multiple builders
  • 824808intermittent failures due to Permission denied: '/proc/sys/fs/binfmt_misc/qemu-arm'
  • 829780: fizz R66 hwtest failures

3/19-3/23
sheriff: eugenegonzalez, shawnn, akahuang
Ongoing issues:


    3/5-3/12
    sheriffs: mgild, nvaccaro, itspeter
    Ongoing issues:
    • 813916: "Not enough DUTs for board: expresso"
    • 815308: coral-release fails HWTests; probably compound flakiness from having to pass on 13 different boards in the lab. Unsure what we can do about this.
    • Investigate on master-paladin failure. It will fail with CL:*581566, ask the author to take a look. 
      • Update: owner submit again after other dependent CLs merged. It is merged as well now.
    • 819576: moblab-generic-vm-paladin seems flaky and failing mater-paladin
    • 819695: Merge 819576 to this one as this is actively investigating.
      • moblab-paladin flake: ERROR: Unhandled UpstartServiceNotRunning: Upstart service moblab-gsoffloader-init not in running state. 

    2/26-3/2
    sheriffs: ejcaruso, matthewmwang, wnhuang
    Ongoing issues:
    • 789058: CreateTarball race in DebugSymbols step of several release builders
    • 813916: "Not enough DUTs for board: expresso"
    • 814343: daisy_skate-release: NotEnoughDutsError
    • 814347: falco_li-release: NotEnoughDutsError
    • 814352: nautilus-release fails on cheets test ( adb is not ready in 60 seconds.)
    • 814500: scarlet-release: NotEnoughDutsError
    • 815308: coral-release fails HWTests; probably compound flakiness from having to pass on 13 different boards in the lab. Unsure what we can do about this.
    • 817074: chromeos2-row8-rack2-host5 is in a bad state, causing setzer-release failures
    • 817437: cros-beefy280-c2 is offline
    • 817948: soraka usb/ethernet instability, likely product issue
    Resolved issues:
    • 814345: falco_release: NotEnoughDutsError
    • 815250: newbie-release fails Signing stage and no email notification is generated
    • 816563: ultima-release: NotEnoughDutsError
    • 816584: enguarde-release: NotEnoughDutsError
    • 816983: guado-release failed from one DUT having issues updating
    • 816986: reef-release won't get past CleanUp stage after build crash
    • 817022: veyron_tiger-release is not running
    • 817063: poppy DUTs in verify-repair loop
    • 817126: chromeos2-row3-rack1-host13 is in a bad state, causing soraka-release failures
    • 817478: 2-3-1-9 is down, causing soraka-release failures
    • 817925: fizz EC fails to sign

    2/19-2/23
    sheriffs: norvez, waihong, rongchang
    Ongoing issues:
    • 814340: platform_addPrinter is flaky (low frequency), sometimes hits the paladins
    • 812425quipper build seems to be failing unittests occasionally
    • 814500: scarlet-release: NotEnoughDutsError
    • 814343: daisy_skate-release: NotEnoughDutsError
    • 814345: falco_release: NotEnoughDutsError
    • 814347: falco_li-release: NotEnoughDutsError
    • 813916: "Not enough DUTs for board: expresso"
    • 814352: nautilus-release fails on cheets test ( adb is not ready in 60 seconds.)
    • 815250: newbie-release fails Signing stage
    • 815308: coral-release fails HWTests - DUTs look really unstable
    • 798618: Intermittent paladin failures with cheets_StartAndroid.stress
    • 813791: Changes to chromeos-base/trunks cause build failures in CQ
    Resolved issues:
    • 814514: veyron_rialto-release: No module named telemetry.core

    2/12-2/16
    sheriffs: snanda, mqg, littlecvr
    Ongoing issues:
    • 811210: HWTest failed due to infra issue (code 3)
      • 811149: widespready DUT pool shortfalls | shards unable to resolve DUTs | dhclient_conf is not set to "yes" on new shards
    • 811217eve-arcnext-mst-android-pfq:78 failed (PackageBuildFailure: Packages failed in ./build_packages: x11-libs/arc-libdrm)        
    • b/72697187: quota increase request for ChromeOS Infrastructure
    • 811402: Master scheduler crashlooping because of malformed HQE
    • 811878cheets_KeyboardTest fails frequently on multiple boards: retry_count: 2, FAIL: Unhandled AssertionError
      • 569819: temporarily remove cheets_KeyboardTest from CQ <- already reverted
    • 812425quipper build seems to be failing unittests occasionally
    • 812949: DUT not rebooted in provisioning
    • 798618: Intermittent paladin failures with cheets_StartAndroid.stress
    • 812581: PackageBuildFailure: Packages failed in ./build_packages: chromeos-base/chromeos-chrome
    • 812848: DUTs get stuck in un-abortable state
    • 811697: Cyan HQEs queuing up catastrophically in shard (M66: Caroline, Cyan build is RED for 2 days)

    2/5-2/9
    sheriffs: craigb, adlr, abhishekbh, shenghao
    Ongoing issues:
            808923: Chrome PFQ failed due to not able to download files from GS
            808945: HWTest failed due to infra issue (code 3)
            809570: edgar-paladin can't mount chroot
            809670CQ blocker: nyan_kitty-paladin fails due to video test errors
            810247wolf-paladin fails: Start browser timeout
            810255: cheets_ContainerMount failed and is blocking CQ
            810667coral-paladin failed on pack_firmware_unittest.py

    1/29-2/2
    sheriffs: vpalatin, ravisadineni, abhishekbh
    Ongoing issues:
            
    808434: Canaries are failing at the build_packages stage due to a python error.
            808563: bvt-arc suite in Canaries builds failing due as the server side tests do not start after scheduling.

    1/22-1/28
    Sheriffs: bmgordon, shapiroc, marcochen

    Ongoing issues:
    • 784914: provision failurs: DUT cannot reboot at pre-setup of rootfs update
    • 806287: chromeos4-row11-rack10-host13 is failing to provision
    • 806013: libbrillo has new ASAN error
    • 796275bvt-arc times out across boards
    • 804977: guado_moblab-paladin is failing for HW tests
    • 782832: not enough daisy_skate devices to keep bvt pool alive
    • b/72397774: intermittent failures to connect to git/gerrit
      • 805928: Release builders failing during ManifestVersionSync stage
        • This looks like potentially a transient git issue
    • 782034: autotest artifacts persist between CQ runs
    Resolved issues:
    • libc++ related changes
      • 805691: qhull fails to build with libc++
      • 805619: buildpackages failing on numerous packages in bare precq run
      • 805722: Clobber incremental builders
      • 805657: arc-camera3-hal-intel-ipu3 fails to biuld with libc++
    • TKO/mysql issues
      • 806019: tko query pileup | tko restart takes over an hour
      • 805337: TKO database reached maximum size.
      • 804127: shard outage for board:leon, board:nyan_big
      • 805724: job_reporter died, causing passing test to appear aborted
      • 804425: shard outage for board:orco
      • 806011: chromeos-server133 afe serving lots of 5XXs
    • 806106: lumpy-incremental-paladin failure
    • 806107: Not enough lumpy DUTs
    • 806196: Uprev stage fails on cros_mark_as_stable
    • 805517: Betty android pfq tests are failing due to VM AU issue
    • 805710: gale-paladin fails in ap-daemons unit tests
    • 804513: eve-paladin failed to rebuild previously removed ebuild
    • 804372: Missing alerts in Sheriff-o-Matic

    1/8-1/12
    Sheriffs: ahassani, dianders, hiroh

    NOTE: Trying idea of just keeping the week's log in a Google doc.

    Ongoing Issues:
    • 782832: not enough daisy_skate devices to keep bvt pool alive
    • 795902: fizz-release failing since Dec 8
    • 800831: provisioning issue across the board
    • 800943: ls-remote-gerrit rate limit exceeded
    • 800949: canaries had a problem pushing to gerrit
    Resolved Issues:
    • 800426race between the Android PFQ and the CQ
    • 800132: infrastructure issue caused provisioning failures
    • 800886: moblab-paladin autotest failure

    1/1-1/5
    Sheriffs: frankhu, sarthakkukreti, johnylin

    Ongoing Issues:
    • 782832: not enough daisy_skate devices to keep bvt pool alive
    • 795902: fizz-release failing since Dec 8
    • 788584: kefka/coral-release/paladin: Linksys USB3GIGV1 Ethernet adapter fails to enumerate (r8152, usb X-Y: device not accepting address Z, error -62)
    • 797849: paygen tests from previous canary delayed bvt-arc from canary, causing timeout
    • 789058: UnitTest/Archive steps race: WARNING: CreateTarball: tar: source modification time changed
    • 799669banon-release: build fails security_SandboxedServices
    • 799604: Lakitu-gpu release: intermittently fails GCETest
    • 798540: jetstream_ApiServerDeveloperConfiguration flake: Timed out waiting for AP to appear operational
    Resolved Issues:
    • 798558factory-strago-7458.B build failure since 7458.357.0
    • 798618: veyron_minnie-paladin builds fail with cheets_StartAndroid.stress
    • 798649: chromite.scripts.merge_logs_unittest failed due to acrossing year
    • 798273: HWTest bvt-arc aborted cheets_CTS_N* and cheets_GTS* tests stuck on "waiting for cache lock"
    • 797620Build failure chrome-pfq: login_CryptohomeIncognito and security_ProfilePermissions.guest are failing across multiple boards

    12/25-12/29
    Sheriffs: jcliang, semenzato, renyi

    Ongoing Issues:
    • 783832: cheets_StartAndroid.stress fails on release builders and CQ.  Trying to reproduce.
    • 797620Build failure chrome-pfq: login_CryptohomeIncognito and security_ProfilePermissions.guest are failing across multiple boards
    • 796254: Archive stage fails in build_image
    • 796275bvt-arc times out across boards
    • 789058: UnitTest/Archive steps race: WARNING: CreateTarball: tar: source modification time changed - seen across many canaries.
    • 795128: evemu-device did not die resulting in security_SandboxedServices failing
    • 794242: chromeos6-row22-jetstream-host5 repeatedly failing tests
    • 796684: coral-paladin can't finish very often
    • 796737: coral-release hasn't succeeded since build 599
    • 782832: not enough daisy_skate devices to keep bvt pool alive
    • 795912: kefka-release:1758-1773 failed
    • 784914: DUT cannot reboot at pre-setup of rootfs update
    • 715011: nvmem ec test crashes
    • 795902: fizz-release failing since Dec 8
    Resolved Issues:
    • 797314: cheets_MediaPlayerVideoHWDecodeUsed failing across boards
    • 797599Build failure on coral-release builder: build_packages failed

    12/18-12/22
    Sheriffs: bmgordon, martinroth, chenghan

    Ongoing issues:
    • 797314: cheets_MediaPlayerVideoHWDecodeUsed failing across boards
    • 796254: Archive stage fails in build_image
    • 796275bvt-arc times out across boards
    • 789058: UnitTest/Archive steps race: WARNING: CreateTarball: tar: source modification time changed - seen across many canaries.
    • 795128: evemu-device did not die resulting in security_SandboxedServices failing
    • 794242: chromeos6-row22-jetstream-host5 repeatedly failing tests
    • 796684: coral-paladin can't finish very often
    • 796737: coral-release hasn't succeeded since build 599
    • 782832: not enough daisy_skate devices to keep bvt pool alive
    • 795912: kefka-release:1758-1773 failed
    • 784914: DUT cannot reboot at pre-setup of rootfs update
    • 715011: nvmem ec test crashes
    • 795902: fizz-release failing since Dec 8
    Resolved issues:
    • 796212: graphics_Idle: Unhandled ZeroDivisionError
    • 796916: reef unibuild config doesn't validate

    12/11-12/15
    Sheriffs: dtor, ecgh, deanliao

    12/04-12/08
    Sheriffs: hungte, rspangler, caveh

    Ongoing issues:
    • 767953: cheets_StartAndroid.stress: FAIL: Android did not boot! (first reported 22-Sep; maybe recurring now)
    • 789077: release: RootfsUpdateError: Update failed with unexpected update status: UPDATE_STATUS_IDLE'
    • 792262: Chrome Pre-Flight Exceptions on M64 Branch
    • 792592: ap-demons unit test failing with dbus errors on several canaries
    • 792667: CQ failure: Moblab AFE timeout
    • 793356: peach-pit-chrome-pfq failed HWTest because of no DUTs
    • 793447: M63 builders failing with INVALID_BUILD_DEFINITION on stabilize branch
    • 793499: Hwtest provision error on several chrome PFQs and informational PFQs
    Resolved Issues:
    • 791600: Master scheduler is down
    • 791786caroline-tot-chrome-pfq-informational failed HWTest security_OpenFDs
    • 791916: Master scheduler down with NoHostIdError
    • 792115 -> 791643: TestSimpleChromeWorkflow stage failing due to gsutil creds not updating
    • 792565desktopui_ScreenLocker failing on betty (--> removed from smoke test)
    • 792753: chromeos-firmware-coral build issues
    • 792985: CQ failure: MySQL Cannot execute statement
    • 757625: smbprovider unit tests failing ASAN builds
    Misc:
    • 792536: Need coral testing for branch builds (--> just turned on; may result in new bugs)


    11/27-12/01
    Sheriffs: drinkcat, athilenius, slavamn

    Ongoing issues

    Needs attention:
    Assigned but not fixed (?):
    • 789062: guado_moblab-paladin failed due to "lxc-clone: command not found"
    • 789451novato-arc64-release: Target image has run out of space
    • 740408: sheriffing rotation: No sheriff displayed on Monday morning TPE time
      • Patch needs OWNERS review
    • 788628: HWTest bvt-arc keeps timing out on a few boards
    Deputy stuff:
    • 784914: provision failurs: DUT cannot reboot at pre-setup of rootfs update
      • 788584kefka/coral-release/paladin: Linksys USB3GIGV1 Ethernet adapter fails to enumerate (r8152, usb X-Y: device not accepting address Z, error -62)
      • 788589: kefka-release: cannot recover from reboot at post check of stateful update // pre-setup of rootfs update  (duped to above)
    • Missing DUT sadness:
      • 782832not enough daisy_skate devices to keep bvt pool alive
      • 788586: daisy_spring: Not enough DUTs for board: pool: bvt; required: 4, found: 3
      • 788596: veyron_rialto: No good devices in pool:bvt
      • 780738: M64: FAIL builds of veyron_tiger since 10/28
      • 789352enguarde: Not enough DUTs for board: enguarde, pool: bvt; required: 4, found: 2
      • 789420: pyro-release: bvt-arc suite timeout
    Resolved issues
    • 788455: lxc-start failing in HWTest for electro and basking
      • 788595: pyro-release: lxc-start failing in HWTest for pyro
      • ultima-release as well
    • 788925: File dir-ROOT-A/opt/google/chrome/libwidevinecdm.so contains unsatisfied symbols: set(['\x07\x01'])
      • Reverted libwidevinecdm change, hmchen and xhwang are looking
      • AI: Could we possibly run ImageTest in Chrome PFQ to avoid this issue next time?
    • 789839: chromium-pfq: BuildPackages: chromeos-chrome: Command 'lsb_release -a' returned non-zero exit status 3
      • Broke -master and pfq for a few builds...
    Misc:
    • 789461: eve-release: cheets_ContainerMount: Mount points are mismatched with the expected list
    • 788017: falco-release times-out at BuildPackage
    • 788592: nefario-release: The BuildPackages [afdo_use] stage failed: Packages failed in ./build_packages: sys-boot/depthcharge
    Flakes and other issues (not fixed but not consistently failing either):
    • 789077: -release: RootfsUpdateError: Update failed with unexpected update status: UPDATE_STATUS_IDLE'
    • 788591: mccloud: graphics_GLMark2: crash in i915_gem_retire_requests_ring/i915_gem_object_move_to_inactive

      11/13-11/17
      Sheriffs: benchan, nsanders, hiroh

      Ongoing issues
      • 784462Provision failure spike in the lab
        • (Duplicated) 784222: PaygenTestDev failed on multiple canary builds
      • 784225: TestLabException: Not enough DUTs on Chrome-PFQ, Android-PFQ and canary build
      • 784686: veyron_rialto-paladin failed at BuildImage staging due to package: chromeos-base/telemetry
      • 786159: ImportError: No module named lockfile
      • 786159: HWTest failed due to INVALID_OPTIONS
      • 786159: AFE is down: google-sso enforced a new config requirement, breaking our apache servers
      • 786167: auto-update failed with StatefulUpdateError
      • 786395: CQ master failed to push a change with 'git log' errors
      • 786487: reef-uni-paladin failed due to no valid hosts for board:reef-uni
      • 785552: provision failures: DUT cannot recover from reboot at post check of rootfs update

      11/6-11/10
      Sheriffs: puthik, ddavenport, cywang

      Resolved issues
      • 782509video_ChromeHWDecodeUsed mse tests are failed because crosvideo.appspot.com is broken down.
      • 781845: desktopui_ScreenLocker failing on amd64-generic and betty
      • 781302: slow queries on shards | chromeos-server98 and 104 tick rate is really low
      • 783312: video_ChromeHWDecodeUsed failing on tricky, caroline, lumpy, peppy
      • 781852: CQ failure when there are no CLs in the CQ run
      • 783449: unittest flake in autotest_lib.site_utils.lxc.container_pool.client_unittest.ClientTests.testConnection
      Ongoing issues
      • 776997: cheets_StartAndroid.stress failes and chrome / kernel crashes
      • 783832: cheets_StartAndroid.stress timeout

      10/30-11/6
      Sheriffs: teravest, justincarlson, cywang
      • 782509: widespread Media.GpuVideoDecoderInitializeStatus not loaded or histogram bucket not found or histogram bucket found at < 100%" - the root cause is "404 in crosvideo.appspot.com". hiroh@ is helping to make a workaround to redirect requests to crosvideo2.appspot.com temporarily.
      • 782577incorrect dependencies of media-libs/arc-camera3-libcamera_jpeg (Fixed)

      10/30-11/3
      Sheriffs: teravest, justincarlson, fukino
      • 777920[kernel 3.18] veyron_speedy provision failure: USB enumeration of ethernet adapter fails with "can't set config #1, error -71"
      • 768542: DUT fails to bring up USB ethernet adapter after reboot in provision (chromeos kernel 4.4)
      • 779583: General Protection Fault in kernel-list_move_tail called from i915
        • Causes graphics_Idle failures
      • 780515: daisy_skate-release:1910 failed
        • Paygen failures
      • 780045: BuildPackages failing to build chromeos-chrome
        • This should be resolved, but keep an eye on the next goma update.
      • 780503: cave-release:1635 failed
      • 765686: wizpig-paladin Provision failed: Post-provision check for "system-services" being "start/running" can fail
        • This needs more attention and debugging.

      10/23-10/27
      Sheriffs: akahuang, jinsong, mruthven
      • 777250HWTest failed to provision on peach_pit and veyron_minnie, let Chrome gardener to triage
      • 776919: lakitu-gpu, lakitu, lakitu paladin failed at build_package, should be fixed by CL:735061 and CL:737773
      • 766259: buildstart stage failing with IntegrityError, a flaky failure.
      • 777829: Most paladins raised exception "process killed by signal 9"

        10/16-10/20
        Sheriffs: groeck, xiaochu, fukino, tetsui
        • 775872: M64: Cyan, Eve, Kefka, Samus build is RED for 4 days

        10/9-10/13
        Sheriffs: jclinton, furquan, posciak
        • 773185: All Chrome PFQ bots failing starting from 63.0.3237.0 due to a syntax error in DEPS
        • 772568: lumpy, peppy, tricky Chrome PFQ failures in vmtest; manual uprev via 773446

        10/2-10/8
        Sheriffs: ntang, djkurtz, phobbs
        • 771396: Lab DNS failure caused wide spread master-paladin filaure.
        • 771236: Provision failure due to version '9999'
        • 772582: Puppet run may interrupt the ssh_config and causes ssh conntection failure.
        • 770778: A few cases of shard apache process death, which needs alerting.
        • 770865: Shard db inconsistent with master db causes shard_client crashloop
        • 770715:  Quite a few graphics_drm failure (fixed).
          9/25-10/1
          Sheriffs: chinyue, vbendeb, mxt
          • 769099autotest-server & autotest-web-frontend circular dep
          • 769334betty-arc64-paladin failed VMTest
          • 768280: build_image run out of space

            9/18-9/24
            Sheriffs: puneetster, amstan, 

            OLDER ENTRIES MOVED TO THE ARCHIVE so this page doesn't take forever to load.  See Sheriff Log: Chromium OS (ARCHIVE!)
            Comments