Sheriff Log: Chromium OS (go/croslog)


Please update go/cros-sheriff-playbook when you find a build/infra failure and can map it to what action the sheriff should take for it.

2/12-2/16
sheriffs: snanda, mqg, littlecvr
Ongoing issues:
  • 811210: HWTest failed due to infra issue (code 3)
    • 811149: widespready DUT pool shortfalls | shards unable to resolve DUTs | dhclient_conf is not set to "yes" on new shards
  • 811217eve-arcnext-mst-android-pfq:78 failed (PackageBuildFailure: Packages failed in ./build_packages: x11-libs/arc-libdrm)        
  • b/72697187: quota increase request for ChromeOS Infrastructure
  • 811402: Master scheduler crashlooping because of malformed HQE
  • 811878cheets_KeyboardTest fails frequently on multiple boards: retry_count: 2, FAIL: Unhandled AssertionError
    • 569819: temporarily remove cheets_KeyboardTest from CQ <- already reverted
  • 812425quipper build seems to be failing unittests occasionally
  • 812949: DUT not rebooted in provisioning
  • 798618: Intermittent paladin failures with cheets_StartAndroid.stress
  • 812581: PackageBuildFailure: Packages failed in ./build_packages: chromeos-base/chromeos-chrome
  • 812848: DUTs get stuck in un-abortable state
  • 811697: Cyan HQEs queuing up catastrophically in shard (M66: Caroline, Cyan build is RED for 2 days)

2/5-2/9
sheriffs: craigb, adlr, abhishekbh, shenghao
Ongoing issues:
        808923: Chrome PFQ failed due to not able to download files from GS
        808945: HWTest failed due to infra issue (code 3)
        809570: edgar-paladin can't mount chroot
        809670CQ blocker: nyan_kitty-paladin fails due to video test errors
        810247wolf-paladin fails: Start browser timeout
        810255: cheets_ContainerMount failed and is blocking CQ
        810667coral-paladin failed on pack_firmware_unittest.py

1/29-2/2
sheriffs: vpalatin, ravisadineni, abhishekbh
Ongoing issues:
        
808434: Canaries are failing at the build_packages stage due to a python error.
        808563: bvt-arc suite in Canaries builds failing due as the server side tests do not start after scheduling.

1/22-1/28
Sheriffs: bmgordon, shapiroc, marcochen

Ongoing issues:
  • 784914: provision failurs: DUT cannot reboot at pre-setup of rootfs update
  • 806287: chromeos4-row11-rack10-host13 is failing to provision
  • 806013: libbrillo has new ASAN error
  • 796275bvt-arc times out across boards
  • 804977: guado_moblab-paladin is failing for HW tests
  • 782832: not enough daisy_skate devices to keep bvt pool alive
  • b/72397774: intermittent failures to connect to git/gerrit
    • 805928: Release builders failing during ManifestVersionSync stage
      • This looks like potentially a transient git issue
  • 782034: autotest artifacts persist between CQ runs
Resolved issues:
  • libc++ related changes
    • 805691: qhull fails to build with libc++
    • 805619: buildpackages failing on numerous packages in bare precq run
    • 805722: Clobber incremental builders
    • 805657: arc-camera3-hal-intel-ipu3 fails to biuld with libc++
  • TKO/mysql issues
    • 806019: tko query pileup | tko restart takes over an hour
    • 805337: TKO database reached maximum size.
    • 804127: shard outage for board:leon, board:nyan_big
    • 805724: job_reporter died, causing passing test to appear aborted
    • 804425: shard outage for board:orco
    • 806011: chromeos-server133 afe serving lots of 5XXs
  • 806106: lumpy-incremental-paladin failure
  • 806107: Not enough lumpy DUTs
  • 806196: Uprev stage fails on cros_mark_as_stable
  • 805517: Betty android pfq tests are failing due to VM AU issue
  • 805710: gale-paladin fails in ap-daemons unit tests
  • 804513: eve-paladin failed to rebuild previously removed ebuild
  • 804372: Missing alerts in Sheriff-o-Matic

1/8-1/12
Sheriffs: ahassani, dianders, hiroh

NOTE: Trying idea of just keeping the week's log in a Google doc.

Ongoing Issues:
  • 782832: not enough daisy_skate devices to keep bvt pool alive
  • 795902: fizz-release failing since Dec 8
  • 800831: provisioning issue across the board
  • 800943: ls-remote-gerrit rate limit exceeded
  • 800949: canaries had a problem pushing to gerrit
Resolved Issues:
  • 800426race between the Android PFQ and the CQ
  • 800132: infrastructure issue caused provisioning failures
  • 800886: moblab-paladin autotest failure

1/1-1/5
Sheriffs: frankhu, sarthakkukreti, johnylin

Ongoing Issues:
  • 782832: not enough daisy_skate devices to keep bvt pool alive
  • 795902: fizz-release failing since Dec 8
  • 788584: kefka/coral-release/paladin: Linksys USB3GIGV1 Ethernet adapter fails to enumerate (r8152, usb X-Y: device not accepting address Z, error -62)
  • 797849: paygen tests from previous canary delayed bvt-arc from canary, causing timeout
  • 789058: UnitTest/Archive steps race: WARNING: CreateTarball: tar: source modification time changed
  • 799669banon-release: build fails security_SandboxedServices
  • 799604: Lakitu-gpu release: intermittently fails GCETest
  • 798540: jetstream_ApiServerDeveloperConfiguration flake: Timed out waiting for AP to appear operational
Resolved Issues:
  • 798558factory-strago-7458.B build failure since 7458.357.0
  • 798618: veyron_minnie-paladin builds fail with cheets_StartAndroid.stress
  • 798649: chromite.scripts.merge_logs_unittest failed due to acrossing year
  • 798273: HWTest bvt-arc aborted cheets_CTS_N* and cheets_GTS* tests stuck on "waiting for cache lock"
  • 797620Build failure chrome-pfq: login_CryptohomeIncognito and security_ProfilePermissions.guest are failing across multiple boards

12/25-12/29
Sheriffs: jcliang, semenzato, renyi

Ongoing Issues:
  • 783832: cheets_StartAndroid.stress fails on release builders and CQ.  Trying to reproduce.
  • 797620Build failure chrome-pfq: login_CryptohomeIncognito and security_ProfilePermissions.guest are failing across multiple boards
  • 796254: Archive stage fails in build_image
  • 796275bvt-arc times out across boards
  • 789058: UnitTest/Archive steps race: WARNING: CreateTarball: tar: source modification time changed - seen across many canaries.
  • 795128: evemu-device did not die resulting in security_SandboxedServices failing
  • 794242: chromeos6-row22-jetstream-host5 repeatedly failing tests
  • 796684: coral-paladin can't finish very often
  • 796737: coral-release hasn't succeeded since build 599
  • 782832: not enough daisy_skate devices to keep bvt pool alive
  • 795912: kefka-release:1758-1773 failed
  • 784914: DUT cannot reboot at pre-setup of rootfs update
  • 715011: nvmem ec test crashes
  • 795902: fizz-release failing since Dec 8
Resolved Issues:
  • 797314: cheets_MediaPlayerVideoHWDecodeUsed failing across boards
  • 797599Build failure on coral-release builder: build_packages failed

12/18-12/22
Sheriffs: bmgordon, martinroth, chenghan

Ongoing issues:
  • 797314: cheets_MediaPlayerVideoHWDecodeUsed failing across boards
  • 796254: Archive stage fails in build_image
  • 796275bvt-arc times out across boards
  • 789058: UnitTest/Archive steps race: WARNING: CreateTarball: tar: source modification time changed - seen across many canaries.
  • 795128: evemu-device did not die resulting in security_SandboxedServices failing
  • 794242: chromeos6-row22-jetstream-host5 repeatedly failing tests
  • 796684: coral-paladin can't finish very often
  • 796737: coral-release hasn't succeeded since build 599
  • 782832: not enough daisy_skate devices to keep bvt pool alive
  • 795912: kefka-release:1758-1773 failed
  • 784914: DUT cannot reboot at pre-setup of rootfs update
  • 715011: nvmem ec test crashes
  • 795902: fizz-release failing since Dec 8
Resolved issues:
  • 796212: graphics_Idle: Unhandled ZeroDivisionError
  • 796916: reef unibuild config doesn't validate

12/11-12/15
Sheriffs: dtor, ecgh, deanliao

12/04-12/08
Sheriffs: hungte, rspangler, caveh

Ongoing issues:
  • 767953: cheets_StartAndroid.stress: FAIL: Android did not boot! (first reported 22-Sep; maybe recurring now)
  • 789077: release: RootfsUpdateError: Update failed with unexpected update status: UPDATE_STATUS_IDLE'
  • 792262: Chrome Pre-Flight Exceptions on M64 Branch
  • 792592: ap-demons unit test failing with dbus errors on several canaries
  • 792667: CQ failure: Moblab AFE timeout
  • 793356: peach-pit-chrome-pfq failed HWTest because of no DUTs
  • 793447: M63 builders failing with INVALID_BUILD_DEFINITION on stabilize branch
  • 793499: Hwtest provision error on several chrome PFQs and informational PFQs
Resolved Issues:
  • 791600: Master scheduler is down
  • 791786caroline-tot-chrome-pfq-informational failed HWTest security_OpenFDs
  • 791916: Master scheduler down with NoHostIdError
  • 792115 -> 791643: TestSimpleChromeWorkflow stage failing due to gsutil creds not updating
  • 792565desktopui_ScreenLocker failing on betty (--> removed from smoke test)
  • 792753: chromeos-firmware-coral build issues
  • 792985: CQ failure: MySQL Cannot execute statement
  • 757625: smbprovider unit tests failing ASAN builds
Misc:
  • 792536: Need coral testing for branch builds (--> just turned on; may result in new bugs)


11/27-12/01
Sheriffs: drinkcat, athilenius, slavamn

Ongoing issues

Needs attention:
Assigned but not fixed (?):
  • 789062: guado_moblab-paladin failed due to "lxc-clone: command not found"
  • 789451novato-arc64-release: Target image has run out of space
  • 740408: sheriffing rotation: No sheriff displayed on Monday morning TPE time
    • Patch needs OWNERS review
  • 788628: HWTest bvt-arc keeps timing out on a few boards
Deputy stuff:
  • 784914: provision failurs: DUT cannot reboot at pre-setup of rootfs update
    • 788584kefka/coral-release/paladin: Linksys USB3GIGV1 Ethernet adapter fails to enumerate (r8152, usb X-Y: device not accepting address Z, error -62)
    • 788589: kefka-release: cannot recover from reboot at post check of stateful update // pre-setup of rootfs update  (duped to above)
  • Missing DUT sadness:
    • 782832not enough daisy_skate devices to keep bvt pool alive
    • 788586: daisy_spring: Not enough DUTs for board: pool: bvt; required: 4, found: 3
    • 788596: veyron_rialto: No good devices in pool:bvt
    • 780738: M64: FAIL builds of veyron_tiger since 10/28
    • 789352enguarde: Not enough DUTs for board: enguarde, pool: bvt; required: 4, found: 2
    • 789420: pyro-release: bvt-arc suite timeout
Resolved issues
  • 788455: lxc-start failing in HWTest for electro and basking
    • 788595: pyro-release: lxc-start failing in HWTest for pyro
    • ultima-release as well
  • 788925: File dir-ROOT-A/opt/google/chrome/libwidevinecdm.so contains unsatisfied symbols: set(['\x07\x01'])
    • Reverted libwidevinecdm change, hmchen and xhwang are looking
    • AI: Could we possibly run ImageTest in Chrome PFQ to avoid this issue next time?
  • 789839: chromium-pfq: BuildPackages: chromeos-chrome: Command 'lsb_release -a' returned non-zero exit status 3
    • Broke -master and pfq for a few builds...
Misc:
  • 789461: eve-release: cheets_ContainerMount: Mount points are mismatched with the expected list
  • 788017: falco-release times-out at BuildPackage
  • 788592: nefario-release: The BuildPackages [afdo_use] stage failed: Packages failed in ./build_packages: sys-boot/depthcharge
Flakes and other issues (not fixed but not consistently failing either):
  • 789077: -release: RootfsUpdateError: Update failed with unexpected update status: UPDATE_STATUS_IDLE'
  • 788591: mccloud: graphics_GLMark2: crash in i915_gem_retire_requests_ring/i915_gem_object_move_to_inactive

    11/13-11/17
    Sheriffs: benchan, nsanders, hiroh

    Ongoing issues
    • 784462Provision failure spike in the lab
      • (Duplicated) 784222: PaygenTestDev failed on multiple canary builds
    • 784225: TestLabException: Not enough DUTs on Chrome-PFQ, Android-PFQ and canary build
    • 784686: veyron_rialto-paladin failed at BuildImage staging due to package: chromeos-base/telemetry
    • 786159: ImportError: No module named lockfile
    • 786159: HWTest failed due to INVALID_OPTIONS
    • 786159: AFE is down: google-sso enforced a new config requirement, breaking our apache servers
    • 786167: auto-update failed with StatefulUpdateError
    • 786395: CQ master failed to push a change with 'git log' errors
    • 786487: reef-uni-paladin failed due to no valid hosts for board:reef-uni
    • 785552: provision failures: DUT cannot recover from reboot at post check of rootfs update

    11/6-11/10
    Sheriffs: puthik, ddavenport, cywang

    Resolved issues
    • 782509video_ChromeHWDecodeUsed mse tests are failed because crosvideo.appspot.com is broken down.
    • 781845: desktopui_ScreenLocker failing on amd64-generic and betty
    • 781302: slow queries on shards | chromeos-server98 and 104 tick rate is really low
    • 783312: video_ChromeHWDecodeUsed failing on tricky, caroline, lumpy, peppy
    • 781852: CQ failure when there are no CLs in the CQ run
    • 783449: unittest flake in autotest_lib.site_utils.lxc.container_pool.client_unittest.ClientTests.testConnection
    Ongoing issues
    • 776997: cheets_StartAndroid.stress failes and chrome / kernel crashes
    • 783832: cheets_StartAndroid.stress timeout

    10/30-11/6
    Sheriffs: teravest, justincarlson, cywang
    • 782509: widespread Media.GpuVideoDecoderInitializeStatus not loaded or histogram bucket not found or histogram bucket found at < 100%" - the root cause is "404 in crosvideo.appspot.com". hiroh@ is helping to make a workaround to redirect requests to crosvideo2.appspot.com temporarily.
    • 782577incorrect dependencies of media-libs/arc-camera3-libcamera_jpeg (Fixed)

    10/30-11/3
    Sheriffs: teravest, justincarlson, fukino
    • 777920[kernel 3.18] veyron_speedy provision failure: USB enumeration of ethernet adapter fails with "can't set config #1, error -71"
    • 768542: DUT fails to bring up USB ethernet adapter after reboot in provision (chromeos kernel 4.4)
    • 779583: General Protection Fault in kernel-list_move_tail called from i915
      • Causes graphics_Idle failures
    • 780515: daisy_skate-release:1910 failed
      • Paygen failures
    • 780045: BuildPackages failing to build chromeos-chrome
      • This should be resolved, but keep an eye on the next goma update.
    • 780503: cave-release:1635 failed
    • 765686: wizpig-paladin Provision failed: Post-provision check for "system-services" being "start/running" can fail
      • This needs more attention and debugging.

    10/23-10/27
    Sheriffs: akahuang, jinsong, mruthven
    • 777250HWTest failed to provision on peach_pit and veyron_minnie, let Chrome gardener to triage
    • 776919: lakitu-gpu, lakitu, lakitu paladin failed at build_package, should be fixed by CL:735061 and CL:737773
    • 766259: buildstart stage failing with IntegrityError, a flaky failure.
    • 777829: Most paladins raised exception "process killed by signal 9"

      10/16-10/20
      Sheriffs: groeck, xiaochu, fukino, tetsui
      • 775872: M64: Cyan, Eve, Kefka, Samus build is RED for 4 days

      10/9-10/13
      Sheriffs: jclinton, furquan, posciak
      • 773185: All Chrome PFQ bots failing starting from 63.0.3237.0 due to a syntax error in DEPS
      • 772568: lumpy, peppy, tricky Chrome PFQ failures in vmtest; manual uprev via 773446

      10/2-10/8
      Sheriffs: ntang, djkurtz, phobbs
      • 771396: Lab DNS failure caused wide spread master-paladin filaure.
      • 771236: Provision failure due to version '9999'
      • 772582: Puppet run may interrupt the ssh_config and causes ssh conntection failure.
      • 770778: A few cases of shard apache process death, which needs alerting.
      • 770865: Shard db inconsistent with master db causes shard_client crashloop
      • 770715:  Quite a few graphics_drm failure (fixed).
        9/25-10/1
        Sheriffs: chinyue, vbendeb, mxt
        • 769099autotest-server & autotest-web-frontend circular dep
        • 769334betty-arc64-paladin failed VMTest
        • 768280: build_image run out of space

          9/18-9/24
          Sheriffs: puneetster, amstan, 

          OLDER ENTRIES MOVED TO THE ARCHIVE so this page doesn't take forever to load.  See Sheriff Log: Chromium OS (ARCHIVE!)
          Comments