Sheriff Log: Chromium OS (go/croslog)

2016-05-26-27
Sheriff: jrbarnette, waihong, wnhuang
  • 615474x86-alex-paladin HwTest timeout abort
  • 615151guado_moblab: failing provision because moblab-scheduler-init isn't running

2016-05-25
Sheriff: djkurtz
Gardeners: slavamn, puthik
    • 614579: [bvt-inline] security_ASLR Failure on daisy_skate-chrome-pfq/R53-8368.0.0-rc2
    • 614606: nyan-release consistently failing signing
    • 615029: minnie failing to sign
    2016-05-24
    Sheriff: littlecvr
    Gardeners: stevenjb, levarum
    • 613868: build141-m2 had been swapped, but a restart is needed. The restart has been scheduled at the EOD (PDT time).
    • 614261: build141-m2 had been replaced by build257-m2, but build257-m2 died again.

    2016-05-23
    Sheriff: littlecvr
    Gardeners: stevenjb, levarum
    • 613868: build141-m2 is offline and there is no backup.
    • 612688: KioskTests are flaky on ChromiumOS bots.
    • 611405 ASan builders failed when building update_engine
    • 614040: cyan-cheets continues to faill with PoolHealthBug

    2016-05-18
    Sheriff: martinroth, wfrichar
    Deputy: akeshet
    • p/53507 VMTests have been failing for several days in the canary builds due to crashing DisplayLinkManager.

    2016-05-16
    Sheriff: robotoboy, dtor
    Deputy:
    • 611405 ASan builders failed when building update_engine - deymo@: A CL on AOSP landed to fix that last week, there's an uprev blocked on some CQ issues that I'll get to today.

    2016-05-12
    Sheriff: ravisadineni, zqiu
    Deputy: shuqianz
        [ONGOING] :  mccloud-release, stumpy-release [Issue 609926]  FAIL: Powerwash count didn't increase after powerwash cycle
        [FLAKE]       :   paygen issue [Issue 605181  Issue 606071] : paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a                                connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log.
       [RESOLVED] : relm-release [ Issue 611528] : doins failled.

    2016-05-11
    Sheriff: reveman, sonnyrao, tbroch
    Deputy: shuqianz
    • [RESOLVED] everything : manifestversionedsync: GoB quota issue (611084b/28721585b/28720367) 
      • veyron_pinky-release
      • samus-paladin : Tried fetch locally and it worked.
        • RunCommandError: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/ap-daemons refs/changes/27/258727/1
        • fatal: remote error: Git repository not found
    • [FLAKE] zako-release : paygen : (605181)
      • paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log
    • [EXPECTED] gru-release : chromeos-initramfs emerge fails (605597)
    • [RESOLVED] master-paladin : daisy_skate-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** 
      • provision_AutoUpdate.double [ FAILED ] 
      • provision_AutoUpdate.double ABORT: None
        • [FLAKE] test running successfully but suite aborted at ~30min.  Says it should run for 90min however.

    2016-05-10
    Sheriff: reveman, sonnyrao, tbroch
    Deputy: shuqianz
    • [ONGOING] guado_moblab, - [provision]: FAIL: Moblab has 0 Ready DUTs, completed successfully (610727, repair: b/28690294)
    • [RESOLVED] devserver issue: (b/28704856)
    • [INFO] buildbot slave shutdowns on 5/9 for emergency maintenance having some fallout (paladins)
    • [FLAKE] ninja-release - [bvt-inline]: FAIL 62794807-chromeos-test/chromeos4-row3-rack9-host6/provision_AutoUpdate provision
      • Unhandled AutoservSSHTimeout: ('ssh timed out', * Command:
      • flake?  Host is fine now.
    • [RESOLVED] veyron_speedy-paladindaisy_skate-release - [bvt-cq]: Exception waiting for results, JSONRPCException: Error decoding JSON response (606071)
    • [RESOLVED] *-cheets-android-pfq [buildPackages]: autotest-cheets-* import error: No module named cros.graphics.drm (b/28694363)
    • [RESOLVED] Lars builds down for hardware swap ()

    2016-05-06 to 09
    Sheriff: mruthven, rspangler, kcwu
    Gardener: jennyz

    More detailed notes on our shift are here.

    Stuff that broke and was fixed:
    • Lots of other release builders failing with "timed out", "didn't start", or on Sync-Chrome on Friday.  dgarrett@ said the release builders are being reorganized and will be highly unreliable.  Cleared up over the weekend.
    • CQ failed on multiple paladins HWTest with two types of failure, but both seem to have the same underlying cause in the logs.  Filed 610000 and throttled tree.
      • [bvt-inline] - logging_CrashSender: retry_count: 2, FAIL: Simple minidump send failed
      • [bvt-cq] - logging_UserCrash: FAIL: Did not find version 8288.0.0-rc2 in log output
      • Cause: CL 342574 (fixed)
    • [bvt-cq] - graphics_Gbm: FAIL: Gbm test failed().  Bad CL has been identified and fixed.
    Stuff that's still broken:
    • veyron_rialto-release fails: BuildPackages: Cannot find prebuilts for chromeos-base/chromeos-chrome.  (590784)
    • stout-paladin builder (build126-m2) is offline (609682)
    • daisy_skate-release - AUTest misconfigured (610088)
    • CQ failed with CommitQueueSync errors on multiple paladins (server hung up unexpectedly), but passed on the next run.  Seems to happen in the afternoon.

    2016-05-05
    Sheriff: groeck, furquan
    Gardener: jennyz
    • 609610MobLab ToT not showing network bridge
    2016-05-03
    Sheriff: johnylin
    • 609054: M52: Failed to update the status for master-release
      • Error message: "fatal: could not read Username for 'https://chrome-internal.googlesource.com': No such device or address
        "
      • Many CQ/PFQ build failure related to this as well
        • CQ: failed CommitQueueSync
        • PFQ: failed MasterSlaveLKGMSync
    • 608838: Some video/media tests are temporary waived on veyron
      • Workaround needs to revert after this fixed

    2016-05-03
    Sheriff: johnylin
    • Powerwash flakes on Canaries 605325:
      • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group
      • https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group  => almost never passed
      • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1760
      • https://uberchromegw.corp.google.com/i/chromeos/builders/sandybridge-release-group  => almost never passed
    • Paygen flakes on Canaries 516795:
      • https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/124
    • Build failures on lakitu-release:
    • HWTest flakes on Canaries:
      • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-c-release-group/builds/2225
      • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1760
      • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-e-release-group/builds/1063
      • https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group
    • Some autoupdate rollback failures in terra / wizpig / reks / celes / ultima. Lab network issue? 596262
      • https://uberchromegw.corp.google.com/i/chromeos/builders/strago-b-release-group
      • https://uberchromegw.corp.google.com/i/chromeos/builders/strago-release-group/builds/1135
    • Not enough disk space on veyron-b-release-group 605601
    • CQ:
      • veyron_rialto is failing with "ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto"
        • Failed for a long time. Under tracking in 590784


    2016-04-22
    Sheriff: drinkcat
    • Lab issue? 605464
      • I think it got better
    • CQ:
      • One instance of a "nyan-full-compile-paladin did not start": seems like random flake
    • Canaries
      • Paygen on link almost never passes 605849
    • Chrome-PFQ:
      • BuildImage: ERROR: test_elf_deps: Failed dependency check (chromium-pfq on arm/x86 platforms) 605851
        • Chumped a revert, but the bug the original CL was fixing is also P0: 601854, please coordinate with ihf & gardener.
    • Android-PFQ:
      • chrome gs handler issue: some files do not have a md5 sum 605861

    2016-04-21
    Sheriff: drinkcat, denniskempin, dbasehore
    • Lab issue? 605464
      • wolf-paladin fail: wolf-tot-paladin/builds/6443 wolf-paladin/builds/10777
      • A number of HWTest timeout:
        • https://uberchromegw.corp.google.com/i/chromeos/builders/auron-b-release-group/builds/1473
        • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2087
        • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group/builds/2100
        • https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/89
        • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1725
        • https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group/builds/3631
        • https://uberchromegw.corp.google.com/i/chromeos/builders/strago-b-release-group/builds/549
        • https://uberchromegw.corp.google.com/i/chromeos/builders/veyron-b-release-group/builds/1473
        • https://uberchromegw.corp.google.com/i/chromeos/builders/kunimitsu-release-group/builds/796 => no, different stuff
        • https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group/builds/3632
      • Paygen failure:
        • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2087
        • https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group/builds/1404
        • https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group/builds/1405
        • https://uberchromegw.corp.google.com/i/chromeos/builders/nyan-release-group/builds/2750
        • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-b-release-group/builds/2297
        • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-c-release-group/builds/2190
        • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1725
        • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-e-release-group/builds/1029
        • https://uberchromegw.corp.google.com/i/chromeos/builders/strago-c-release-group/builds/415
        • https://uberchromegw.corp.google.com/i/chromeos/builders/smaug-release/builds/1037
        • https://uberchromegw.corp.google.com/i/chromeos/builders/glados-release-group/builds/910
        • https://uberchromegw.corp.google.com/i/chromeos/builders/auron-release-group/builds/1822
        • https://uberchromegw.corp.google.com/i/chromeos/builders/ivybridge-release-group/builds/1917
    • CQ
      • *-cheets-paladin fail in HWTest: 605309
    • Canaries
      • rambi-release BuildPackages timeout: 605402 . Likely a flake.
      • Minor guado_molab-release BuildPackages issue: 605408
      • guado-moblab-release HWTest: 605409
        • https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-release/builds/883
      • x86-alex/x86-zgb/x86-alex_he: chromeos-kernel-3_8: undefined reference to `watchdog_dev_unregister': 605458
      • cros_make_image_bootable is failing 605587
      • veyron_rialto still failing due to lack of chrome prebuilt: 597966
      • More powerwash failures 605325
        • https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1726
        • https://uberchromegw.corp.google.com/i/chromeos/builders/sandybridge-release-group/builds/2363
        • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group/builds/2101
      • More autoupdate failures 605181
        • https://uberchromegw.corp.google.com/i/chromeos/builders/daisy-release-group/builds/4887
        • https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2088
      • More /dev/loop0 issues 605176
        • https://uberchromegw.corp.google.com/i/chromeos/builders/pineview-release-group/builds/2300
      • security_test_image failing on amd64-generic-goofy-release 605595
        • https://uberchromegw.corp.google.com/i/chromeos/builders/amd64-generic-goofy-release/builds/231
      • Gru is failing to build chromeos-initramfs 605597
        • https://uberchromegw.corp.google.com/i/chromeos/builders/gru-release-group/builds/60
      • Not enough disk space on veyron-b-release-group 605601
        • https://uberchromegw.corp.google.com/i/chromeos/builders/veyron-b-release-group/builds/1474
      • Enguarge builds packages for >7 hours, gets killed. 605608
        • https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/90
      • gale-release failing to build chromeos-bootimage 605638
        • https://uberchromegw.corp.google.com/i/chromeos/builders/gale-release/builds/57
        • https://uberchromegw.corp.google.com/i/chromeos/builders/gale-release/builds/58
    • PFQ
      • Chrome fails to build on all PFQs 605592

    2016-04-20
    Sheriff: jcliang, denniskempin, dbasehore
    • CQ
      • veyron_rialto has been failing for ages due to lack of chrome prebuilt: 597966
    • Canaries
      • stumpy pool health bug: 596647
      • Powerwash is still failing on multiple boards: 589030
      • Intermittent au test failures on multiple boards. Looks like infra flakes.
      • auron-release-group and ivybridge-release-group keep failing paygen 605159
      • auron-b-release-group fails to build image 605155
      • daisy-release-group failing hwtest 535795
      • Hosts not returning after powerwash 589089
      • rambi-e-release-group is having issues with /dev/loop0 605176
      • Tons of failures of autoupdate_EndtoEndTest.paygen 605181
      • Veyron-b builders still out of space: bug
      • AutoservRunError on guado_moblab-paladin 605241
    • PFW
      • nyan-chrome-pfq fails to build packages 605202
    2016-04-19
    Sheriff: jcliang, puneetster, charliemooney
    • Powerwash is still failing on multiple boards: 589030
    • panther pool health bug: 597744
    • Veyron-b builders running out of space: bug
    • It looks like the master builder crashed and took out several slaves, but then recovered gracefully.
    • Chrome PFQ did not update over the weekend.  Working with dimu@ and ketaki@ to figure out why
    • Lulu Cheets failing to sign bug
    • "Timeout deadline set by master" error in PayGen for Auron bug
    • Alex's missing in the pool bug

    2016-04-18
    Sheriff: puneetster, charliemooney
    • Powerwash continues to fail bug
    • Not enough builders in the pool, killing some canaries bug
    • Generic SSH (255) errors continue bug

    2016-04-14
    Sheriff: bleung, briannorris, cywang
    • CQ
      • Pool Health Bug (almost all boards are affected, peppy-paladin:601988 wolf-palain:603450 veyron-speedy-paladin:603455,  daisy-skate-paladin:603456, ...)
        • machines in pool:{cq, bvt} are all marked as 'Repair Failed', no bvt-cq bvt-inline suites can be executed.
        • clicked 'Repair button' on a failed DUT but in vain.
    • Canaries
    • PFQ
      • issue 603169 : extensions_to_load has been moved to browser options, a hiccup during transition on Chrome PFQ, fixed by achuith
    2016-04-13
    Sheriff: bleung, briannorris, cywang
    • CQ
    • Canaries
    • PFQ
    • Other
      • issue 603248 : gizmo-paladin and gizmo-release builders were removed yesterday, but they still appear on the waterfall, failing again and again. Waterfall may need to be restarted.

    2016-04-12
    Sheriff: cychiang, bleung, adlr
    2016-04-11
    Sheriff: cychiang


    2016-04-07
    Sheriff: shchen, briannorris

    Redirects to log files are now working again.  No more hand-modifying urls :).

    • CQ
      • There is an ongoing provisioning error (598517) that's hitting the CQ with the error: FAIL: Failed to install device image using payload athttp://100.107.160.2:8082/update/peach_pit-paladin/R51-8162.0.0-rc2 on chromeos4-row7-rack13-host11. SSH reports a generic error (255) which is probably a lab network failure.  It is still under investigation.  I think that I saw it at least 5 times during my sheriffing shift.
      • veyron_speedy had failed three times, twice with a provisioning error: "Update failed. Returned update_engine error code: ERROR_CODE=49, ERROR_MESSAGE=ErrorCode::kNonCriticalUpdateInOOBE. Reported error: AutoservRunError". This is a known issue: 600737.
      • veyron_minnie-cheets failed with a timeout error.  I checked the individual tests in the suite and they seemed to all pass (nothing aborted).  I contacted the deputy and he added more DUTs to the pool for minnie to hopefully rectify this situation.
    • PFQ
      • HWTest and VMTest failures on daisy_skate and lumpy possibly caused by dev tools regression: 601533
      • veyron_minnie-cheets failed with the same timeout error described above.  Hopefully the additional DUTs will resolve this situation.
      • cyan-cheets failed 6/8 runs due to timeouts.  There were many jobs aborted so it seems that there was a significant shortage of machines for this platform.  Infromed the deputy and he increased the allocation of DUTs from 6 to 11.
    • Canaries
      • The canaries look sad.  About half are failing for various reasons below:
        • Timeouts: ivybridge (during paygen), rambi-c (during paygen), rambi (during buildpackages), strago-b (this is due to cyan-cheets, which just upped its allocation), veyron (during paygen)
        • Powerwash (host did not return from reboot): jecht, beltino-a
        • Powerwash (Powerwash count didn't increase after powerwash cycle): beltino-b
        • autoupdate_Rollback (host did not return from reboot): kunimitsu
        • build_image: pineview
        • tar: chromiumos_base_image.bin: file changed as we read it: rambi-b
        • slippy and strago failing with "TestLabException: Number of available DUTs for board falco_li pool bvt is 0, which is less than the minimum value 1."  Bugs 590398590522 were automatically filed, but seems not be have been triaged.  Pinged deputy.
    2016-04-06
    Sheriff: shchen, adlr

    Notes:
    Log files are still broken.  Workaround described in b/27653354.  Solution is take https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/ and append test name to it.  
    • CQ
      • 601224:  buildpackages error on glados, strago, cyan-cheets.  Error in iwl7000 wireless driver.  The merge has been reverted.
        • So apparently merges do not show up in the change list on the builder pages.  I had an instance where a merge occurred (without me knowing) and I could not figure out what was causing the error from the waterfall pages.  It was in the kernel code, but there was only 1 kernel CL that was unrelated.  What I ended up having to do was find the hash used for the build.  It looks like:

          <project name="chromiumos/third_party/kernel" path="src/third_party/kernel/v3.18" revision="b850f41a01164fe1eb4cf76b5178194d53394130"/>

          and matching that up with a commit in 
          https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-3.18.
      • whirlwind has failed three times in a row with jetstream test failures.  Deputy is trying to track down the error at 593404.  This problem seems to have been fixed.
      • cyan-cheets is failing due to a timeout: "ERROR: Timeout occurred- waited 13461 seconds, failing. Timeout reason: Slave reached the timeout deadline set by master."
        • Dug into log and seems like the provisioning stage never connected to the machine because it was down.  Checked on state of machine in the lab and it seems to be up and running again.  Will keep an eye on the test to make sure that it doesn't happen again today/tomorrow.
          • To find error, go to the cyan-cheets builder and click on last build, in this case #37.  Scroll down to the test (HWTest) and click "link to suite", which will take you to the Autotest results.  Here you find the Failed job and click on that test, which then you can find the logs to search through.
          • Currently, the log redirect links are failing, so you need to get to them with the instructions in the Notes above.  Take the test name (found in parenthesis next to the job name) and append to the link above.  So, you'll end up going to: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/59112285-chromeos-test.  You'll see a folder for the hostname of the machine.  The logs are in <hostname>/debug/.
          • To check status of host, click on the hostname (to the left of the currently broken log links).  This will take you to the host's page and you can check the status of it there.
    • PFQ
      • veyron_rialto is failing with "ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto"
        • This has been failing forever.  I checked back 200 builds (as far as I could) and they're all failing with the same error.
        • 597966 has already been filed for it, but it remains untriaged.  Pinged for update.

    2016-04-05
    Sheriff: aaboagye, abhishekbh

    To next sheriff(s):
    !! There's an issue where trying to get the logs from a test suite returns "Not Found". See b/27653354 for more details. !!
    I expect the lakitu incremental builders to both go green once CL:337302 lands.
    Canaries will probably fail due to timeouts (it's a known issue), but check the slave builder for any non-timeout related failure like Paygen or AUTest.
    PFQs should go green since the DUTs were repaired or replaced. Watch 598517 for updates regarding the generic SSH errors.
    amd64-generic-asan will continue to fail until this CL lands.
    • CQ
      • The link paladin failed during the provision step with an error that says "provision: FAIL: Failed to install device image using payload. It appears to be an update_engine error with an error code of "kNonCriticalUpdateInOOBE".
        • crbug.com/600737 was filed to track it.
        • It doesn't seem to be related to any one board.
    • PFQ
      • Yesterday the PFQs were green, but today there seem to be some issues present.
      • There at least two different issues here that both occur during the provision step:
        • "SSH reports a generic error (255) which is probably a network failure" -> crbug.com/598517
        • "Update failed. The priority of the inactive kernel partition is less than that of the active kernel partition." -> crbug.com/599893
    • Canaries
      • Previous runs of the canary builders were still timing out. There was also one run where they just failed to start, but the timeout issues are still prevalent.
      • Towards the end of the day,  powerwash issues surfaced for beltino-a, jecht, and rambi canaries. -> crbug.com/600892
    • Incremental
      • Since build #7794, the lakitu-incremental builder has been failing VMTest for the test logging_UserCrash.
        • Since it was an autotest failure, searching for the string "END FAIL" in the stdio, leads to the error message pretty quickly.
        • Filed crbug.com/600774.
        • This seemed to be caused by an inadvertent change due to rebasing and patch shuffling. Unfortunately wasn't caught by the CQ since the CQ doesn't run VMTests.

    2016-04-04
    Sheriff: aaboagye, abhishekbh, vapier

    This morning there were a bunch of paladin failures, with the CQ master failing 26(!) consecutive times. Because of this I throttled the tree because there's no use in trying new changes until we get the CQ actually finishing correctly.
    • Guado moblab is one of the main offenders. To get to the debug logs, I clicked on one of the failed builds, scrolled to the HWTest section, clicked on "Link to suite".
      • Once there, I clicked on the failed test, provision. (Shows up in a purple box) Then on the new page that opened, clicked on "view all logs".
      • From there, navigated to the debug directory and took a look at "autoserv.DEBUG".
        • Searched until I found the string "Autotest caught exception when running test". Just above that line, it shows the command that was attempted. In this case it was "/tmp/stateful_update: Permission denied".
      • Filed crbug.com/600403.
        • The infrastructure teams notes that it's helpful to include the hostname in the bug as well. The hostname of the DUT and the buildslave.
    • daisy-full failed SimpleChromeWorkflow. The buildslave also appears to be offline since last friday.
      • To find the logs, I clicked on the failed build, and clicked the "stdio" link under SimpleChromeWorkflow. Scrolled down to find the traceback && STEP_FAILURE.
      • Looks like a couple different errors: a read-only filesystem, and an ImportError for no module named apport.fileutils.
      • Filed crbug.com/600413.
    amd64-generic-asan has been failing for a _very_ long time. I found crbug.com/589885 where some progress is being made. The primary CL is still pending review.

    For triaging canary failures, I first take look at each release group. Most of the failures seem to be due to the suite timing out. However there are a few other issues. You can check these by viewing the "stdio" link under step that's yellow or red.
    In the afternoon, the internal waterfall seemed dead. Infra deputy filed crbug.com/600526 to a trooper.


    2016-04-01
    Sheriff: moch, zachr
    • 599674: glados-release-group chell and cave fail buildpackages
    • 599982: daisy-paladin did not start

    2016-03-31
    Sheriff: kitching, moch, zachr
    • 597866: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto
      • Apparently nothing to worry about since important=False
    • Almost all CQ builders are timing out, but seems like current builds are succeeding
    • 596630: "Failed to install device image using payload" during provision errors (x86-zgb-paladin)
    • 579119: unittest timeout (peach-pit-paladin)

    2016-03-30
    Sheriff: kitching
    • 589885: failure in desktopui_ScreenLocker still showing up in amd64-generic
    • 598967: LeakSanitizer: detected memory leaks in update_engine-0.0.3-r1895 UnitTest (deymo@ investigating)
    • [from akeshet@, fixed] 598960: storm and whirlwind paladins failing consistently in vox unittest
    • 59898051703: rialto-services use of ReadFileToString needs updating
    • 598517: "SSH reports a generic error" failures during provision
    • strago board issues:
      • 583014: Strago boards don't have bvt results for the last week
      • 51482: Braswell systems are repeatedly failing to install in the autotest lab with eMMC failures
    2016-03-25
    Chromeos Gardener: jennyz
    Sheriff: bsimonnet, gwendal, wuchengli
    • amd64-generic
      • 589885:  failure in desktopui_ScreenLocker
      • 556785:  Reduce parallelism during unittests - unit test fail.
    • autoupdate failure:
      • 557106:  File system corruption on samus DUTs.
    • 598224:  Several CQ/paladin builders offline.
    • 596150:  pineview-release-group fails InitSDK
    • 593565:  Paygen failure (FAIL: Unhandled TypeError: expected string or buffer)
    2016-03-23
    Chromeos Gardener: jennyz Sheriff: dgreid, josephsih
    • 597213: [bvt-cq] platform_Perf Failure on tricky-chrome-pfq/R51-8100.0.0-rc3: Could not find build id for a DSO. 
    • 597183: provision_AutoUpdate.double_SERVER_JOB Failure on tricky-chrome-pfq/R51-8100.0.0-rc2: assigned to dfang.
    • 597111:  SitePerProcessBrowserTest.PagePopupMenuTest flaky on Linux_Chromeos_Test bot: kenrb fixed it today.
    2016-03-22
    Chromeos Gardener: jennyz Sheriff: tbroch, scollyer
      • 594336: network_DefaultProfileCreation Failure on tricky-tot-chrome-pfq-informational/R51-8053.0.0-b61: assigned to zqiu@.
      • 597111:  SitePerProcessBrowserTest.PagePopupMenuTest flaky on Linux_Chromeos_Test bot.
      • 536061 : (non-closer, builds fixed on retry) debugd:missing dependency. fixed by olofj
      2016-03-21
      Sheriff: tbroch, scollyer
      • 595274 : (tree-closer) webRTC HW Decode/Encode crashes tab
      • 595988 : (flaky/non-closer) network_DefaultProfileCreation Failure on tricky-tot-chrome-pfq-informational
      • 596150 : (non-closer) pineview-release-group fails InitSDK
      2016-03-14
      Sheriff: djkurtz, marcheu, shawnn
      • 51123: oak-release-group: elm: SignerTest fails: security_test_image failed == "CHROMEOS_RELEASE_BOARD: Value 'elm' was not recognized"
      • 594556: x86-generic-paladin: VMTest: desktopui_ScreenLocker fails => Screen is locked
      • 594565: mttools: BuildPackages fails on first attempt
      • 594571 veyron_rialto-paladin: BuildPackages fails: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto
      • 594622 veyron_minnie-cheets paladin consistently failing
      • 594592 lakitu-incremental builder failing gcetest
      • 594699 samus vmtest failures

      2016-03-11
      Deputy: shuqianz, Sheriff: marcheu, shawnn
      • 594176: daisy_skate-chrome-pfq provision failing
      • 593926: Lars devices in lab going down
      • 592766: chromeos-bootimage build failures
      • 594233: paladin builders offline

      2016-03-04 wiley, drinkcat (honorary), aaboagye
      • PreCQ
        • 592143: PreCQ: Failing InitSDK (Fixed due to chumping some python changes.)
      • PFQ failures
        • 591401: BuildImage step failing on PFQ "No space left on device" (Fixed with a revert.)
        • 554222: AutoservSSHTimeout PFQ failures
        • 582477: video_ChromeHWDecodeUsed is flake on CQ
      • Canary
        • 591965: guado_moblab-paladin: HWTest fails "bash: /tmp/stateful_update: Permission denied"
          • A following run also failed, but what looks like to be for a different reason.
        • 591957: smaug-paladin: BuildPackages failure "sys-fs/udev[gudev(-)] ("sys-fs/udev[gudev(-)]" is blocking dev-libs/libgudev-230)"
      • CQ
        • 592148: chromeos-test-testauthkeys-mobbase failed to build due to collisions.
        • 592182guado_moblab moblab_RunSuite failure in CQ run.

      2016-03-03
      Sheriff: cywang, aaboagye, wiley
      • Chrome PFQ failures
        • 590762Broken CrOS build of telemetry autotests - still happening
        • 591731chromeos-chrome: build failure 'ppapi_example_video_decode': No such file
          • See 591782 and 59140 for the background for this bug.
          • Basically, trying to add earlier failures for file operations in the chromeos-chrome.ebuild.
          • The 1st change was submitted, but led to the ppapi_example_video_decode error. Change was then reverted.
            • At a later time the cleanup will land.
          • This may cause the telemetry failures to pop up again.
      • CQ
        • 591639graphics_GLBench(graphics_utils) failed in HWTest - fix submitted
        • 591837: prebuilts failing to upload on certain paladins. GS flake? (lakitu, guado)
      • Canary
        • 591656security_AccountBaseline failed on lulu - fix submitted
        • 591658security_StatefulPermissions Failure on lulu - fix submitted
        • 583014: strago release groups red since December 2015 (~2% pass rate)
      • Misc
        • 591853public waterfall is missing the status boxes
      2016-03-02
      Sheriff: bfreed, charliemooney, cywang
      • PFQ failures
        • 591308: ChromeSDK failed in Chromium PFQ
        • 590762Broken CrOS build of telemetry autotests - force another chromium PFQ build
        • 591401: Builders failing in BuildImage step because they run out of storage
        • 376372about 8 canaries hit a HWTest "Suite timed out" error.
        • 590372: A few builders died trying to sync the source (error: Exited sync due to gc errors)
      2016-03-01
      Sheriff: bfreed, charliemooney
      • 591097: shill and dhcpd flake causing HWTest infrastructure failures and 10 straight CQMaster failures.
      • 591231: samus canary timeout in paygen stage while trying to copy a gsutil file.
      • 589135: rambi-c-group canary failed in Archive: "tar: chromiumos_test_image.bin: file changed as we read it"
      • 591256: peach group canary failed in Paygen with LockNotAcquired error
      • 583364: Veyron Paygen downloading failures
      2016-02-26
      Sheriff: drinkcat
      • 590113: x86-generic incremental VMTest security_ASLR fails (once in VMTest, a bit strange)
      • Closed the tree for 1 minute, false alarm: CQ-master page gave me the impression that the built failed because of rialto
      • PFQ failures
        • 590133: amd64-generic chromium PFQ: fatal error: ui/accessibility/ax_enums.h: No such file or directory
        • 590114: [bvt-cq] provision Failure on daisy_skate-chrome-pfq/R50-7966.0.0-rc2 (autofiled)
      • 584542: toybox build is flaky, but never caused an actual build failure. Local fix on gerrit, started upstream discussion about fix
      2016-02-25
      Sheriff: jrbarnette, quiche
      • 590065: toybox build is flaky
      • 589879: Build failures on "Lumpy (Chrome)" and "Alex (Chrome)"
      • 589905: Lumpy timing out in afdodatagenerate
      • 589885: desktopui_ScreenLocker failure on chromiumos.chromium
      • 589844: CQ failure due to HWTest failure on veyron_minnie-cheets-paladin
      2016-02-25
      Sheriff: drinkcat
      • 589690CQ fails at CommitQueueSync, other builders in Sync (Cannot fetch chromiumos/third_party/arm-trusted-firmware)
        • Chumped manifest change to pin 
        • 589713: third_party/arm-trusted-firmware: Figure out which branch to track (Follow up on underlying issue)
      • p/50460: oak-full build failure
      • 589777: lakitu: security_AccountsBaselineLakitu Baseline mismatch
      • 2 Sync issues:
      2016-02-24
      Sheriff: jrbarnette, quiche
      • 588834audio_CrasSanity fails: "CRAS stream count is not matching with number of streams"
        • This can cause failures in the CQ.  All boards seem to be affected.
        • Reverted three CLs; it's not yet known whether that will stop the problems.
      • 589641 graphics_Sanity failing on veyron boards 
        • This has caused some failures in the CQ.  So far, only veyron shows the problem.
      • 589623 Pre-CQ cannot uprev and rejects new CLs
        • A bad CL was chumped in without review.
        • Chumped in a fix to go with it.
      2016-02-22/23
      Sheriff: ejcaruso, waihong
      • 588739: Timed out going through login screen. Cryptohome not mounted.
      • 588834audio_CrasSanity fails: "CRAS stream count is not matching with number of streams"
      • 588921: Some builder suffer a virtual drive failure.

      2016-02-17/18
      Sheriff: wnhuang
      • 587411Multiple CQ build failure due to infrastructure issue
      2016-02-16
      Gardeners: jennyz
      Sheriff: wfrichar, davidriley, kcwu
      • 558983daisy-skate PFQ occasionally failed for this issue. The pending cl for fix this is not landed yet. guidou@ is working on it.
      • 585973: daisy-skate PFQ occasionally failed for this issue.
      2016-02-10/11
      Gardeners: stevenjb
      Sheriff: dtor, avakulenko
      • 586180: Pre-CQ and CQ masters failed due to git outage during source sync
      • 586179: Canaries fail due to provision timeout (SuitePrep: ABORT due to timeout)
      2016-02-09
      Gardeners: stevenjb / afakhry
      Sheriff: scollyer, furquan

      2016-02-05
      Gardener: stevenjb
      • 584722 chromeos-chrome build failure: "No package 'gtk+-2.0' found" while running pkg-config with media.gyp

      2016-02-04
      Sheriff: dhendrix
      • 584542: sys-apps/toybox failing to compile on amd64-generic
      • 473899: paygen "Not all images for this build are uploaded", smaug has been seeing this for months.
      • 569358: pool: bvt, board: x86-mario in a critical state. (assigned now)
      • 584447: pool: bvt, board: veyron_mickey in a critical state. (assigned)
      • 571757: [sanity] provision Failure on expresso-release/R49-7760.0.0. Note: This manifested itself as a swarming failing when I updated the bug (#68).
      2016-02-03
      Sheriff: johnylin,grundler, dbasehore
      • 561036: FIXED: paygen timing out: dshi appears to have fixed this
      • 574915: VMTest failures in desktopui_ScreenLocker - jdufault investigating
      • 578771: GPT Header Issue
      • 579119: Unittest timeout
      • 581639: IGNORE: lakitu_mobbuild fails cloud_SpinyConfig: turning down this build (sosa)
      • 582144: FIXED: security_ASLR: reverting changed fixed problem (https://chromium-review.googlesource.com/324950)
      • 582325: veyron-b: rialto-services emerge fail
      • 582521: FIXED? error in gsutil: samus canary builds succeeded on Feb 02 19:15. Also seen on daisy.
      • 583081: FIXED: autotest-chrome build failures (https://chrome-internal-review.googlesource.com/#/c/247126/)
      • 583535: FIXED: login_* test failures: reverted https://codereview.chromium.org/1646223002 (alchuith, dup:583382)
      • 583684: FIXED: CommitQueueSync repo sync: manifest referred to a tag instead of branch
      2016-02-02
      Sheriff: grundler,dbasehore
      • 561036paygen timing out on release builders
      • 574915: VMTest failures in desktopui_ScreenLocker (later forked into three bugs)
      • 581639 - lakitu_mobbuild fails cloud_SpinyConfig (known issue)
      • 582521 - samus canary failed because of error in gsutil
      • 583375provision thrashing causing canary/beta build timeouts (kevcheng)
      • 583382: login_* tests failing (may be dup of 574915 or others)

      2016-02-01
      Sheriff: bleung, puthik
      • 582531 - flaky HWTest for Pineview/ strago-b / sandybridge
      • 583375 - canary and beta builds can cause provision thrashing which can cause hwtests to time out

      2016-01-29
      Sheriff: bleung, puthik
      • 582521 - samus canary failed because of error in gsutil
      • 581639 - lakitu_mobbuild fails cloud_SpinyConfig
      • 576879 - pool: bvt, board : candy in a critical state.
      • 582325 - veyron-b: rialto-services emerge fail

      2016-01-28
      Sheriff: bhthompson, shchenhychao
      • 582144security_ASLR test failing on glados, strago, strago-b with Unhandled TypeError

      2016-01-27
      Sheriff: bhthompson, shchenjchuang
      • 581598: archive stage failure at BuildAndArchiveFactoryImages 
      • 581624: gd-2.0.35 build failed on guado_moblab
      • 581630: docker build failed on lakitu_next
      • 543649: smaug paygen failing with "Not all images for this build are uploaded, don't process it yet" (does not cause canary failure, low priority)
      • 581631: cheets_SettingsBridge: Timed out waiting for condition: Android font size set to smallest
      • 581639: GCETest fail at 01-cloud_SpinyConfig on lakitu_mobbuild

      2016-01-26
      Sheriff: robotboy, semenzato, jchuang
      • 580184PFQ failed to build related to chromeos/ime/input_methods.h missing
      • 561036paygen timing out on release builders
      • 581382: perf_dashboard_shadow_config.json syntax error led to parse job failure (causing several timeout)

      2016-01-25
      Sheriff: littlecvr
      • 486098Builder failure HWTest Code 3 - not enough detail to debug
      • 561036paygen timing out on release builders
      • 547055Jecht Group Failed Archive Step

      2016-01-22
      Sheriff: littlecvr
      • 547055Jecht Group Failed Archive Step
      • 578771Paygen error: GPT_ERROR_INVALID_HEADERS
      • 558266[au] autoupdate_Rollback Failure on ultima-release/R49-7655.0.0
      • 580184Master: PFQ failed to build related to chromeos/ime/input_methods.h missing
      • 580261Update/provisioning timeouts during tests due to slow network
      • 579811lakitu-release build continuously failed at GCETest

      2016-01-21
      Sherif: deymo, zqiu, hungte
      Chromeos Gardener: jennyz
      • 580184: Master: PFQ failed to build, related to missing chromeos/ime/input_method.h

      2016-01-20
      Sheriff: stevefung, dlaurie, hungte
      Chromeos Gardener: jennyz
      • 579565: M49: PFQ Failing chromite unit testing on lumpy.

      2016-01-14
      Sheriff: stevefung, dlaurie
      • 322443: M49 PFQ failing unit tests
      2016-01-14
      Sheriff: vapier, zeuthen
      • 577549: lakitu_mobbuild_paladin fails at mariadb
      • 577542: build_packages fails at chromeos-mrc on strago canary and paladin build
      • 577836: lakitu_mobbuild_paladin fails at serf
      2016-01-13
      Sheriff: cychiang
      • 576905: pool: bvt, board: veyron_mighty in a critical state.
      • 576992: util-linux-2.25.1-r1 build failure on cyan canary build
      • 577025: TestFailure(paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to perform stateful update on chromeos2-row2-rack10-host9)
      • 571747: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row2-rack3-host1)
      • 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
      • 571884: [bvt-inline] security_ASLR Failure: No such file or directory: '/proc/32189 32187/maps'. (on PFQ)
      • 577549: lakitu_mobbuild_paladin fails at mariadb
      • 577542: build_packages fails at chromeos-mrc on strago canary and paladin build
      2016-01-12
      Sheriff: cychiang
      • 576525: chromeos-bootimage build failure on nyan_blaze: Unknown blob type 'boot' required in flash map
      • 576526: cheets_PerfBootServer failure at wait_for_adb_ready
      • 529612: lakitu_mobbuild: cloud_CloudInit fails in VMTest
      • 576549: lakitu_mobbuild canary build fails at GCE test because of quota exceeded
      • 576545: rambi-a-release group clapper build_packages fails at net-misc/strongswan
      • 571749: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row5-rack8-host11)
      • 571747: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row2-rack3-host1)
      • 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
      • 576608: security_AccountsBaselineLakitu fails with Baseline mismatch
      2016-01-06
      Sheriff: moch, zachr
      • 572745[bvt-cq] graphics_GpuReset Failure on falco-chrome-pfq
      • 574870: [sanity] dummy_PassServer.sanity_SERVER_JOB Failure on veyron-b-group canary
      • 574915: VMTest failures in desktopui_ScreenLocker, securityASLR, login_LoginSuccess
      • 574303provision Failure on cyan-release

        2016-01-05
        Sheriff: moch, zachr
        • 574501: amd64-generic ASAN vmtests failing (desktopui_ScreenLocker, buffet_InvalidCredentials, buffet_IntermittentConnectivity)

        2016-01-04
        • 574197 Peach group Canary failing since 12/29
        Gardener: stevenjb@/jdufault@
        • 574104 : LKGM builder needs to be updated to git
        • 573961 : Peach pit failures
          • Forcing a rebuild, looks like it might be infra flake: 'Failed to install device image using payload at...'
        • 574198 : PFQ flake, security_SandboxStatus
        OLDER ENTRIES MOVED TO THE ARCHIVE so this page doesn't take forever to load.  See Sheriff Log: Chromium OS (ARCHIVE!)
        Comments