THIS IS AN ARCHIVE. See Sheriff Log: Chromium OS for newer entries.
8/22 - 8/28 Sheriffs: bhthompson, nya, walker Gardeners: jennyz, lpique - Top issues affecting CQ/PFQ now
- Pending issues
- Resolved, but bug entries are not closed yet
- Resolved
8/15 - 8/21 Sheriffs: benzh, sureshraj, yoshiki Gardeners: jamescook, domlaskowski - crbug.com/637868 security_StatefulPermissions failures on canaries:
- crbug.com/593423 provision_AutoUpdate.double failures on chrome pfq informational:
- crbug.com/637962 SyncChrome failures due to "Repository does not yet have revision" on chrome informational pfq -> infra, ongoing flake
- crbug.com/637960 Chrome telemetry failures due to missing system salt file -> reverted
- crbug.com/637900 cyan chrome pfq informational builder cros-beefy191-c2 is out of disk space building chrome -> infra
- crbug.com/637472 pool: bvt, board: falco in a critical state -> infra
- crbug.com/637931 Chrome4CROS Packages builder failing in bot_update "fatal: reference is not a tree" -> infra
- crbug.com/637938 VMTest failing on telemetry bots due to telemetry_UnitTests_perf -> bug in test script?, disabled
- crbug.com/638348 cros amd64-generic Trusty builder failing to start goma in gclient runhooks step -> networking flake?
- crbug.com/631640 login_CryptohomeIncognito -> flaky, but real failure
- crbug.com/638656 cheets_NotificationTest failure on Cyan PFQ -> real failure in chrome (crash in shelf)
- crbug.com/638980 falco-full-compile-paladin has failed to start with exception setup_properties
- crbug.com/638968 x86-generic-tot-asan-informational failures in tpm_manager (odr-violation) and attestation (leaks) -> new target added to cros build that had failures, reverted
- crbug.com/639102 Kernel panics on Cyan PFQ -> ???
- crbug.com/639107 link-paladin BuildPackages failure with SSLError The read operation timed out
- crbug.com/639314 AUTest failed on most canaries due to no test configurations
8/8 - 8/14 Sheriffs: davidriley, vprupis, takaoka, smbarber (Mon afternoon only) - Continued UnitTest failures on canaries and release branches: crbug.com/627881
- lakitu failures: crbug.com/635562
- edgar missing duts: crbug.com/596262
- kevin firmware prebuilt: crbug.com/635598
- x86_alex and veyron_rialto pool health: crbug.com/634471 and crbug.com/592002
- Chumped change broke everything (eg pre-CQ, CQ, canaries) until revert was chumped in
- infrastructure flake
- celes-release/289, setzer-release/292 (build interrupted) -> crbug.com/602565
- nyan-release/293, wolf-release/1294 (sudo access) -> crbug.com/616206
- pre-cq (gerrit quota limits) -> crbug.com/624460
- Friday: lab downtown affected builds for much of the day
8/1 - 8/8 Gardeners: stevenjb@, khmel@
7/29 Notes for the next sheriffs from aaboagye, kirtika: - Major issues we are seeing, format is <Impact: Issue: Links>::
- Tree closure, fixed now: "No space left on device" for cheets builds: aaboagye@'s post-mortem here. crbug.com/630426.
- CQ failures: We've been seeing intermittent failures due to hitting git fetch limits with gerrit (commit queue sync step doesn't work). The current CQ run failed due to this, would not be surprised if the next one does too. crbug.com/632065.
- Several canaries failing: Unit-test times out, possibly due to overloaded machines: crbug.com/627881
- Android-PFQ failures: adb is not ready in 60 seconds: crbug.com/632891.
- Minor issues, work-in-progress:
- Android-PFQ: mmap_min_addr not right on samus/x86: crbug.com/632526.
- Paygen/signing issues.
- Autoupdate-rollback (likely network SSH issue): example crbug.com/596262.
2016-07-25 thru 2016-07-29 Sheriff: aaboagye, kirtika, hidehiko (non-PST)
7/29 - PST
- Canaries
- kevin-release was broken, but a fix is on the way. (wfrichar@ knows)
- CQ
- Non-PST:
- Canaries
- CQ: Green at #11894
- Chrome PFQ
- Android PFQ:
7/28 - PST
- Canaries
- Still seeing the error in the unittest phase. See crbug.com/627881.
- Paygen issue still affecting some canaries (x86_alex-he - crbug.com/629094).
- Saw a failure with auron_yuna canary with an error parsing a JSON response. See crbug.com/632433.
- samus failed with platform_OSLimits Found incorrect values: mmap_min_addr. Filed crbug.com/632526.
- CQ
- Closed the tree because the CQ would just reject people's changes because of the no-disk-space error. crbug.com/630426.
- Chrome PFQ
- Still seeing some failures in the login_CryptoHomeIncognito test. See crbug.com/631640.
- Non-PST
- CQ:
- RED.
- samus-paladin is failing due to no-disk-space error. crbug.com/630426
- cheets tests are failing two times with actual error (https://chrome-internal-review.googlesource.com/#/c/270781/). Being fixed.
- Chrome PFQ:
- Android PFQ:
7/27 - PST
- Canaries
- Seems like nearly all the canaries failed during HWTest stage apparently due to Infra issues.
- CQ
- On one run, some of the paladins failed during the CommitQueueSync step due to git rate limiting.
- Android PFQ
- An overloaded devserver is causing provisioning to fail for cyan-cheets-android-pfq and veyron_minnie-android-pfq (wolf-tot-paladin too).
- (Non-PST)
- CQ:
- Master paladin looks flaky due to various reasons.
- CQ limit hitting
- HWtest time out
- kOmahaErrorInHTTPResponse: crbug.com/621148 looks a tracking issue.
- These look not always reproducible, and some runs pass successfully.
- Chrome PFQ:
- Android PFQ:
- Failing in latest several runs. Though the reasons are variety. Looks just too flaky.
7/26 (18:20 PST) - Canary Failure Classification: Lots of canary failures (~50%) this afternoon, so listing unique causes here to track down tomorrow:
- x86-zgb: Pool-health issue, infra (kevcheng@) looking into it, may be back up next canary run?
- x86-mario: Not sure if the manifestversionedsync is a real issue or not, filed crbug.com/631867 anyway.
- Paygen failures: falco, falco_li, gru, jecht, kip, lumpy, ninja, parrot, peppy, samus, smaug, x86_alex-he, stumpy. TBD: Update more details here.
7/26 - (PST)
- Canaries
- Still some errors on nyan_blaze and nyan_kitty caused by the vboot_firmware CL. crbug.com/631192
- Fixes posted to gerrit and making it's way through the CQ.
- Still some unittest failures. There's a CL that just landed to reduce the parallelism. Will be following to see if the situation improves. crbug.com/627881.
- That CL did not seem to resolve the issues.
- Saw a few canaries yesterday (celes this morning) that had issues when uploading debug symbols. dgarret@ is working on a fix. crbug.com/212437.
- security_StatefulPermissions is pretty flaky, veyron_minnie canary failing on it. wmatrix is all red: https://wmatrix.googleplex.com/retry_teststats/?days_back=30&tests=security_StatefulPermissions. Investigating crbug.com/604606
- There was canary failure on lars-release which reported all the DUTs in the pool as dead, but they seem to be up now. crbug.com/631530.
- x86-zgb pool health is poor - most devices down. kevcheng@ taking a look. crbug.com/590653.
- Towards the end of the day, a larger number of canaries were failing at the paygen step. I think what may be happening is network flakiness, but I wonder why we don't just retry again?
- CQ
- panther_embedded-minimal-paladin has been down for quite some time now. Pinged the bug to see if there are any updates. crbug.com/630494.
- A restart of the master has been scheduled. Need to check back later today if that fixes things.
- No elm devices in pool:cq making elm-paladin fail. kevcheng@ taking a look. No bug yet.
- Android PFQ
- harmony_java_math CTS test is causing failures with its causing android-pfq failures "cts test does not exist". Filed b/30413761. Ping ihf@ if it doesn't get better.
- Chrome PFQ
- (Non-PST)
- Canaries
- CQ
- Looks flaky: Sometimes failing ErrorCode=37 (OmahaErrorInHTTPResponse).
- Chrome PFQ:
- Looks flaky. Sometimes failing due to login error, but there is variety of failing boards.
7/25 - Canaries
- Several of the canaries were failing in the platform_FilePerms HwTest.
- This was seen on cyan, elm, lulu, oak, samus, and veyron_minnie.
- Appears to be missing expectations for ARC containers.
- Filed crbug.com/631080.
- The unittest stage seems to be timing out somewhat fairly often now.
- nyan-big is failing on a vboot_firmware CL not building. Filed crbug.com/631192. Fix is in CQ now.
- CQ
- Generally okay today. There was one issue regarding a failure in VMTest, but that was caught.
2016-07-18 thru 2016-07-24 Sheriff: wuchengli 7/19
7/18 - 628990: DebugSymbolsUploadException: Failed to upload all symbol
- 593461: Chrome failed to reach login screen within 120 seconds
- 628494: chromeos-bootimage build failures in canary builds
- 609931: 'chromite.lib.parallel.ProcessSilentTimeout'>: No output from <_BackgroundTask(_BackgroundTask-5:6:7:3, started)> for 8610 seconds
- 629094: cannot find source stateful.tgz
2016-06-27 thru 2016-07-01 Sheriff: mojahsu 6/30 - 624744: Canary Build: Exception on build packages.
6/29 - Try to fix error by rebooting chromeos4-row6-rack9-host14.cros.
- 624328: Canary Build: cros_sdk:enter_chroot: Not mounting chrome source: could not find CHROME_ROOT/src dir.
6/28 - 598779(lumpy), 623803(stumpy): NotEnoughDutsError: (DUTs are expected to be back online by noon Tuesday, 6/28)
- Canary build parrot_ivb: NotEnoughDutsError: skip_duts_check option is on and total number DUTs in the pool is less than the required minimum avaialble DUTs.
- 623873: Canary: ERROR: ** HWTest did not complete due to infrastructure issues (code 3) **
- 623880: Canary: No output from <_BackgroundTask(_BackgroundTask-5:7:4, started)> for 8640 seconds
6/27 - 623448: unknown target 'khronos_glcts_test' (daisy_skate, nyan, peach_pit, veyron_minnie, veyron_pinky,veyron_rialto,x86-alex)
- 622789: StageControlFileFailure: Failed to stage
- 623502: Unable to create project hosting api client: Cannot get apiclient library.
2016-06-20 thru 2016-06-26 Sheriff: zhihongyu, reinauer
6/24 - 623116 OOM-induced kernel panic when running hardware_RamFi
- FIXED 623115 chromeos-base/libchrome fails to emerge
6/23 - 622789 StageControlFileFailure: Failed to stage tricky-chrome-pfq/R53-8490.0.0-rc2
- 617666 CQ could not update Gerrit: many CLs are now in limbo?
6/22 - 622365 Missing permission_status.mojom.h on Chrome OS (flakey?)
6/21 - 621971 PFQ: HWTest: Failure summary not always reported in autofiled issues
- 517995 ./build_packages fails for amd64-generic board target when kernel-3_18 is selected in board definition
6/20 - 621396: gclient runhooks => TypeError: unhashable type: 'list'
- 622293: sdk bot failing after perl upgrade
621676 stop drones from using devserver for static content
2016-06-13 thru 2016-06-17 Sheriff: tbroch, puneetster
6/14 - 620015: sync_chrome: Unhandled exception: OSError: [Errno 2] No such file or directory: '/b/cbuild/internal_master/.cache/distfiles/target/chrome-src-internal'
- 609886: beaglebone-release: GCE / repo-cache issue? Failed to import ts_mon, monitoring is disabled: cannot import name gce
- 619615: tricky-*: network_DefaultProfileCreation fails
- 619980: veyron_gus-release: No such configuraton target: "veyron_gus-release"
- 619754: buildPackages failing on internal builders : chromeos-chrome: png->io_ptr
6/13 2016-06-10 Sheriff: tfiga - 618916: panther_embedded-minimal-release builders failing with "No such configuraton target"
- 618919: veyron_gus-release builds failing with "No such configuraton target"
- 618923: Canaries fail AUtest/HWtest due to infrastructure issues (not only canaries actually)
2016-06-07 thru 2016-06-08 Sheriff: shawnn, vpalatin - 617979: samus failing signing stage
- 618020: Provision failure on braswell devices
- 618131: BuildPackages failure due to socket timeouts / FileNotFound
- 618159: pre-cq-launcher failures
- 618523: cryptohome unit test failures
2016-06-06 Sheriff: smbarber, abhishekbh - 617704 - EC change needed to be reverted since DUTs were no longer reporting AC power
- 617979: samus failing signing stage
2016-06-03 Sheriff: smbarber, abhishekbh - Chrome PFQ had manual uprev, accidentally broke some canaries. Canaries manually restarted.
2016-06-02 Sheriff: moch, scollyer - 54007 - cyan-cheets-paladin failed (due to some necessary chumped changes). Tree closed but later throttled as Android container fix is being landed.
2016-06-01 Sheriff: moch, scollyer, djkurtz - 605181 - veyron_speedy-paladin/peppy-paladin: The HWTest [bvt-cq] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) ** - Flaky provisioning issues
2016-5-31 Sheriff: djkurtz, ejcaruso, zachr - (ongoing) 615730: Rialto build break in libpayload: multiple definition of `video_console_init'
- 615997 - 18:05 canary runs failing bvt-inline on many systems
- 615993 - video_VideoSanity / video_ChromeHWDecodeUsed / video_ChromeRTCHWDecodeUsed on arm chrome PFQs: (daisy_skate, peach_pit, veyron_minnie-cheets
- 616015 - veyron_jerry-release - buildpackages fails - chromeos-base/chromeos-ec-0.0.1-r3046 - No room left in the flash
- 616236 - CL with anonymous owner crashes buildbot
- 616238 - Normal buildbot failures are sometimes reported to gerrit as timeouts
2016-5-30 Sheriff: wnhuang - [RESOLVED] veyron_speedy-paladin: The HWTest [bvt-cq] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **
- filesystem become read-only due to error. Fixed by rebooting.
- cyan-cheets-paladin: The HWTest [arc-bvt-cq] stage failed: ** Suite timed out before completion **
- chromeos4-row6-rack9-host1 repair failed, scheduled another repair.
2016-05-26-27 Sheriff: jrbarnette, waihong, wnhuang - 615474: x86-alex-paladin HwTest timeout abort
- 615151: guado_moblab: failing provision because moblab-scheduler-init isn't running
2016-05-25 Sheriff: djkurtz Gardeners: slavamn, puthik - 614579: [bvt-inline] security_ASLR Failure on daisy_skate-chrome-pfq/R53-8368.0.0-rc2
- 614606: nyan-release consistently failing signing
- 615029: minnie failing to sign
2016-05-24 Sheriff: littlecvr Gardeners: stevenjb, levarum - 613868: build141-m2 had been swapped, but a restart is needed. The restart has been scheduled at the EOD (PDT time).
- 614261: build141-m2 had been replaced by build257-m2, but build257-m2 died again.
2016-05-23 Sheriff: littlecvr Gardeners: stevenjb, levarum - 613868: build141-m2 is offline and there is no backup.
- 612688: KioskTests are flaky on ChromiumOS bots.
- 611405 ASan builders failed when building update_engine
- 614040: cyan-cheets continues to faill with PoolHealthBug
2016-05-18 Sheriff: martinroth, wfrichar Deputy: akeshet - p/53507 VMTests have been failing for several days in the canary builds due to crashing DisplayLinkManager.
2016-05-16 Sheriff: robotoboy, dtor Deputy: - 611405 ASan builders failed when building update_engine - deymo@: A CL on AOSP landed to fix that last week, there's an uprev blocked on some CQ issues that I'll get to today.
2016-05-12 Sheriff: ravisadineni, zqiu Deputy: shuqianz [ONGOING] : mccloud-release, stumpy-release [Issue 609926] FAIL: Powerwash count didn't increase after powerwash cycle [FLAKE] : paygen issue [Issue 605181 Issue 606071] : paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log.
2016-05-11 Sheriff: reveman, sonnyrao, tbroch Deputy: shuqianz - [RESOLVED] everything : manifestversionedsync: GoB quota issue (611084, b/28721585, b/28720367)
- veyron_pinky-release
- samus-paladin : Tried fetch locally and it worked.
- RunCommandError: return code: 128; command: git fetch -f https://chrome-internal-review.googlesource.com/chromeos/ap-daemons refs/changes/27/258727/1
- fatal: remote error: Git repository not found
- [FLAKE] zako-release : paygen : (605181)
- paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to receive a download finished notification (download_finished) within 600 seconds. This could be a problem with the updater or a connectivity issue. For more details, check the update_engine log (in sysinfo or on the DUT, also included in the test log
- [EXPECTED] gru-release : chromeos-initramfs emerge fails (605597)
- [RESOLVED] master-paladin : daisy_skate-paladin: The HWTest [bvt-inline] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **
- provision_AutoUpdate.double [ FAILED ]
- provision_AutoUpdate.double ABORT: None
- [FLAKE] test running successfully but suite aborted at ~30min. Says it should run for 90min however.
2016-05-10 Sheriff: reveman, sonnyrao, tbroch Deputy: shuqianz - [ONGOING] guado_moblab, - [provision]: FAIL: Moblab has 0 Ready DUTs, completed successfully (610727, repair: b/28690294)
- [RESOLVED] devserver issue: (b/28704856)
- [INFO] buildbot slave shutdowns on 5/9 for emergency maintenance having some fallout (paladins)
- [FLAKE] ninja-release - [bvt-inline]: FAIL 62794807-chromeos-test/chromeos4-row3-rack9-host6/provision_AutoUpdate provision
- Unhandled AutoservSSHTimeout: ('ssh timed out', * Command:
- flake? Host is fine now.
- [RESOLVED] veyron_speedy-paladin, daisy_skate-release - [bvt-cq]: Exception waiting for results, JSONRPCException: Error decoding JSON response (606071)
- [RESOLVED] *-cheets-android-pfq [buildPackages]: autotest-cheets-* import error: No module named cros.graphics.drm (b/28694363)
- [RESOLVED] Lars builds down for hardware swap ()
2016-05-06 to 09 Sheriff: mruthven, rspangler, kcwu Gardener: jennyz
More detailed notes on our shift are here.
Stuff that broke and was fixed: - Lots of other release builders failing with "timed out", "didn't start", or on Sync-Chrome on Friday. dgarrett@ said the release builders are being reorganized and will be highly unreliable. Cleared up over the weekend.
- CQ failed on multiple paladins HWTest with two types of failure, but both seem to have the same underlying cause in the logs. Filed 610000 and throttled tree.
- [bvt-inline] - logging_CrashSender: retry_count: 2, FAIL: Simple minidump send failed
- [bvt-cq] - logging_UserCrash: FAIL: Did not find version 8288.0.0-rc2 in log output
- Cause: CL 342574 (fixed)
- [bvt-cq] - graphics_Gbm: FAIL: Gbm test failed(). Bad CL has been identified and fixed.
Stuff that's still broken: - veyron_rialto-release fails: BuildPackages: Cannot find prebuilts for chromeos-base/chromeos-chrome. (590784)
- stout-paladin builder (build126-m2) is offline (609682)
- daisy_skate-release - AUTest misconfigured (610088)
- CQ failed with CommitQueueSync errors on multiple paladins (server hung up unexpectedly), but passed on the next run. Seems to happen in the afternoon.
2016-05-05 Sheriff: groeck, furquan Gardener: jennyz - 609610: MobLab ToT not showing network bridge
2016-05-03 Sheriff: johnylin - 609054: M52: Failed to update the status for master-release
- Error message: "fatal: could not read Username for 'https://chrome-internal.googlesource.com': No such device or address
" - Many CQ/PFQ build failure related to this as well
- CQ: failed CommitQueueSync
- PFQ: failed MasterSlaveLKGMSync
- 608838: Some video/media tests are temporary waived on veyron
- Workaround needs to revert after this fixed
2016-05-03 Sheriff: johnylin - Powerwash flakes on Canaries 605325:
- https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group
- https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group => almost never passed
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1760
- https://uberchromegw.corp.google.com/i/chromeos/builders/sandybridge-release-group => almost never passed
- Paygen flakes on Canaries 516795:
- https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/124
- Build failures on lakitu-release:
- HWTest flakes on Canaries:
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-c-release-group/builds/2225
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1760
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-e-release-group/builds/1063
- https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group
- Some autoupdate rollback failures in terra / wizpig / reks / celes / ultima. Lab network issue? 596262
- https://uberchromegw.corp.google.com/i/chromeos/builders/strago-b-release-group
- https://uberchromegw.corp.google.com/i/chromeos/builders/strago-release-group/builds/1135
- Not enough disk space on veyron-b-release-group 605601
- CQ:
- veyron_rialto is failing with "ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto"
- Failed for a long time. Under tracking in 590784
2016-04-22 Sheriff: drinkcat - Lab issue? 605464
- CQ:
- One instance of a "nyan-full-compile-paladin did not start": seems like random flake
- Canaries
- Paygen on link almost never passes 605849
- Chrome-PFQ:
- BuildImage: ERROR: test_elf_deps: Failed dependency check (chromium-pfq on arm/x86 platforms) 605851
- Chumped a revert, but the bug the original CL was fixing is also P0: 601854, please coordinate with ihf & gardener.
- Android-PFQ:
- chrome gs handler issue: some files do not have a md5 sum 605861
2016-04-21 Sheriff: drinkcat, denniskempin, dbasehore - Lab issue? 605464
- wolf-paladin fail: wolf-tot-paladin/builds/6443 wolf-paladin/builds/10777
- A number of HWTest timeout:
- https://uberchromegw.corp.google.com/i/chromeos/builders/auron-b-release-group/builds/1473
- https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2087
- https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group/builds/2100
- https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/89
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1725
- https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group/builds/3631
- https://uberchromegw.corp.google.com/i/chromeos/builders/strago-b-release-group/builds/549
- https://uberchromegw.corp.google.com/i/chromeos/builders/veyron-b-release-group/builds/1473
- https://uberchromegw.corp.google.com/i/chromeos/builders/kunimitsu-release-group/builds/796 => no, different stuff
- https://uberchromegw.corp.google.com/i/chromeos/builders/slippy-release-group/builds/3632
- Paygen failure:
- https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2087
- https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group/builds/1404
- https://uberchromegw.corp.google.com/i/chromeos/builders/jecht-release-group/builds/1405
- https://uberchromegw.corp.google.com/i/chromeos/builders/nyan-release-group/builds/2750
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-b-release-group/builds/2297
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-c-release-group/builds/2190
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1725
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-e-release-group/builds/1029
- https://uberchromegw.corp.google.com/i/chromeos/builders/strago-c-release-group/builds/415
- https://uberchromegw.corp.google.com/i/chromeos/builders/smaug-release/builds/1037
- https://uberchromegw.corp.google.com/i/chromeos/builders/glados-release-group/builds/910
- https://uberchromegw.corp.google.com/i/chromeos/builders/auron-release-group/builds/1822
- https://uberchromegw.corp.google.com/i/chromeos/builders/ivybridge-release-group/builds/1917
- CQ
- *-cheets-paladin fail in HWTest: 605309
- Canaries
- rambi-release BuildPackages timeout: 605402 . Likely a flake.
- Minor guado_molab-release BuildPackages issue: 605408
- guado-moblab-release HWTest: 605409
- https://uberchromegw.corp.google.com/i/chromeos/builders/guado_moblab-release/builds/883
- x86-alex/x86-zgb/x86-alex_he: chromeos-kernel-3_8: undefined reference to `watchdog_dev_unregister': 605458
- cros_make_image_bootable is failing 605587
- veyron_rialto still failing due to lack of chrome prebuilt: 597966
- More powerwash failures 605325
- https://uberchromegw.corp.google.com/i/chromeos/builders/rambi-d-release-group/builds/1726
- https://uberchromegw.corp.google.com/i/chromeos/builders/sandybridge-release-group/builds/2363
- https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-b-release-group/builds/2101
- More autoupdate failures 605181
- https://uberchromegw.corp.google.com/i/chromeos/builders/daisy-release-group/builds/4887
- https://uberchromegw.corp.google.com/i/chromeos/builders/beltino-a-release-group/builds/2088
- More /dev/loop0 issues 605176
- https://uberchromegw.corp.google.com/i/chromeos/builders/pineview-release-group/builds/2300
- security_test_image failing on amd64-generic-goofy-release 605595
- https://uberchromegw.corp.google.com/i/chromeos/builders/amd64-generic-goofy-release/builds/231
- Gru is failing to build chromeos-initramfs 605597
- https://uberchromegw.corp.google.com/i/chromeos/builders/gru-release-group/builds/60
- Not enough disk space on veyron-b-release-group 605601
- https://uberchromegw.corp.google.com/i/chromeos/builders/veyron-b-release-group/builds/1474
- Enguarge builds packages for >7 hours, gets killed. 605608
- https://uberchromegw.corp.google.com/i/chromeos/builders/enguarde-release/builds/90
- gale-release failing to build chromeos-bootimage 605638
- https://uberchromegw.corp.google.com/i/chromeos/builders/gale-release/builds/57
- https://uberchromegw.corp.google.com/i/chromeos/builders/gale-release/builds/58
- PFQ
- Chrome fails to build on all PFQs 605592
2016-04-20 Sheriff: jcliang, denniskempin, dbasehore - CQ
- veyron_rialto has been failing for ages due to lack of chrome prebuilt: 597966
- Canaries
- stumpy pool health bug: 596647
- Powerwash is still failing on multiple boards: 589030
- Intermittent au test failures on multiple boards. Looks like infra flakes.
- auron-release-group and ivybridge-release-group keep failing paygen 605159
- auron-b-release-group fails to build image 605155
- daisy-release-group failing hwtest 535795
- Hosts not returning after powerwash 589089
- rambi-e-release-group is having issues with /dev/loop0 605176
- Tons of failures of autoupdate_EndtoEndTest.paygen 605181
- Veyron-b builders still out of space: bug
- AutoservRunError on guado_moblab-paladin 605241
- PFW
- nyan-chrome-pfq fails to build packages 605202
2016-04-19 Sheriff: jcliang, puneetster, charliemooney - Powerwash is still failing on multiple boards: 589030
- panther pool health bug: 597744
- Veyron-b builders running out of space: bug
- It looks like the master builder crashed and took out several slaves, but then recovered gracefully.
- Chrome PFQ did not update over the weekend. Working with dimu@ and ketaki@ to figure out why
- Lulu Cheets failing to sign bug
- "Timeout deadline set by master" error in PayGen for Auron bug
- Alex's missing in the pool bug
2016-04-18 Sheriff: puneetster, charliemooney - Powerwash continues to fail bug
- Not enough builders in the pool, killing some canaries bug
- Generic SSH (255) errors continue bug
2016-04-14 Sheriff: bleung, briannorris, cywang - CQ
- Pool Health Bug (almost all boards are affected, peppy-paladin:601988 wolf-palain:603450 veyron-speedy-paladin:603455, daisy-skate-paladin:603456, ...)
- machines in pool:{cq, bvt} are all marked as 'Repair Failed', no bvt-cq bvt-inline suites can be executed.
- clicked 'Repair button' on a failed DUT but in vain.
- Canaries
- PFQ
- issue 603169 : extensions_to_load has been moved to browser options, a hiccup during transition on Chrome PFQ, fixed by achuith
2016-04-13 Sheriff: bleung, briannorris, cywang - CQ
- Canaries
- PFQ
- Other
- issue 603248 : gizmo-paladin and gizmo-release builders were removed yesterday, but they still appear on the waterfall, failing again and again. Waterfall may need to be restarted.
2016-04-12 Sheriff: cychiang, bleung, adlr 2016-04-11 Sheriff: cychiang
2016-04-07 Sheriff: shchen, briannorris
Redirects to log files are now working again. No more hand-modifying urls :).
- CQ
- There is an ongoing provisioning error (598517) that's hitting the CQ with the error: FAIL: Failed to install device image using payload athttp://100.107.160.2:8082/update/peach_pit-paladin/R51-8162.0.0-rc2 on chromeos4-row7-rack13-host11. SSH reports a generic error (255) which is probably a lab network failure. It is still under investigation. I think that I saw it at least 5 times during my sheriffing shift.
- veyron_speedy had failed three times, twice with a provisioning error: "Update failed. Returned update_engine error code: ERROR_CODE=49, ERROR_MESSAGE=ErrorCode::kNonCriticalUpdateInOOBE. Reported error: AutoservRunError". This is a known issue: 600737.
- veyron_minnie-cheets failed with a timeout error. I checked the individual tests in the suite and they seemed to all pass (nothing aborted). I contacted the deputy and he added more DUTs to the pool for minnie to hopefully rectify this situation.
- PFQ
- HWTest and VMTest failures on daisy_skate and lumpy possibly caused by dev tools regression: 601533
- veyron_minnie-cheets failed with the same timeout error described above. Hopefully the additional DUTs will resolve this situation.
- cyan-cheets failed 6/8 runs due to timeouts. There were many jobs aborted so it seems that there was a significant shortage of machines for this platform. Infromed the deputy and he increased the allocation of DUTs from 6 to 11.
- Canaries
- The canaries look sad. About half are failing for various reasons below:
- Timeouts: ivybridge (during paygen), rambi-c (during paygen), rambi (during buildpackages), strago-b (this is due to cyan-cheets, which just upped its allocation), veyron (during paygen)
- Powerwash (host did not return from reboot): jecht, beltino-a
- Powerwash (Powerwash count didn't increase after powerwash cycle): beltino-b
- autoupdate_Rollback (host did not return from reboot): kunimitsu
- build_image: pineview
- tar: chromiumos_base_image.bin: file changed as we read it: rambi-b
- slippy and strago failing with "TestLabException: Number of available DUTs for board falco_li pool bvt is 0, which is less than the minimum value 1." Bugs 590398, 590522 were automatically filed, but seems not be have been triaged. Pinged deputy.
2016-04-06 Sheriff: shchen, adlr
Notes: - CQ
- 601224: buildpackages error on glados, strago, cyan-cheets. Error in iwl7000 wireless driver. The merge has been reverted.
- So apparently merges do not show up in the change list on the builder pages. I had an instance where a merge occurred (without me knowing) and I could not figure out what was causing the error from the waterfall pages. It was in the kernel code, but there was only 1 kernel CL that was unrelated. What I ended up having to do was find the hash used for the build. It looks like:
<project name="chromiumos/third_party/kernel" path="src/third_party/kernel/v3.18" revision="b850f41a01164fe1eb4cf76b5178194d53394130"/>
and matching that up with a commit in https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-3.18.
- whirlwind has failed three times in a row with jetstream test failures. Deputy is trying to track down the error at 593404. This problem seems to have been fixed.
- cyan-cheets is failing due to a timeout: "ERROR: Timeout occurred- waited 13461 seconds, failing. Timeout reason: Slave reached the timeout deadline set by master."
- Dug into log and seems like the provisioning stage never connected to the machine because it was down. Checked on state of machine in the lab and it seems to be up and running again. Will keep an eye on the test to make sure that it doesn't happen again today/tomorrow.
- To find error, go to the cyan-cheets builder and click on last build, in this case #37. Scroll down to the test (HWTest) and click "link to suite", which will take you to the Autotest results. Here you find the Failed job and click on that test, which then you can find the logs to search through.
- Currently, the log redirect links are failing, so you need to get to them with the instructions in the Notes above. Take the test name (found in parenthesis next to the job name) and append to the link above. So, you'll end up going to: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/59112285-chromeos-test. You'll see a folder for the hostname of the machine. The logs are in <hostname>/debug/.
- To check status of host, click on the hostname (to the left of the currently broken log links). This will take you to the host's page and you can check the status of it there.
- PFQ
- veyron_rialto is failing with "ERROR: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto"
- This has been failing forever. I checked back 200 builds (as far as I could) and they're all failing with the same error.
- 597966 has already been filed for it, but it remains untriaged. Pinged for update.
2016-04-05 Sheriff: aaboagye, abhishekbh
To next sheriff(s): !! There's an issue where trying to get the logs from a test suite returns "Not Found". See b/27653354 for more details. !! I expect the lakitu incremental builders to both go green once CL:337302 lands. Canaries will probably fail due to timeouts (it's a known issue), but check the slave builder for any non-timeout related failure like Paygen or AUTest. PFQs should go green since the DUTs were repaired or replaced. Watch 598517 for updates regarding the generic SSH errors. amd64-generic-asan will continue to fail until this CL lands. - CQ
- The link paladin failed during the provision step with an error that says "provision: FAIL: Failed to install device image using payload. It appears to be an update_engine error with an error code of "kNonCriticalUpdateInOOBE".
- crbug.com/600737 was filed to track it.
- It doesn't seem to be related to any one board.
- PFQ
- Yesterday the PFQs were green, but today there seem to be some issues present.
- There at least two different issues here that both occur during the provision step:
- "SSH reports a generic error (255) which is probably a network failure" -> crbug.com/598517
- "Update failed. The priority of the inactive kernel partition is less than that of the active kernel partition." -> crbug.com/599893
- Canaries
- Previous runs of the canary builders were still timing out. There was also one run where they just failed to start, but the timeout issues are still prevalent.
- Towards the end of the day, powerwash issues surfaced for beltino-a, jecht, and rambi canaries. -> crbug.com/600892
- Incremental
- Since build #7794, the lakitu-incremental builder has been failing VMTest for the test logging_UserCrash.
- Since it was an autotest failure, searching for the string "END FAIL" in the stdio, leads to the error message pretty quickly.
- Filed crbug.com/600774.
- This seemed to be caused by an inadvertent change due to rebasing and patch shuffling. Unfortunately wasn't caught by the CQ since the CQ doesn't run VMTests.
2016-04-04 Sheriff: aaboagye, abhishekbh, vapier
This morning there were a bunch of paladin failures, with the CQ master failing 26(!) consecutive times. Because of this I throttled the tree because there's no use in trying new changes until we get the CQ actually finishing correctly. - Guado moblab is one of the main offenders. To get to the debug logs, I clicked on one of the failed builds, scrolled to the HWTest section, clicked on "Link to suite".
- Once there, I clicked on the failed test, provision. (Shows up in a purple box) Then on the new page that opened, clicked on "view all logs".
- From there, navigated to the debug directory and took a look at "autoserv.DEBUG".
- Searched until I found the string "Autotest caught exception when running test". Just above that line, it shows the command that was attempted. In this case it was "/tmp/stateful_update: Permission denied".
- Filed crbug.com/600403.
- The infrastructure teams notes that it's helpful to include the hostname in the bug as well. The hostname of the DUT and the buildslave.
- daisy-full failed SimpleChromeWorkflow. The buildslave also appears to be offline since last friday.
- To find the logs, I clicked on the failed build, and clicked the "stdio" link under SimpleChromeWorkflow. Scrolled down to find the traceback && STEP_FAILURE.
- Looks like a couple different errors: a read-only filesystem, and an ImportError for no module named apport.fileutils.
- Filed crbug.com/600413.
amd64-generic-asan has been failing for a _very_ long time. I found crbug.com/589885 where some progress is being made. The primary CL is still pending review.
For triaging canary failures, I first take look at each release group. Most of the failures seem to be due to the suite timing out. However there are a few other issues. You can check these by viewing the "stdio" link under step that's yellow or red. In the afternoon, the internal waterfall seemed dead. Infra deputy filed crbug.com/600526 to a trooper.
2016-04-01 Sheriff: moch, zachr - 599674: glados-release-group chell and cave fail buildpackages
- 599982: daisy-paladin did not start
2016-03-31 Sheriff: kitching, moch, zachr - 597866: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto
- Apparently nothing to worry about since important=False
- Almost all CQ builders are timing out, but seems like current builds are succeeding
- 596630: "Failed to install device image using payload" during provision errors (x86-zgb-paladin)
- 579119: unittest timeout (peach-pit-paladin)
2016-03-30 Sheriff: kitching - 589885: failure in desktopui_ScreenLocker still showing up in amd64-generic
- 598967: LeakSanitizer: detected memory leaks in update_engine-0.0.3-r1895 UnitTest (deymo@ investigating)
- [from akeshet@, fixed] 598960: storm and whirlwind paladins failing consistently in vox unittest
- 598980, 51703: rialto-services use of ReadFileToString needs updating
- 598517: "SSH reports a generic error" failures during provision
- strago board issues:
- 583014: Strago boards don't have bvt results for the last week
- 51482: Braswell systems are repeatedly failing to install in the autotest lab with eMMC failures
2016-03-25 Chromeos Gardener: jennyz Sheriff: bsimonnet, gwendal, wuchengli - amd64-generic
- 589885: failure in desktopui_ScreenLocker
- 556785: Reduce parallelism during unittests - unit test fail.
- autoupdate failure:
- 557106: File system corruption on samus DUTs.
- 598224: Several CQ/paladin builders offline.
- 596150: pineview-release-group fails InitSDK
- 593565: Paygen failure (FAIL: Unhandled TypeError: expected string or buffer)
2016-03-23 Chromeos Gardener: jennyz Sheriff: dgreid, josephsih - 597213: [bvt-cq] platform_Perf Failure on tricky-chrome-pfq/R51-8100.0.0-rc3: Could not find build id for a DSO.
- 597183: provision_AutoUpdate.double_SERVER_JOB Failure on tricky-chrome-pfq/R51-8100.0.0-rc2: assigned to dfang.
- 597111: SitePerProcessBrowserTest.PagePopupMenuTest flaky on Linux_Chromeos_Test bot: kenrb fixed it today.
2016-03-22 Chromeos Gardener: jennyz Sheriff: tbroch, scollyer - 594336: network_DefaultProfileCreation Failure on tricky-tot-chrome-pfq-informational/R51-8053.0.0-b61: assigned to zqiu@.
- 597111: SitePerProcessBrowserTest.PagePopupMenuTest flaky on Linux_Chromeos_Test bot.
- 536061 : (non-closer, builds fixed on retry) debugd:missing dependency. fixed by olofj
2016-03-21 Sheriff: tbroch, scollyer - 595274 : (tree-closer) webRTC HW Decode/Encode crashes tab
- 595988 : (flaky/non-closer) network_DefaultProfileCreation Failure on tricky-tot-chrome-pfq-informational
- 596150 : (non-closer) pineview-release-group fails InitSDK
2016-03-14 Sheriff: djkurtz, marcheu, shawnn - 51123: oak-release-group: elm: SignerTest fails: security_test_image failed == "CHROMEOS_RELEASE_BOARD: Value 'elm' was not recognized"
- 594556: x86-generic-paladin: VMTest: desktopui_ScreenLocker fails => Screen is locked
- 594565: mttools: BuildPackages fails on first attempt
- 594571 veyron_rialto-paladin: BuildPackages fails: Cannot find prebuilts for chromeos-base/chromeos-chrome on veyron_rialto
- 594622 veyron_minnie-cheets paladin consistently failing
- 594592 lakitu-incremental builder failing gcetest
- 594699 samus vmtest failures
2016-03-11 Deputy: shuqianz, Sheriff: marcheu, shawnn - 594176: daisy_skate-chrome-pfq provision failing
- 593926: Lars devices in lab going down
- 592766: chromeos-bootimage build failures
- 594233: paladin builders offline
2016-03-04 wiley, drinkcat (honorary), aaboagye - PreCQ
- 592143: PreCQ: Failing InitSDK (Fixed due to chumping some python changes.)
- PFQ failures
- 591401: BuildImage step failing on PFQ "No space left on device" (Fixed with a revert.)
- 554222: AutoservSSHTimeout PFQ failures
- 582477: video_ChromeHWDecodeUsed is flake on CQ
- Canary
- 591965: guado_moblab-paladin: HWTest fails "bash: /tmp/stateful_update: Permission denied"
- A following run also failed, but what looks like to be for a different reason.
- 591957: smaug-paladin: BuildPackages failure "sys-fs/udev[gudev(-)] ("sys-fs/udev[gudev(-)]" is blocking dev-libs/libgudev-230)"
- CQ
- 592148: chromeos-test-testauthkeys-mobbase failed to build due to collisions.
- 592182: guado_moblab moblab_RunSuite failure in CQ run.
2016-03-03 Sheriff: cywang, aaboagye, wiley - Chrome PFQ failures
- 590762: Broken CrOS build of telemetry autotests - still happening
- 591731: chromeos-chrome: build failure 'ppapi_example_video_decode': No such file
- See 591782 and 59140 for the background for this bug.
- Basically, trying to add earlier failures for file operations in the chromeos-chrome.ebuild.
- The 1st change was submitted, but led to the ppapi_example_video_decode error. Change was then reverted.
- At a later time the cleanup will land.
- This may cause the telemetry failures to pop up again.
- CQ
- 591639: graphics_GLBench(graphics_utils) failed in HWTest - fix submitted
- 591837: prebuilts failing to upload on certain paladins. GS flake? (lakitu, guado)
- Canary
- 591656: security_AccountBaseline failed on lulu - fix submitted
- 591658: security_StatefulPermissions Failure on lulu - fix submitted
- 583014: strago release groups red since December 2015 (~2% pass rate)
- Misc
- 591853: public waterfall is missing the status boxes
2016-03-02 Sheriff: bfreed, charliemooney, cywang - PFQ failures
- 591308: ChromeSDK failed in Chromium PFQ
- 590762: Broken CrOS build of telemetry autotests - force another chromium PFQ build
- 591401: Builders failing in BuildImage step because they run out of storage
- 376372: about 8 canaries hit a HWTest "Suite timed out" error.
- 590372: A few builders died trying to sync the source (error: Exited sync due to gc errors)
2016-03-01 Sheriff: bfreed, charliemooney - 591097: shill and dhcpd flake causing HWTest infrastructure failures and 10 straight CQMaster failures.
- 591231: samus canary timeout in paygen stage while trying to copy a gsutil file.
- 589135: rambi-c-group canary failed in Archive: "tar: chromiumos_test_image.bin: file changed as we read it"
- 591256: peach group canary failed in Paygen with LockNotAcquired error
- 583364: Veyron Paygen downloading failures
2016-02-26 Sheriff: drinkcat - 590113: x86-generic incremental VMTest security_ASLR fails (once in VMTest, a bit strange)
- Closed the tree for 1 minute, false alarm: CQ-master page gave me the impression that the built failed because of rialto
- PFQ failures
- 590133: amd64-generic chromium PFQ: fatal error: ui/accessibility/ax_enums.h: No such file or directory
- 590114: [bvt-cq] provision Failure on daisy_skate-chrome-pfq/R50-7966.0.0-rc2 (autofiled)
- 584542: toybox build is flaky, but never caused an actual build failure. Local fix on gerrit, started upstream discussion about fix
2016-02-25 Sheriff: jrbarnette, quiche - 590065: toybox build is flaky
- 589879: Build failures on "Lumpy (Chrome)" and "Alex (Chrome)"
- 589905: Lumpy timing out in afdodatagenerate
- 589885: desktopui_ScreenLocker failure on chromiumos.chromium
- 589844: CQ failure due to HWTest failure on veyron_minnie-cheets-paladin
2016-02-25 Sheriff: drinkcat - 589690: CQ fails at CommitQueueSync, other builders in Sync (Cannot fetch chromiumos/third_party/arm-trusted-firmware)
- Chumped manifest change to pin
- 589713: third_party/arm-trusted-firmware: Figure out which branch to track (Follow up on underlying issue)
- p/50460: oak-full build failure
- 589777: lakitu: security_AccountsBaselineLakitu Baseline mismatch
- 2 Sync issues:
2016-02-24 Sheriff: jrbarnette, quiche - 588834: audio_CrasSanity fails: "CRAS stream count is not matching with number of streams"
- This can cause failures in the CQ. All boards seem to be affected.
- Reverted three CLs; it's not yet known whether that will stop the problems.
- 589641 graphics_Sanity failing on veyron boards
- This has caused some failures in the CQ. So far, only veyron shows the problem.
- 589623 Pre-CQ cannot uprev and rejects new CLs
- A bad CL was chumped in without review.
- Chumped in a fix to go with it.
2016-02-22/23 Sheriff: ejcaruso, waihong - 588739: Timed out going through login screen. Cryptohome not mounted.
- 588834: audio_CrasSanity fails: "CRAS stream count is not matching with number of streams"
- 588921: Some builder suffer a virtual drive failure.
2016-02-17/18 Sheriff: wnhuang - 587411: Multiple CQ build failure due to infrastructure issue
2016-02-16 Gardeners: jennyz Sheriff: wfrichar, davidriley, kcwu - 558983: daisy-skate PFQ occasionally failed for this issue. The pending cl for fix this is not landed yet. guidou@ is working on it.
- 585973: daisy-skate PFQ occasionally failed for this issue.
2016-02-10/11 Gardeners: stevenjb Sheriff: dtor, avakulenko - 586180: Pre-CQ and CQ masters failed due to git outage during source sync
- 586179: Canaries fail due to provision timeout (SuitePrep: ABORT due to timeout)
2016-02-09 Gardeners: stevenjb / afakhry Sheriff: scollyer, furquan - 571980: provision Failure on peach_pit-chrome-pfq/R49-7763.0.0-rc1
- 585494: x86-alex failing vmtest with Unhandled DevToolsClientConnectionError and Unhandled TimeoutException
- 585552: peach_pit: emmc issues in test lab
- 585554: peach_pit: provision faliure
- 585572: moblab_quick: guado: FAIL: Unhandled IndexError: list index out of range
2016-02-05 Gardener: stevenjb - 584722: chromeos-chrome build failure: "No package 'gtk+-2.0' found" while running pkg-config with media.gyp
2016-02-04 Sheriff: dhendrix - 584542: sys-apps/toybox failing to compile on amd64-generic
- 473899: paygen "Not all images for this build are uploaded", smaug has been seeing this for months.
- 569358: pool: bvt, board: x86-mario in a critical state. (assigned now)
- 584447: pool: bvt, board: veyron_mickey in a critical state. (assigned)
- 571757: [sanity] provision Failure on expresso-release/R49-7760.0.0. Note: This manifested itself as a swarming failing when I updated the bug (#68).
2016-02-03 Sheriff: johnylin,grundler, dbasehore - 561036: FIXED: paygen timing out: dshi appears to have fixed this
- 574915: VMTest failures in desktopui_ScreenLocker - jdufault investigating
- 578771: GPT Header Issue
- 579119: Unittest timeout
- 581639: IGNORE: lakitu_mobbuild fails cloud_SpinyConfig: turning down this build (sosa)
- 582144: FIXED: security_ASLR: reverting changed fixed problem (https://chromium-review.googlesource.com/324950)
- 582325: veyron-b: rialto-services emerge fail
- 582521: FIXED? error in gsutil: samus canary builds succeeded on Feb 02 19:15. Also seen on daisy.
- 583081: FIXED: autotest-chrome build failures (https://chrome-internal-review.googlesource.com/#/c/247126/)
- 583535: FIXED: login_* test failures: reverted https://codereview.chromium.org/1646223002 (alchuith, dup:583382)
- 583684: FIXED: CommitQueueSync repo sync: manifest referred to a tag instead of branch
2016-02-02 Sheriff: grundler,dbasehore - 561036: paygen timing out on release builders
- 574915: VMTest failures in desktopui_ScreenLocker (later forked into three bugs)
- 581639 - lakitu_mobbuild fails cloud_SpinyConfig (known issue)
- 582521 - samus canary failed because of error in gsutil
- 583375: provision thrashing causing canary/beta build timeouts (kevcheng)
- 583382: login_* tests failing (may be dup of 574915 or others)
2016-02-01 Sheriff: bleung, puthik - 582531 - flaky HWTest for Pineview/ strago-b / sandybridge
- 583375 - canary and beta builds can cause provision thrashing which can cause hwtests to time out
2016-01-29 Sheriff: bleung, puthik - 582521 - samus canary failed because of error in gsutil
- 581639 - lakitu_mobbuild fails cloud_SpinyConfig
- 576879 - pool: bvt, board : candy in a critical state.
- 582325 - veyron-b: rialto-services emerge fail
2016-01-28 Sheriff: bhthompson, shchen, hychao - 582144: security_ASLR test failing on glados, strago, strago-b with Unhandled TypeError
2016-01-27 Sheriff: bhthompson, shchen, jchuang - 581598: archive stage failure at BuildAndArchiveFactoryImages
- 581624: gd-2.0.35 build failed on guado_moblab
- 581630: docker build failed on lakitu_next
- 543649: smaug paygen failing with "Not all images for this build are uploaded, don't process it yet" (does not cause canary failure, low priority)
- 581631: cheets_SettingsBridge: Timed out waiting for condition: Android font size set to smallest
- 581639: GCETest fail at 01-cloud_SpinyConfig on lakitu_mobbuild
2016-01-26 Sheriff: robotboy, semenzato, jchuang - 580184: PFQ failed to build related to chromeos/ime/input_methods.h missing
- 561036: paygen timing out on release builders
- 581382: perf_dashboard_shadow_config.json syntax error led to parse job failure (causing several timeout)
2016-01-25 Sheriff: littlecvr - 486098: Builder failure HWTest Code 3 - not enough detail to debug
- 561036: paygen timing out on release builders
- 547055: Jecht Group Failed Archive Step
2016-01-22 Sheriff: littlecvr - 547055: Jecht Group Failed Archive Step
- 578771: Paygen error: GPT_ERROR_INVALID_HEADERS
- 558266: [au] autoupdate_Rollback Failure on ultima-release/R49-7655.0.0
- 580184: Master: PFQ failed to build related to chromeos/ime/input_methods.h missing
- 580261: Update/provisioning timeouts during tests due to slow network
- 579811: lakitu-release build continuously failed at GCETest
2016-01-21 Sherif: deymo, zqiu, hungte Chromeos Gardener: jennyz - 580184: Master: PFQ failed to build, related to missing chromeos/ime/input_method.h
2016-01-20 Sheriff: stevefung, dlaurie, hungte Chromeos Gardener: jennyz - 579565: M49: PFQ Failing chromite unit testing on lumpy.
2016-01-14 Sheriff: stevefung, dlaurie - 322443: M49 PFQ failing unit tests
2016-01-14 Sheriff: vapier, zeuthen - 577549: lakitu_mobbuild_paladin fails at mariadb
- 577542: build_packages fails at chromeos-mrc on strago canary and paladin build
- 577836: lakitu_mobbuild_paladin fails at serf
2016-01-13 Sheriff: cychiang - 576905: pool: bvt, board: veyron_mighty in a critical state.
- 576992: util-linux-2.25.1-r1 build failure on cyan canary build
- 577025: TestFailure(paygen_au_dev,autoupdate_EndToEndTest.paygen_au_dev_full,Failed to perform stateful update on chromeos2-row2-rack10-host9)
- 571747: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row2-rack3-host1)
- 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
- 571884: [bvt-inline] security_ASLR Failure: No such file or directory: '/proc/32189 32187/maps'. (on PFQ)
- 577549: lakitu_mobbuild_paladin fails at mariadb
- 577542: build_packages fails at chromeos-mrc on strago canary and paladin build
2016-01-12 Sheriff: cychiang - 576525: chromeos-bootimage build failure on nyan_blaze: Unknown blob type 'boot' required in flash map
- 576526: cheets_PerfBootServer failure at wait_for_adb_ready
- 529612: lakitu_mobbuild: cloud_CloudInit fails in VMTest
- 576549: lakitu_mobbuild canary build fails at GCE test because of quota exceeded
- 576545: rambi-a-release group clapper build_packages fails at net-misc/strongswan
- 571749: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row5-rack8-host11)
- 571747: TestFailure(sanity,provision,Failed to perform stateful update on chromeos4-row2-rack3-host1)
- 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
- 576608: security_AccountsBaselineLakitu fails with Baseline mismatch
2016-01-06 Sheriff: moch, zachr - 572745: [bvt-cq] graphics_GpuReset Failure on falco-chrome-pfq
- 574870: [sanity] dummy_PassServer.sanity_SERVER_JOB Failure on veyron-b-group canary
- 574915: VMTest failures in desktopui_ScreenLocker, securityASLR, login_LoginSuccess
- 574303: provision Failure on cyan-release
2016-01-05 Sheriff: moch, zachr - 574501: amd64-generic ASAN vmtests failing (desktopui_ScreenLocker, buffet_InvalidCredentials, buffet_IntermittentConnectivity)
2016-01-04 - 574197 Peach group Canary failing since 12/29
Gardener: stevenjb@/jdufault@ - 574104 : LKGM builder needs to be updated to git
- 573961 : Peach pit failures
- Forcing a rebuild, looks like it might be infra flake: 'Failed to install device image using payload at...'
- 574198 : PFQ flake, security_SandboxStatus
2015-12-28 Sheriff: itspeter Investigating across all the build status over the weekend of 12/25-12/27. Below are outstanding / repeating failures: - pineview-release-group HWTest [x86-XXX] [bvt-inline]
- from build #1602- #1604
- 569357: Indicates the following need to be recovered.
- chromeos4-row5-rack13-host3,chromeos4-row5-rack13-host7,chromeos4-row5-rack13-host9
- chromeos4-row5-rack13-host3,chromeos4-row5-rack13-host5,chromeos4-row5-rack13-host7
- strago-release-group HWTest [ultima]
- [bvt-inline] from build #1597 - #1601, #1603, #1604
- [sanity] from build #1602
- 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
- strago-release-group HWTest [cyan]
- [sanity]: from build #1597, #1600, #1601, #1602, #1604
- 547536: provision flake: Failed to install device image using payload #1597, 1600, 1602, 1604
- 568708: DownloaderException: Could not find *_full_* in Google Storage #1601
- [bvt-inline]: from build #1598, #1599, #1603
- Still not able to root cause the [bvt-inline] across different boards.
2015-12-25 Sheriff: itspeter - 547536: provision flake: Failed to install device image using payload
- Repeatedly occurred on strago,
- 571884: [bvt-inline] security_ASLR Failure: No such file or directory: '/proc/32189 32187/maps'.
- Repeatedly occurred on jecht, strago
- 571730: Flaky VMTest security_ASLR: Command <pidof debugd> failed
- Repeatedly occurred on jecht (572093 merged), glados, auron-b,
- 568708: DownloaderException: Could not find *_full_* in Google Storage
- Repeatedly occurred on rambi-d, ivybridge
- 505744: TestFailure(sanity,provision,Unhandled AutoservSSHTimeout: ('ssh timed out', * Command: )
2015-12-23/24 Sheriff: wuchengli - 485197: Provision failure downloading stateful.tgz
- 568708: DownloaderException: Could not find *_full_* in Google Storage
- 546630: Peppy Paladin Provision Error
- 571874: BuildPackages failed on gobi-firmware
- 571730: Flaky VMTest security_ASLR: Command <pidof debugd> failed
- 547536: provision flake: Failed to install device image using payload
- 551003: cyan-cheets: [Errno 28] No space left on device
- 548114: Autotest client terminated unexpectedly: We probably lost connectivity during the test..
- 571884: [bvt-inline] security_ASLR Failure: No such file or directory: '/proc/32189 32187/maps'.
- 569357: pool: bvt, board: x86-mario in a critical state.
2015-12-21/22 Sheriff: tfiga, tbroch, martinroth - 571599: daisy: Missing Manifest files in overlay-daisy
- 571505: hwlab down due to DB capacity ( PFQ fail hwtest [ sanity ] )
- 48735: missing Manifest in overlay-guado-private
- 571221: Builders failing at "running steps via annotated script" stage
2015-12-17/18 Sheriff: josephsih, tbroch, martinroth - 569620: bvt-inline and paygen time out in canaries.
2015-12-14/15 Sheriff: gedis, benzh - 569620: bvt-inline and paygen time out in canaries
- 569487: security_ASLR failures in lakitu canary
- 569726: cautotest is down
- 569979: Paygen fails on all canary builders
- according to @deymo, should cycle green. If not, ping @deymo, @gedis, @fdeng
- 569983: Unittest fail test_count_jobs: When n jobs are inserted, n jobs should be counted within a day range
2015-12-14 Sheriff: kitching Gardner: - 439136: Existing issue with google-breakpad on auron-b group canary
- 569487: security_ASLR failures in lakitu canary
2015-12-11 Sheriff: kitching, aaboagye Gardner: achuith - 569163: Many CQ paladins failed at CommitQueueSync step.
- See also b/26161444
- Subsequent CQ run was unencumbered.
- 568473: CL:317573 chumped right before 18:00 PST, current canary builds should finish EOD
- Still seeing canary failures even though CL:317573 landed. Recommend that they revert for now as it's blocking lakitu release.
- It's actually a different error, but similar error string. Fix is in CL:317780.
- Chumped before the 1800 PST launch of canaries. Hopefully that will be the last of the paygen issues.
- CQ master failed due to CL:317573 being chumped (gob_util.py got Conflict: change is closed), should finish EOD
2015-12-10 Sheriff: aaboagye Gardner: achuith - 568473: Paygen error (Payload integrity check failed: Unsupported minor version: 3)
- Fix should be going in, in CL:317573
- 568496: tricky PFQ graphics_GpuReset Failure
- Was just a one-off due to the DUT rebooting.
- 550826: amd64-generic ASAN failed on buffet_Registration / buffer_BasicDBusAPI
- One hiccup with the CQ master PublishUprevChanges stage.
- Bug filed here - 568780: CQ PublishUprevChanges stage uses repo list as it existed before applying changes
2015-12-08 Sheriff: drinkcat Gardner: , - 567936: lakitu-incremental failed with GCETest errors (for some reason it did not run for a week...) => Fixed+Verified
- 550826: amd64-generic ASAN failed on buffet_Registration / buffer_BasicDBusAPI
- 567989: SyncChrome issue is killing all the canaries, fortunately does not affect paladins (yet?) => Fixed+Verified
- 568095: daisy_skate Bluetooth issue that I believe is causing some test flakiness (spotted it in Chrome PFQ daisy_skate) => bluetooth update reverted across all kernels
- 568473: Paygen error (Payload integrity check failed: Unsupported minor version: 3)
2015-12-08 Sheriff: dgreid, scollyer Gardner: achuith, afakhry - Signer timing out
- 567797: HW and VM tests failures due to adb_wrapper.AdbWrapper.KillServer error on chromeos
2015-12-07 Sheriff: jcliang - 529905: bobcat failed to setup_board
- 47849: update-signal-relay build failed on daisy_winter canary
- 543649: smaug canary failing paygen stage
2015-12-04 Gardener: stevenjb - 566057: AdbClientSocketTest.TestFlushWithoutSize and AdbClientSocketTest.TestFlushWithData flaky
- Continuing to investigate:
566152: VMTest failures in login_RemoteOwnership, login_LoginSuccess, login_OwnershipApi, login_GuestAndActualSession - 566503 VMTest failure: security_NetworkListeners
2015-12-03 Sheriff: olofj, wiley, posciak Gardener: stevenjb - 565228: Multi canaries failing after 5 failed attempts to start VMs
- 565349: dev server fails to start in mario-incremental
- 566152: VMTest failures in login_RemoteOwnership, login_LoginSuccess, login_OwnershipApi, login_GuestAndActualSession
2015-12-02 Sheriff: olofj, wiley, posciak - 564870 : ERROR: <class 'chromite.lib.parallel.ProcessSilentTimeout'>
- 514802: Provision fails with "start: Job is already running: autoreboot"
- 564336: buildbot internal failure is not supposed to cause tree throttling
2015-11-26 Sheriff: cywang Gardener: jennyz - 561939: image signer stage is slow
- 561990: Rikku: missing Manifest of chromeos-factory-board package
- 563877: CQ failing to create valid manifest
- 563878: crbug.com/new shortcut broken
2015-11-25 Sheriff: sonnyrao, avakulenko, cywang Gardener: ihf - 561208: {Rambi-a, jecht} group machines not available in test lab for HWTests.
- 561214: HWTestDumpJson ERROR: No JSON object could be decoded
- 561244: bvt test got aborted but the real test job completed successfully
- 554043: UnitTest timeouts in the CQ
- 560915: disable Bluez flaky unit test
2015-11-24 Sheriff: sonnyrao, avakulenko, yoshiki Gardener: ihf - 556785: builders fail due to timeouts. build/unit test stages take over 1 hour and process is killed due to timeout.
- Test failure in lakitu_mobbuild canary, reverted CL: 239368
- 561036 paygen timing out on release builders
2015-11-20 Sheriff: ejcaruso, briannorris, wnhuang Gardener: afakhry - 554222: provision failure on falco and daisy_skate PFQs. AutoservSSHTimeout.
2015-11-18 and 2015-11-19 Sheriff: wnhuang, davidriley Gardener: - 558366: storm group canary build_package failed at wireless-regdb
- 47849: update-signal-relay build failed on daisy_winter canary
- 207003: peach_pit build_packages failed at exynos-pre-boot
- 557578: veyron_minnie-cheets fails at chromeos-bsp-minnie-private
- 452759: unit test timeouts on auron, rambi-a, glados
- 516795: builds failing for exceeding 8 hour time out (auron, rambi, veyron, slippy, ivybridge)
- 558457: ChromePFQ is all red.
- 557449: cros_trunk is red.
2015-11-17 Sheriff: waihong, bfreed, dhendrix, jchuang Gardener: - 207003: butterfly & leon build_packages failed in chromeos-touch-firmware-samus
- 557245 (was 549044): Canary failure: The Paygen stage failed: Image signing timed out
- 557214: build310-m2 failing to repo sync, causes CommitQueueCompletion to fail: tricky-paladin did not start
- 557106 and 557107: Samus canary failures (HW issues)
- 557238: Veyron_minnie recovery image signing issue ("veyron_minnie has broken appid setting")
- 557314: Tree closer: Pre-CQ Sync stage fails to pick up CLs
- I do not know how to find pre-cq problems in general, but these provide clues:
- https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre-cq
- https://uberchromegw.corp.google.com/i/chromeos/builders/Pre-CQ%20Launcher
- 557364: Need to recover Rialto BVT machines to get TPM into a good state.
- 207003: peach_pit build_packages failed in chromeos-touch-firmware-pit.
- 552648, 536689, 535928: Multiple network_VPNConnect.l2tpipsec_xauth failures
2015-11-16 Sheriff: kcwu, waihong, bfreed Gardener: - 556529: Samus build_packages failed in chromeos-touch-firmware-samus
- 25691600: Network/Hardware Issue with chromeos4-devserver2, possible cause of 540587: provision Failure (Failed to perform stateful update)
- 556671: veyron canary: timeout_util_unittest failed
2015-11-13 Sheriff: kcwu Gardener: - 540587: provision Failure (Failed to perform stateful update)
2015-11-10 and 2015-11-11 Sheriff: jrbarnette, dianders, jchuang Gardener: - 551279 x86-zgb paladin timeout in p2p, modem-manager-next unittest (3 fails in a row again)
- 553424: login problems, including "Malformatted response" in login_OwnershipTaken and "Timed out going through login screen" in others.
- 554043: unittest timeouts
2015-11-06 and 2015-11-09 Sheriff: waihong, rspangler, hychao, dhendrix Gardener: dzhioev - 552452 glados group canary: Failed to create HWID v3 bundle
- 543958 veyron-b-release-group: The priority of the inactive kernel partition is less than that of the active kernel partition.
- Attempt to resolve some spammy, mass-autofiled bugs (there seem to be a lot that have gone unnoticed for several weeks):
- 553442: Remote power management failing for many builders
- 553579: video_VideoDecodeAccelerator failure seen on many builders
- 553424: login_OwnershipTaken failing on multiple builders
- 553575: p11_replay/chaps causing network_VPNConnect.l2tpipsec_cert test to fail
- 553548: video_VEAPerf fails on many BVT machines
- 549910: touch_TouchscreenScroll failures on Samus and Sumo
- 553226: cyan, celes, veyron_rialto missing from KernelVersionByBoard "expected" file in autotest
- For next sheriff rotation: If you get bored, please look at other autofiled issues and attempt to triage and find owners for them: https://code.google.com/p/chromium/issues/list?q=label:autofiled
2015-11-04 and 2015-11-05 Sheriff: dlaurie, cychiang Gardener: tdanderson - 551451 Failing BrokerFilePermission.* sandbox death tests are preventing a PFQ uprev
- 547057 Paygen timeout
- 547434 Paygen autotest client terminated unexpectedly
- 548037 Paygen command execution failure
- 551586 Paygen failed to create cache file
- 545065 login_GuestAndActualSession_SERVER_JOB failure
- 500094 Builder load in unittest causes some unittests to timeout
- 542558 Mario HWtest pool health
- 550768 bluez timeouts causing builders to fail randomly (NOT FIXED YET)
2015-11-03 Sheriff: bleung, deymo, cychiang - 550768 strago-paladin and tricky-paladin timeout while building bluez-5-r40
- 550826 amd64-generic ASAN failed on buffet_Registration / buffer_BasicDBusAPI
- 550840 [samus] bvt-inline test failed to remove /var/tmp/messages.autotest_start
- 549472 [bvt-inline] security_SandboxStatus Failure on lumpy-chrome-pfq/R48-7595.0.0-rc1
- 548535 [bvt-cq] video_ChromeRTCHWDecodeUsed Failure on tricky-chrome-pfq/R48-7589.0.0-rc1
- 549044 The Paygen stage failed: Image signing timed out. Failure on samus-canary/7608.0.0
- 546457 veyron_mighty interal server error on HWTest Failure on veryon_group_canary/R48-7608.0.0
- 542558: pool: bvt, board: x86-mario in a critical state
- 544654 [paygen_au_dev] autoupdate_EndToEndTest.paygen_au_dev_full Failure on candy-release/R48-7608.0.0
- 551279 x86-zgb paladin timeout in unittest
2015-11-02 Sheriff: bleung, deymo, reveman Gardener: - 548755 BranchUtil failure on canary master -> external and internal manifest out of sync
- 546871 panther: git package seems corrupt while building
2015-10-29 Sheriff: chihchung, semenzato, shchen Gardener: abodenha - 549044 Paygen image signing timed out
- 547541 CQ Failing in PublishUprev repeatedly
2015-10-28 Gardener: abodenha - 548693 video_ChromeRTCHWDecodeUsed test has been flaking since Oct 9
- 548544 Compile failure on 8010 Builder #48.0.2548.0
2015-10-27 and 2015-10-28 Sheriff: abrestic, dbasehore - 548257 paygen failures which moved to ASAN only failures
- 547057 paygen timeouts
- 548723 autotest timeouts on ivybridge and slippy devices
- 548755 BranchUtil failure on canary master
- 548804 manifest broken by duplicate remote
2015-10-27 Gardener: stevenjb - 548358: PDFExtensionTest.Load failing on cros_trunk
2015-10-23 Sheriff: johnylin, alecaberg, shawnn Gardener: jonross - 431486: Multiple PFQ failure: shill uprev build failure
- 546865: Chrome PFQ master failed while running annotated script
- 518591: auron-release-group: HWTest flaky for test provision
- 546871: panther: git package seems corrupt while building
- 546921: lakitu-incremental: build error for chaps token_manager
- 546630: PFQ failure: Peppy Paladin provisioning failures.
- 415617: PFQ failure: moblab_RunSuite test failing in lab when trying to determine test platformguado_moblab paladin
- 545779 PFQ not upreving builds. I believe it is related to the linked bug.
- 547055 Canary: Jecht group failed archive step
- 547057 Canary: Rambi Paygen failure, timeout after tests pass.
- 547116 CrOS trunk, Linux ChromiumOS, v8 roll broke a test
2015-10-21 and 2015-10-22
Sheriff: johnylin, alecaberg, shawnn
Gardener: jonross
- 546023: veryon_rialto canary failing to build rialto-services
- 529612: lakitu_mobbuild: cloud_CloudInit fails in VMTest
- 542558: pool: bvt, board: x86-mario in a critical state
- 518591: [sanity] provision Failure: DownloaderException: Could not find autotest.tar in Google Storage
- 546457: veyron_mighty interal server error on HWTest
- 544230 Bot configuration error leading to CrOS trunk browser_test failures. Have been happening since Oct 2.
- 546567 GearMenu tests failing on CrOS trunk since introduction.
- 546581 PFQ failure: Daisy Skate Paladin provisioning failures.
- 546600 CrOS trunk has continuous interactive_ui_test failures for focus. Reverting suspected change.
- 546630 PFQ failure: Peppy Paladin provisioning failures.
- 546708 CrOS trunk compilation failure. Not sure how it made it past trybots and commit queue.
- 545779 Ebuilds not upreving leads to incremental builder failure
- 544751 Glados kernel build failure
- 546639 External builders not running for days
2015-10-20
Sheriff: moch, zqiu, littecvr
- 537475: security_OpenFDs Failure
- 545530: lakitu: security_test_image test failure
- 542558: pool: bvt, board: x86-mario in a critical state.
- 545588: Lakitu Canary: Signer Test failure
- 543874: [sanity] provision Failure on veyron_rialto
2015-10-19
Sheriff: moch, zqiu, littecvr
- 544921: MTV-2081 offline due to water main burst. Lab shutdown temporarily.
- 545171: pool: bvt, board: peach_pit in a critical state.
- 545172: pool: cq, board: peach_pit in a critical state.
- 518591: provisioning errors
- 543646: security_OpenFDs failing on veyron boards
- 543649: smaug canary failing paygen stage
- 543593: update_engine: buffer overflow in unittests (detected on asan bots)
2015-10-14
Sheriff: vapier, grundler, bhthompson
- 543593: update_engine: buffer overflow in unittests (detected on asan bots)
- caused HW test failures on canary broadwell/braswell also.
- 543596: buffet_RestartWhenRegistered autotest is too flaky
- Tree was throttled most of the time both days. Initially due to infrastructures fixes that phobbs hadn't completely pushed.
- Still seeing what might be 465862 with amd64-generic-full (and others) build 15558
2015-10-14
Sheriff: vapier, stevefung
- 541474: platform_Firewall autotest failing for various boards
- 543186: security_mprotect autotest failing for lakitu
- 518591: provisioning errors
- 529612: cloud_CloudInit autotest failing for lakitu
- 534437: SimpleTestVerify failing vmtest
- 543248: Linux ChromeOS Buildspec builder repeatedly failing
- 543593: update_engine failing unittests (buffer overflow) on asan bots
- 543596: buffet_RestartWhenRegistered autotest failing
- 543646: security_OpenFDs failing on veyron boards
- 543649: smaug canary failing paygen stage
2015-10-07
CrOS gardener: stevenjb
- All quiet on the chrome on chrome os front.
Sheriff: tfiga, gedis
- 539720: Waterfall can't accessed through Uberproxy, need to use https://chromegw.corp.google.com/i/chromeos/waterfall
- 538744: swarming internal error on PFQ
- x86-generic full VMTest login_Cryptohome flake
2015-10-06 and 2015-10-05
CrOS gardener: stevenjb
- 539748: depot_tools update caused chromiumos.chromium builds to fail (reverted)
Sheriff: bsimonnet, furquan
- 539594: depthcharge build failure on veyron rialto [Fixed]
- 539748: chromeos-chrome fails to build: rmdir/mkdir access violation in bootstrap.py (all canaries failing) [Fixed]
- 539720: Waterfall can't accessed through Uberproxy, need to use https://chromegw.corp.google.com/i/chromeos/waterfall
- 539739: Flaky VMTest Failures
2015-10-06
CrOS gardener:
Sheriff: tfiga
- 539720: Waterfall can't accessed through Uberproxy, need to use https://chromegw.corp.google.com/i/chromeos/waterfall
- 539748: chromeos-chrome fails to build: rmdir/mkdir access violation in bootstrap.py (all canaries failing)
- 539594: depthcharge build failure on veyron rialto still not fixed
- 538908: still flakey
2015-10-05
CrOS gardener: ihf
Sheriff: bowgotsai
- 418539: Change to fixsecurity_NetworkListeners Failure is out for review.
- 485108: Moved desktopui_FlashSanityCheck from bvt-cq to bvt-perbuild.
- 538908: Lab should have recovered enough daisy_skate to continue testing.
2015-10-02
CrOS gardener: ihf
Sheriff: bowgotsai
- 537655: 'Platform' object has no attribute 'SetHTTPServerDirectories'
- 535374: pre-cq failure: GOBError: Forbidden
- 538017: paygen failure (autoupdate_EndToEndTest.paygen_au_dev_delta Failure)
- 518591: provision Failure (DownloaderException: Could not find autotest.tar in Google Storage)
- 505744: provision Failure (sanity,provision,Unhandled AutoservSSHTimeout)
- 526453: autoupdate_Rollback Failure, ANCHOR TestFailure(au,autoupdate_Rollback,update-engine failed on chromeos4-row1-rack4-host3)
- 538480: autoupdate_EndToEndTest.paygen_au_canary_full
- 538476: autoupdate_EndToEndTest.paygen_au_canary_delta
2015-10-01
CrOS gardener:
Sheriff: josephsih
- 537886: paygen failure (Failed to finish download from devserver)
- 530498: HWTest failure (stage_artifacts timed out)
- 485881: HWTest failure (login_OwnershipNotRetaken)
- 529466: AUTest failure (Suite job failed or provisioning failed)
- 534437: amd64-generic full failed on results-40-buffet_RestartWhenRegistered
- 538098: Sync buildbot slave files failed: update_scripts failed
- 538057: ERROR: Failed to stage payload: stage_artifacts timed out
- 538140: CQ HWTests failing in audio_CrasSanity: Unhandled AttributeError: 'Platform' object has no attribute 'SetHTTPServerDirectories'
2015-09-30
CrOS gardener:
Sheriff: josephsih
- 537419: autoupdate_EndToEndTest.paygen_au_dev__full: Unhandled KeyError: 'source_payload_uri'
- 488291: amd64-generic ASAN failed on login_LoginSuccess
- 534437: amd64-generic full failed on results-40-buffet_RestartWhenRegistered
- 537799: PDFTestFiles/PDFExtensionTest.Load/2 and 8 are currently flaky on cros_trunk.
2015-09-29
CrOS gardener:
Sheriff: kitching
- 537075: falco-full-compile paladin fails due to cidb.py missing sqlalchemy import
- CL 301593: libgcrypt update causing problems with mesa-img
- 537087: lakitu_next-incremental always failing VMTest security_Minijail0 and security_SuidBinaries
- 409019: graphics_GpuReset failure (GPU hang) on falco-chrome-pfq
- 536780: added more timeouts (stage_artifacts)
- 536259: HW Lab Infrastructure: too few guado bvt boards (only 2, require 4)
- 536794: HW Lab Infrastructure: too few stumpy bvt boards (only 3, require 4)
- 537128: 8+ canary build trees failing on Autotest timeouts
2015-09-28
CrOS gardener:
Sheriff: kitching
- 536670: autoupdate_EndToEndTest.paygen_au_* test failure: Canary still has tons of argument mismatch errors even after garnold reverted his CL
- 536690: amd64-generic incremental build failure: updates to buffet package
- CL 302593: fio update fails HWTest
- 536515: HW Lab Infrastructure: too few auron_paine bvt boards (only 3, require 4)
- 536618: HW Lab Infrastructure: too few tidus bvt boards (only 3, require 4)
- 536775: Autotest: many builds failing due to security_ and login_ tests
- 536780: HWLab: builds failing due to timeouts
2015-09-24
CrOS gardener:
Sheriff: deanliao
Builder x86-generic full #17275
- 472858 VMTest fail: DBusException: org.freedesktop.DBus.Error.NoReply
Several autoupdate_EndToEndTest.paygen_au_* test failures:
HWTest lost MySQL connection: 535795
2015-09-23
CrOS gardener:
Sheriff: zachr, marcheu, itspeter
Mostly the same as 09-22. Things different from yesterday:
fdeng@ had Verify-1 for those 2 CLs:
- https://chromium-review.googlesource.com/#/c/294553/
- https://chromium-review.googlesource.com/#/c/296051/
Chump-in CLs (Doesn't break anything, just FYI)
- https://chromium-review.googlesource.com/#/c/300843/
- https://chromium-review.googlesource.com/#/c/301561/
2015-09-22
CrOS gardener: bruthig
Sheriff: briannorris, mylesgw, itspeter
- Several infra failures point to 534361; fdeng is looking at it
Builder Canary master Build #1307
- Builder jecht group canary Build #777
- 491290 autoupdate_Rollback failures on rikku
- Builder pineview group canary Build #1682
- autoupdate_EndToEndTest dashboard shows alex is 80%. Give it a retry.
- Builder auron-b group canary Build #847, Failure since 2015-09-19
- 533881 [paygen_au_canary] provision Failure on gandof-release/R47-7472.0.0
- 533879 [paygen_au_dev] provision Failure on gandof-release/R47-7472.0.0
Builder x86-generic full #15418
- 532658 VMTest fail: DBusException: org.freedesktop.DBus.Error.NoReply
2015-09-21
CrOS gardener: bruthig
PFQ:
- 534437 vm_disk and vm_memory failed_SimpleTestVerify_1_autotest_tests on amd64-generic
- 496555 [bvt-cq] provision Failure on falco-chrome-pfq/R45-7139.0.0-rc2
- 534451 PFQ builds timing out when calling /b/cbuild/internal_master/chromite/third_party/swarming.client/swarming.py
- failed on tricky and falco
- 530498 [sanity] provision Failure on tricky-chrome-pfq/R47-7446.0.0-rc3
- 496292 [bvt-cq] video_VideoSanity Failure on daisy_skate-chrome-pfq/R45-7137.0.0-rc2
- 533979 [bvt-cq] audio_CrasSanity Failure on lumpy-chrome-pfq/R47-7474.0.0-rc1
- 478533 [bvt-cq] desktopui_FlashSanityCheck Failure on lumpy-chrome-pfq/R44-6989.0.0-rc1
- 418539 [bvt-inline] security_NetworkListeners Failure on daisy_skate-release/R39-6315.0.0
- 534544 [bvt-cq] network_DefaultProfileServices Failure on tricky-chrome-pfq/R47-7478.0.0-rc2
cros_trunk:
- 534399 IncidentReportingServiceTests are crashing on the cros_trunk build
cros_stable:
- 510291 DevToolsPixelOutputTests.TestScreenshotRecording fails on cros_trunk official bot
2015-09-17
Sheriffs: posciak, cernekee, smbarber
- 530203 Samus with on-going HWTest Aborts.
- 491290 autoupdate_Rollback failures on beltino
- 483749 more provision failures (daisy_skate)
- 409019 graphics_GpuReset failures on lumpy
- 533006, 530693 multi-GB log files from Link FSI (?) took down the lab network
- 488291 x86-generic, amd64-generic ASAN vmtests are failing
- omg/1030 cros_sdk.py unable to fetch binaries from GS
2015-09-11
CrOS gardener: jonross
- 530646 WebViewTest.Dialog_TestConfirmDialogDefaultGCCancel Failure from V8 Roll (Also 530593)
Canary:
- 530612 Canary master: BranchUtil: No such file or directory: '/tmp/cbuildbot-tmpd0s19p/tmprZ63D7/src/aosp/system/attestation'
- 518591 Veyron-group provision failure.
- 528748 Mario/nyan pool-health bug
- 530203 Samus with on-going HWTest Aborts.
PFQ:
- Actually passed last night! First pass since August 27th :D
- 530498 Provision failure on Tricky
- 471531 Peach pit failure on provision_AutoUpdate.
- 530605 Falco PFQ Aborting during HWTest
- 473976 Daisy_skate failure in HWTest
- 530661 Lumpy audio_CrasSanity Failure
2015-09-10
CrOS gardener: jonross
Sheriffs: drinkcat, snanda, tbroch
Misc:
- 530265 chromiumos sdk :: *-client pkgs need dep to chromeos-dbus-bindings.
- 530194 Alex and Lumpy bots failing update_scripts on missing gclient config.
Canary:
- 529216 paygen failures now also appearing on Nyan, Rambi, and Veyron group
- 528748 x86-mario HWTest pool issue.
- 530203 Samus HWTest reported an infra issue.
PFQ:
- 471531 Daisy AutoUpdate Failure. Looks to be a flake, hopefully clears up.
- Other failures (Tricky, Falco) appears to be possible HWTest [bvt-inline] flakes, as the jobs timed out. No detailed logs or auto-filed bugs from failures.
- 530286 Falco Failure, desktopui_ExitOnSupervisedUserCrash Failure: no process matching chrome
- 489106 Lumpy Autotest failure during graphics_CpuReset. Unexpected termination
- 478416 Peach timeout during desktopui_FlashSanityCheck
2015-09-09
CrOS gardener:
Sheriffs: drinkcat, snanda, tbroch
canary failures:
- 529905: bobcat: setup_board fails
- 523313: nyan_blaze: autoupdate failed. (#2084 / #2085)
- Happened twice in a row now. Might be worth investigating further.....
- 529044: platform_Powerwash: leon does not reboot?
- 529216: x86-mario Paygen: Failed with Killing tasks: [<_BackgroundTask
- 529113: stumpy-moblab (moblab_SmokeSuite)
- 529674: canary master fails in BranchUtil
- 529428: veyron_pinky: 'str' object has no attribute 'write'
- 529565: The Paygen [stout] stage failed: <class 'chromite.lib.paygen.gslock.LockNotAcquired'>
- 529466: x86-mario Unhandled AutoservRunError: command execution error
- 529324: butterfly/sumo provision failure
- Race between UploadTestArtifacts and HWTest phases
- 529612: akitu_mobbuild: cloud_CloudInit fails in VMTest (twice)
PFQ:
- 529480: Several PFQs failing on security_SandboxLinuxUnittests
- Fix landed in chromium. Ensure that the subsequent runs pass.
- This should be fixed from 47.0.2506.0, but PFQ is not picking it up (I tried to abort the current build to force it... that did not work...). Looks like we need to wait for the release tag 47.0.2506.0 to appear on https://chromium.googlesource.com/chromium/src/ (should be Wed 8pm PST if I understand correctly).
CQ:
CrOS gardener:
Sheriffs: drinkcat, aaboagye, benzh
- 529480: Several PFQs failing on security_SandboxLinuxUnittests
- Fix landed in chromium. Ensure that the subsequent runs pass.
- 525128: CustomLauncherPageBrowserTest.EventsActivateSwitchToCustomPage browsertest failing on Builder: Linux ChromeOS Buildspec Tests
- This started failing on the 6th. Need to find a fix ASAP.
529388: ninja failed in build_packages: sys-kernel/chromeos-kernel-3_10
- The subsequent build seemed to work fine.
- 528567: daisy pool critical state (board does not recover from update to 7426.0.0, 7427.0.0 or 7428.0.0; 7429.0.0 seem ok)
- 529443: daisy provision_AutoUpdate failures
- 528748: x86-mario pool critical state => Similar to above
- There were some shill issues and DUTs going down. See b/23896777.
- 524814: daisy/x86-mario Paygen => Long standing issue. Possibly causing 2 issues above?
- 518591: zako/clapper: Could not find *_full_* in Google Storage (possibly same root cause)
- 529216: veyron_mighty/nyan_blaze: Failed with Killing tasks: [<_BackgroundTask (possibly same root cause)
- 529113: stumpy-moblab (moblab_SmokeSuite) => Long standing issue
- 487955: daisy full failure in ChromeSDK => Old issue resurfacing (missing dependency?), check if happens again on next run
- 529160: daisy_spring failed to return from powerwash (log) => Check on next run if it happens again
2015-09-07
CrOS gardener:
Sheriffs: jcliang
- 516795: Many group canaries has the issue of failures "too close to timing out, or exceeds the timeout"
- root cause: Test Infrastructure issues (takes a long time in HWTest(bvt))
- 524814: Infrastructure Issues (code 3 and code 1)
2015-09-04
CrOS gardener:
Sheriffs: jcliang
- 516795: Many group canaries has the issue of failures "too close to timing out, or exceeds the timeout"
- root cause: Test Infrastructure issues (takes a long time in HWTest(bvt))
- 524814: Infrastructure Issues (code 3 and code 1)
- 528176: vmtest fails logging_UserCrash on x86-generic incremental, x86-generic ASAN, and amd64-generic ASAN
2015-09-01
CrOS gardener:
Sheriffs: cywang
- 516795(merged 526629): Many group canaries has the issue of failures "too close to timing out, or exceeds the timeout"
- root cause: Test Infrastructure issues (takes a long time in HWTest(bvt))
- 509779 Infrastructure Issues (code 3 and code 1)
- 526641(fixed): after vapier clobbered ccache, sys-apps/util-linux-2.25.1-r1 built successfully
- 527909: veyron_jaq canaries failing to sign/paygen due to broken bsp
- 528017: veryon_jerry canaries failing to sign/paygen due to broken bsp
2015-08-31
CrOS gardener:
Sheriffs: cywang
- 526716 WebRTC is failing to build on mipsel. Preventing up rev of Chrome on CrOS
- 526629: HWTest [clapper] [bvt-inline] test timeout failure?
- test seems finished after the test was stopped by 'timeout'?
- 526641: pineview group: failed to build sys-apps/util-linux-2.25.1-r1
2015-08-24
CrOS gardener:
Sheriffs: dhendrix
- 524814: Canaries are falling over on autoupdate_EndToEndTest.paygen_au_canary_test_full
- Provision failures that seemed to fix themselves.
2015-08-21
CrOS gardener: tbrazic
Sheriffs: ejcaruso, bfreed, hychao
- 517876: DUTs lost RPC connections
- 523189: login_OwnershipTaken failures on multiple boards
2015-08-20
CrOS gardener:
Sheriffs: ejcaruso, bfreed, itspeter
- 522851: [Pri-0] Google cloud storage exception causing archive steps fail across major builtbot
- Closing tree because this cover all the errors underground.
- crosreview.com/294781 reverts crosreview.com/286913 which changed how we build and install rsa_id files.
- Note: We cannot login to bots to confirm the missing file, but the log entry "CommandException: No URLs matched: /b/cbuild/external_master/buildbot_archive/daisy-incremental/R46-7381.0.0-b24819/id_rsa" gives the best clue.
- 522785: buildpackages [x86-alex] [afdo_use] is flaky in pineview group canary
- 518591: samus-release: provision failure, infra flaky, succeed in next build.
- 503526: ivybridge-freon-release-group: DUTs pool is too small
- 496036: sandybridge-release-group: DUTs pool is too small
- 523170: gizmo canary fails during BuildPackages
- 523173: nyan group canary timed out during paygen
- 523174: auron group canary has issues with rpm unit tests
- 523139: sandybridge-freon group canary, x86-alex_freon paladin not found on in master and won't build
2015-08-19
CrOS gardener: girard
Sheriffs: dlaurie, wiley, itspeter
- 522533: lab problems creating various failures
- 522540: amd64-generic and x86-generic chromium PFQ failures in security_OpenFDs test
- 522528: HWTest run_suite failures
- 522410: ap-daemons fails to build on storm, blocking many pre-cq runs
2015-08-18
CrOS gardener: girard
Sheriffs: dlaurie, wiley, owenlin
- Spontaneous network failure! Tests won't succeed without network.
- 522130: LKGM timeouts on Chrome PFQ (fixed)
- 522139: Paygen timeouts
- 522141: sandybridge-freon-release-group builder needs to be removed
- 522147: mario-incremental failing (fixed)
2015-08-17
CrOS gardener: girard
Sheriffs: davidriley, waihong, owenlin
- 521642: daisy-skate build failed on database error
2015-08-14
Sheriffs: davidriley, waihong, kcwu
- 520931: provision Failure on samus-release, looks like hardware flaky.
- 521046: VMTest kernel_CryptoAPI failed on lakitu canary
- 521018: Several canary groups timed out on the step "steps" (no attribute 'PrintBuildbotStepFailure')
2015-08-13
Sheriffs: jrbarnette, armansito
- Multiple canary failures in the AM, especially 520311.
- Pinned Chrome; waiting for the overnight canaries to prove whether that fixed it.
2015-08-07
CrOS gardener: michaelpg
- 516978: No space left on device -> stateful partition sizeincreased
- 518015: Bots haven't signed the new CLA; LKGM candidates not uploading -> CLA enforcement rolled back
2015-08-06
Sheriffs: cychiang, deymo, adlr
- 517308: security_test_image faild with wrong fs type to mount recovery_image.bin. --> The image seems good and running security_test_image locally can pass.
- 517388: mipsel-o32-generic full failed at ChromeSDK due to --hash-style defaulting to gnu
- 517348: [paygen_au_dev] autoupdate_EndToEndTest.paygen_au_dev_test_full Failure on peach_pi-release/R46-7335.0.0
- 517351: [sanity] provision Failure on lumpy-release/R46-7335.0.0 stage_artifacts timed out
- 517460: build_package failed at chromeos-chrome
Chrome Gardener: stevenjb
- 517238: RESOLVED: ExtensionTestMessageListener::WaitUntilSatisfied() causing flake across a large number of tests
This turned out to actually be 515914 - browser_tests step fails even though all tests pass
- 516978: P0 STILL IN PROGRESS: piglit file collision cause build_image failure. --> No space left on device
This is causing PFQ failures
- 517593: Frequent browser_tests time out with (TIMED OUT) in log
This isn't causing any detected failures because the tests get retried, but it does slow down the tests a little and generates confusion when other issues (e.g. 515914) show up.
2015-08-05
Sheriffs: cychiang, dianders, denniskempin
- 516978: piglit file collision cause build_image failure. --> No space left on device
- 493752: Lab DHCP failures lead to "Host did not return from reboot". It causes PFQ build failure.
- 517027: jecht and veyron_pinky not available in the lab.
- 516795: veyron group canary is too close to its timeout. ---> seeing other boards fail of the same reason.
- 514700: Samus failures due to stateful partition of Samus DUTs being bad. ext4_lookup errors, "cannot remove" errors, etc.
- 515880: No more samus devices in lab that are good (probably because of above bug).
2015-08-04
Sheriffs: djkurtz, dianders, denniskempin
- 488291: Looks like vm_disk: failed_SimpleTestVerify is flaky and throttled the tree on amd64-generic ASAN.
- chromeos-hwid broke in paladin several times. Probably <https://chromium-review.googlesource.com/#/c/283583/>.
- 516750: Samus failures due to stateful partition of Samus DUTs being bad. ext4_lookup errors, "cannot remove" errors, etc.
- FYI: binutils roll happening
- 515528: AU tests failing. I don't think they always ping this bug, though...
- 516793: cbuildbot timeouts are hard for sheriffs to decipher.
- 516795: veyron group canary is too close to its timeout.
- 530203 Samus with on-going HWTest Aborts.
2015-08-03
Sheriffs: djkurtz, moch, vbendeb
- 516283: chromite: Unittest failure on veyron_pinky in mobmonitor/checkfile/manager_unittest: URLError: <urlopen error [Errno 111] Connection refused>
- 516286: chromite: Unittest failure on "auron group canary" in sync_stages_unittest timeout?
- 516295: zako BuildPackages fails in media-libs/fontconfig
- 515528: [paygen_au_canary] autoupdate_EndToEndTest.paygen_au_canary_test_delta Failure on falco_li-release/R46-7315.0.0
- The autoupdate_EndToEndTest.* tests seem to be failing across many boards over the past ~3 days:
- https://code.google.com/p/chromium/issues/list?can=2&q=autoupdate_EndToEndTest+modified-after%3Atoday-4&sort=-modified&colspec=ID+Pri+M+Stars+ReleaseBlock+Cr+Status+Owner+Summary+OS+Modified&x=m&y=releaseblock&cells=tiles
- 516371: Signer test failing due to unexpected kernel parameter (CL reverted)
- 516430: x86-alex paladin: bvt-cq timed out
- 516457: HWTestStage fails with run_suite.py
- 488291: amd64-generic ASAN failing on desktopui_ScreenLocker
2015-07-31
Sheriffs: vbendeb, moch, pprabhu
- 5155Sheriffs: 60: CrOS Tree Closure: ChromeOS killing DUTs in the lab
- 515905: SyncChrome fails on all Chrome PFQ (non-issue - caused due Chrome pinning because of the above issue, PFQ expected to fail)
- 515937: Temporary workarounds in lab to get DUTs off of a bad build
2015-07-29 and 2015-07-30
CrOS Gardener: alemate (29), afakhry (30-31)
Sheriffs: pprabhu, grundler, zuethen,
- 501178: 13 groups timeout... HWTest of many boards are not running. Closed the tree.
- 515479: kvm is missing on a lot of precq bots breaking vmtest (Fixed by davidjames)
- 515201: Chrome crashes (AddInputDevice) affecting other services(Fixed in Chrome; see also 515154 43375)
- 514700: Samus: file system corrupted after Kingston FW update (reverted Kingston FW update)
- 434755: daisydog constantly restarting (Fix in CQ - x86_64 v3.8 kernel missing /dev/watchdog since last fall)
- 515576: shill crashes took a bunch of DUTs "out of service": alex/zgb/peppy/lumpy/... (chumped Fix in shill)
- HW TestLab Network failure: outbound traffic was losing 60% or more packets (subsided; still monitoring)
- 515905: SyncChrome fails on all Chrome PFQ (Chrome pinned to old version?)
- 515302: SitePerProcessBrowserTest.DiscoverNamedFrameFromAncestorOfOpener is failing on the official cros-trunk.
- 515567: Compile failures on cros-trunk due to #include the generated header "ui/gfx/vector_icons2.h".
- 516052: More Flaky tests in AutofillInteractiveTest on the official cros-trunk.
2015-07-27 and 2015-07-28
CrOS Gardener: bruthig, alemate
Sheriffs: avakulenko, abrestic, vapier/henrysu
- 514257: Lost a bunch of DUTs due to AFE going down last night
- 514364: coreboot branch update caused some builders to go red.
- 512996 BrowserEncodingTest.TestEncodingAutoDetect timing out on cros_trunk
- 513593: [bvt-inline] security_SandboxStatus Failure on lumpy-chrome-pfq/R46-7296.0.0-rc1
- 514401 Multiple build's are failing syncchrome with error: "reference is not a tree: 8a0429e414914781450ca007f20b0e511b3acff7"
- 514499 provision_AutoUpdate failures on Rambi
- 460860 more login_Cryptohome failures
- 514504 desktopui_ScreenLocker failure on Auron (flake?)
2015-07-22
CrOS Gardener: bruthig
Sheriffs:
- 513593: [bvt-inline] security_SandboxStatus Failure on lumpy-chrome-pfq/R46-7296.0.0-rc1
2015-07-21
CrOS Gardener: jonross
Sheriffs: quiche, shchen
- 512417 New FecSendPolicy test is crashing on CrOS Trunk
- 512427 Chrome PFQ failing on autotest-tests-ownershipapi
- 512435 Canary board timeouts, missing boards for HWTest
- 465862 Flake in desktopui_ScreenLocker, with ASAN
- 509274 Canary timeout on candy
- 512577 CrOS Trunk failing on ClickModifierTest
- 513618 mipsel-o32-generic-full failed building Chrome
2015-07-20
CrOS Gardener: jonross
Sheriffs: bleung, hungte, shawnn
- 512010 Chrome OS Canary failure in BranchUtilTest failures: no such option --nobuildbottags
- 512024 Master release failure
- 511680 Samus HWTest failure
- 491361 Strago VMTest failure
- 508637 Jecht-family failure
- 511542 winky DUT shortage
- 512174 CrOS Commit /Queue HWTest failures: /var/log/storage_info.txt does not exist
2015-07-17
Sheriffs: bleung, shawnn
- Known-issues causing multiple canary failures:
- 484726: autoupdate_Rollback failing, possibly due to DNS issues in lab
- 508637: rikku-family stuck at login screen
- 510909: Paygen kernel hash issue. Should be resolved.
- 505744: AutoservSshTimeout
- 511317: login_OwnershipTaken timing out (silently?)
- 511502: libstrongswan missing symbols
2015-07-15
Sheriff: dbasehore, alecaberg
- 510759: Paygen lock not acquired
- 481092: builder must call SetVersionInfo first
- 510909: paygen failure, kernel hash doesn't match
- 509837: amd64 ASAN flake
2015-07-10
Sheriff: zqiu, marcheu, wuchengli
- 510074 amd64-generic-llvm builder unittest failures
- 509779 Flaky HWTest failures
2015-07-10
Sheriff: furquan, charliemooney
- 465862 Flaky screen lock test
- 508637 Rikku: Login screen problem in HWTest
- 491290 Flaky SSH Failure
2015-07-06
Sheriff: puthik, rspangler, seanpaul
- 507279 Lumpy/Falco pfq hwtest failing on timeout
- 501966 pool: bvt, board: lulu in a critical state
- 507372 Drone refresh/execute took over 50s
- 470701 Flaky BVT security_Firewall failure, "Mismatched firewall rules"
2015-07-01
Sheriff: posciak
- 505918 CheckFileModificationTest timeouts
- 506030 Payload generation failures
- 506037 autotest-chrome build failures on missing dependencies
2015-06-26
Sheriff: cernekee, reinauer, kpschoedel
- 505108: wolf-paladin and wolf-canary are failing, lab is closed. Somebody will fix this Monday.
- 505051: "Mismatched firewall rules" test failure on x86-generic build
- 504947: HWTest failures on ivybridge, daisy
- 504861: ASAN buildbot failures
- 504860: HWTest did not complete due to infrastructure issues
2015-06-25
Sheriff: cernekee, reinauer, wiley
- 504602: builders cannot get veyron_pinky chrome prebuilts
- 504476: pre-cq builders are not running
- 504400: Crash in SpokenFeedbackEventRewriterDelegate
- 472895: |AFDO generate| should only run if Chrome has changed
2015-06-24
Sheriff: dtor, wiley, gedis
- 465862: amd64-generic ASAN failure (desktopui_ScreenLocker fails - hitting regularly)
- 430836: autoupdate_Rollback failure
- 491598: platform_Powerwash flake
- 412795: Refresh Packages is down
- 488580: image_to_vm failing
- 472895: Canaries failed while syncing Chrome
2015-06-23
Sheriff: bowgotsai
- 488291: flaky on login_LoginSuccess test
- 465862: amd64-generic ASAN failure
- 503188: HWTest failure
- 488580: parrot canary: image_to_vm failing
2015-06-22
Sheriff: bowgotsai
2015-06-16/17
Sheriff: josephsih
- 500640: Multiple HWTest failures
- 500423: Payload integrity check failed: install_operations[490](MOVE): MOVE operation cannot have extent with start block 0
- 481092: ManifestVersionedSync: RuntimeError: builder must call SetVersionInfo first
- 483661: x86-generic full: vmtest failed in SimpleTestUpdateAndVerify
- 501178: CanaryCompletion: 20 groups timed out
- 444876: Clear and Clone chromite: remote: User Is Over Quota
2015-06-15
Sheriff: filbranden
- 500640: Multiple HWTest failures
- Hardware is borked and we had a deputy outage, so we had to work around it by disabling the hw_tests that were failing.
- CL 277672 disabling the tests.
- ChromeOS Infra team to revert that CL once the hardware is working again.
- 500394: build_package fails on several PFQ builtbot since 6/12
- 500423: Payload integrity check failed: install_operations[490](MOVE): MOVE operation cannot have extent with start block 0
- Still open, if it keeps the tree closed we need to prioritize a revert (even possible?) or push a rushed fix.
2015-06-15
Sheriff: itspeter
- 500394: build_package fails on several PFQ builtbot since 6/12
- 500423: Payload integrity check failed: install_operations[490](MOVE): MOVE operation cannot have extent with start block 0
2015-06-12
Sheriff: itspeter
- amd64-generic ASAN VMTest failure
- 465862: Build haven't been succeed since 2015-06-10, desktopui_ScreenLocker keeps failing
- x86-generic ASAN VMTest failure
- 488291: Seems to be flaky on login_LoginSuccess test
2015-06-05
Sheriff: tyanh
Gardener: girard
- Infra failures on Canary bots
- 491290: autoupdate_Rollback Failure on rambi-release/R45-7142.0.0
- 497035: autoupdate_EndToEndTest.paygen_au_canary_test_full Failure on link-release/R45-7142.0.0
- 497059: HWTest suite prep aborted on Stumpy_moblab Canary
- 497092: Chrome PFQ failing at BuildPackages across all builders - reverted patch - next build should be okay
2015-06-04
Sheriff: tyanh
- Infra failures on Canary bots
- 496552: [au] autoupdate_Rollback; Unhandled AutoservRunError: command execution error
- 496526: [sanity] provision; Unhandled HTTPError: HTTP Error 500: Internal Server Error
- 476324: HWTest provision failure=
- 460925: HWTest/sanity provision; Unhandled TimeoutException: Call is timed out
- Host did not return from reboot
- 496523 mipsel fails continuously on PFQ
2015-06-03
Gardener: jonross
- 460925 Chrome PFQ seeing Infra failures in HWTest, assigned to Infra
- Affecting: daisy_skate, falco, lumpy, peach_pit,
- 465162 469495 Infra failures blocking the PFQ
- A series of Infra failures for Chrome OS Canary bots were auto-filled. Not enough bots in the pool
- CL 274236 removed two bots from the Waterfall x86-generic-tot-chrome-pfq-informational, amd64-generic-tot-chrome-pfq-informational)
- 496273 mipsel-032-generic having gyp error preventing a build.
- 496293 Chrome OS PFQ is trying 45.0.2421.0 which does not contain the fix to 494912 whioh would unblock the PFQ
- 496325 Flaky OAuth test on Linux Chromium OS
2015-06-02
Sheriff: robertshield, aaboagye, ssl
Gardener: jonross
- Chrome PFQ failures still holding up upreving.
494041, 494912: daisy_skate && lumpy PFQ failing on video_VideoSanity.
- Test was landed in this change assigned to developer who landed the tests
- Seems like an actual regression in video playback for ARM.
- Regression is from Chrome side between 45.0.2416.0 and 45.0.2417.0. Reverting the video changes in this diff does not seem to fix the issue. Trying to revert the WebRTC roll, but its failing locally. I've reached out to the WebRTC sheriff
- Suspects 80f289fe303323361d07c5b58b23f8499903a154, c794eda78e9ba3c46b550b433e9fe5a248d40104 as the bug is not present with hardware acceleration off. Apparently my local build has an issue, as I still have the failure with hardware acceleration off.
- 413961: falco failed on stateful_update. It looks like the download got interrupted.
- Canary Failures
- I believe these were network related... some are reporting high flakiness
- 476368, 495463, 428345 have been filed against the Canary failures.
- 493219: nyan group failed with Connection reset by peer.
2015-05-30/31, 2015-06-01
Sheriff: ssl, aaboagye
- 482284: ivybridge-freon-release-group: The BuildImage [stout_freon] [afdo_use] failed - cannot open ‘/dev/loop0p1’ for reading: Permission denied
- p/40797: oak fails to cross-compile img-ddk properly
- 477928: (quawks, x86-zgb) autoupdate-Rollback - ssh: Could not resolve hostname chromeos2-row2-rack7-host6: Name or service not known (network went away briefly?? DNS issue?)
- 493533: asan bots failed quipper unittests. x86-generic ASAN has been broken since 5/28.
- Mainly due to the quipper unittest failing, but also due to login_LoginSucess failure. See 488291.
- Waiting on this CL.
- Chrome PFQ failures
- 494041, 494912: daisy_skate && lumpy PFQ failing on video_VideoSanity.
- 494909: lumpy PFQ failing on desktopui_FlashSanityCheck.
- 495281: One off wolf-tot-paladin failed during HWTest [sanity] - provision: ABORT: reboot command failed
- 493718: Chrome PFQ Failing to uprev Chrome commit - on tricky, which is experimental is a mistake. Disregard these emails.
- 493219: Some canaries failed due to FAILED RPC CALL. (beltino-a/b group, sandybridge-freon, ivybridge-freon) Manifested as timed out and Connection reset by peer.
- CL:274726: Fix lakitu build_image error when modifying kernel command line.
2015-05-29
Sheriff: dianders
- 493718: Chrome PFQ Failing to uprev Chrome commit - on tricky, which is experimental?
- 493301: master-paladin has encountered infra failures - turned out to be hang in bh_submit_read again.
- 493730: master-paladin has encountered infra failures - shouldn't have been blamed on infra when there's a kcrash.
- 493752 (AKA 493752): slippy-release-group: The HWTest [leon] [sanity] stage failed - HWTest did not complete due to infrastructure issues (code 3) - DHCP problems
- 493144 x3: sandybridge-freon-release-group: The HWTest [stumpy_freon] - HWTest did not complete due to infrastructure issues - pool: bvt, board: stumpy_freon in a critical state
- 428345 (AKA 493752): beltino-a-release-group: The AUTest [mccloud] [au] stage failed - HWTest failed (code 1) - platform_Powerwash - ABORT: Host did not return from reboot - DHCP problems
- 430836 (AKA 493752): beltino-a-release-group: The AUTest [mccloud] [au] stage failed - HWTest failed (code 1) - autoupdate_Rollback - ABORT: Host did not return from reboot - DHCP problems
- 493796: RlzLibTests failures on the official cros_trunk.
- 493811: nyan-release-group: The HWTest [nyan_blaze] [bvt-inline] stage failed - HWTest failed (code 1) - Best guess is that a failed prev test put host in bad state (??)
- 460925 (aka 493219) x4: Several (winky, quawks_freon, tidus, lumpy) - HWTest did not complete due to infrastructure issues (code 3) - Autotest FAILED RPC CAL
- 491968 (aka 493219): auron_yuna - - HWTest did not complete due to infrastructure issues (code 3) - Autotest FAILED RPC CAL
- 482284: ivybridge-freon-release-group: The BuildImage [stout_freon] [afdo_use] failed - cannot open ‘/dev/loop0p1’ for reading: Permission denied
- p/40797: oak fails to cross-compile img-ddk properly
2015-05-28
Sheriff: dianders
- 493176: pre-cq failing due to chromeos-base/ap-daemons
- 493207: (warning) chromeos-base/ap-daemons failing on 1st try (old bug)
- Lots of failures at all the same time with RPC failures talking to autotest
- 493192: The Paygen [auron] stage failed - Error sending request to Autotest RPC proxy: ... EOF occurred in violation of protocol
- 493192: The Paygen [butterfly] stage failed - Error sending request to Autotest RPC proxy: ... Connection reset by peer
- 493192: slippy-release-group: The Paygen [peppy] stage failed and The Paygen [leon] stage failed - Error sending request to Autotest RPC proxy: ... Connection reset by peer
- 493192: rambi-a-release-group: The HWTest [clapper] [sanity] stage failed - Error sending request to Autotest RPC proxy ... Connection reset by peer
- 493192: sandybridge-freon-release-group: The Paygen [lumpy_freon] stage failed - Error sending request to Autotest RPC proxy: ... Connection reset by peer
- 463145: jecht-release-group: The Paygen [rikku] stage failed - Host did not return from reboot
- 493251: master-paladin infra failures: more autotest flakiness; letting build deputy handle this
- 493251: rambi-c-release-group: The HWTest [candy] [bvt-inline] stage failed - Suite timed out before completion - Job says ABORT: Timed out, did not run.
- 493273: The Paygen [link] stage failed - paygen_au_canary_test_full [ FAILED ] - DUT just seemed to die. Baffling.
- 493301: The Paygen [stout_freon] stage failed - paygen_au_canary_test_delta - task update_engine:1209 is hung for 60 seconds during shutdown.
- 482284: rambi-b-release-group: The BuildImage [glimmer] [afdo_use] stage failed - cannot open ‘/dev/loop2p1’ for reading: Permission denied
- 493326: The UnitTest stage failed: upload_symbols_unittest
- 493330: The UnitTest stage failed: simple_builders_unittest
- Lots of failures with more RPC failures:
- 477928 (aka 490460): quawks-release (and daisy-release-group): The AUTest [au] stage failed - Could not resolve hostname chromeos4-row10-rack8-host17: Name or service not known
- 471518 (aka 490460): daisy-release-group: The AUTest [daisy_skate] - Could not resolve hostname chromeos2-row5-rack4-host2: Name or service not known
- 493129 - jecht-release-group: The HWTest [rikku] [bvt-inline] stage failed - Too many dead rikku boards?
- 466919 (aka 467066) - beltino-b-release-group: The Paygen [monroe] stage failed - symlink has no referent (in rsync)
- 493533: asan bots failed quipper unittests
- CL:273724: daisy public builders all failing due to mali-drivers-bin manifest mismatch
2015-5-26/27
Sheriff: waihong, dhendrix, rongchang
- 428345: beltino-a group canary and slippy group canary: platform_PowerWash failed to reboot which is flaky recently
- 492161: daisy-group canary: autoupdate_Rollback: DUT pingable but not sshable
- 491479: sandybridge-freon group canary: Not enough parrot_freon avaiable
- veyron group canary: only jerry build_packages failed in symbol refined on llseek.c of util-linux package
- auron group canary: cbuildbot updating slave build timed out on several recent builds
- 465862: desktopui_ScreenLocker failure happened again on amd64-generic ASAN
- 483661: vmtest failure in SimpleTestUpdateAndVerify happened again on x86-generic full
- 492281: "MySQL server has gone away" happened in CQ and canaries.
2015-5-25/26
2015-5-20
Sheriff: owenlin, gwendal
- 434201: security_SandboxLinuxUnittests_SERVER_JOB Failure
- CL:*216896: whirlwind canary failing signer tests due to new kernel option
- 490057: hwlab burned down
- Chrome PFQ failing on x86 bots with use_sysroot error -> already fixed in Chrome; waiting for new LKGM on Chrome side to include fix
- 490546: ImageTest fails on mips bots due to unresolved symbols in mesa; CL:272365 reverted in the mean time
- 491012: chrome pfq build failure in gbm_surface_factory.cc
- 491103: samus paladin crashed while talking to cidb
- CL:217315: fixed uprev/egencache/chromeos-oak errors
2015-5-19
Sheriff: tbroch, gwendal, jchuang
- 483661: x86-generic full: vmtest failed in SimpleTestUpdateAndVerify
- 489733: veyron canary failed to build openssh. Missing symbols from netdb ... part of glibc
- 472858: x86-generic ASAN, login_OwnershipApi: fails when dbus "Did not receive a reply"
- 477928: jecht-release-group: tidus: autoupdate_Rollback: unable to find host ... network flake?
- 428345: auron-b-release-group: lulu: platform_Powerwash,Host did not return from reboot
2015-5-18
Sheriff: tbroch, gwendal, jchuang
- 489302: peach-pits failing in lab. Loose physical connection for entire rack in lab led to devices discharging.
- 488644: slippy-release-group: The AUTest [falco_li] [au] stage failed,
- 489215: PFQ Master: binhosttest fail: AssertionError - cannot find Chrome prebuilts
2015-5-15
Sheriff: itspeter, pstew, ejcaruso
- 488539: oak canary was moved out of experimental and failed due to a large non-bisectable audio patch series going into 3.18
- 488481: devserver fell over during a test, but it seems like a transient failure
- 488580: image_to_vm failing; jrbarnette@ trying to find an owner
- 488487, 488635: chrome crashes, achuith@ trying to find out how to roll back, pstew@ pinned chrome to 44.0.2401.3_rc-r1
- 488644: ivybridge-freon failed during paygen step
- 488739: bad X11 header include is breaking Chrome PFQ for freon images
- 488959: buffet unittests failing on amd64 asan
2015-5-14
Sheriff: djkurtz, pstew, ejcaruso
- 487955: veyron_pinky ChromeSDK fail - touch_handle_drawable_aura.cc - ui/resources/grit/ui_resources.h: No such file or directory
- 487959: nyan_blaze BuildPackages fail - chromeos-base/chromeos-factory
- 487990: All chromiumos.chromium builders broken by video_encode_accelerator_unittest
- 487997: chrome PFQ - BuildPackage fails - chromeos-chrome-44.0.2401.0_rc-r1 - keycode_text_conversion_ozone.cc - ui/events/keycodes/dom3/dom_code.h: No such file or directory
- 461087: Chrome PFQ fails with login_OwnershipNotRetaken HWTest failure on Tricky -- sheriffs get email on this with strong verbiage that Chrome OS won't be picking up a new Chrome for this reason, but it turns out that the PFQ master ignores this build failure because Tricky is experimental.
- b/21165021: Canaries fail if they tried to connect to databases/servers between 2:45pm and 3:00pm PST due to power brownout
- 488291: x86-generic ASAN failure in login_LoginSuccess
- 488308: various paladin builders failing due to infrastructure issues
2015-5-12/13
Sheriff: wiley, puneetster
- 460860: login_Cryptohome Failure on tricky_freon-release
- 487120: squawks boards in short supply in BVT pools, hwtests won't finish in time
- 487303: 503 killed suite job during Leon canary
- 433204: autoupdate_EndToEndTest.paygen_au_canary_test_full fails "... no attribute 'Error'"
- 487772: amd64-generic ASAN builder has been red since April 6?!?
2015-5-10/11
Sheriff: cychiang
- 486497: paygen failing: "Payload integrity check failed: install_operations[615](REPLACE_BZ): src_length is zero." Revert CL
- 465711: Host did not return from reboot. This also breaks PFQ
- 487055: stout canary: ERROR: ** HWTest did not complete due to infrastructure issues (code 3) **
- 487123: Canary builders got killed by signal 9 around May 11 22:58:31 2015 buildbot time.
2015-05-04/05
Sheriffs: garnold, jwerner
Gardener: stevenjb
- 476649: Sandybridge canary failure due to lumpy DUT connectivity issue.
- b/20859395: Stumpy moblab canary failed on hwtest.
- b/20859821: Daisy canary failed on hwtest.
- Reverted: https://codereview.chromium.org/1121293002 (suspected causing flaky interactive_ui_tests failure)
- 482446: Sporadic failures
- 477889: Sporadic failures
- 484307: Still in-progress. Much time spent trying to track this down.
- 484726: Multiple canary failures due to lab DNS flake.
- CL:268999: Stout vmtest failing due to bad CL.
- 484243: Multiple canary failures due to lab test fail on ACL error.
- 486497: most canaries dying in paygen with "install_operations[1219](REPLACE_BZ): src_length is zero."
2015-05-01
Sheriffs: posciak
- 476550 ivb failures on TestBlacklistedFileTypes, TestValidInterpreter, etc.
- 482905 slippy ABORT: reboot command failed
- 477739 samus suite prep abort
2015-04-29
Sheriffs: snanda, benzh, hungte
- 483174: setuid bit missing on ping, breaking ping in chroot
- 476649: Some lumpy duts unreachable
- 482956: PFQ Master: AssertionError - cannot find Chrome prebuilts
- 482905: provision: ABORT: Host did not return from reboot
- 433389: [Samus] Kernel crash meta files shown after rebooting device
- 463861: [bvt-inline] provision Failure on lumpy-release/R42-6812.15.0
- 482454: Delta to itself failed with NewRootfsVerificationError in veyron_speedy-release/R44-7019.0.
- CL:*215145 storm premp rejected by signer due to security checks forced off
2015-04-28
Gardener: afakhry
- 482121 TabindexSaveFileDialog/FileManagerBrowserTest.Test/2 has been failing on the official cros trunk.
- 409019 graphics_GpuReset Failure on falco PFQ.
- 482171 graphics_Sanity_SERVER_JOB Failure on falco PFQ.
- 482164 login_LoginSuccess Failure on lumpy PFQ.
- 481732 Flakey provision failure on falco PFQ.
- 482454 veyron_speedy failing AU test
- CL:*215145 storm premp rejected by signer due to security checks forced off
2015-04-27
Gardener: afakhry
- 481544 Chrome PFQ is failing on x86-generic, alex, lumpy, peppy and tricky due to desktopui_ScreenLocker VMTest failures.
- 481820 bluez unittests breaking on amd64-generic
- 481864 asan bots failing unittests: libchromeos-streams-323904.so: error adding symbols: DSO missing from command line
- 481872 amd64 asan bot failing unittest due to leaks in libchromeos
2015-04-24
Gardener: derat
Sheriffs: posciak
- 466719 lumpy PFQ failures "Unhandled AutoservSSHTimeout"
- 465178 tricky PFQ failed 3 times due to a login timeout in telemetry_LoginTest, but then passed again
- 476584 (possibly) samus-canary failure, issues in TestBlacklistedFileTypes
- More ASAN failures, looks like 478605 not fixed after all?
- 480638 MenuControllerMnemonicTestMnemonicMatch.MnemonicMatch failing on cros trunk
- 480667 SSLUITest.BadCertFollowedByGoodCert failing on cros trunk
- 470130 Chromium OS (x86) Asan builder still always failing with MySQL error (one month and counting!)
2015-04-23
Gardener: derat
Sheriffs: moch, wuchengli
- 463805 PFQ lumpy run failed with "AutoservSSHTimeout: ('ssh timed out', * Command: )"
- 480491 TabindexOpenDialog_FileManagerBrowserTest browser_tests failing on cros trunk
- 470130 Chromium OS (x86) Asan builder failing with "Can't connect to MySQL server on '173.194.81.53' (110)"
- 478605 Pretty much all tests failing on Chromium OS (amd64) Asan builder (at least, I hope this is the cause)
- 480514 lxc container failed to get IP address from bridge
- 480638 MenuControllerMnemonicTestMnemonicMatch.MnemonicMatch failing on cros trunk
- 480667 SSLUITest.BadCertFollowedByGoodCert failing on cros trunk
- 465862 desktopui_ScreenLocker failing on Chromium OS (amd64) Asan
- 449361 HWTest did not complete due to infrastructure issues (code 3)" on enguarde
2015-04-22
Sheriffs: moch, tyanh
2015-04-20, 2015-04-21
Sheriffs: dbasehore, alecaberg, tyanh
- 476550 paygen issue for signing images
- 477739 bvt suite abort
- 463805 Provision failure on SSH timeouts
- 478713 Autotest execution errors
- 478605 CrOs asan failures
- 478762 mod-image-for-recovery failure on non-x86
2015-04-17
Sheriffs: semenzato, quiche, mcchou (shadow)
- 477747 ImageTest failure on canries due to /usr/include/tiffio.hxx, /usr/libexec/perf-core/tests/attr.py, and /lib64/libthread_db-1.0.so
- 477883 kayle-paladin failed due to chromeos-initramfs
- 477888 CQ failure due to VM test timeout in buffet_InvalidCredentials
- 477889 CQ failure due to VM test timeout in buffet_Registration
- 477941 link pre-cq failure (with no CLs): build_image failed due to no space on loop device
- 473970 canary failure due to autoupdate_EndToEndTest
- 474831 moblab_runsuite failure: /root/.boto did does not exist
- 478605 asan bots both dead in vmtest w/new chrome
- CL:266444 non-x86 archive failures in mod_image_for_recovery
2015-04-16
Sheriffs: semenzato, quiche, mcchou (shadow)
- 477703 make_factory_toolkit.sh broke Archive step
- 477712 enabling frame pointers broke optimized webrtc code in chrome
- 477739 samus-release failed in suite prep
2015-04-15
Sheriffs: avakulenko, jrbarnette, fjhenigman
- 476434 Network flakiness causes a dozen canaries go red.
- As of this evening, the problem is still causing intermittent failures across the board.
- 477352 Chrome fails to start on jerry and mighty
- Early morning: Chrome has been pinned to 44.0.2368.0 until the problem is fixed.
- Mid-day: Testing showed the problem is on the Chrome OS side; reverted this CL.
- Afternoon: Chrome is unpinned.
- Evening: Waiting for the CL to make it through a canary, so that the veyron builders will show green.
2015-04-14
Gardener: jonross
Sheriffs: avakulenko, jrbarnette, kpschoedel
- 438729 OutOfProcessPPAPITest.MediaStreamAudioTrack flaky on CrOS. Taking out the X server, causing other tests to fail.
- 469119 TouchExplorationTest.RewritesEventsWhenOn is flaky
- 476934 Chrome PFQ bots failing with config failures, blocking uprev.
- 475923 TouchExplorationTest.SplitTapExplore flaking 50% of the time.
- 476550 Paygen signing failure on multiple boards (including veyron)
2015-04-10 and 2015-04-13
Gardener: jonross
Sheriffs: kpschoedel, denniskempin, bleung
- 476607 Peach_pit Chrome PFQ failng HWTest step due to Infra issue.
- 476577 KioskUpdateTests are flaky on Linux Chromium OS.
- 469119 Flaky TouchExplorationTest.RewritesEventsWhenOn is causing failures on CrOS trunk
- 476338 PFQ Failure on lumpy. Provision failure tracking SSH timeout.
- 476434 Tree is throttled, devservers are causing a P0 issues causing AU testing to flake heavily for the morning PST
2015-04-08
Gardener: zork
- 475170: Autotest security_OpenFDs failing.
2015-04-06 and 2015-04-07
Gardener: xiyuan
Sheriffs: deymo, shawnn, vapier, deanliao
- 474227: Lack of speedy DUTs in lab
- 465862: desktopui_ScreenLocker failing on asan bots
- br:764: buffet autotests flaking on amd64-generic-full
- 474497: Canaries dead in paygen
2015-04-02 and 2015-04-03
Gardener:
Sheriffs: wfrichar, adlr, itspeter (on holiday), vapier
- cbuildbot gsutil errors, solved by fixing permissions manually (by system services team) and https://chromium-review.googlesource.com/#/c/263633/1
- Unittest break in germ, quickl fixed by Jorge
- repo upload failing on kernel repo: b/20062832, b/19932429
- CL:263970 gnutls pulled due to new pyshark code (CL:262457)
- 473738: sumo board failing to build adhd
- 473742: rush_ryu recovery kernel failing
- br:344: kernel git checkouts failing on gizmo/project-sdk bots
- CL:*211751: nyan_freon failing signer tests
- 473721: paygen ran out of space on swanky
- 473899: paygen not finding images
- 473900: mips qemu fc-cache call crashing
2015-03-31 and 2015-04-01
Gardener:
Sheriffs: bfreed, bhthompson, josephsih
- Throttled tree in the morning, most canaries red. 471656: Paygen: failed to set up loop device
- 472378: x86 asan builder failing unittests due to leak san not being supported
- 472658: Paygen fails: Permission denied: '/dev/loop27p4'
- The reverts above get the canaries past "failed to set up lop device", but many now fail with "Permission denied: '/dev/loop27p4'".
- 449738: "Internal server error" should be treated as infrastructure failure so it doesn't reject CLs
- "CQ encountered a critical failure" email.
- Pointer to https://uberchromegw.corp.google.com/i/chromeos/builders/CQ%20master/builds/5150 shows most failures due to "FAILED RPC CALL: get_jobs" in HWTest.
- 347423: Gerrit failed to submit change.
- Sounds like a transient failure. Should just try it again.
- 13:57:51: WARNING: Change 263241 was submitted to gerrit without errors, but gerrit is reporting it with status "NEW" (expected "MERGED").
- 13:57:51: ERROR: Most likely gerrit was unable to merge change 263241.
- 472858: VmTest fails when dbus "Did not receive a reply"
2015-03-30 and 2015-03-31
Gardener:
Sheriffs: josephsih
- 471656: Paygen: failed to set up loop device: No such file or directoryIs. This is a new issue causing the failure of 20 canary builders.
- 466777: HW_Test error: this issues still exists.
- 469259: image_to_vm.sh fails: this issue still exists.
2015-03-26 and 2015-03-27
Gardener: skuhne
Sheriffs: puthik, furquan
- PFQ failed to uprev. Caused by 991533002, reverted (Chrome and ChromeOS PFQ).
- 469566: pool: bvt, board: veyron_jerry in a critical state
- 466777: pool: bvt, board: veyron_speedy in a critical state
- 469259: image_to_vm.sh fails with "partx: /dev/loop0: error deleting partition ..."
- 471032: PFQ still fails - now: The binary size of the image exceeds the limits. (e.g. Daisy-freon, Daisy-skate)
- 449766: PFQ still fails - now: [bvt-inline] login_LogoutProcessCleanup Failure on Falco
2015-03-24 and 2015-03-25
Gardener: jamescook
Sheriffs: charliemooney, kcwu
- 463530, 461087 Chrome uprev failure on tricky, cryptohome not mounted
- 466719 lumpy ssh timeout resulting in provisioning failures
- 470172 MasterUploadPrebuilts failure in Update PFQ config dump
- 470118 Amd64-generic failing vmtest "buffet_Registration" with urlopen "Connection Refused"
- 470237: WallpaperManagerBrowserTest.DevLaunchApp failing on cros_trunk official builders
- 470130: Chromium OS (x86) Asan always failing with MySQL connection error
- 470381: [bvt-cq] graphics_SanAngeles Failure on tricky-chrome-pfq/R43-6910.0.0-rc5 - wflinfo / waffle problem, test disabled
- 1035763003: Revert of Test Accelerators In Interactive UI Tests - cros_trunk official builder was failing
- 448247: [bvt-inline] provision Failure on candy-release/R42-6683.0.0 - update_engine failed
- 470701: Flaky BVT security_Firewall failure, "Mismatched firewall rules"
2015-03-18 and 2015-03-19
Gardener: abodenha
Sheriffs: stevefung
- 468340, 468770: PFQ and canaries are a mess due to infra issues
- 465862: amd64 ASAN builds are failing
- 468394: autoupdate_EndToEndTest.paygen_au_stable_test_delta failures
- 466777: not enough duts
- 467975: Image Signing Timeouts
- 463805: autoserv timeouts
2015-03-17
Sheriffs: littlecvr,stevefung
- 467975: Image Signing Timeouts
- 260602: Fix auron_paine build break
2015-03-16
Sheriffs: littlecvr,dlaurie,zqiu
- 466972: insufficient DUTs for butterfly
- 466777: insufficient DUTs for veyron_speedy
- 465230: wolf: login_OwnershipTaken_SERVER_JOB Failure
- 433970: wolf: login_LogoutProcessCleanup_SERVER_JOB Failure
- 419772: gnawty: security_ProfilePermissions_SERVER_JOB Failure
- 411608: gnawty: security_NetworkListeners_SERVER_JOB Failure
- 434148: gnawty: login_MultiUserPolicy_SERVER_JOB Failure
2015-03-13
Sheriffs: tyanh,dlaurie,zqiu
- 426164: [au] autoupdate_EndToEndTest.npo_test_delta Failure on nyan_blaze-release/R38-6158.71.0
- 466919: [paygen_au_canary] autoupdate_EndToEndTest.paygen_au_canary_test_full Failure on quawks-release/R43-6872.0.0
- 8 issues on [bvt-inline] login_* Failure on wolf-release/R43-6872.0.0
- 434202: login_RetrieveActiveSessions_SERVER_JOB Failure
- 434182: login_SameSessionTwice_SERVER_JOB Failure
- 434178: login_OwnershipNotRetaken_SERVER_JOB Failure
- 434185: login_Cryptohome_SERVER_JOB Failure
- 419772: security_ProfilePermissions_SERVER_JOB Failure
- 434148: login_MultiUserPolicy_SERVER_JOB Failure
- 403701: login_MultipleSessions_SERVER_JOB Failure
- 434195: login_GuestAndActualSession_SERVER_JOB Failure
- b/19729024: incorrect DHCP config change was pushed breaking DHCP in the lab
- br/590: amd64 asan failing unittests due to leaks related to protobuf
2015-03-12
Sheriffs: tyanh
- 462734: [sanity] provision Failure on lumpy-release/R43-6869.0.0
2015-03-10
Sheriffs: chirantan, thieule
Gardener: oshima
- 465752: [bvt-inline] login_RetrieveActiveSessions Failure on tricky
- 465963: daisy canary: The AUTest [daisy_spring] [au] stage failed: ** HWTest did not complete due to infrastructure issues (code 3)
- 464171: Multiple canary's are failing due to kernel size limits
- 465877: [bvt-inline] login_OwnershipTaken Failure on x86-mario-release/R43-6865.0.0
- 464751: [au] provision Failure
2015-03-09
Sheriffs: chirantan, thieule
Gardener: achuith
- 464938: Test lab performance issue
- 465596: update_engine is failing on all canary builders
2015-03-06
Sheriffs: benchan, tbroch
Gardener: tengs
2015-03-05
Sheriffs: benchan, tbroch
Gardener: tengs
- 464407 Your "Oauth 2.0 User Account" credentials are invalid .... Failure: Invalid response 302..
2015-03-04
Sheriffs: dhendrix, waihong
Gardener: derat
2015-03-03
Sheriffs: dhendrix, waihong
Gardener: derat
- 462842 Chromium OS (amd64) Asan bot failing unittests due to webserver leaks
- 463493 Flaky failure in breakpad's linux_client_unittest on X86 (chromium)
- 463532 browser_tests failing on cros trunk in WebViewTest.FileSystemAPIRequestFromWorkerDeny
- 464053 Beaglebone kernel too big, causing build failures
- 463411 pool: bvt, board: leon in a critical state. (Some DUTs had their USB ethernet dongles swapped and were down for a bit)
2015-03-02
Sheriffs: dgreid, tbroch
Gardener: derat
- 461406 Chromium OS (x86) Asan bot still failing
- 458775 peach_pit nightly chrome PFQ failed with too few available DUTs
- 463213 X86 (chromium), Daisy (chromium), and AMD64 (chromium) failing on chromeos-chrome with "ValueError: invalid literal for int() with base 10"
2015-02-27
Sheriffs: dgreid, tbroch
2015-02-25
Sheriff: dianders, pstew, jchuang
- 462240 [storm-release canary] storm-release: The BuildPackages stage failed: Packages failed in ./build_packages
- 460174 [canary] peach-release-group: The HWTest [peach_pit] [sanity] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **
- 461841 rambi-c-release-group: timed out
2015-02-25
Sheriff: dianders, pstew, tyanh
- 461184 [canary] HWTest did not complete due to infrastructure issues again in canary 670; also "peach-release-group: timed out"
- 461841 [canary] sandybridge-release-group: The Paygen [butterfly] stage failed: <class 'chromite.lib.timeout_util.TimeoutError'>: Timeout occurred- waited 13800 seconds
- 461893 [canary] rambi-a-release-group: The HWTest [expresso] [bvt-inline] stage failed: ** Suite timed out before completion **
- 460174 [canary] peach-release-group: The HWTest [peach_pit] [sanity] stage failed: ** HWTest did not complete due to infrastructure issues (code 3) **
2015-02-24
Sheriff: amstan, gwendal, tyanh
- 461184 [canary, chrome pfq] HWTest did not complete due to infrastructure issues (HWTest Lab closed?)
- 461188 [pineview canary] Operation timed out at the end of a build; transient, subsequent build succeeded.
- 438908 Image signing timed out across many platforms on canary; transient, subsequent builds succeeded.
- 461378 chrome compilation error prevents some build to complete in CQ.
- 461415 critical fix in chrome was only present on chrome ToT.
2015-02-23
Sheriff: amstan, gwendal Gardener: flackr
- 460815 Cros SDK broken in .bashrc_profile
- 460693 Chrome PFQ failing with exception: global name 'AccessDeniedException' is not defined in File "src/build/download_sdk_extras.py", line 71
- 460951 ChromeMetricsServiceAccessorTest.MetricsReportingEnabled, ExternalCacheTest.Basic, ExternalCacheTest.PreserveInstalled, DeviceLocalAccountExternalPolicyLoaderTest.ForceInstallListSet failing on cros_trunk
- 458122 Reopened FileSystemProviderApiTest.BigFile failing on cros_trunk as test is failing 98% of runs on cros_trunk
- 461021 MetricsServicesManagerTest.GetRecordingLevelCoarse faililng on cros_trunk.
- 461046 MetricsServicesManagerTest.GetRecordingLevelFine faililng on cros_trunk.
2015-02-18
Sheriff: posciak, namnguyen, snada
- Note to PST sheriffs: lots of infrastructure failures (I think) over the last few days, which I don't really understand. I think we need help from Infra team before we can reopen the tree for good.
- 419904 IndexError: list index out of range on moblab_RunSuite
- 458613 Pre-CQ Launcher failures
- 224 Moblob failures
- 459679: Moblab blocks canaries
- b/19426205: Missing commit info
2015-02-13
Gardener: jonross
- Reverted a change that was failing a compile on Chrome OS.
- 458567 Disable flaky InProcessAccessibilityBrowserTest.VerifyAccessibilityPass
- 458549 Disable flaky TextInput_TextInputStateChangedTest.SwitchingAllTextInputTest
- 458526 Chromium OS Waterfall bots falling on LKGMSync
- 458918 amd64 asan failing shill unittests due to leaks
2015-02-12
Gardener: jonross
- 458341 Disable flaky LoginPromptBrowserTest.LoginInterstitialShouldReplaceExistingInterstitial
- 458333 Disable flaky AutofillDialogControllerSecurityTest.DoesntWorkOnBrokenHttps
- Reverted patch that broke Chrome OS LoginUITests
- 458154 Chrome PFQ peach_pit. HWTest failure, RPC Connection Timeout.
- 458122 cros_trunk FileSystemProviderApiTest.BigFile failure. Disabling the test, passing bug to owners.
- 457993 The pool is in a critical condition and cannot complete build verification tests in a timely manner.
2015-02-10
Sheriffs: jwerner, victoryang, hungte
Gardener: girard
- reverted https://codereview.chromium.org/899973006 - suspect it caused an x86 ASAN failure
- 456993 Chrome PFQ failure
2015-02-03
Sheriffs: dbasehore, katierh
Gardener: ihf
- 453090 Pre-CQ failure
- 454657 - Canary master Build #601 failed HWTest on rambi-[a,b,c]-release-group
- SSL connection flake for build
- 455728: ASAN unittest failure in permission broker
- 456501: canaries dying during ChromeSDK due to missing gbm.h header
- 456491: chrome pfq dying during BuildPackages due to dpkg-architecture errors
- 456829: arm-generic_freon chrome pfq failing with conflicting minigbm/mesa depends
2015-02-02
Sheriffs: djkurtz
Gardener:
- 448208 - pool: bvt, board: daisy_spring in a critical state
- 454561 - pool: bvt, board: expresso in a critical state
- 454657 - Canary master Build #601 failed HWTest on rambi-[a,b,c]-release-group
2015-01-30
Sheriffs: vbendeb, armansito, wuchengli
Gardener: jamescook
- 401341: update_engine UnitTest failures in P2PManagerTest.ShareFile, out of disk space on /tmp
2015-01-29
Sheriffs: wuchengli
Gardener: jamescook
- 452349: Canary Chrome failures because of mixed Freon / non-Freon
- 36103: storm-release: BuildPackages failed in chromeos-base/ap-daemons
- 453201: [bvt-inline] provision Failure on zako-release/R42-6735.0.0
- 428058: [bvt-inline] security_NetworkListeners Failure on daisy_spring-chrome-pfq/R40-6412.0.0-rc2
- 446221: PDFBrowserTest.Basic & PDFBrowserTest.Scroll failures -> disabled
2015-01-28
Sheriffs: sonnyrao, arakhov, vapier
Gardener: jamescook
- 452911: Chrome PFQ failing due to ozone/evdev/input_controller_evdev.cc warnings -> reverted, asked chromeos-tpms to bump PFQ
- 450335: [bvt-cq] video_VideoSanity Failure on daisy_skate-chrome-pfq -> flaky test -> disabled
- 446221: cros_trunk: PDFBrowserTest.Basic & PDFBrowserTest.Scroll failures on official builders
- 452623: cros_trunk: WebRtcSimulcastBrowserTest.TestVgaReturnsTwoSimulcastStreams browser_tests failures -> disabled
- 453090: pre-cq failing with commit KeyError
- 453208: cidb connection failed with buildStageTable key error
2015-01-27
Sheriffs: zeuthen, shawnn, vapier
Gardener: jamescook
- 452497: canaries all dying in chrome with /home/chrome-bot/depot_tools/external_bin errors
- 452534: pre-cq bots timing out due to most slaves offline
- 450278: Chromium OS Asan bots failing in logging_AsanCrash, telemetry exception problem
- 451603: Chromium OS (amd64) Asan: security_SandboxLinuxUnittests failing
- 449103: cros_trunk: WebInputEventAuraTest.TestMakeWebKeyboardEventWindowsKeyCode fails under ThreadSanitizer
- 371290: cros_trunk: ICOImageDecoderTest.Decoding content_unittest fails on 8010 Mac, Linux32, Linux64 bots
- 452647: cros_trunk builder failures: base_unittests: runtest.py test.exe no such option --parallel
- 452706: syncing bluez repo broke with upstream ref errors
2015-01-23 - 2015-01-26
Sheriffs: zeuthen, shawnn, reveman
Gardener: tbarzic
- 452073: Beltino-B builder unable to build chrome from source.
- 452070: Missing prebuilts for nyan_freon.
- 452329: Chrome PFQ uprev failure.
2015-01-20 - 2015-01-21
Sheriffs: garnold, avakulenko, itspeter
Gardener: xiyuan, zork
- 445705: peach-pit ethernet issues cause update signals to not be received, failing autoupdate_EndToEnd.
- 450244: paygen timing out waiting on rambi-c-canary, waiting for DUTs.
- 450407: A CL in chryptohome seems to cause a unit test to fail. Reverted.
- 450771: Chrome PFQ is broken on MIPS platform. Related to this CL.
2015-01-14 - 2015-01-15
Sheriffs: wfrichar, adlr, kpschoedel
Gardener: skuhne
- Network issues: https://b2.corp.google.com/u/0/issues/19028546
- veyron-pinky-nightly-chrome-pfq is red, looking at log seems a flake, rebuild
- Reverted https://codereview.chromium.org/857613002/ since it broke many builders and updated the PFQ build to get the PFQ to uprev.
2015-01-12 - 2015-01-13
Sheriffs: bfreed, bsimonnet, rongchang
Gardener: achuith
- crbug.com/447821: Scheduled Lab shutdown on Jan 9 is complete. Let's see if the tree comes back up.
- crbug.com/448079: Chrome failed in the PFQ: git error. Should not close our tree on PFQ failure.
- crbug.com/446889: Tree throttled due to video_ChromeHWDecodeUsed Failure.
- crbug.com/448244: beltino-freon full release failed to build binutils and chrome. First build, so might be just plain broken.
- crbug.com/448414: Canary timeouts in report stage, but jrbarnette did some additional cleanup as well.
2015-01-08 - 2015-01-09
Sheriffs: quiche, bhthompson, rongchang
Gardener: achuith
- crbug.com/447324: rojen and tkensinger replaced old winky duts with new ones
2015-01-06 - 2015-01-07
Sheriffs: bhthompson, quiche, sheckylin, rongchang
Gardener:
2015-01-06 - 2015-01-07
Sheriffs: grundler, jrbarnette, owenlin
Gardener: oshima
- 445705: AU and Paygen failures on peach_pit
- CanaryCompletion timeouts caused by master restart (yjhong/cmasone)
- winky DUTs in lab *locked* by rojen - caused winky paygen test failures
- The DUTs were locked in order to replace them with MP hardware.
- 322072: peach-canary, nyan-canary and winky timed out in paygen test
- 446177: intermittent login test failures on x86, especially VM tests.
- 446463: AU test failure on peach_pit.
- 446885: security_OpenFDs failing in vmtests on asan bots
- CL:239300: sync errors due to glibc upstream/ refs changing from a file to a dir
2015-1-2 - 2015-1-5
Sheriffs: benchan, namnguyen, dhendrix Gardener:
- 445068: logging_CrashServices found to be bricking DUTs, temporarily disabled
- 286343: git push failures: missing permissions
2014-12-30 - 2014-12-31
Gardener: derat (again)
- Most issues from 26th-29th still unresolved.
2014-12-26 - 2014-12-29
Gardener: derat
- crbug.com/445200: Chrome builds failing due to unqualified base::CommandLine references in third_party/libjingle/overrides
- crbug.com/444876: PFQ SyncChrome step failing with "user is over quota" git errors
- crbug.com/445382: PFQ SyncChrome step failing with "The following mask changes are necessary to proceed" for chromeos-chrome-41.0.2262.0_rc-r1 due to earlier pinning for still-present crbug.com/434587
- crbug.com/445442: All tests in content_browsertests are failing on cros trunk
- crbug.com/445452: "Lumpy (chrome)" builder fails in Report step with KeyError on "chromeos.chrome"
- crbug.com/445485: PDFBrowserTest.Basic fails on cros trunk
- crbug.com/445489: clang flags passed to g++ on Daisy (chromium) builder
- 445477: security_SandboxLinuxUnittests failures on amd64 asan vmtest
- 445478: security_OpenFDs failures on amd64 asan vmtest
2014-12-24 - 2014-12-25
Sheriffs: kcwu Gardener:
2014-12-22 - 2014-12-23
Sheriffs: josephsih, thieule, stevefung, littlecvr Gardener:
- crbug.com/431836: HWTest did not complete due to infrastructure issue (code 3)
- crbug.com/430836: HWTest failed (code 1): autoupdate_Rollback ABORT: Host did not return from reboot
- depthcharge firmware (17942) -> flaky repo sync. Cycled green.
- crbug.com/444346: pool: bvt, board: peach_pit in a critical state
- crbug.com/367469: FAIL AUTest [peach_pit] [au] (0:52:21) with TestFailure autoupdate_Rollback ABORT: update-engine failed
crbug.com/310783: autoupdate_EndToEndTest.npo_test_delta Error. This resulted in the failure of 15 release groups in Canary master i-467 and x86-mario full.
2014-12-19 - 2014-12-22
Sheriffs: thieule, stevefung, littlecvr Gardener:
- 443828: [au] autoupdate_Rollback Failure: Could not find a job_repo_url for the given host.
- 444042: [bvt-inline] login_MultiUserPolicy Failure on lumpy-chrome-pfq/R41-6600.0.0-rc1
- 444036: [bvt-inline] provision_AutoUpdate.double Failure on lumpy-chrome-pfq/R41-6600.0.0-rc1
2014-12-17 - 2014-12-18
Sheriffs: littlecvr, chirantan Gardener:
2014-12-15 - 2014-12-16
Sheriffs: wuchengli Gardener:
- 442635: VMTest failed on release-R40-6457.B builds
- 443131: DevInstallerPrebuilts stage failed
- 441288: SyncChrome fails with 'Too many requests from this client. Try again later
2014-12-12 - 2014-12-15
Sheriffs: cychiang Gardener: girard
- Build broken by r187161 - reverted
- Build broken by r308105 - reverted
- 419343: mario incremental: Can't connect to MySQL server.
- 439801: beltino canary: mccloud in a critical state.
- 441450: rambi-a canary: Number of available DUTs for board expresso pool bvt is 3, which is less than the minimum value 4.
- 442276: libchrome:293518" for /build/veyron_pinky/ have been masked.
- 442297: asan unittest failure in modemmanager
- 442617: no enough available DUTs for board daisy_skate.
2014-12-10 - 2014-12-11
Sheriffs: namnguyen, pstew, dhendrix, hungte Gardener: flackr
- 439745: moblab failure seen, in afternoon, changed should be merged so that no further action is required
- 440817: cras failed to build with missing symbol rate_estimator_reset_rate
- 433482: Popen issue
- 440810: content_browsertests flaky on cros_trunk official builds
- 440869: mysql failure on Chrome PFQ master
- 440654: opcode failure continues on mpisel-o32-generic pfq
- 441142: Coreboot caused Chrome PFQ to fail.
- 439801 bvt, board: mccloud in a critical state
- 431815: Paygen unable to get lock for signing payload hash
- 441168: kernel_ProtocolCheck_SERVER_JOB Failure
- 441258: flakiness on Chrome4CROS builder seemingly due to clang roll
- 441278: chromiumos.chromium bots failing report stage due to missing dashboard url
- 441288: chromiumos.chromium builder failed SyncChrome with 'Too many requests from this client'
2014-12-08 - 2014-12-09
Sheriffs: cywang
- canary master keeps failing in
- 433482 slippy and beaglebone flakes throttled tree - wondered if we hit racing issue in subprocess.Popen concurrency(in python 2.7.3+, fixed in python 3)
- 439745 stumpy_moblab ToT unable to start Apache
- 439801 pool: bvt, board: mccloud in a critical state
- 440428 vmtest failures on in pre-cq (e.g. rambi) due to incomplete OS install
- 333398 incremental bot raced with CQ uprev & died once
2014-12-05 - 2014-12-08
Sheriffs: gwendal, tbroch Gardener: jonross
- 433482 slippy and beaglebone flakes throttled tree.
- 439533 broken lumpy build due to invalid cros_sdk path.
- 427692 Flakes on CQ, HWTest [bvt-cq] leading to timeouts.
2014-12-03 - 2014-12-04
Sheriffs: marcheu, jchuang, Gardener: jonross
- 433482 slippy and beaglebone flakes throttled tree.
- No PFQ roll, as Chrome was pinned to an older version due to other failures.
- Chromium OS (x86) Perf is not rebuilding libbrowser dependencies. Linker failure for libbrowser.a for a missing symbol that is exported in libui_base.a
- Chrome4CROS Packages are flaky since Nov 26
- 433498: samus security_StatefulPermissions Failure
- 434995: PFQ failure while building Chrome for mipself-o32-generic (however, this board is marked not important for the time being
- 438908: peach-pit image signing timeout
- ChromeOS SDK fails: caused by recent changes in depot tools, error in download_from_google_storage.py
2014-12-01 - 2014-12-02
Sheriffs: sbasi, wiley
- 438292: stumpy_moblab testing continues to fail due to a second bad autotest change which took down all moblab duts.
- 430976: stumpy_moblab continues to fail due to a bad autotest change.
- 437169: link_freon canary continues to fail in the provision job (timed out waiting for Chrome)
- 437859: Rambi canary groups take too long for canary master's timeout
- 437598: Failed to uprev Chrome because of failure in video_ChromeHWDecodeUsed on daisy_skate
- 434995: PFQ failure while building Chrome for mipself-o32-generic (however, this board is marked not important for the time being
- 437983: PFQ failure on Falco (seemingly lab infrastructure related)
- 438158: Failure calling Create() in login_Cryptohome on peach_pi-release/R41-6533.0.0
- 438466: Failure to build Chrome on arm-geneic full (treating as flake, unsure how to dig further)
2014-11-26 - 2014-11-27
Sheriffs: jchuang
- No prebuilt binary for chrome for mipsel-o32-generic, ignore it first, Mike and benchang says waiting PFQ to roll (to generate chrome prebuilt)
- Flakey of "Canary master"-i-383, 384, 385, 386
- CL:225332 causes rush_ryu kernel build error, CL:232015 fixes it.
- 436602 - ASAN security_OpenFDs
- 437145 - sdhci-tegra.c compile error on arm-generic
2014-11-25
Sheriffs: dgreid, tyanh
- Chrome won't build on ARM (linker error)
- secutiry_openfds failing on x86 and x86-64 ASAN bots.
2014-11-24
Sheriffs: tyanh, dianders, bleung
- 435967 chromite.lib.paygen.gslock.LockNotAcquired on ivbridge canary
- 333398 Incremental builders racing with CQ caused amd64 and daisy incrementals to fail 3 times in a row
2014-11-21
Sheriffs: dianders, bleung, chihchung, Gardener: skuhne
- 435564 ASAN security_Firewall failures
- Falco failed PFQ 2 times for different reasons, GPU reset failure, network listener [assuming flaky since benchmarks run - restarted]
- 435615 Rush_ryu PFQ build broken
- 435640 ChromiumOS TryServer not executing non-pre-cq jobs
- 418539 [bvt-inline] security_NetworkListeners Failure on daisy_skate-release/R39-6315.0.0 [only this time on falco]
2014-11-20
Sheriffs: semenzato, arakhov, chihchung. Gardener: skuhne
- 434498 CROS trunk builder fails since ui_unittests binary is renamed to ui_base_unittests
- 435322 Precise64 Trunk builder is failing in net_unittests: HTTPSOCSPTest.RevokedStapled
- 435362 elliptic curve error while uploading prebuilts
- 435360 Chromium OS (x86) Perf failed the past 1000 builds
- 434939 chromiumos sdk build kernel failure
- CL:231230 chromite lint unittest
2014-11-18 - 2014-11-19
Sheriffs: semenzato, arakhov. Gardener: mukai
- 434498 CROS trunk builder fails since ui_unittests binary is renamed to ui_base_unittests
- browser_tests get flaky on CROS trunk, probably due to the file access failures to /tmp. Asked chrome-troopers to reboot the builder.
- 434738, 434871, 434875 extremely flaky hwtest on x86-zgb-paladin
- 434958: bad file descriptor when unpacking ebuilds
- 434939: 3.14 kernel fails to build on arm
2014-11-14 - 2014-11-17
Sheriffs: ???
- CL:230011 new gcc triggered signed warnings in buffet
- 433628 link_freon paygen failed
- CL:230001 sdk refactoring triggered existing bug in boards that did not declare arm-none-eabi dependency
- CL:*184305 new gcc triggered warning about unused func in ap-daemons
- missing mipsel/aarch64 toolchain prebuilts -> waited for SDK to finish and generate them
2014-11-12 - 2014-11-13
Sheriffs: vapier, victoryang, avakulenko
- 432666 V4L2 change broke 32 bit 3.14 kernels (x86-generic/arm-generic/etc...)
- 432705 Lumpy Chrome PFQ lagging behind in hwtest
- 432793 x86-generic-incremental died looking for libjpeg-turbo license
- 432929 pineview canary red due to CL 227543
2014-11-10 - 2014-11-11
Sheriffs: djkurtz
- crbug.com/431622 canary builder failures / timeouts throttling tree
- crbug.com/426672 GS paygen lockfile error (repeat)
- crbug.com/427187 pool: bvt, board: daisy_spring in a critical state.
- crbug.com/420132 flaky chromite unittests timeout when machine is loaded
- crbug.com/427469 pool: bvt, board: monroe in a critical state.
- crbug.com/432020 many canaries => DebugSymbols upload -> gs flake => "WARNING: could not upload: *.sym: HTTP 403: Forbidden" => ProcessExitTimeout => hung for 600 seconds
2014-11-07 - 2014-11-10
Sheriffs: pprabhu, shawnn, posciak
2014-11-04 - 2014-11-05
Sheriffs: dbasehore, denniskempin, fjhenigman, Gardener: jamescook, Lab: dshi
- crbug.com/430182 daisy BuildPackage flake due to desktopui_CameraApp autotest
- http://b2/18249316 git 502 errors to https://chromium.googlesource.com/, backend load issue
- crbug.com/418918 Infrastructure issues (HWTest failed due to #DUTs not sufficient)
2014-10-30 - 2014-10-31
Sheriffs: wfrichar, waihong
2014-10-28 - 2014-10-29
Sheriffs: zeuthen, adlr, seanpaul
2014-10-24 - 2014-10-27
Sheriffs: deymo, garnold, rongchang
2014-10-22 - 2014-10-23
Sheriffs: bfreed, bsimonnet
2014-10-20 - 2014-10-21
Sheriffs: jrbarnette, quiche
2014-10-13 - 2014-10-14
Sheriffs: cychiang
- 422188: rambi series failed to boot started from 10/10 http://chromegw/i/chromeos/builders/rambi-a%20canary/builds/617
- Check on clapper: 6348 can boot, 6353 can not boot. blamelist: http://chromeos-images/diff/report?from=6348.0.0&to=6353.0.0
- Check on squawks: 6349 can boot, 6351 can not boot. blamelist: http://chromeos-images/diff/report?from=6349.0.0&to=6351.0.0
- Found CL https://chromium-review.googlesource.com/#/c/222290/.
- 358933: nyan canary https://uberchromegw.corp.google.com/i/chromeos/builders/nyan%20canary/builds/1063
- 422702, 422703, 422704, 422705, 422760, 422801: telemetry failed to do Oobe.loginForTesting on lumpy. Passed in this build http://chromegw/i/chromeos/builders/sandybridge%20canary/builds/672 .
- 409332: provision_AutoUpdate sometimts fails on engurade and gnawty.
- 424900: sdk bot fails w/missing telemetry imports
Gardener: stevenjb
- 421453 continues to cause flakiness on http://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Tests%20%281%29
- 423032 filed: HistoryWebUIIDNTest may be flakey
- 422703: Various telemetry tests flaking with "Cannot set property 'disabled' of null" - causing some flakiness, being investigated.
2014-10-10 - 2014-10-13
Sheriffs: puthik, dlaurie
- Duck paladin broken, CL reverted.
- 422406: bvt does not have enougn nyan_blaze machine
- 420344: HW Test flake - desktopui_Screenlocker
- 422700: HW Test flake - security_SandboxedServices
- 422703: HW Test flake - telemetry
2014-10-09 - 2014-10-10
Gardener: michaelpg
- 422102: Chrome crashes on start-up, fixed 10/10 by: https://codereview.chromium.org/642153003
- Some PFQs failing autotests using 40.0.2184.0, suspect fixed by https://codereview.chromium.org/641693008. Verify tomorrow.
2014-10-08 - 2014-10-09
Sheriffs: grundler, stevefung, spang; Gardener: michaelpg
- b/17880749: GoB outage happened again. Tree closed most of 10-08. http://go/omg/121. davidjames deployed rtc bomb. :)
- git and gerrit service restored by late afternoon. CQ still closed as of 6pm or so.
- crbug.com/421943: AUTest bug is fixed
- PFQ autotests were broken, fixed by: https://chromium-review.googlesource.com/#/c/222713/
2014-10-07 - 2014-10-08
Sheriffs: wuchengli, chirantan, furquan, spang
- b/17880749: GoB outage
- 421245: DB connection error (too many connections)
- 358933: umount leaves mountpoint marked as 'busy'
2014-10-03 - 2014-10-06
Sheriffs: wuchengli
- 358933: umount leaves mountpoint marked as 'busy'
- 418358: Paygen failed with TypeError: __str__ returned non-string (type list)
- 419581: emerge: there are no ebuilds to satisfy chromeos-test-root for beaglebone_servo
- 418918: HWTest failed due to #DUTs not sufficient
- 420344: desktopui_ScreenLocker Failure due to Bad password bubble did not show.
2014-10-02 - 2014-10-03
Sheriffs: namnguyen, charliemooney, rongchang, Gardener: flackr
- 419659: CheckCriticalProcesses failed on ASAN bots. Tests were moved out of "smoke" suite.
- 419752: chromeos-base/platform2 fails configure first time looking for ModemManager
- 401258: Seeing daisy fail on building glbench before opengles-headers.
- 419964 / 419965: pre-cq p2p & modem-manager unittest flakes
- 420080: nyan_kitty failing signing
- 420066: Chrome PFQ failure
2014-10-01 - 2014-10-02
Gardener: flackr
- 419393: net-misc/modemmanager-next-1.5.0-r244 tests flaky on Chromium OS (amd64) Asan
2014-9-29 - 2014-9-30
Sheriffs: vpalatin, alecaberg, josephsih, Gardener: derat
- 418464: 23 VM Tests failed on multiple platforms due to VM-only Chrome startup failures caused by removal of --disable-vaapi-accelerated-video-encode.
- 418850: VMTests failed: failed to start/stop VM
- 418650: image_to_vm.sh flakes
- 418928: HWTest failed due to infrastructure issues: fail to untar test_suites.tar.bz2
- 418918: HWTest failed due to number of available DUTs is less than the minimum value 4
- 418921: SignerTest failed: security_test_image failed
- 418994: Daisy (chromium) build warnings about missing GLES2 headers when compiling autotest-deps-glbench
- 418998: Nightly PFQ bots failed to build libxkbcommon due to GitHub throwing 500s
- 419390: google-breakpad unittests hung on x86-generic-asan
2014-9-23 - 2014-9-24
Sheriffs: sbasi, thieule, sheckylin
2014-9-22 - 2014-9-23
Sheriffs: dhendrix, sheckylin
- 416755: paygen: PayloadTestError: cannot find source payload
- 413682: Residual paygen issues on nyan_big and daisy builders
- 416204: PFQ failed to uprev due to missing omxtypes.h causing Chrome to fail to build
- 417050: old platform2 pulled in by various packages, but newer version needed by shill (shouldn't be a problem, clobber builder if it fails due to this)
- 417094: openvpn fails to build on mipsel-o32-generic (should be fixed when crosreview.com/219635 lands)
2014-9-18
Sheriffs: dgreid, puneetster, Gardening: skuhne
Fixed:
- 415281: Falco failing "[bvt-inline] login_OwnershipNotRetaken Failure on falco-chrome-pfq/R39-6276.0.0-rc3". Found culprit after bisect.
- CL:215220 reverted via CL:218925 to fix arm-generic/chromiumos-sdk kernel builds
2014-09-16 - 2014-09-17, Gardening: skuhne
Sheriffs: zqui, gwendal
Pending issues:
- 415281: Need to rollback to a stable chromeos-chrome: cl/217936 in progress.
- 414345: transient authentication issue prevent Chrome PFQ to run.
- 415281: Falco failing "[bvt-inline] login_OwnershipNotRetaken Failure on falco-chrome-pfq/R39-6276.0.0-rc3". Created a falco with the latest image and a TOT Chrome on top of it - but the problem was not showing up when running locally.
Fixed issues:
- 414322 has been identified, tree is reopened.
- A trailing slash caused the build to fail with a style warning. (!?!? Why do our checks differ between systems?) CL 552103003 takes care of that.
For the experimental builds:
- 408263: gizmo paladin is broken, gizmo canary is fine.
- 389617: rush needs a new tool chain in place
2014-09-12 - 2014-09-15
Sheriffs: dkrahn, tbroch
Decision was made to leave the tree throttled until 414322 is fixed.
- 414128: kernel build failure
- 411693: image_to_vm failure
- 414322: paygen/autest bug ... device fails to run new image delivered (split out of 370302)
- 370302: paygen/autest bug caused by devserver availability issue still happening -- apparently not fixed
2014-09-09 - 2014-09-10
Sheriffs: semenzato, snanda, amstan, etc. (add your names) djkurtz for one day
Situation is relatively calm as of 5pm. Tree has been open a few hours. Yesterday and this morning the tree was closing every hour or so.
- 370302: paygen/autest bug happened a lot, but is now believed to be fixed by removing an errant devserver
- 413014 and 355843 also apparently stopped hitting
2014-09-08 - 2014-09-09
Sheriffs: davidriley, achaulk, djkurtz
- 370302: devserver/paygen failures manifesting as different autoupdate_EndToEndTest failures
- 410482, 412563: b-lab network saturation causing issues
- 412564: cascading pre-CQ failures trying to get tryjobs
2014-09-03 - 2014-09-04
Sheriffs: cywang
- 410716: master canary is waiting for results of all other release builds which are not started yet
2014-09-02 - 2014-09-03
Sheriffs: pstew, marcheu
2014-09-01 - 2014-09-02
Sheriffs: tyanh
- 409774: Canary master keeps getting timed out from daisy-release-group
- Waiting for this CL to be in to fix canary master on spitting TypeError
2014-04-24 - 2014-04-25
Sheriffs: jwerner, pstew Gardener: bshe, Build Deputy: yjhong, Lab Sheriff: beeps
- 366158 video_VideoEncodeAccelerator failure now breaking BVT as well
- 366537 login_MultipleSessions fails in BVT
- 365973, 365976, 366283, 366292, 366460, 366465, 366552, 366588 video_ChromeHWDecodeUsed causing a spate of failures
- 346868 login_LoginSuccess makes a re-appearance in the PFQ uprev failures, but this is likely due to BrowserConnectionGoneException
- 366577, 366581 graphics_SanAngeles failures popping back up
- 366592 login_LogoutProcessCleanup also failing too, so there's clearly something systematic (in telemetry?) causing BrowserConnectionGoneException failures.
- 366593 (login_OwnershipTaken),
- 356020 - Reset failing in BVT sporadically since Mar 24. Odd that it hasn't been assigned to anyone yet for triage.
- 366823 - daisy_spring DUTs are all non-functioning.
- 366988 parallel_emerge failed with IOError: [Errno 9] Bad file descriptor
- CL:196992 to fix sdk bot failure in strongswan and ipsec user/group
- 367086: [bvt] login_OwnershipNotRetaken Failure
- 367174: Tests aborting on HW tests
2014-04-23
Sheriffs: vapier Gardener: jamescook
- 365507 Login screen failures due to multiple blink regressions
- 365982 security_Firewall failed
- 366067 graphics_GpuReset hung the kernel
- 366142 Assertion about window opacity/visibility
- 366141 Suite aborts from a timeout without any other failures (causes PFQ failures, due to lab overload)
- 366158 video_VideoEncodeAccelerator failure breaking Chrome PFQ
- CL:196473 and CL:196481 asan bots failed in unittest due to bug in bootstat unittest
2014-04-22
Sheriffs: vapier Gardener: jamescook
2014-04-21
Sheriffs: wuchengli
- [365195] stumpy_moblab canary unittest failed
- [332665] [bvt] autoupdate_CatchBadSignatures Failure
2014-04-18
Sheriffs: wuchengli, semenzato, benchan
- [359223] [monroe] graphics_SanAngeles suspected to reboot/hang machine
- [365035] ChromeSDK failed on all full builders (because of LD=gold by default and probable gcc incompatibility)
- [364875] beaglebone canary build fails running out of space (FIXED)
- [364887] moblab unit tests failures
- [364818] x86-generic ASAN unittest failing due to compile warnings in metrics_daemon_test.cc
2014-04-17
Sheriffs: milleral, thieule
- [364617] Beaglebone servo image is too big
- [356020] [bvt] reset Failure on falco-chrome-pfq/R35-5684.0.0-rc4
- [355843] TreeCloser: build failure in DebugSymbols 600sec timeout
- [364669] Daisy skate build is failing on serial-tty
- [359223] [monroe] graphics_SanAngeles suspected to reboot/hang machine
- [358737] [bvt] graphics_GpuReset Failure on stout-release/R36-5718.0.0
2014-04-16
Sheriffs: milleral, thieule
2014-04-15
Gardener: derat
- [363884] LKGMSync step failing repeatedly due to bad SHA1 when syncing coreboot
- started new x86-generic nightly chromium PFQ build since last night's run died on some slaves
2014-04-14
Sheriffs: jchuang, reinauer, Gardener: derat
- [363339] stumpy moblab failure
- [363294] sandybridge-canary failed.
- [362999] Failed cbuildbot failed hwtest [bvt] [parrot_ivb]. Transient.
- [358737, 356020] Transient HWtest fail on falco and wolf (both issues have been auto filed many times)
- [363015] Failed cbuildbot failed debugsymbols [x86-mario]
- [363167] HWTest step timed out on daisy_spring and falco PFQ
2014-04-11
Sheriffs: olofj, adlr, josephshi, ihf
- [362621] Removed pyauto dependencies that broke PFQ.
2014-04-05
Sheriffs: keybuk, bfreed
- [339291] Reverted a set of CLs that caused platform_Powerwash failure.
- [360797] The chumps from 339291 broke incremental builders and required deputy assistance.
- [360898] video_DecodeAccelerator is increasingly unreliable. Maybe we should make it experimental?
2014-04-04
Sheriffs: keybuk, bfreed
- [360084] rambi-b canary build failed in the Archive stage on loopback mount failure. Believed transient.
- [360082] Chrome PFQ fails with unknown linker flag (--reduce-memory-overheads), likely because of https://codereview.chromium.org/225093005 last night.
2014-04-02 & 2014-04-03
Sheriffs: pprabhu, dgreid
- [359143] generate_payload failed to unmount a file system, and we tried to rm it later. pprabhu@ forced canaries to restart, since ongoing runs were all going to hit this issue. But it took a while to take this decision.
This hit us again later in the day, keeping canaries red almost all day. The reason was that a script had to be manually upreved to pull in the revert. See CL.
- [359227] VMTest hung. Root cause unknown.
- [359422] VMTest failed because VM ran out of space when running the tests. We reverted the Cl in the morning. Unfortunately, although all the canaries are back online, we can't get the CQ to pass yet, due to flakes + lots of CLs trying to get in. So, we need to uprev the manifest by allowing a noop CL through a throttled tree.
- [348199] and [353590] daisy_spring-pfq failed because of known GOLO and update engine flakes.
- [359760] beaglebone_servo canary is currently broken. [TODO: If the canary is still red at EOD, revert the CL mentioned in the bug].
2014-03-31
Sheriffs: katierh, dparker
- [358180] Daisy canary builder failure caused by error in a git repro. Existing error turned into a warning due to crbug.com/352692
- [358075] daisy_skate and daisy_spring canary failing due to clustered Chrome builds on one builder. Variants not using pre-built of Chrome from daisy.
2014-03-28
Sheriffs: dbasehore, armansito, sheckylin
- [357364] Tree doesn't close anymore when builds fail. Looks like it's fixed now.
- repeated failures in daisy canary during build packages.
2014-03-27
Sheriffs: dbasehore, armansito, sheckylin
- [353906] Builder out of space
- [357093] x86 generic ASAN fail due to Chrome
- [357202] Pre-CQ timeout.
2014-03-25
Sheriffs: dianders, vbendeb
- [356187] widespread provision failures; waiting for lab sheriff for the most part
- [356198] and [356199] video_VideoDecodeAccelerator - probably a duplicate of [353898]. There's a possible fix for that but it hasn't gone back to R34 yet.
- ... lab issue is hopefully fixed now ...
- ... David James and crew restarted CQ ...
- ... various things handled by David James ...
- [355843] beltino canary - DebugSymbols hung for 600 seconds
- [348188] slippy canary and daisy canary - Flood of "Too many open files"
- Chrome uprev has failed a few times; Chrome sheriff handling?
2014-03-21 and 2014-03-24
Sheriffs: jrbarnette, tbroch
- [355843] beltino canary: DebugSymbols failed during upload with timeout
- daisy incremental: CQ bug missed bump of chromeos-init for CL:190619 initially (race?) then got it fixed making manual override here unnecessary.
- [353018] sandybridge canary: OSError(16, 'Device or resource busy') ... believed to be not enough loopback devices.
- [354573] A bug in Chrome caused the x86-generic paladin to fail multiple times in VM testing.
2014-03-19 and 2014-03-20
Sheriffs: snanda
- [344506]: peppy canary failed to reboot due to ASIX USB issue.
- [352276]: falco canary platform2 build failure. http://crosreview.com/190820 is the fix but still waiting to be blessed by CQ.
- [354496]: monroe paladin misbehaved for a while.
- [354262]: sandybridge build failed. DUT was down?
- [311350]: platform_Powerwash Failure on daisy_spring-release. USB dongle flakiness?
2014-03-17 and 2014-03-18
Sheriffs: yjlou, wfrichar, victoryang, hungte (TPE)
- [352994] cros_generate_test_payloads failed to find image folder (race condition)
- [353429] chrome/chromium pfq bots died in build_image due to missing libmojo_system.so
- [353461] failuer in uploading DebugSymbols
2014-03-13 and 2014-03-14
Sheriffs: cywang(TPE), dgarrett, bleung, gwendal
- [348855] amd64-generic-asan: logging_UserCrash timed out (flaky)
- [352093] daisy_spring: HWtest job timeout, but tests are still running
- [350677] x86-generic-full : cryptohome fails to link.
- [352276] platform2-0.0.1-r366 fails on arm-generic full
- [352297] Pre-CQ Failure- Gerrit Code Review requires Java 7
- [348855] amd64-generic-asan: logging_UserCrash timed out
- [352428] x86-generic asan : logging_AsanCrashTelemetry : Unhandled TabCrashException: Handshake Status 500
- [72633] x86-generic incremental: login_OwnershipNotRetaken
- [352520] atom canary: x86-mario: build_image failed (can't read superblock)
2014-03-11 and 2014-03-12
Sheriffs: dlaurie, grundler
2014-03-05 and 2014-03-06
Sheriffs: miletus, shawnn
- [337490] daisy incremental unmount completed, but returned an error
- [348855] amd64-generic-asain, logging_UserCrash timed out
- [348758] x86-generic-asan failure. not sure how to interpret the failure message.
- [349559] Signer failure on all canaries
- [343442] Wolf Paladin builder wedged
- [349597] chrome-internal-fetch netrc credentials revoked
2014-03-03 and 2014-03-04
Sheriffs: quiche, vpalatin
- [348607] Chrome PFQ failure. later Chrome builds cycled green.
- [345501] platform_FilePerms: jrbarnette@ has CL checked in, but it wasn't picked up by the lab server. lab team will update its server.
- [347932] security_AccountsBaseline (multiple times). cmasone@ investigating.
- [348059] chromiumos sdk builder failing. pinged bug.
- [348758] x86-generic-asan failure. not sure how to interpret the failure message.
- [348799] stumpdgarrett, bleung, gwendaly-paladin, reboot failure in autoupdate_CatchBadSignatures
- [348805] x86-generic, e2fsprogs failed to emerge
- [330670] breakpad unittest failure on amd64-generic
- [348855] amd64-generic-asain, logging_UserCrash timed out (2x)
- [348889] duck canary failure
- [345491] x86-mario canary: GSResponseError 403
- [349073] parrot canary: platform_PowerWash failed
- [337490] daisy incremental unmount completed, but returned an error
- [349187] x86-mario canary failed in ChromeSDK: out of disk?
- [349292] duck canary, GS_ERROR: Attempt to get key
2014-02-19 and 2014-02-20
Sheriffs: olofj, pstew
- mario-canary fails in UReadAheadServer. No logs.
- Chrome uprev failed on thermal, dianders@ to revbump package, but failed again
- [344914] CQ failing due to failure to build hostapd, deemed to be a corrupted tarball in the buildier's cache.
- [345098] New print_repo_status.py factory install script broke the archive process.
- [345210] Rash of signer test failures (alex, slippy, parrot, leon)
- [345479] VMTests fail with 'NoneType' object has no attribute 'Cleanup'
- [345491] GS AccessDenied error while uploading prebuilts for slippy_canary. Invoking troopers.
- [345476] login_CryptohomeIncognitoTelemetry and ScreenLockerTelemtry [344849] causing chrome uprev issues
2014-02-17 and 2014-02-18
waihong (TPE), bhthompson, marcheu:
- peach_pit canary hwtest flake -> crobug.com/344427
- amd64-generic paladin machine went offline for a while. Contacted the Trooper to fix.
- stumpy canary hwtest flake - happened again, crbug.com/344173
2014-02-13 and 2014-02-14
reinauer, garnold
- beaglebone canary failed on DebugSymbols stage; appears to be a flake (crbug.com/344059).
2014-02-12
If you are seeing double-free/heap corruption errors when running gn during ChromeSDK runs (e.g.:
https://uberchromegw.corp.google.com/i/chromeos/builders/x86-alex%20canary/builds/4655/steps/ChromeSDK%20%5Bx86-alex%5D/logs/stdio)
it's probably crbug.com/335587. Please see my explanation in the bug.
- posciak
2014-02-11
benchan, sosa, owenlin:
2014-02-10
dkrahn, adlr, kcwu (TPE):
- Due to DiRT, MTV was offline and internal waterfall was affected ~14:00 - 17:00. Some buildbots were affected as well.
- Another chrome pfq vmtest failure on falco. crbug.com/212879 - these don't actually block uprev, see crbug.com/342425
- Google Storage issues (fiber cut) cause canary failures - https://a.corp.google.com/#102649
- Filed crbug.com/342497 for link canary build error.
- linux_chromeos dbg 2 bot has very long cycle time (~4 hours) so failures may show up late, filed crbug.com/342588
2014-02-07
dkrahn, adlr, kcwu
- mario canary hwtest bvt failure - emailed troppers to escalate. This has been going on for a while now, it seems
- chrome pfq vmtest failure - next build cycled green, so letting this one go
- falco chrome pfq vmtest failure - crbug.com/212879
- mario canary hwtest bvt failure ended up being a combination of crbug.com/339702 and no logs reported - crbug.com/341494
- investigated repeated failures on x86-generic asan builder - filed crbug.com/341922
- daisy_spring starvation in the lab due to crbug.com/339636
- wolf_canary lab flake due to crbug.com/340839
2014-02-06
skuhne, jwerner, thieule:
- Tryserver unavailable ~13:00 - 16:00
- HW lab down 14:39 - 15:46
- crbug.com/341658: dev_install failed due to connection timeout
- crbug.com/337490: amd64-generic-incremental build failure during build image due to unmountable partition. Device busy.
- crbug.com/212879: system sometimes does not come up after reboot in VMTest
- SimpleTestAndVerify fails on x86generic ASAN. Saw that yesterday already, but there were many more problems.. (-> crbug.com/337848)
- PFQ for daisy_spring is still failing in HWTests since there are apparently no machines since 6 days (-> crbug.com/339636)
2014-02-05
mtennant, skuhne, jwerner, thieule (MTV):
- PFQ for falco, lumpy, .. had failures. Might be fluke upon "Failed to run /home/chrome-bot/depot-tools/gclient runhooks". Retriggered / clobbered / but no success (-> crbug.com/341179) PFQ seems still to be broken @5:45pm, but it will cycle tomorrow morning green since Chrome was red as well.
- PFQ x86: lab failure or restarting VM's (-> crbug.com/212879)
- Several x86-generic and amd64-generic failures in SimpleTestUpdateAndVerify (-> crbug.com/212879), VM hung on reboot, rolling back 3.10 kernel switch on generic to fix)
- daisy canary - random tests failed due to crashdumps from (non-fatal) Xorg crashes that took too long to symbolize (client test is marked GOOD but server job times out)
- daisy_spring canary - all tests after a certain point in the suite failed with ABORT (suite hit 2h timeout since not enough lab devices available soon enough to finish on time)
- dev_install test is failing on canaries - crbug.com/341266 (offending change was rolled back with some difficulties problem was identified and will be fixed on reupload)
2014-02-04
posciak (TOK)
dianders, rspangler (MTV):
2014-02-03
dianders, rspangler
2014-01-31
tbroch, jrbarnette
- crbug.com/339934 race/flake for tar during DebugSymbols ... tar: debug/bin: file changed as we read it
- crbug.com/335587 double-free/corruption errors when running gn during ChromeSDK stage
- crbug.com/339743 [bvt] network_VPNConnect.l2tpipsec_cert Failure
- crbug.com/337490 mount failure during build_image (error status 32)
posciak (TOK)
- See crbug.com/335587 for a probable reason behind occasional double-free/corruption errors when running gn during ChromeSDK stage
- build failures as CQ missed one of the CQ-DEPEND CLs, because it was uploaded as a draft
- login timeouts on link canary in a few bvt tests; suspecting https://codereview.chromium.org/148843002 to have made login last longer/stop working... may need to followed up on if persists;
2014-01-30
tbroch, jrbarnette
- crbug.com/339573 failed to uprev chrome 34.0.1813 due to proto change for LocalExtensionCache::CacheItemInfo::CacheItemInfo
- contacted chrome gardner (harrym) to resolve
- crbug.com/310783 stumpy canary.
- crbug.com/339135 leon/samus/link/panther canary failures for vm_test fix here.
- crbug.com/338085 radvd ebuild failure ... fixed with clobber build. Email triage by (jamescook, avi, xiyuan, achuith)
posciak (TOK)
2014-01-24/2014-01-27
vapier
2014-01-24/2014-01-27
katierh, armansito
- Jan 27 was full of clobbering...
- Needed a few reverts due to bad eclass CLs landing - and then lots of clobbering, removing prebuilts, etc to get the tree in a sane shape - crosbug.com/338085
- Jan 24 had a number of failures due to Gaia corp errors...
2014-01-23
derat, wiley, dparker
- crbug.com/337490 failure in build image: mount(8) failed: Device or resource busy
- Timeouts in login_CryptohomeTelemetry. Fixed by this revert.
- PFQ failures with "TEST_NA: Unsatisfiable DEPENDENCIES" caused by a server dying in the lab (per scottz@).
- CQ fails in unittests due to timeout in chromite: crbug.com/337602
2014-01-22
derat, wiley, dparker
- crbug.com/336742: all PFQs failed due to factory-test-init and chromeos-test-init conflict (see 2014-01-20). forced rebuilds
- crbug.com/334958: two CertificateManagerBrowserTest tests failing on "Linux ChromiumOS Tests (dbg)(2)" builder
- BVT failure on pit caused by kernel crashes: crbug.com/336839
- BVT failure on ZGB caused by flake in network_DhcpStaticIP: crbug.com/336767
2014-01-20
reveman, pprabhu, dgreid
- crbug.com/335978: security_ptraceRestrictions failed due to test_image update. Fixed by this revert.
- factorytest-init and chromeos-test-init package conflict. Fixed by this revert.
- crbug.com/336296: Arm canaries and pfq were broken. It was mostly a chrome issue, fix had already made its way to ToT pfq builders. TODO(sheriff): Make sure that the nightly-pfq picks up this change. Essentially, make sure that nightly-pfq has a green run.
- crbug.com/336634: We didn't have enough daisy_spring DUTs in the lab, so HWTest timed on ChromePFQ a couple times.
2014-01-16 and 2014-01-17
bfreed, snanda, ellyjones
2014-01-10 and 2014-01-13
bleung, dbasehore, spang
- crbug.com/333310: Node.js issue with downloading Chrome. Caused all canary builders and full builders to fail.
- crbug.com/333398: Delay between ebuild commit and uprev commit
- crbug.com/332645: beltino canary failed in archive, might be low memory issue.
2014-01-06 and 2014-01-07
jsalz (TPE), dlaurie, grundler
- crbug.com/332104: Recurring issue in ManifestVersionedSync step on several builders (zgb, falco, peppy, stout) and experimental builders. This is the top item for tree closure on 1/7.
- crbug.com/332145: Chrome PFQ nightly failing to compile. Fixed in chrome already, should be good for tomorrow's build.
- crbug.com/327651: autopdate_EndToEndTest failure
- crbug.com/329248: "Update failed" in VMTest
- daisy_incremental out of space
- crbug.com/327388: experimental_platform_RebootAfterUpdate suspected of putting machines in Repair Failed state.
- crbug.com/331176: login_CryptohomeIncognitoTelemetry suspected of putting machines in Repair Failed state.
- crbug.com/328360: audiovideo_VDA failed
- crbug.com/331318: "Suite prep" failure - probable bvt timeout [update: fixed on parrot_ivb]
- crbug.com/331754: platform_FilePerms: "/dev/pts" is missing options "set(['mode=620', 'gid=5'])"
- crbug.com/324907: GerritHelperTest unit test failure
- crbug.com/331756: UploadPrebuilts fails with CommandException: Invalid canned ACL
2014-01-02 and 2014-01-03
djkurtz (TPE)
- crbug.com/329777 - autoupdate_CatchBadSignatures hash failure
- crbug.com/317309 - "daisy canary" - autoupdate_Rollback failed failed to find a job_repo_url for the given host
- crbug.com/331318 - "parrot canary" - time out during bvt
*** 2014! Happy New Year!! 2014! ***
2013-12-25 and 2013-12-26
hungte (TPE)
2013-12-23 and 2013-12-24
jcliang (TPE)
2013-12-19 and 2013-12-20
cywang (TPE), gabeblack, zork
2013-12-17 and 2013-12-18
waihong (TPE), shawnn, charliemooney
2013-12-09 and 2013-12-10
sabercrombie, milleral, miletus, mtennant (Chrome OS build deputy), rginda (Chrome gardner)
- crbug.com/327005 - chromeos-base/telemetry failed on chrome_pfq, multiple platforms
- crbug.com/327007 - Parrot canary failed because of chromeos-chrome build failed
2013-12-5 and 2013-12-6
dbasehore, pstew, seanpaul, tengs (chrome)
2013-12-3 and 2013-12-4
reinauer, benchan, josephsih, jamescook (chrome)
- crbug.com/324872 - repeated hwtest timeout on daisy_spring
- crbug.com/325056 - TestFailure on HWTest [bvt]: network_DefaultProfileCreation: Missing setting CheckPortalList=ethernet,wifi,cellular
- crbug.com/212879 - lumpy chrome pfq failing (SimpleTestUpdateAndVerify fails, system doesn't come up after reboot)
- crbug.com/325610 - devinstall_test failed with KeyboardInterrupt: SIGINT received in VMTest x86-alex canary
- crbug.com/325617 - chromite unitest failure on samus canary
- crbug.com/325629 - chromite unitest gerrit_unittest failure with KeyError: 'http' on amd64-generic full and x86-mario. szager had submitted a patch to fix this problem.
- crbug.com/325632 - ALL bvt tests failed. I believed they were caused by the same reason. Merged all the other auto-filed issues to this one to track this bug.
2013-12-2 (and 11-29 - holiday)
skuhne, dkrahn, rspangler, kinaba
- See several timeout problems (auto update, VMTest) and investigating. re-run builder with clobber
- crbug.com/212879 - lumpy chrome pfq failing
- crbug.com/319997 - daisy_spring canary hwtests failing b/c dut not coming back after reboot
- crbug.com/324872 - filed this bug to track repeated hwtest timeout instances on daisy_spring
- crbug.com/324907 - filed this bug to track repeated chromite unittest failure on x86-mario canary
- crbug.com/324916 - filed this bug to track repeated perf (benchmark tests) failures on lumpy, parrot, daisy, ..
- crbug.com/317903 - closed the tree because lumpy paladin will not pass until this is fixed -- update: temporary fix here and tree reopened
2013-11-27
thieule, garnold, kcwu
- crbug.com/324116
- crbug.com/323001
- x86-zgb canary failure during BuildImage; cgpt error, likely due to a corrupt image remnant from an previously interrupted run; re-ran builder w/ clobber.
- Tree/builders closed for maintenance between 9am-12:35pm PST.
- crosreview.com/178034
2013-11-25 and 2013-11-26
sosa, tbroch, yjlou
- 11/26: crbug.com/321997 : stumpy canary: "update-engine failed"
- 11/26: butterfly canary: hwtest failed ... out of machines in lab
- 11/26 x86-mario canary: cryptohome bug ( crbug.com/322161? )
- 11/26: crbug.com/323593 : chrome PFQ failing on amd64-generic due to disk full
- 11/26: crbug.com/323569 : previous build interrupted during setup_board left unclean state. davidjames cleaned up manually
- 11/26: crbug.com/322826 : cbuildbot issue w/ versioning
- 11/25: mtv no closures 9-5pm PST
2013-11-21 and 2013-11-22
fjhenigman, dianders, jwerner
- daisy_sping canary: new auto-filed bug 321997 - "update-engine failed" but can't tell why
- x86-generic ASAN#14728 - Failed in VMTest in cryptohome stuff (apparently a TPM failure?). http://crbug.com/322161
- All canaries are failing - closing tree as per lab folks while they investigate.
- Friday morning: vapier opened the tree with the message "network_3GSmokeTest.pseudomodem.3GPP failure -> crbug.com/322263"
- Friday morning: vapier opened the tree with the message "daisy_spring canary" looks like flake; peppy hwtest -> crbug.com/322263. Note that 322263 has since been marked as fixed.
- Butterfly canary died again in network_3GSmokeTest.pseudomodem.3GPP, but with a different message. This time: "Unhandled gaierror". Opening up http://crbug.com/322606 to track.
2013-11-19 and 2013-11-20
dgreid, wiley, owenlin
- [bvt] network_DhcpStaticIP failed on daisy paradin:
- [bvt] p2p_ConsumeFiles failures:
- login failure running test on alex canary.
2013-11-15 and 2013-11-18
puneetster, adlr
- crbug.com/319952 chrome crash on boot as of 4966.0.0/33.0.1710.0 . Chrome pinned. CL to unpin: https://code.google.com/p/chromium/issues/detail?id=319975
- crbug.com/319796: Network issues w/ archiving ; still happening. on-call groups have been paged. Throttling that was impacting us is resolved.
2013-11-13 and 2013-11-14
rminnich, katierh
- crbug.com/319227 - updated chrome build had a telemetry bug - fix landed in chrome and then sent to the canaries - the offending test was marked as experimental until it landed
- crbug.com/318814 - flake on ASAN vmtest falls into an error path looking for pyautolib which doesn't exist
- crbug.com/318681 - stout32 flake (autoupdate_Rollback_SERVER_JOB,Provisioning) -> duped to 317052
2013-11-11
bfreed, vbendeb
- ARM systems broken: filesystem corruption causing chrome/autoupdate failure: crbug.com/317693
- Toolchain reverted from 4.8 to 4.7!
- lumpy paladin rootfs is low -> crbug.com/317903
- VMTest SimpleTestVerify failed in login_OwnershipApi -> duplicated to crbug.com/314293
2013-11-5
jrbarnette, wuchengli
- build failure during TPE shift, fixed with revert in crosreview.com/175915
- stumpy canary went red: AU test failed because of crbug.com/277839
- bayleybay canary isn't important, filed crbug.com/315194 to have it removed.
- mario canary failed unit tests - akeshet investigated.
- amd64-generic ASAN builder failed, timed out uploading to GS.
- bayleybay canary was red at the opening bell, filed crbug.com/315189
2013-11-1
2013-10-28 and 2013-10-29
sbasi, dparker
2013-10-24 and 2013-10-25
jchuang, quiche, dlaurie
- bayleybay-canary UnitTest still fails. (not tree closer)
- Tree closed 2 time on Thursday for Archive time out in canary (1st time: 22 slaves fail. 2nd time: 2 slaves fail): crbug.com/311215
- Tree closed 1 time for Mario incremental compile error in platform2 and p2p: crbug.com/311455
- Tree closed for VMTest fail on x86/amd64 ASAN: crbug.com/311478
2013-10-18 and 2013-10-21
snanda, spang, dhendrix
- bayleybay-canary disk filled up, lab team fixed it. However it's still having other issues and hasn't built successfully for several days.
- Had issues with autotest_lib.client.common_lib.barrier_unittest: http://crbug.com/309832
2013-10-16 and 2013-10-17
bhthompson, jeremyt
- Tree closed 10 times due to crbug.com/305263 (buildbot failure in ChromiumOS on {amd64,x86}-generic ASAN)
- Tree closed 1 time, at the end of Thursday on slippy canary. Root cause is not known.
- Preflight queue failed 3 times, crbug.com/212879
2013-10-13 and 2013-10-14
bleung, vpalatin
- Tree closed much of Monday due to crbug.com/307021. Lots of BVT failures due to hosts not returning from an update to R32-4820.0 Upon further investigation, it looks like there is a lab network issue (crbug.com/307199)
- Build failed due to GoB flake. Failed in Clear and Clone phase : crbug.com/307524
- connection timed out due to banner exchange flake in VMTest happened on parrot64 canary : crbug.com/254166 comment #100
- Segfault in AddresSanitizer on amd64-generic-ASAN and x86-generic-ASAN : crbug.com/237690
2013-10-10 and 2013-10-11
reinauer, garnold
- tree closure due to login_CryptohomeIncognitoUnmounted during hwtests on peach_pit canary, crbug.com/278379
- chromeos-chrome failed, crbug.com/268397
- x86 generic ASAN, crbug.com/305263
- security_OpenSSLBlacklist failed, crbug.com/255349
- tree manually closed and reopened for parrot 32->64 transition
- another closure (amd64 generic ASAN) due to chromeos-chrome errors during parallel stripping, crbug.com/305263
- amd64 generic full failed during SyncChrome with what seems like a bad DEPS file or stale mirror, crbug.com/306692
2013-10-08 and 2013-10-09
rharrison, sheu
- mario incremental VMTest failed on banner exchange, crbug.com/254166
- CQ was broken overnight, fixed by sop, crbug.com/305464
- daisy canary failure, all known/flake issues, crbug.com/294909
- peppy canary failure, all known/flake issues, crbug.com/263527, crbug.com/233864, crbug.com/294909, crbug.com/268397
- x86 generic ASAN build packages failure, unable to strip files that don't exist, different runs of exact same manifest having different results, crbug.com/305262, crbug.com/305263, crbug.com/268397
2013-10-04 and 2013-10-07
djkurtz
- mario incremental timeout in SimpleTestVerify crbug.com/303972
- "x86 generic incremental" & "amd64 generic full" crosbug.com/254166 "Connection timed out due to banner exchange"
- parrot32 vmtest failed due to KVM dying. Trooper rebooted server (build100-m2). crbug.com/304257
- "amd64 generic ASAN" & "x86 generic ASAN" - chromeos-chrome build fails
- "link canary" - power_Resume, "Sanity check failed: missed RTC wakeup" - crbug.com/253355
- "daisy_spring canary" - power_Resume, "Spurious wake from s5m-rtc" - crbug.com/304557
- "falco canary" - Archive BackgroundFailure: "code 600" - filed new crbug.com/304757
2013-10-02 and 2013-10-03
kamrik, dkrahn, zork
- Archive stage sometimes flakes out while uploading symbols crbug.com/303111
- Chaps unittest failure: crbug.com/216572
- power_Resume hardware test failure: crbug.com/233864
- VMTest failure "Connection timed out due to banner exchange": crbug.com/254166
- October 3 ...
- Login tests failed in BVT on all platforms - matrix. One is a bug the other two seem to be real. Downloaded image - can't log in, get stuck on the "Updating screen". Guest login works ok.
- Login was actually broken but passed the chrome pfq. Tracking bug is crbug.com/303764. Bug has been fixed but chrome uprev pending. In the meantime, chrome has been pinned to 32.0.1658.2. A CL to unpin is at https://chromium-review.googlesource.com/171761.
- Chrome unpinned - 10/04 10:35am
2013-09-24 and 2013-09-25
sabercrombie, dbasehore
- x86-mario canary autoupdate_EndToEndTest failed (crbug.com/218342 -- intermittent platform_Shutdown failure).
- daisy canary platform_CryptohomeTestAuth failed (crbug.com/262546).
- some machines failing BVT due to crbug.com/294221
- Recurrence of of "Connection timed out due to banner exchange" http://crbug.com/254166 on Mario incremental (security_Minijail_seccomp) and x86 generic ASAN (platform_CrosDisksDBus).
- Filed crbug.com/298216 for intermittent Link power_Resume "Could not find start_resume_time entry" BVT failure.
- Filed crbug.com/298376 which is causing the "Clear and Clone chromite" stage to fail.
- crbug.com/254166 again on mario_incremental, this time with security_SymlinkRestrictions.
2013-09-23
benchan, gabeblack, jcliang
2013-09-20
benchan, gabeblack, jcliang
2013-09-19
bfreed, pstew
2013-09-18
bfreed, pstew
- All release builds fail in VMTests. https://crbug.com/294144. This seems to be an issue with the python eclass which has a broken issue with EROOT vs EPREFIX when looking for python wrappers during the gmerge tests. This issue only affects EAPI=4 due to logic in that eclass, so only when hdctools upgraded to use it this problem was triggered.
- Chrome fails to uprev due to includes moving around, and PDF failing to compile against it. This issue was fixed in Chrome but the version we were trying to uprev to did not have it. Spoke with sky@ to have a new version cherry-picked to today's Chrome release branch, and have a build kicked off for it.
- 4:23: daisy canary fails with crbug.com/289821.
2013-09-17
dianders, jrbarnette
- start of day: tree closed due to daisy_spring canary (login_CryptohomeUnmounted failed on daisy_spring-release). Filed http://crbug.com/293495 (the bug, which looks like infrastructure) and http://crbug.com/293491 (why did autofiler use wrong bug).
- start of day: butterfly canary. Autofiler chose http://crbug.com/255866 and that seems reasonable.
- start of day: crosbot wasn't updating IRC. ellyjones booted it to fix it.
- 9:30: Random PFQ failure filed as http://crbug.com/293518. "WARNING: Cannot rev sys-boot/chromeos-coreboot-fox"
- 9:30: Ben points out that mario canary has been dead for days. http://crbug.com/293515.
- 10:10: parrot canary: "double free or corruption" in VMTest login_CryptohomeMounted. Reusing old http://crbug.com/237646.
- 10:20: daisy_spring canary failed on "login_CryptohomeUnmounted_SERVER_JOB". Tracking with autofiled bug http://crbug.com/293524, although really all we can track is why proper debug info wasn't gathered.
- various: Chrome PFQ Failing to uprev Chrome commit - jrbarnette thinks that this will get better with a commit to stop including Chrome Driver; hopefully Chrome is handling?
- after 5:00: an x86-alex canary failure will happen soon. Looks like http://crbug.com/288795 is hitting again.
2013-09-16
dianders, jrbarnette
- start of day: tree is green and things look reasonable.
2013-09-12 and 2013-09-13
jwerner, hungte (TPE)
- mario-incremental: VMTest network failue ("Could not initiate first contact with remote host", "Connection timed out during banner exchange"), http://crbug.com/254166
- Another one on x86-generic-incremental, same underlying cause but this time with a huge spew of ssh debug output due to the connection problems
- daisy_spring-canary: weird autoupdate_EndToEndTest.npo_test_delta timeout problem, no idea, autofiled http://crbug.com/271115
- mario-incremental: crbug.com/290142
2013-09-09
2013-09-06
- Tree started green in the morning. cmp@ reverted a change which caused prebuild uploads to fail with a message about gerrit rejecting a push: https://code.google.com/p/chromium/issues/detail?id=286343
- At around 4:30 the tree suffered two flakes nearly simultaneously: crbug.com/281733 and crbug.com/273728
2013-09-04 and 2013-09-05
waihong (TPE), adlr, mtennant
2013-09-3
katierh
- butterfly-canary
- crbug.com/284384 - autoupdate_EndToEndTest.npo_test_delta failed (same as 283706, 269706, 281733)
- daisy-canary
- crbug.com/273728 - login_CryptohomeMounted,Could not get info about cryptohome vault through /home/user
- stout/spring
- stout
- spring
- lumpy
2013-08-29 and 2013-08-30
puneetster, vbendeb, cywang
- x86-zgb canary HwTest failures
- x86 generic ASAN VMTest failure
- chromium-sdk buildbot job timed out due to the fact that building pakages is longer than usual
- all canaries failed in
2013-08-27 and 2013-08-28
dgreid, tbroch, yoshiki (davidjames, cmp, vapier, others probably)
- Generally tree was closed much of the two days due to gerrit service migration.
- Some paladin bots were failed on compiling chromeos-coreboot-falco-0.0.1-r177: firmware/lib/region-fw.c.
2013-08-23 and 2013-08-26
thieule, shawnn, dgozman
- crbug.com/254166: SSH connectivity drops temporarily in VMTests (This is holding up the CQ, bumped to P0)
- crbug.com/278334: CQ VMTest fails with "Devserver did not start"
- crbug.com/278379: [bvt] login_CryptohomeIncognitoUnmounted failed on daisy-release/R31-4583.0.0
- crbug.com/251309: devserver hang (vmtest failure: WARNING: Killing tasks...) (sosa@ has fix pending)
2013-08-21 and 2013-08-22
chromeos-chromedgarrett and rminnich and josephsih
- Canary flake (4-5 times?) from known: crbug.com/251309.
- VM flake from known: crbug.com/254166chromeos-chrome
- Canary failed because of HW Lab issues being worked. Reason obscured by crbug.com/276507
- x86-zgb canary failed: crbug.com/276507
- crbug.com/251309 has continued to be popular.
2013-08-19 and 2013-08-20
quiche and sbasi
2013-08-13 and 2013-08-14
charliemooney and dhendrix and hychao
- Failures with google storage are causing CQ and canary failures: crbug.com/273254
- The CQ had managed to complete one run as of 18:15 PDT, but there's no confirmed resolution.
- A bit of a commit queue dependency problem, that got sorted out neatly: crbug.com/272220
- There was a mysql server that apparently died and caused some failures: crbug.com/272412 (This was an issue both days)
- login_CryptohomeIncognitoUnmounted failed twice in a row on x86-generic incremental builder. No root cause yet...
- PFQ nightly builders were failing due to aforementioned dependency problem.
- daisy canary failed hwtest platform_CryptohomeTestAuth - 262546 autofiled again.
- crbug.com/271971: Error execute cmd 'tar xjf /usr/local/autotest/packages/dep-pyauto_dep.tar.bz2 ...'
- Filed crbug.com/272666
- Another occurance of CryptoHomeTelemetry failing: crbug.com/26565
- One FPQ failed suuper early and never even tried to build: crbug.com/269171
2013-08-09 and 2013-08-12
dkrahn, chinyue
- crbug.com/271287: VMTest login_CryptohomeIncognitoUnmounted failed
- crbug.com/257880: platform_RebootAfterUpdate failures
- crbug.com/270854: flaky chromeos-ec unittest
- crbug.com/270942: daisy_spring canary failure: All hosts with HostSpec ['board:daisy_spring', 'pool:bvt'] are dead!
- crbug.com/270952: peppy canary failure: chromite unittest
- crbug.com/254166: multiple failures: VMTest timeout error
- Widespread canary failures over the weekend due to dependency problem: reverted https://gerrit.chromium.org/gerrit/65297
2013-08-07 and 2013-08-08
tyanh
- Chrome PFQ failing to uprev
- crbug.com/252451: reimage timed out causing all the tests to fail with "Failed to reimage machine with appropriate labels"
- crbug.com/269831: platform_OSLimits_SERVER_JOB
- crbug.com/264522: extension_QuickofficeOpenFile
- crbug.com/267565: login_CryptohomeIncognitoTelemetry
- Several occurrences of ssh flakiness in login_Cryptohome*. crbug.com/254166
- x86 generic ASAN: #13324
- amd64 generic ASAN: #6689, #6693
- x86 generic full: #10603, #10622
- x86 generic incremental: #13201
- amd64 generic full: #9006
- hwtest bvt failures on daisy_spring_canary
2013-08-05 and 2013-08-06
cychiang, dlaurie
- lots of login_Cryptohome{Mounted,Incognito} on daisy_spring canary started from 8/1 build #350 ~ 8/5 build #358, passed in #360, but failed in #361,362,363 again. crbug.com/268225
- platform_RebootAfterUpdate : crbug.com/268198, similar to crbug.com/263425
- parrot32 vmtest hangs and gets killed by builder, similar failure in #68,69,71,79,86 crbug.com/251309
- daisy canaray login_CryptohomeMounted: crbug.com/267794 crbug.com/267974 crbug.com/268052 crbug.com/268223
- lumpy nightly chrome PFQ: vmtest SimpleTestUpdateAndVerify: Networking sometimes does not come up after reboot in VMTest crbug.com/212879, fail from #3004 to #3007
- parrot canary failure in hwtest suite prep: reimages stomping on each other crbug.com/265463
- stout32 canary fail in login_LoginSuccess in #497. crbug.com/268262
- butterfly, link, daisy BVT failures in platform_RebootAfterUpdate due to "shutdown took too long" crbug.com/259956
- x86-generic-asan intermittent build failure in chromeos-chrome postinstall step: crbug.com/268397
- SSH connection issues in VM tests: crbug.com/254166
- parrot32 vmtest failure stopping VM: crbug.com/251309
- lumpy #3012 and daisy #2937 , amd64-generic #3025 nightly chrome PFQ was broken by chromium CL r215785 and was reverted by chromium sheriff.
- x86-mario crbug.com/268809 flake as in crbug.com/267041
2013-08-01 and 2013-08-02
wfrichar, vpalatin, vapier
2013-07-31
reinauer, fjhenigman
- stuff had cycled green - opened the tree
- daisy_spring canary failed hwtest login_BadAuthentication - 266340 autofiled - looks unlike previously autofiled login_BadAuthentication autofiles
- daisy canary failed hwtest platform_CryptohomeTestAuth - 262546 autofiled
- parrot canary failed because the kvm instance wouldn't start up.
- parrot32 canary failed with crbug.com/251309
- butterfly canary failed with crbug.com/254255
- stout canary failed with crbug.com/265725
- stout32 canary is full of stars:
- crbug.com/254166 - SSH connectivity drops temporarily in VMTests
2013-07-30
reinauer
2013-07-24 and 2013-07-25
olofj, garnold
- crbug.com/265019 --- daisy canary build fails to emerge kernel
- crbug.com/253034 --- AUTest failure on multiple builds, failing to spawn a local devserver via ssh; hopefully a transient (albeit scary) lab hickup
- crbug.com/264802 --- VMTest failure (login_CryptohomeMounted); appears to be a timeout sshing to the kvm
- crbug.com/254255
- crbug.com/257810
2013-07-24 and 2013-07-25
reinauer, bhthompson, bleung, sheckylin
2013-07-22 and 2013-07-23
petermayo, rspangler, zork
2013-07-16 and 2013-07-17
benchan, rcui, spang
Continuing issues from previous shift:
2013-07-12 and 2013-07-15
djkurtz, mtennant, jrbarnette
- The following unresolved bugs are causing persistent canary failures, and will need tracking by the next sheriffs:
- Stout canary failure, build image stage, cros_generate_test_payloads error: http://crbug.com/260432
- ALL canaries fail AUTest autoupdate_EndToEndTest.nmo_test_delta & npo_test_delta: http://crbug.com/260185
- Stout, link, butterfly canary security_HciconfigDefaultSettings BVT failure: http://crbug.com/253706
- Peach-pit & Butterfly canaries Archive step failed to upload_symbols: http://crbug.com/259652
- daisy canary failure; shutdown takes too long: http://crbug.com/259956
- Stumpy canary is red (and will remain red) waiting for the fix to http://crbug.com/259901. Many hwtests failing is the symptom.
- stout canary hwtest failures, experimental_video_VideoSanity: http://crbug.com/259976 and security_HciconfigDefaultSettings: http://crbug.com/253706
- amd64 generic full, vmtest failure, Chrome sig 6 during security_ProfilePermissions.VWSI test: http://crbug.com/260027
- x86 alex canary, vmtest failure, devinstall_test cannot connect to VM: http://crbug.com/260036
- link canary fails platform_RebootAfterUpdate: http://crbug.com/260177 although likely a dup of http://crbug.com/257880 which is hiting the release builders
2013-07-10 and 2013-07-11
sabercrombie, olofj
- Stout canary security_HciconfigDefaultSettings BVT failure: http://crbug.com/253706.
- Multiple failures of video_VideoSanity: http://crbug.com/248552.
- Multiple failures of hardware_VideoDecodeCapable. Apparently this isn't something we care about: http://crbug.com/253501. Update: Seems this is a longstanding issue with an unknown resolution date: http://crbug.com/223291.
- Daisy canary failures: http://crbug.com/257148 and http://crbug.com/253821.
- Filed http://crbug.com/259100 for peach_pit_canary modules_install failure.
- desktopui_ScreenLocker failure on butterfly. Appears to be another case of crbug.com/253920.
- autoupdate_EndToEndTest timeout failures: http://crbug.com/253821.
- Failure of Mario to come out of suspend: http://crbug.com/255447.
- Another instance of http:crbug.com/233864: daisy_spring power_Resume: Autotest client terminated unexpectedly: DUT rebooted during the test run.
2013-07-08 and 2013-07-09
reveman, wdg
- hang in chromite tests because gerrit was hanging - crbug.com/258091
- video_VideoSanity keeps failing - crbug.com/248552
- hardware_VideoDecodeCapable and other flaky hwtest failures reported by previous sheriffs are still present
- failed to connect to virtual machine failure on stout32 canary (cycled green): crbug.com/215784
2013-07-04 and 2013-07-05
cwolfe
- autest failures seem to be time-related -- is someone rebooting something? Glancing through the bots, all the autest failures seem to be on the 10:30pm runs.
- some dbus-related stack traces being reported from Chrome crashes in vmtests (mario-incremental and amd64-full). Waiting to see if it recurrs...
- chronic hwtest flake on hardware_VideoDecodeCapable
- chronic hwtest flake on experimental_video_VideoSanity (ignore this one)
- chronic hwtest flake scattered across power_Resume, login_CryptohomeMounted, etc
- fox fails to compile adhd occasionally; filed crbug.com/257634 and uploaded a quick fix
- amd64-generic full failing VmTest on an assertion in CrosDBusServiceImpl::OnOwnership; probably crbug.com/234382
- wow, does the autofiled bugs "feature" get spammy... the tool you want on crbug.com is under Actions in the upper left of the bug list, Bulk Edit.
2013-07-02 Tue
dparker, bfreed, ellyjones
fjhenigman, dgreid, rminnich
- tree was closed by lumpy canary but cycled green - maybe infrastructure glitch
- stumpy canary red - 254678, dgreid
- alex canary looks like it will cycle green - and it did
- stout canary AUtest failure may be bug 237122 or 235608 though it was closed a couple days ago...
- stout canary HWtest looks same as lumpy above
- stout canary cycled green
- slippy canary restarted, cycled green
- x86 generic full was red, cycled green
- stumpy canary cycled green
- stout32 canary timeout error in VMTest, suspect 209719
2013-06-27 Fri
dgreid
Tree closed around 0800 PDT, issue 233864
2013-06-26 Thur
dianders, clchiou, olege
2013-06-26 Wed
dianders, clchiou, olege
2013-06-23 Mon & 2013-06-24 Tues
rharrison, wiley, katierh
Tuesday
- crbug.com/253706: security_HciconfigDefaultSettings failure
- crbug.com/254208: symbol upload failures on alex and zgb
- crbug.com/234382: Same Chrome crash as yesterday on amd64 generic full
- crbug.com/254096: Failed to get a good response line from lab servers during reimaging
- daisy_spring failed HWTest
- Tree nicely busted first thing in the morning, investigating ...
- Holding the tree closed until bots start cycling green
- crbug.com/212879: Networking sometimes does not come up after reboot in VMTest
- crbug.com/253824: [bvt] network_Ping failed on daisy-release/R29-4318.0.0
- crbug.com/253822: [bvt] security_ProfilePermissions.login failed on daisy-release/R29-4318.0.0
- crbug.com/253823: [bvt] security_ProfilePermissions.BWSI failed on daisy-release/R29-4318.0.0
- crbug.com/2538x86-mario canary: AUTest failure21: [au] autoupdate_EndToEndTest.npo_test_delta failed on butterfly-release/R29-4318.0.0
- Looks like there was a network issue, since a lot of the bots failed due to RPCs failing, etc.
Monday
- crbug.com/253302: AUtest failure
- crbug.com/245026 strikes again.
- Looks like a DUT not coming out of reboot
- Is caused by shill not being able to get a DHCP lease on an ethernet port
- cause mostly unknown
- rharrison orinally filed: crbug.com/253527 about this
- wmatrix.googleplex.com was acting up, talked to kamrik about addressing this
- crbug.com/253571: Failed to connect to gerrit to download patches? (Required reverting update to gerrit)
- crbug.com/253554: falco VMTest failing - looks like bad reimage
- crbug.com/234382: Fatal chrome error "Failed to own: org.chromium.LibCrosService" during test automation
- stout and stumpy canary went down at the same time:
- crbug.com/253501: hardware_VideoDecodeCapable control.v4l2 running on stumpy_canary (non-closer)
- crbug.com/253506: hardware_VideoDecodeCapable control.v4l2 running on stout_canary (non-closer)
- crbug.com/237530: security_HciconfigDefaultSettings failed HWTest (stout_canary, butterfly)
- crbug.com/253521: experimental_logging_UdevCrash failing (non-closer)
- crbug.com/248552: Video Sanity test flaking often on Chrome OS HWTest (non-closer)
- crbug.com/253485: [bvt] power_Resume failed on stumpy-release/R29-4315.0.0 (stumpy_closer)
- crbug.com/253527: Seeing some failures on try_new_image, hosts not returning from reboot (non-closer)
- crbug.com/234382: amd64 generic full VMTest failing occasionally due to Chrome crash (dup of 234383)
- Came in to a broken tree, most redness look like flaky network
2013-06-20 Thurs & 2013-06-21 Fri
davidjames, zork
2013-06-18 Tues & 2013-06-19 Wed
vbendeb, shawnn, serya
2013-06-14 Fri
pstew, ihf, hungte
ongoing issues: daisy boot issues
- crbug.com/250816: Mali changes apparently have left the system in an un-bootable state which was first detected by canary HWTest
2013-06-14 Fri
quiche, ihf, hungte
ongoing issues: chrome automation timeouts, devserver problems, power_Resume flake
2013-06-13 Thurs
thieule, tbroch, sjg
2013-06-12 Wed
tbroch, sjg
- Build failure 244055 - fix is in, has not happened since
- (hwtest) power_Resume EarlyWakeupError on daisy_spring canary: 19227, 247458.
2013-06-11 Tue
- glmark2 failing on egl dependency; cwolfe landed 58195 to fix in glmark-0.0.1-r2329
- R28 release builders failing in ManifestVersionedSync 248559
2013-06-06 Thu & 2013-06-07 Fri
rspangler, ferringb
- (vmtest) Slippy-canary background task hung (logs).
- (hwtest) power_Resume EarlyWakeupError on daisy_spring canary: 247458.
- (hwtest) power_Resume SuspendFailure on link canary: 247460.
- (vmtest) login_LoginSuccess CommandAutomationTimeout on alex canary: 240031.
- (vmtest) network_3GSmokeTest timed out waiting for shill device disable: 247540.
- peach_pit paladin couldn't run build_packages starting at build 882. vbendeb investigating.
2013-06-04 Tue & 2013-06-05 Wed
wdg, dhendrix
- (hwtest) login_BadAuthentication: 246754
- U-boot was rebased and entailed a manifest change. There was a fair bit of fallout that caused chromeos-bootimage to fail to build and was fixed over the course of a few hours. In a nutshell, the issues were:
- peach_pit device tree files that were being installed on daisy platforms (snow, spring) and causing problems
- Missing device tree files for x86 platforms.
- A bug in the firmware bundling logic that was causing an invalid dependency on CrOS EC for platforms which do not use CrOS EC (parrot, butterfly, stout).
- Network flakiness causing many failures when cloning.
2013-05-31 Fri & 2013-06-03 Mon
- Ongoing - (vmtest) Cryptohome failures: 241789. Tree's been red all week. One fix went in over the weekend but it may still be flaky. Got davidjames to increase retries for login_CryptohomeIncognitoMounted to see if that helps some changes get through the CQ.
- Ongoing -- (hwtest) Frequent failures in security_ProfilePermissions, platform_Pkcs11ChangeAuthData, video_VideoSanity (experimental tests)
- Link to tree closer issues is incorrect after the merge with the Chromium issue tracker. Should be this; updated wiki.
- Monday AM -- poked troopers about 500 errors in HwTest and AuTest stages on canaries. [cwolfe drive-by]
2013-05-27 Mon (Holiday) & 2013-05-28 Wed
- Ongoing -- (hwtest) Frequent failures in security_ProfilePermissions, platform_Pkcs11ChangeAuthData, video_VideoSanity
- Ongoing -- (hwtest/autotest) Flaky test automation causing frequenty failures
2013-05-21 Tue & 2013-05-22 Wed
semenzato, dgarrett, cywang
- Ongoing -- (hwtest) Flaky power_Resume test on canary builders: 242788, 220014
- Ongoing -- (buildbot) autotest-telemetry build failed on PFQ, ASAN builders: 242770
- Ongoing -- (autest) Flaky autoupdate_EndToEndTest: 235608
- Ongoing -- (vmtest) Hung then killed on Falco, Peppy canaries: 242470
2013-05-13 Mon & 2013-05-13 Tues
charliemooney, sheu
- Ongoing -- Lots of problems with the AU rebooting canary builders: 235608
- Fixed -- The PFQ's are mad about thier dependencies when building expected_deps: 240601
- Fixed -- Some PFQ's were crashing due to a typo: 239754
- daisy_spring canary closed tree with media-libs/secomx build failure: crbug.com/239474. Possibly due to new clang syntax checking for cros_workon-able packages.
grundler,olofj,milleral
- stout canary closed tree with AUtest failure: crbug.com/234725
- mario incremental failed: reopened tree since it feels like flake
2013-05-07 Tue
grundler,olofj,milleral
- dennisjeffrey CL killed the Commit Queue. Since it moved an autotest from one package to another, it affected successive tests as well. Needed to add a "!" (remove) dependency to remove/update the origin of the files before installing the new package. kudos to davidjames for clobbering everything and explaining how to fix.
- dgreid changes CL 49812 and CL 49921 enabled functionality that is broken in chrome version from two days ago that ChromeOS is currently using. ToT chrome is fixed but chromeOS didn't pick up the ToT last night due to other Chrome nightly build failures. dgreid will resubmit once ChromeOS has newer Chrome.
- CL adding apiclient to test image broke on canaries with a dev_install failure on VMTest. See crbug.com/238653, and CLs 49815 and 50308.
2013-05-06 Mon
josephsih, piman
- mario incremental: BuildPackages failed due to a platform2 ebuild (https://gerrit.chromium.org/gerrit/#/c/37366/). Revert the patch, and the builder cycled green.
- link canary: autest [au] failed report. crbug.com/237122
network_LTEActivate flakiness crbug.com/238404
2013-05-02 Thu
josephsih
- link canary, parrot canary failed at vmtest: "Unhandled JSONInterfaceError : Unable to get browser_pid over automation channel on first attempt."
- Root cause: "crossystem hwid" failed. cat: /sys/devices/platform/chromeos_acpi/HWID: No such file or directory.
- Filed a bug crbug.com/237719 which was merged to crbug.com/223728
- x86-alex canary: vmtest failed "Unhandled AutomationCommandTimeout: Chrome automation timed out after 45 seconds for {"skip_image_selection": true, "command": "SkipToLogin"}"
2013-05-02 Thu
posciak, garnold, seanpaul
- x86-alex canary failed hwtest step with "Unhandled PackageInstallError: Installation of pyauto_dep(type:dep) failed"
- Couldn't root cause it, so filed a bug at crbug.com/237508 and reopened
- security_HciconfigDefaultSettings autotest failures due to https://code.google.com/p/chrome-os-partner/issues/detail?id=15059
- Session manager did not restart after logout error on CryptohomeIncognitoUnmounted, filed crbug.com/237601
- Filed crbug.com/237690 for address sanitizer segfault on amd64-generic during vmtest
2013-05-01 Wed
posciak, garnold, seanpaul
- x86-mario canary failed au step with “FAIL: Unhandled timeout: timed out”
- stumpy, stout & daisy also failed on autest step
- suspect there was an AU outage/problem last night which caused this
- Filed http://crbug.com/237122 to track
- 03:04 lumpy nightly chrome pfq failed in VMTest
- this crash (https://storage.cloud.google.com/chromeos-image-archive/lumpy-chrome-pfq/R28-4071.0.0-rc2/chrome.20130501.035940.437.dmp.txt) is being tracked in http://crbug.com/233241
- 05:34 stout32 hwtest failed with “ERROR: All hosts with HostSpec ['board:stout32', 'pool:bvt'] are dead!”
- All stout32 hosts in cautotest are marked “Repair Failed”
- Filed http://crbug.com/237127
- 05:34 parrot canary failed in unittest
- seanpaul not sure what the problem is, so filed http://crbug.com/237143
- I think it's caused by https://gerrit.chromium.org/gerrit/49643, reverted with https://gerrit.chromium.org/gerrit/#/c/49721/ and reopened
2013-04-29 Mon
waihong,
- Parrot canary failure reported 2-day ago. The recent 2 parrot builds goes green and other builds also look good. Reopen the tree.
- Daisy canary failed, autoupdate_EndToEndTest could not verify that update was successful, crbug.com/23626
2013-04-26 Fri
rcui, taysom, spang
- Link failed again in power_Resume
- Stout BVT: power_Resume: Sanity check failed: did not try to suspend - crbug.com/235847
- Lumpy canary failed on repeat of crbug.com/231095
- Lumpy paladin failure in desktopui_ScreenLocker test crbug.com/235949
- Parrot flaky test login_CryptohomeUnmounted crbug.com/223728
- Lumpy chrome crash crbug.com/231095 - this may be a new problem but we have only seen it on lumpy
- Asan builder failing BuildPackages on the chromium.memory waterfall - crbug.com/235988
2013-04-25 Thur
rcui, taysom, spang
2013-04-18 Thur
mtennant, jrbarnette, dshi (hwlab), mukai (Chrome on ChromeOS)
- devinstall_test failure on all canaries - crbug.com/233217
- chrome crashes in CrosLanguageOptionsHandler::GetLanguageListInternal on a few builders - crbug.com/233241
2013-04-17 Wed
mtennant, jrbarnette, dshi (hwlab), mukai (Chrome on ChromeOS)
2013-04-15 Mon
vapier, jwerner
- crosbug.com/p/17615 (power_Resume failure "Could not find start_resume_time entry" due to SSD hardware flake)
- crbug.com/22168 (unexpected reboot during login_LoginSuccess... can probably happen during all UITests)
- VMTest testUpdateKeepStateful error (cannot connect to KVM instance)... suspected flake
- crbug.com/232085: python 2.7 upgrade breaking hwtests
- coreboot repo shuffling; any coreboot related errors -> reinauer
2013-04-12 Fri
fjhenigman, yusukes, dbasehore, rbyers (Chrome on ChromeOS)
- crbug.com/230529
- Couple cased of lab flake
2013-04-11 Thu
fjhenigman, yusukes, dbasehore sjg now, rbyers (Chrome on ChromeOS)
2013-04-10 Wed
katierh, clchiou, haruki, gedis(shadow)
- Unhandled AutomationCommandTimeout for {"skip_image_selection": true, "command": "SkipToLogin"} - already noted at crbug.com/223728
2013-04-09 Tue
katierh, clchiou, gedis(shadow)
- Daisy power_resume failure - already noted at crbug.com/189108
- ConnectionHealthChecker failures across the board - reverted https://gerrit.chromium.org/gerrit/#/c/47248 - crbug.com/229752
- butterfly autoupdate_EndToEndTest.npo_test_delta flake - bug filed crbug.com/229749
2013-04-08 Mon
petkov, quiche, pstew
2013-04-05 Fri
petkov, quiche, pstew
2013-04-03 Wed
gabeblack, dgreid, sheckylin
- 01:19 autoupdate_EndToEndTest.parrot_nmo_test_delta flakiness.
- 8:30 everything broken, EndToEndTest, Autoupdate, desktop_VideoSanity, all failing on different boards.
- 10:45 try to re-open after disabling VideoSanity, AUTest and power_Resume flakes.
2013-04-02 Tue
rminnich, sonnyrao: west coast
2013-04-01 Mon
rminnich, sonnyrao: west coast
- 10am Link Canary Failed due to Archive step Time Out
- 10am Daisy Canary has been red all weekend -- found out about crbug.com/224871
- 1pm stout canary failed with Archive time out - opened crbug.com/225505
- 2pm x86-zgb canary failed with Archive time out - crbug.com/225505
- 8pm build packages started failing due to a gtest uprev and an associated python bug - https://gerrit.chromium.org/gerrit/#/c/46420/
- Chrome ebuild also failed to uprev due to above issue
2013-03-29 Fri
bfreed, vbendeb: west coast
2013-03-28 Thu
bfreed, vbendeb: west coast
- 17:55pm - again connectivity issue, on x86 generic ASAN
- 17:37pm tree reopened
- 17:26 pm - another connectivity failure, davidjames took "amd64 generic ASAN" builder down as it seems more prone to experiencing this problem
- 16:29 pm Tree reopened, crbug.com/224811 filed
- 16:16pm "Could not resolve host: commondatastorage.googleapis.com"
- 16:15pm - tree reopened
- 15:39pm - "Unable to look up nv-tegra.nvidia.com (port 9418) (Name or service not known)" crbug.com/224819 filed to deal with external dependency
- 2:45pm: "no space left on device" on incremental builder, fixed by davidjames.
- 2pm: Same pool:bvt issue as below, this time with x86-zgb.
- 1pm: As with now-closed crbug.com/220032, "All hosts with HostSpec ['board:parrot', 'pool:bvt'] are dead". Suspect lab issue.
- Can view the list by going to http://cautotest/afe/#tab_id=hosts, then selecting Platform "parrot", then selecting Label "pool:bvt".
- 3am: vmtest failure closed tree on "amd64 generic ASAN". Subsequent builds worked, so maybe denniskempin fixed it.
2013-03-27 Wed
wdg,dparker: west coast
- 4pm: crbug.com/223728 Closed tree on "butterfly canary" Command "crossystem hwid" failed
- 2pm: crbug.com/223728 Closed tree on "mario incremental" Command "crossystem hwid" failed
- 1pm: Shill build failure closed tree on "x86 generic ASAN" and "amd64 generic ASAN". Reverted shill change https://gerrit.chromium.org/gerrit/#/c/46667/
2013-03-26 Tue
wdg,dparker: west coast
- 3pm: crbug.com/224403 Closed tree on "x86-zgb canary" autotest_rpc_client.py -- writing off as test flake but starting to think we blame brand new chrome version...
- 3pm: crbug.com/224077 Closed tree on "daisy canary" Device rebooted during power_Resume.
- 3pm: crbug.com/161406 Closed tree on "x86-mario canary" Unhandled AutomationCommandTimeout
- 2pm: crbug.com/223956 Closed tree on "x86 generic full". login_CryptohomeUnmounted failed but may be an underlying test framework issue.
- 8am: crbug.com/223956 (not a tree-closer, but...) Build 1187, Parrot Canary: Failed cbuildbot failed vmtest failed report
2013-03-25 Mon
adlr,dhendrix: west coast
- crbug.com/223661 (python free()'ing invalid pointers) strikes multiple times.
2013-03-22 Fri
adlr,dhendrix: west coast
- 1pm: crbug.com/217288 timeout during archive
2013-03-20 Thu
quiche,wiley: west coast
- 8am: XXX chromium.chromiumos VMTest failure
- 2am: crbug.com/222603 update engine failure on parrot-canary
- 1am: crbug.com/222021 desktopui_VideoDecodeAcceleration failure on x86-zgb
- 12am: crbug.com/222021 desktopui_VideoDecodeAcceleration failure on x86-mario
2013-03-20 Wed
quiche,wiley: west coast
- 11pm: crbug.com/222021 desktopui_VideoDecodeAcceleration failure on x86-alex
- 7pm: crbug.com/222021 desktopui_VideoDecodeAcceleration failure on x86-alex, x86-mario, x86-zgb
- 7pm: crbug.com/222660 AUTest failure on x86-mario
- 5pm: crbug.com/222021 desktopui_VideoDecodeAcceleration failures on x86-alex, x86-mario, x86-zgb
- 5pm: buildbot failures on amd64-generic-incremental, due to disk filling up
- 1pm: crbug.com/222021 desktopui_VideoDecodeAcceleration failures on x86-alex, x86-mario, x86-zgb
- 8am: crbug.com/222041 build_RootFilesystemSize failure on link
- 8am: crbug.com/222021 desktopui_VideoDecodeAcceleration failures on x86-mario, x86-alex
- 8am: daisy incremental failure: kernel gerrit mirror out-of-sync
- 4am: chrome PFQ failure on amd64-generic: kernel gerrit mirror out-of-sync
- 1am: crbug.com/222041 build_RootFilesystemSize failures on link, stout
- 1am: crbug.com/222021 desktopui_VideoDecodeAcceleration failures on x86-alex, x86-mario, daisy, x86-zgb, stumpy
2013-03-19 Tue
tbroch,thieule: west coast
- 1pm: crbug.com/221258 kernel warning in power_Resume on daisy
- 8am: crbug.com/222041 build_RootFilesystemSize fails as rootfs <100MB across most x86 systems
- 8am: crbug.com/187993 experimental_desktopui_VideoSanity
- 8am: network problem leading to vmtest fail
2013-03-18 Mon
tbroch,thieule: west coast
- 3pm: Transient network problem while emerging chrome
- 9am: crbug.com/217288 UploadArtifact task timeout (1800secs)
- 8am: crbug.com/215358 intermittent (hopefully) 'Exception: Missing uploads.'
- 8am: crbug.com/187993 experimental_desktopui_VideoSanity
2013-03-14 Thursday
dlaurie, sbasi
- 8am: ARM build broken overnight due to build flags change, reverted here: https://gerrit.chromium.org/gerrit/#/c/45430/
- 8am: GDB issues causing problems for Chrome PFQ, this "fixed itself" on retry
- 1pm: Commit queue stuck, mario-paladin waiting for alex-paladin
2013-03-07 2013-03-08 Tues-Wed
ferringb, charliemooney
2013-03-08, Fri
sjg, sabercrombie
2013-03-08, Fri
rspangler, sabercrombie
- Flake on amd64 generic ASAN uploading results to google storage
- Canaries failing due to https://gerrit.chromium.org/gerrit/#/c/44890/; can't download libva-1.1.0.tar.bz2. Uploaded what we hope is the right file. It turns out that was not the right thing to do. The problem stemmed from two versions of libva carrying the 1.1.0 designation, which led to an old cached version messing up the download process on the canary buildbots. Mike Frysinger removed these old files.
- Stout canary failed with "NoHostsException: All hosts with HostSpec ['board:stout', 'pool:bvt'] are dead!" - http://crosbug.com/39746. johndhong and jrbarnette investigated; lots of systems are in Repair Failed state, probably due to a DHCP problem this morning. They kicked off a verify on all hosts, and the hosts started coming back on their own.
- Paladins failed with "ERROR: Project name mismatch for /mnt/host/source/src/platform/depthcharge (found chromiumos/platform/depthcharge, expected chromeos/platform/depthcharge)". Probably caused by rev 1 of https://gerrit-int.chromium.org/#/c/33529/.
- mario paladin was stuck waiting for stout paladin, but stout was idle. Aborted mario paladin build; all paladins seem to be building normally now,
2013-03-04 - 2013-03-05, Mon-Tue
dianders, dkrahn
- dianders: ASAN failures (use after free). Appears to be intermittent, but a real bug. http://crbug.com/179796
- dianders: ASAN failure "No such file or directory: '/home/.shadow'". Digging into logs showed cryptohome not starting. Digging more showed "cryptohome: symbol lookup error: /usr/lib64/libchaps.so: undefined symbol: __asan_handle_no_return". Liam identified as https://gerrit.chromium.org/gerrit/#/c/44508/. Reverted and chumped. Re-opened http://crosbug.com/32017 to track. Re-opened tree.
- Parrot canary failed with http://crosbug.com/32539.
- dianders: Some strange transitory failures across many builders with "update_scripts Sync buildbot slave files failed ( 9 secs )". Didn't seem serious and went away on its own, but David James tracked it down as http://crbug.com/180099.
- dianders: Failure with SDK builder on vboot_reference (it couldn't find <tss/tcs.h>). Filed http://crosbug.com/39531. Chumped in a CL that ought to fix this.
- dianders: Tree was closed overnight with x86 generic full failure. A timeout building chromite? Didn't reproduce...
- dianders: Hit the x86 generic full failure again. Filed http://crosbug.com/39565.
- More 'update_scripts' failures: tracking in crbug.com/180099.
- dianders: Got a BVT failure in experimental_desktopui_VideoSanity on x86-alex canary. Filed http://crosbug.com/39586.
2013-02-28 - 2013-03-01, Thur-Fri
sheu, dgarrett
- Im Westen, (fast) nichts Neues
- git infrastructure issue takes down a bunch of builders: crbug.com/179141
- rename of gerrit-int repos without updating manifest takes down more builders: crosbug.com/39448
2013-02-26 - 2013-02-27, Tue-Wed
grundler, benchan
- daisy powerResume failing on chromeos1-host5-rack4 crosbug/39260
- documented daisy repro case. (crosbug.com/39153)
- link canary failure (crosbug/p/17893) (found dups of this bug too)
- stout canary failure (crosbug.com/39272)
- alex/stumpy failed power_Resume due to new warning in Kernel - was reverted (crosbug.com/p/17609)
- parrt-canary failed due to "Session manager did not restart" (after following a chain of "merged into" --> http://crbug.com/167671)
2013-02-22 - 2013-02-25, Fri-Mon
?
2013-02-20 - 2013-02-21, Wed-Thu
taysom, garnold, zork
2013-02-19 - 2013-02-20 Mon, Tues
ellyjones, reinauer, sque
Feb 14, 15 Thu, Fri
snanda, posciak
Feb 12, 13 Tue, Wed
dparker, semenzato
Feb 8, 11 Fri, Mon
jaysri, milleral, olege
- Filed crosbug.com/p/17781 for power_Resume gen6_gt_check_fifodbg issue
- Someone put a test that belongs in autotest-chrome into autotest-tests again, so BuildTarget is having to repeat emerging of autotest-tests again.
Feb 4, 5 Mon, Tue
rharrison
- Failure on security_RestartJob for x86-mario canary, looks like flake, filed crosbug.com/38628. Reopened tree
- Failure on power_Resume for link canary. looks like crosbug.com/37596. Reopened the tree
- Came into a red tree on Monday(failure on stumpy), all the builders were green. Assuming it was flake, possibly from the fun with the HW lab over the weekend
Jan 31, Feb1 Thu, Fri
vpalatin, bleung, hungte, benrg
Jan 29, 30 Tue, Wed
bfreed, bhthompson, katierh, petermayo
- tree still red Tuesday morning due to crosbug.com/38334 - revert of nss/nspr upgrade is resulting in segfault in local shlibsign. These are security packages that might also cause the sandbox failure of crosbug.com/38309.
- tree throttled Wednesday morning due to crosbug.com/33611 - timing out on VMTest update steps. Failed to get through normal channels to find a flaw, rebooting the mario paladin build slave was sufficient.
Jan 25, 28 Fri, Mon
mtennant, gabeblack
- crosbug.com/38334 - revert of nss/nspr upgrade is resulting in segfault in local shlibsign. Found after hours by vapier. P0 TreeCloser unresolved.
- crosbug.com/38309 - Chrome crash on startup in renderer thread. Causing major problems and possible overnight red tree. P0 TreeCloser unresolved.
- crosbug.com/38324 - vmtest testInterruptedUpdate failure in canary builds.
- crosbug.com/38303 - git clone command in Chrome/Chromium PFQ builders suddenly asking for password. Resolved.
- crosbug.com/38279 - shill unittests segfault, intermittent, fix at: https://gerrit.chromium.org/gerrit/#/c/42113/. Tree throttled as fix worked its way through commit queue then all canaries. Bug got through commit queue originally because it is intermittent. Resolved.
- crosbug.com/38238 - stout canary - vmtest - testInterruptedUpdate - cannot allocate memory
Jan 23,24 Wed,Thur
rcui, sjg, dbashore
Jan 22, Tu
jamescook (chrome-on-cros)
- crosbug.com/38117 - PyAutoFunctionalTests.FULL flakily reporting sig 6 from an intentional Chrome crash
Jan 17 Thurs, Fri
mkrebs, rminnich, sheckylin
- crosbug.com/37343: Xorg signal 6
- crosbug.com/33611: on stout, seems not to be fixed, missing pxe rom for virtio.
- crosbug.com/33611 again: "amd64 generic full" closed the tree this time.
- If you're going to be helpful and post error messages, the best way to be sure you don't say anything you should not is to mention the error, the software, but not the file name.
- We ought to just fix this vm error due to a missing pxe_virtio.bin. I will see what I can do.
- crosbug.com/37682: Repeat failure in HWTest: "x86-mario-release/R26-3571.0.0/bvt/platform_CryptohomeMount ABORT:".
- crosbug.com/38054 (created): login_CryptohomeMounted failed with "Login timed out". Couldn't find a similar bug that was open (crosbug.com/33613 seemed to be the closest closed issue).
- [mkrebs] Saw a bunch of "Chrome PFQ Failing to uprev Chrome" emails on the 17th. Was allegedly a failure to build a certain package, but the fix was taking a while to land. They seem to be fine now, but my best guess is that ellyjones@ actually got them working early on the 18th (on IRC he mentioned something about restarting the mario paladin around that time).
Jan 15 Tue, Wed
chinyue, sonnyrao, yusukes
- crosbug.com/37889: x86-alex-release/R26-3560.0.0/bvt/experimental_kernel_fs_Inplace_SERVER_JOB FAIL: HTTP Error 500: Internal Server Error
- crosbug.com/37899: "desktopui_ScreenLocker failure in bvt on parrot-r26" hits on stumpy canary, stout canary, x86-alex, parrot bvt, possibly x86-generic as well
- chromium for chromium-os builder started failing VMTests due to automation timeouts around 1pm on Wednesday, might affect Chrome on ChromeOS starting Thursday
Jan 11, 14 Fri, Mon
dgreid, pstew
- crosbug.com/37716: HWTest [bvt] failed at login_CryptohomeMounted: Cryptohome created a vault but did not mount (and Host did not return from reboot) - parrot canary
- crosbug.com/p/11474: power_Resume test failing with "gen6_gt_check_fifodbg.isra.6+0x36/0x48()"
- crosbug.com/37861: power_Resume test failing with "EarlyWakeupError(1)"
Jan 9, 10 Wed, Thu
djkurtz, jrbarnette, olofj
Jan 7, 8, Mon, Tue
clchiou, jwerner, josephsih
All canaries have been failing randomly in login_Cryptohome* tests due to crbug.com/168540. Chrome team has pushed a fix that should get synched during the night between Jan 8th/9th. If the same issue still shows up after that, please let them know!
- crbug.com/168540: parrot-canary: login_CryptohomeMounted
- crosbug.com/37684: Updater failed and many *_SERVER_JOB failed on daisy canary
- crosbug.com/37682: HWTest [bvt] failed on platform_CryptohomeMount on x86-mario canary
- crosbug.com/32181: try_new_image: Host did not return from reboot. Connection timed out.
- crosbug.com/37676: stumpy-canary and lumpy-canary died from an experimental test because the crash server timed out on symbolizing the crash dumps
- crbug.com/168540: parrot-canary and kiev-canary: login_CryptohomeUnmounted. This can probably happen on all the login_Cryptohome* tests.
- crosbug.com/p/17115: power_Resume fails on stout... flaky NIC sometimes fails to resume
- crosbug.com/37596: power_Resume abort bvt
- crbug.com/168540: x86-alex canary: login_CryptohomeMounted : Session manager did not restart after logout
2013
Jan 3, Jan 4, Thu, Fri
dbasehore, wfrichar, hychao
dkrahn, sque, miletus
- crosbug.com/37337: vmtest login_CryptohomeMounted: browser hang during shutdown (multiple occurrences)
- crosbug.com/37461: vmtest Unable to connect to X server causing 2400 second timeout (multiple occurrences)
- crosbug.com/37504: desktopui_VideoSanity fails to load video (not a tree closer)
- crosbug.com/36949: stout BVT. platform_Pkcs11Events (not a tree closer, multiple occurrences)
- crosbug.com/32539: python2 sig 6 during login_BadAuthentication test
- crosbug.com/33613: login_CryptohomeIncognitoUnmounted of VMTest has failed in login timed out for >5 times
- crosbug.com/37522: Login_BadAuthentication failed during HWTest (BVT) on Alex
Dec 20, Dec 21, Thu, Fri
rspangler, dhendrix, dgozman
- crosbug.com/37337: vmtest login_CryptohomeMounted due to chrome crash (multiple failures)
- crosbug.com/37372: vmtest login_CryptohomeUnmounted due to chrome or X crash
- crosbug.com/35458: vmtest login_CryptohomeUnmounted times out waitng for UI to restart at the end of the test
- crosbug.com/33611: vmtest unable to connect to remote host (ssh: connect to host 127.0.0.1 port 9222: Connection refused)
- crosbug.com/32382: vmtest desktopui_ScreenLocker failing
- crosbug.com/p/11474: stumpy-canary is failing power_Resume test with warning in i915_drv.c.
- crosbug.com/36986: daisy incremental build failure, believe git mirror was out-of-sync ("git-2_branch: changing the branch failed")
- kiev, daisy, stout paladins failed a build, and mario paladin was stuck waiting for them. Killed mario and forced a rebuild. (In retrospect, just killing mario paladin was probably sufficient)
- crosbug.com/37461: vmtest Unable to connect to X server causing 2400 second timeout
- crbug.com/167342: trying to get some Chrome devs to look into Chrome shutdown crash (which in turn caused session manager timeouts and VMTest failures)
- crosbug.com/37368: vmtest login_CryptohomeMounted timeout waiting for login prompt
Dec 18, Dec 19, Tue, Wed
kochi (non-PST), dlaurie, puneetster
- started open with status "hwtest failure = dependencies_info not being generated properly -> crosbug.com/37326".
- crbug.com/140385: login_CryptohomeMounted timed out happend 3 times on x86 generic incremental.
- crosbug.com/37332: desktopui_ScreenLocker fail with timeout on mario incremental. happened only once.
- crosbug.com/37333: empty dependency_info causing hw_tests failure: LOTS
- autotest-tests failing the first build and succeeding on retry, suspect desktopui_VideoSanity, email sent to developer
- butterfly-canary failed with "Could not parse devserver log" possibly crosbug.com/34768, was successful on next build
- amd64-generic-full failed vmtest login_CryptohomeMounted due to chrome crash, filed crosbug.com/37337
- tlsdate issue determining its release number and causing failures in uprev step, fixed with https://gerrit.chromium.org/gerrit/39915
- login_CryptohomeUnmounted causing Chrome/X to crash, filed crosbug.com/37372
- 12/19 11AM: Still seeing lots of vmtest failures due to issue 37337
- x86-zgb canary failed BuildTarget step for zgb_he phase because build_packages was killed, filed crosbug.com/37388
Dec 14, Dec 17, Fri, Mon
sabercrombie, thieule, zoro, rongchang
Dec 10, Dec 11, Mon, Tues
charliemooney, tbroch, milleral(10th), yjlou(11th)
- llvm.org went down, taking out chromiumos sdk during buildtarget
- crosbug.com/37120: A BuildTarget reported back with a warning from a python crash while building chrome.
- crosbug.com/37129: buildbot threw an exception during VMTest due to a failed assertion.
- crosbug.com/37086: Daisy TPM related activities need >= 2min to complete not current 45sec. Fix in and propagating.
Dec 4, Dec 5, Tue, Wed
quiche, anush, spang
- crosbug.com/35908: hit this on an x86-generic-full build and a daisy-canary build
- crosbug.com/36986: daisy incremental build failure, believe git mirror was out-of-sync
- crosbug.com/36969: link canary BVT failure, tree cycled green
- x86-mario canary failure, google storage flake
- crosbug.com/36949 - stout BVT. platform_Pkcs11Events (not a tree closer, multiple occurrences)
- crosbug.com/36661 - stout BVT. platform_Pkcs11ChangeAuthData (not a tree closer)
- crbug.com/157246 - caused a snow BVT failure (not a tree closer)
- false alarm email for buildbot failure stout-canary. sbasi checked the BVT results, and says the tests passed.
suspects network flake causing buildbot to believe the BVT failed.
Nov 30, Dec 3, Fri, Mon
dparker, piman, fjhenigman (Mon. only)
- google storage flake during archive step on x86-alex
- crosbug.com/36886 - kiev BVT. Power_resume fail on reading RTC fail after 10 retries.
- crosbug.com/36554 - daisy BVT. platform_CryptohomeChangePassword fails to migrate password
- crosbug.com/35458 - mario-r23 BVT. login_CryptohomeUnmounted times out waitng for the UI to restart at the end of the test.
- stout-canary. HWtest failure due to infrastructure problems in the hwtest lab.
- x86-mario canary. ABORT on security_ptraceRestrictions. Believed to be a test flake or lingering fallout from test lab going down (?)
- crosbug.com/p/11474. stumpy-canary x 2. Power_resume error with warning in i915_drv.c.
- crosbug.com/36004. Power_resume failure reading RTC on kiev & lumpy canaries.
Analysis of the BuildTarget warnings
WARNING: The following packages failed the first time,
but succeeded upon retry. This might indicate incorrect
chromeos-base/autotest-tests-0.0.1-r3342
autotest-tests-0.0.1-r3342: ERROR:root:Dependency pyauto_dep does not exist
so the problem could be a change introduced between those two times. milleral on irc suggested a
"test was likely added to autotest-tests.___.ebuild that needs to be in autotest-chrome.____.ebuild" but I don't see a change there at the right time.
Nov 20-21, Tue, Wed
reinauer, sleffler, fjhenigman
- crbug/36566 CQ build failures in update_engine with "unrecognized command line option "-Wno-c++11-extensions""; fixed by kliegs
- crbug/29895 filed by Prashanth for x86-alex-r23 bvt failure in power_Resume
- crbug/35908 desktopui_UrlFetch.not-live FAIL hit three times overnight in the chrome pfq
- All quiet on Tue the 20th
Nov 16, Nov 19, Fri, Mon
waihong (tpe), keybuk, garnold
Nov 14 - Nov 15, Wed, Thu
jamescook (cros gardener)
- crbug.com/161329 BVT chrome sig 11 on shutdown, crash in ash GetDisplayManager() due to metrics logging, official builds only
- lumpy (perf) failing HWTest, "All hosts are dead" in [try_new_image] results status.log, infrastructure problem, fixed
- crbug.com/161073 ChromeOS Crash in WindowOpenPanelTest.ClosePanelsOnExtensionCrash
- crosbug.com/36370 Snow: BVT login_LoginSuccess failure due to cryptohome / TPM issue (only affects chromeos1-rack5-host3, maybe preMP hardware issue?)
Nov 9, Fri
puneetster, sheu, kinaba
Nov 6, Tue
gpike, grundler, reveman
- crosbug.com/35907 parrot canary, Crash in HWTest - enterprise_DevicePolicy
- crosbug.com/36058 Chrome PFQ, Chrome/Init getting a lot of SIGBUS errors preventing Chrome from revving during VMTests
- crosbug.com/36097 parrot canary, desktopui_NaClSanity: Failed to installed SecureShell extension
- crosbug.com/35648 x86-alex canary, experimental desktopui_DocViewing failure closed tree
Nov 5, Mon
josephsih
- crosbug.com/32028 x86-alex canary, Archive bug, command timed out: 9000 seconds without output (davidjames fixed it.)
- crosbug.com/36032 chromiumos sdk failed SDKTest. make: *** [build/shims/shill-pppd-plugin.so] Error 1
- crosbug.com/35908 amd64 generic full: Timeout in UrlFetch.not-live
Oct 31 - Nov 1, Wed, Thu
taysom, petermayo, wdg
- crosbug.com/35865 Kiev paladin hwclock bug, same on link, timeout in URLFetch
- crosbug.com/35908 Daisy flake; said there were no changes but widevine was failing to link properly
- crosbug.com/35648 daisy, parrot problems
- reverted change I02955c8e
- Google died but it got better.
- crosbug.com/35958 daisy incremental ran out of space, clobbered chroot
- CQ got stuck
Oct 29 - Oct 30, Mon, Tue
wfrichar, pstew, cwolfe
Oct 23 - Oct 24, Tue, Wed
katierh, olege, mkrebs
Oct 19 - Oct 22, Fri, Mon
jrbarnette, mtennant, hungte
Oct 17 - Oct 18, Wed, Thu
dgreid, dbasehore
- Day starts with tree closed due to Link now being over-size. crosbug.com/p/35412
- crosbug.com/34788 lumpy canary: HWTest failed likely due to lab networking issue
- crosbug.com/35469 link canary: warning on build due to missing coreboot dependency
- Single VMTest failures on all canaries (passed afterwards)
Oct 15 - Oct 16, Mon, Tue
bfreed, vbendeb
- crosbug.com/35199 hit the mario and zgb canaries.
- A few hours later, canaries now fail HWTest with "TimeoutError: Timeout occurred- waited 8400 seconds." cmasone is investigating network outage.
- crosbug.com/35347 link canary: desktopui_DocViewing fails in doc_viewing.DocViewingTest.testOpenOfficeFiles with "Extension could not be installed".
- crosbug.com/35354 link canary: desktopui_NaClSanity fails in secure_shell.SecureShellTest.testLaunch with "Extension could not be installed".
- crosbug.com/35357 link canary: desktopui_DocViewing fails in doc_viewing.DocViewingTest.testOpenOfficeFiles with "Chrome automation timed out after 45 seconds"
- Throttling the tree. I see consistent failures on various tests and on "try-new-image-*".
- Not sure if this is server overload or chrome causing the failures. Nothing points to chrome-os, best I can tell.
- A set of 3 CLs broke shill in a lumpy PFQ. https://gerrit.chromium.org/gerrit/#/c/35702/ fixed it.
- crosbug.com/35388 x86 alex canary: HWTest during SuitePrep: Connection timed out
Oct 11 - Oct 12, Thu, Fri
rcui, sjg
- Link failed on BVT HWTest again
- crosbug.com/35222: HWTest fails power_Resume with 'Autotest client terminated unexpectedly'
- Noticed that failing test has a status log which shows success. According to sosa this is a network flake. Ignoring.
- crosbug.com/33613: login_CryptohomeIncognitoUnmounted timeout.
Oct 9 - Oct 10, Tue, Wed
rharrison, bleung, sonnyrao
- crosbug.com/35147: Daisy full failing due to issue with binutils (Appears to be a repeat of crosbug.com/34667)
- crosbug.com/35148: amd64 generic incremental timed out after 8 hours on BuildTarget (Pinged troopers@, since this bot appears to be sick)
- crosbug.com/34567, crosbug.com/35151, crosbug.com/35150: Link failed on BVT HWTest
- crosbug.com/33613, crosbug.com/35151, crosbug.com/35150: x86-zgb failed on BVT HWTest
- crosbug.com/35162: qemu-kvm failed to link with glib-2.32.4-r1
- crosbug.com/35173: Came into very red tree due to bad WebKit roll and failure of the PFQs to prevent Chrome on ChromeOS from updating. This issue was created from the fact that we were patching WebKit in ChromeOS, there is a thread discussing that we shouldn't do this again. Many late arriving bots failed after the fix was in and the tree had to be reopened.
- crosbug.com/35201: some canary builders (parrot, stumpy, kiev) failed in svn update. Connection reset by peer
Oct 3 - Oct 4, Mon, Tue
gpike, sjg, kamrik
- crosbug.com/34990: power_Resume.py failed trying to treat IP address as a float
- crbug.com/150568: butterfly R24 Chrome crash in ExtensionAppProvider (same bug has hit R23 recently) (twice)
- crosbug.com/34825: svn flakiness downloading / unpacking chromeos_chrome (again)
Oct 1 - Oct 2, Mon, Tue
piman, rspangler, ellyjones
Sept 27 - Sept 28, Thu, Fri
rspangler, keybuk, rongchang
- crosbug.com/34825: svn flakiness downloading / unpacking chromeos_chrome (twice).
- ManifestVersionedSync failed on all canaries. rcui, ferringb determined gerrit replication was failing and fixed it.
- chromium:150568: canaries failed with "FAIL: Unhandled JSONInterfaceError: Chrome automation failed" (multiple times)
- VMTest timeout: x86_generic_incremental.
Sept 25 - Sept 26, 2012, Tue, Wed
dianders, davidjames, yoshiki
crosbug.com/34571 crbug.com/150604: Numerous test failures in BVT and VMtest with Unhandled JSONInterfaceError: Chrome automation failed prior to timing out ...
- crosbug.com/34126: Chrome PFQ vmtest failure - alex and lumpy - Failed to installed SecureShell extension - Fixed, but see 34796 below
- crbug.com/152189: Daisy chrome PFQ: create_nmf.py: Not a valid NaCL executable - Fixed
- crosbug.com/34785: desktopui_DocViewing failed on lumpy canary - Any repeats?
- crbug.com/151855: hitting canaries (like butterfly build 367); originally this was thought to be crbug.com/150604 but that's because I didn't dig deep enough (I just saw the "Chrome automation failed..."). You need to dig into the artifacts and look for the "dmp.txt" file to see the real chrome crash. - Hitting all the time
- crosbug.com/34796: Secure Shell did not get correct exit message
- Saw some strange try_new_image failures in https://uberchromegw.corp.google.com/i/chromeos/builders/stumpy%20canary/builds/1934. milleral thought they were just warnings so no bug filed, but he's going to look at them. Failures are due to crosbug.com/34788.
- crosbug.com/34576: 'desktopui_LoadBigFile: ERROR: The big file did not load' during x86-mario hw
Sept 21 - Sept 24, 2012, Fri, Mon
olofj, dparker, chinyue
Sept 19 - Sept 20, 2012, Wed - Thu
marcheu, thieule, falken, sbasi, armansito
- Many canaries have been failing. Several are due to crbug.com/150568 (ExtensionAppProvider::RefreshAppList) or crbug.com/150604 (chrome!ui::Layer::SendDamagedRects). View artifacts to see the cause.
- crosbug.com/34523: ASAN bot failures
- crbug.com/149984: lumpy nightly pfq preventing Chrome uprev
- crosbug.com/33403: VMTest timeout on x86 generic incremental
- crosbug.com/34567: SecureShell failure on x86 generic incremental
- crosbug.com/34113: loginRemoteLogin Fail: Login Timed out on alex R23-2914
- crosbug.com/34576: 'desktopui_LoadBigFile: ERROR: The big file did not load' during x86-mario hw test
- crosbug.com/33611: Can't reach the VM on local host.
- crbug.com/150826: Chrome Sig 11 during ChromeOS BVT
- Tree has been throttled due to chrome crashes. crosbug.com/34571 ("JSONInterfaceError"), duped to crbug.com/150604 ("SendDamagedRects"). A revert is in Chrome for 150604 (crrev.com/157567). Now waiting for the PFQ to pick up the good Chrome.
- lumpy PFQ failed but it was crbug.com/149984 again. Decided to reopen tree.
- Many paladin bots are timing out during gclient sync. Pinged troopers.
- crosbug.com/34614: logging_CrashSender: ERROR: Timeout waiting for crash_sender to emit done. This is happening on paladins and canary bots.
- crosbug.com/34597: VMTest: logging_UserCrash: Timed out waiting for unnamed condition
- crosbug.com/22019: Commands run by test harness become unresponsive and won't respond to signals
Sept 13 - Sept 14, 2012, Thu - Fri
jaysri, gabeblack, sheckylin
Sept 12, 2012, Wed
semenzato ,pstew
Sept 11, 2012, Tues
wdg, semenzato ,pstew
Sept 7 - Sept 10, 2012, Fri - Mon
rcui, tbroch , josephsih
tlambert, vbendeb, kochi
tlambert, vbendeb, kochi (9/5-6 JST)
Sept 4, 2012, Tues
mtennant, sonnyrao, vapier, kochi (9/5-6 JST)
- Tree started the day closed, due to crosbug.com/34102, a vmtest flake due to chrome timeout. See run for mario incremental.
- Two internal Chrome PFQ builders are also failing, since at least last Thursday, which has effectively caused the version of Chrome to be pinned.
- http://chromegw/i/chromeos/builders/lumpy%20nightly%20chrome%20PFQ (crosbug.com/34129)
- http://chromegw/i/chromeos/builders/alex%20nightly%20chrome%20PFQ/ (crosbug.com/34126 created and assigned to UI). Efforts to enlist Chrome sheriffs and ChromeOS chrome gardener did not get anywhere.
- Another instance of crosbug.com/34102. The current owner is out of office today, krisr re-assigned to craigdh.
- This time crosbug.com/34102 hit the "x86 generic full" builder. The bug is getting attention from test team now.
- Another instance of crosbug.com/34102 on Mario Incremental -- added logs to the bug
- x86-alex failed HWTest, sosa commented on IRC "looks like a false negative as i was rebooting/restarting the devservers when this happend so the update payloads weren't avialable on the devserver" -- re-opened and watching other canaries still running HWTest
- meanwhile, hit another instance of 34102 on Mario Incremental
- then another instance of 34102 on x86-mario Canary -- HWTest didn't seem to run (was orange)
Sept 3, 2012, Mon
mtennant, sonnyrao, vapier
- Labor Day holiday in United States
Aug 31, 2012, Fri
adlr, ferringb
- Sameer checked in a kernel change that caused all(?) machines to oops, reboot after ~10 seconds. Reverted the change. crosbug.com/34081
- crosbug.com/34102
Aug 29, 2012, Wed
miletus, garnold, mkrebs
- Tree closed due to "Kernel image is larger than 8 MB" (crosbug.com/34039). Reverted changes that added parted to initramfs.
- Note: Reverts finally got merged in at about 8pm, so builds started before that could still fail (depending on their kernel size).
- tree closure following x86 generic full VM test failure due to python crash; filed http://code.google.com/p/chromium-os/issues/detail?id=34025, tree re-opened.
- Autotest failure: "Not logged in" error in platform_Pkcs11Persistence (possibly crosbug.com/32166).
- Autotest failures: several more "supplied_Compositor sig 11" failures (crosbug.com/33906). Also a "supplied_nacl_helper_boo sig 11" failure, which I added to that issue since it's also Chrome.
Aug 28, 2012, Tue
miletus, garnold, mkrebs
- x86-alex and x86-mario canaries failed in hwtest (login_CryptohomeIncognitoUnmounted and login_CryptohomeUnmounted, respectively); investigation reveals network issues related to http / mysql server, tree re-opened.
- lumpy, x86-mario and x86-zgb canaries failed in hwtest; latter two due to login issues, former on desktopui_{KillRestart,AccurateTime}. variety of failing bots suggests a transient flakiness. lab sheriff (jrbarnette) informed, tree re-opened.
- Autotest failures: Bunch of failures with "Login timed out" and "chrome_200_percent.pak". Turns out the chrome_200_percent errors are a red herring (they don't cause failures: crbug.com/143850). These are really login issues (crosbug.com/33841).
- Autotest failures: "supplied_Compositor sig 11" in desktopui_DocViewing (crosbug.com/33906).
Aug 24 Fri
djkurtz (TPE), dgreid, katierh
- lumpy canary failed enterprise_DevicePolicy http://crosbug.com/33435
- alex canary failed, enterprise_DevicePolicy, power_Resume (one login failure and an instance of crosbug.com/33435
- zgb canary failed imaging chromeos-rack6-host7 - multiple network failures on this board
Aug 22 - Aug 23 Wed/Thu
taysom, dhendrix, dgozma
- x86-alex canary and x86-zgb canaray failed in HwTest during login
- x86 generic incremental failed in flaky FMTtest
- For login problems (crosbug.com/33841)
Aug 20 - Aug 21 Mon/Tue
cywang (TPE)
Aug 16 - Aug 17 Thu/Fri
waihong (TPE), posciak (MTV), bfreed (MTV)
- x86-mario canary failed with a Chrome crash: crbug.com/143495
- chromium.chromiumos amd64 failing most of the day, crosbug.com/33613
- flaky chromiumos-sdk: gtk-doc failing in configure, but intermittently
- Flaky tegra2 full archive step's been failing intermittently on archive stage due to crosbug.com/30031, will be getting rid of tegra2 bots Fri or Mon
- Several packages failed with "select error: (4, 'Interrupted system call')", suspect something killed a build: crosbug.com/33617
- mario and alex canary failed due to HWTest losing connections, will be resolved itself.
- Failed to connect to virtual machine: crosbug.com/33611
- security_ptraceRestrictions failing: http://code.google.com/p/chromium-os/issues/detail?id=33531
- security_ASLR failing: http://code.google.com/p/chromium-os/issues/detail?id=33590
- filed issue http://crosbug.com/33613 for recent >5 builds failed in login timed out.
Aug 10 - Aug 13 Fri/Mon
sleffler (SFO), quiche (MTV)
Aug 8 - Aug 9 Wed/Thu
sheu (MTV), bhthompson (MTV)for chromeos-factory
- Intermittent flakes from security_SeccompSyscallFilters tracked in crosbug/33403. Revert of promotion to bvt chumped in.
- parrot canary failure due to 27c54ab in third_party/coreboot; fix chumped in.
Aug 4 - Aug 5 Sat/Sun
- I'm not actually sheriff today, but this is a note to sheriffs over the weekend and early Monday: there's a possible unit test failure in shill that made its way into the tree which could fail in build and cause a failure. If this happens, feel free to submit https://gerrit.chromium.org/gerrit/29242/ in order to fix it. It's waiting for normal review, but if it does end up causing trouble, chumping it is the right thing to do. (pstew)
Aug 2 - Aug 3 Thu/Fri
fjhenigman (WAT), benrg, snanda
Jul 31 - Aug 1 Tue/Wed
?
Jul 27 - Jul 30 Fri/Mon
?
Jul 25 - Jul 26 Wed/Thu
dennisjeffrey (MTV), sosa (MTV), hungte (TPE)
- bot hung after successfully completing archive stage but before the report stage; forcefully killed by buildbot after 9000 seconds. Seems to be a rare flake. Filed http://crosbug.com/32944.
- lots of errors connecting to Google Storage (curl failures). Google Storage team was contacted and they fixed the problem on their end. Followed-up by filing http://crosbug.com/32986 to track the task of updating the version of gsutil used on the chromeOS builders (a recommendation by the Google Storage team).
- another "python2 sig 6" error. Updated existing bug http://crosbug.com/32539, which is currently under investigation.
Jul 23 - Jul 24 Mon-Tue
dkrahn(MTV), dtu(MTV)
Jul 19 - Jul 20 Thu-Fri
puneet(MTV), rminnich(MTV), seanpaul(MTV)
Lots of failures to network issues, the biggest symptom being curl fails.
July 17 - July 18 Tue/Wed
msb(MTV), kamrik(WAT)
- crosbug.com/32539: pyautolib sig6 crash - test passes but leaves a crash file behind. Saw this thrice.
- Bunch of tegra flakiness issues. Told to ignore.
Jul 13 - Jul 16 Fri/Mon
grundler(MTV), sabercrombie(MTV)
canaries were mostly fine on Friday. More failures on Monday:
- crosbug.com/32439: "zgb failed on update-engine". Saw similar AU timeouts on lumpy, x86-mario, and zgb.
UPDATE: "Issue was devserver overloading and deploying apache and fixing crashes that happened every test run has resolved this issue."
- crosbug.com/32385: "mod_image_for_recovery failed on arm-daisy canary". Saw this once.
- crosbug.com/32539: pyautolib sig6 crash - test passes but leaves a crash file behind. Saw this once.
Jul 11 - Jul 12 Wed-Thu
nirnimesh(MTV), piman(MTV)
Canaries repeatedly kept breaking due to update_engine problems.
- butterfly canary failed VMTest with 'No space left on device' on image (not host). Updated on existing bug crosbug.com/32454
- x86-zgb canary failed HWTest with "Host did not return from reboot." Updated on existing bug crosbug.com/32181
- tegra2_kaen canary failed HWTest with "update-engine failed". Updated on existing bug crosbug.com/32129
Jul 5 - Jul 6 Thu-Fri
chinyue(TPE), dhendrix (MTV), ferringb (MTV)
- Thu Jul 05, 06:30 UTC: amd64 generic full failed: update_engine unittest takes too long to finish. (http://crosbug.com/32096)
- Fri Jul 06 - ?: update_engine unittest fails on multiple internal builders during the FilesystemCopierAction test (http://crosbug.com/29841#c42)
- Thu Jul 05, 07:33 UTC: stout canary failed: ManifestVersionedSync took too long (6+ hours) and thus BuildTarget didn't have enough time to finish. Seems a glitch, re-opened tree.
Jul 3 - Jul 4 Tue-Wed
nirnimesh(MTV), rharrison(WAT)
- chromium.chromiumos bots were dying in the VMTest, Chrome sheriffs fixed that.
- Potentially saw this filter through to x86 alex canary. File crosbug.com/32382
- mario canary failed a couple of times due to HWTest losing connections over night, resolved itself.
- amd64 generic full failed due to unit tests taking too long. Filed crosbug.com/32380. This occured again on x86 alex canary.
- FilesystemCopierActionTest.RunAsRootSimpleTest in update_engine failed for no apparent reason. File crosbug.com/32366
- stumpy canary failed in HWTest with "StageBuildFailure" and "500 Internal Server Error". Filed crosbug.com/32361
- Saw instance of prebuilts getting a 500 on upload
29 Jun-2 Jul Fri-Mon
rspangler(MTV), mtennant(MTV), waihong (TPE)
27-28 Jun 2012 Wed-Thu
benchan (MTV), dparker (MTV), josephsih (TPE)
- x86-alex canary and x86-zgb canary failed => crosbug.com/32181.
- Failded at HWTest [bvt]: try_new_image FAIL: Host did not return from reboot.
- This might be related with crosbug.com/31748: system failed to respond on the network to cause reboot timeout. Alex and zgb seem particular hard hit.
- amd64 generic full failed => crosbug.com/30518
- Failed at cros_run_vm_update in VMTest. Networking sometimes failed to come up maybe due to a bug in VM network driver.
- lumpy canary failed => crosbug.com/32195 . Unhandled AssertionError: Could not create /home/chronos/Consent To Send Stats. during VMTest. No obvious cause. Reopened the tree and kicked the builder to see if problem reoccurs. Other canaries are passing.
- lumpy/stumpy/tegra2_kaen canary failed => crosbug.com/32228.
- Failed at HWTest [bvt]. Seemed to be network problem.
25-26 Jun 2012 Mon-Tues
dianders (MTV), bfreed (MTV), clchiou (TPE)
- ~8am MTV: amd64-generic-inc is failing, but looks like a builder issue (as found by kliegs / ellyjones). Tree still open. Looking for a trooper; fixed by pschmidt. resolv.conf was empty on the builder
- Kaen canary has been failing since last Friday. 2086 - 2090 were various HWTest failures. Now it doesn't even do the update. http://crosbug.com/32129 for the update problem. Not a closer, so assuming bug filed is enough.
- Autotest failure in bvt on x86-mario-r22 R22-2490.0.0. Flake? Don't see info about the failure.
- chromium.chromiumos failure: http://crosbug.com/32139
- All canaries died. Theory by davidjames is <https://gerrit.chromium.org/gerrit/19401>. Revert is here: <https://gerrit.chromium.org/gerrit/#change,26077>
- x86-mario canary died. Reported http://crosbug.com/32166.
- tegra2_kaen canary died the same way it was dying Friday night. That is an improvement over the weekend failures. http://crosbug.com/32012.
- tegra2_kaen and x86-mario canaries died. tegra2_kaen canary => crosbug.com/32012; x86-mario => crosbug.com/32166
- Think x86-mario may be a flake and just a longer timeout needed? Need owner
- Not sure about tegra2_kaen
- parrot canary failure http://crosbug.com/32173
- Retry didn't help. Trying a clobber retry.
- kliegs reverted lumpy hwtest connection to the bots: http://chromegw/i/chromeos/changes/2521
- Uprev failing; kliegs manually modified .repo/manifests on mario paladin and kicked bots. This looks to have fixed uprev failures and vmtests also passing. Still hobbled by lumpy hwtest failures (timeouts take 30mins).
- All canaries failing with HWTest [bvt] Suite prep 502 Proxy Error (crosbug/31921). tammo: Tree throttled, as I have no idea what to do about this.
- Tree throttled for vmtest failures; MTV sheriffs left for the day w/o resolution (PSA posted to chromium-os-dev@)
- Paladin's stuck so force stopped alex+stumpy paladin's and clobber+force build mario.
- Lumpy paladin hw tests are timing out backing up the CQ by ~15mins. Attached to existing crosbug/31916.
- Autotest failure in bvt on stumpy-r22 (R22-2471.0.0): after the test passed, Chrome crashed, and there was no stacktrace due to http://crosbug.com/31151 ; ddrew created http://crosbug.com/32038
- Looks like a network issue caused gsutil to hang (link canary); created crosbug/32028.
19-20 Jun 2012 Tue-Wed
taysom, wfrichar, kliegs, vapier
- Tree closure due to RPC failure by build server http://crosbug.com/31981
- Tree closure due to failure to upload prebuilts to Google Storage (gsutil flake; at http://crosbug.com/31580)
- Tree closure due to race condition in cleaning up. Appeared to be the same as http://crosbug.com/30031
- Chromiumos-tegra2 failed due to disk full - the build people with access to that server were in Las Vegas
15 & 18 Jun 2012 Mon & Fri
sosa, quiche, djkurtz
- Fri Jun 15, 06:15 UTC: "parrot canary" closed: crosbug.com/31883
- Sat Jun 16, 07:30 UTC: network flake during BVT on lumpy canary
- Sun Jun 17, 07:32 UTC: 502 Proxy Error during suite prep on x86-alex canary
jrbarnette, rcui, kinaba
11-12 Jun 2012 Mon-Tue
bleung, petkov, thieule
7-8 Jun 2012 Thu-Fri
craigdh
4 Jun Mon
fjhenigman, dtu, tlambert
- 8:37am PDT - amd64 generic incremental closed tree when vm16-m2 disk filled up - could not find a trooper but Peter Mayo helped - thanks Peter
- 2:05pm PDT - x86 zgb canary failure first thought to be upload_symbols flake, but investigation indicates those errors are not fatal - looking for real cause
- 2:40pm PDT - paladins blew up real good, vapier identified and fixed it as a permissions issue - thanks vapier
- 3:46pm PDT - mario incremental http://crosbug.com/30880
- 7:14pm PDT - lumpy canary http://crosbug.com/18587
1 Jun Fri
fjhenigman, dtu, tlambert
31 May Thu
sque, pstew, josephsih
30 May Wed
sque, pstew, josephsih
- Unable to generate file identifier for ec.obj. hungte reverted it. (http://crosbug.com/31386)
- Dependency failure for autotest-deps-piglit-0.0.1-r1450 on amd64-generic (looks like a flake, but http://crosbug.com/31389)
- update_engine_unittests flake (http://crosbug.com/29841) -- this issue continues to close the tree.
- Network problem?
29 May Tue
miletus, semenzato, sonnyrao
- amd64-generic full failed on Archive, opened new bug crosbug.com/31332
- VMTest flak on mario-incremental - crosbug.com/31067
- unpinned Chrome from 21.0.1150.3
23-24 May 2012 Thu-Fri
sabercrombie(MTV), vbendeb(MTV), kochi(TOK)
- closure by VMTest flake (http://crosbug.com/31067)
- tree broken by libssl update. ellyjones fixed it. Ongoing problems caused by failure to rebuild binpkgs dependent on openssl.
- Chrome build broke various UI tests: http://code.google.com/p/chromium-os/issues/detail?id=31291. Pinned Chrome to 21.0.1150.3.
- mario-incremental VMTest failure with two apparent variants of http://crosbug.com/31067
- link paladin u-boot build failure -- change reverted
- cros_mark_as_stable broken. fix chumped in.
- gcc change broke builds. reverted.
22 May 2012 Tue
cwolfe, marcheu, micahc
- the experimental "unified lumpy paladin" is down on disk full; it is being moved to another machine so does not need a cleanup
- mario-incremental failed with "login_CryptohomeIncognitoMounted ... Chrome did not reopen the testing channel after login as guest" http://crosbug.com/20286
- unclutter was causing retries in various builds; fixed by cwolfe https://gerrit.chromium.org/gerrit/23227
- link canary failure on chromeos-u-boot; fixed by sjg
- everything else failed on svn server problems; fixed by maruel and nsylvain
18-21 May 2012 Fri-Mon
dkrahn, puneetster, hashimoto
- VMTest failure: crosbug.com/31067.
- Tree broken by manifest change: olofj fixed.
- Archive failure on lumpy canary: crosbug.com/30854.
- Update engine failure on tegra2_kaen canary continues: crosbug.com/31019.
- Multiple occurances of storage error: 'transfer failed with bytes remaining': davidjames filed crosbug.com/31103.
- gpsd timeout on x86-generic full: crosbug.com/31096.
- Another google storage failure on tegra2-full: 'No valid URLs found' exception: davidjames in contact with storage team.
- Autotest timeout on amd64-generic for test: SimpleTestUpdateAndVerify. Subsequent VMTest stage passed.
17 May 2012 Thu
dlaurie, grundler, yusukes(TOK)
16 May 2012 Wed
dlaurie, grundler, yusukes(TOK)
- Came in to red tree, all canaries failed in vmtest. http://crosbug.com/30952 Eventually we pinned Chrome to 21.0.1137.5, but that was not usable for ARM so it is being moved forward again.
- Chromium OS also has vmtest failure attributed to http://src.chromium.org/viewvc/chrome?view=rev&revision=137395
- 2pm: amd64-generic-incremental failure: collect2: ld terminated with signal 7 [Bus error]. This was my fault (dlaurie) for not applying the binhost change to other targets when I pinned chrome.
- 2:45pm: amd64-generic-incremental is out of disk space, escalated to troopers
- arm-daisy canary build failed (closed tree) due to missing dependency in chromeos-bootimage (was built in parallel and "usually" built)
- 5pm: network timeout trying to retrieve cros_sdk
14 May 2012 Mon
mkrebs, bhthompson, falken(TOK)
[TOK]
- mkrebs: filed crosbug.com/30880 for "tegra2_kaen canary": JSONInterfaceError => GetNextEvent => "received empty response"
[TOK]
- x86-mario and stumpy canaries failed, maybe same as http://crosbug.com/30854
- ferringb@ on IRC: some duplicated output issues, "if you see anything screwed up, for example, if vmtest has parts of unittest logs in it, please open bugs for it w/ links to the specific failures"
- also: builders that hang without output for a long time are probably out of disk space. CLs are coming.
4 May 2012 Fri
snanda, ellyjones, katierh
- crosbug.com/30575 - arm generic full builder had a timeout on gpsd though it builds locally and did not break other builders. Will watch the next build (already in progress)
3 May 2012 Thu
pstew, gpike
- BVT test run last night still has failures in graphics_WindowManagerGraphicsCapture, but now they are just failures and not segmentation faults. Test disabled, so it should not feature in the BVT on 4 May. crosbug.com/27587.
- Monitoring login_CryptohomeIncognitoUnmounted failure which seems to have failed VMTest on a couple of platforms last night, but is cycling green.
- Issued crbug.com/126133 for crashing on Link paladin in chrome!BaseTab::AdvanceLoadingAnimation. Chrome gardener is flackr@, not bshe@ as the waterfall shows, due to swap. This issue is claimed is and verified in early builds on May 4.
- Persistent "Timed out waiting to revert DNS." messages on Link paladin builds. crosbug.com/30472 This appears to be a side effect of the bug above causing tests to end prematurely. Submitted a CL (making its way through the queue) which will landed land before the Chrome change so we were able to confiurm that this fixes this secondary issue.
- Transient failure in VMTest on x86-generic. Filed bug crosbug.com/30518.
- A couple of glitchy builds due to some dependency swaps for the parted package. Should have cycled through all builds, but contact benchan@ if parted features in any build failures tomorrow.
2 May 2012 Wed
pstew, gpike
- BVT failure in graphics_WindowManagerGraphicsCapture (segmentation fault). Assigned crosbug.com/30402 to ihf@ who wants to "wait and see" how it does in BVT toight.
- Canaries are red: crosbug.com/30376. Fixed by scottz@ who reverted the offending change.
- Creation of "swap.conf" in chromeos-init conflicts with platform-specific swap.conf: crosbug.com/30397. Reverted this, and michahc@ will land a more comprehensive change.
- UploadPrebuilts phase for multiple architectures failing with "GSResponseError:: status=502, code=None, reason=Bad Gateway." Appears to have been a temporary server failure -- monitoring.
1 May 2012 Tue
jrbarnette, thutt, inter alia
- Tree started the west coast day green
- Occasional update_engine unit test failures due to crosbug.com/29841
- There are ongoing changes underway trying to get to the root cause.
- One canary failure due to an ill-timed change to the dev server.
30 Apr 2012 Mon
jrbarnette, inter alia
- Apparently, nothing has happened for the past week and a half.
- The tree started the west coast day (and week) green.
- Minor failures during the day; known bugs (to be documented later).
- At the time of West coast sign-off, there is an ongoing outage due to multiple canary failures
- HWTest updates got 404 errors downloading stateful.tgz; root cause unknown
- jrbarnette is declaring it "transient" - time will tell whether this is right.
19 Apr 2012 Thu
gmorain, piman, kamrik
- Found the tree red with HW test stage failed for alex, zgb, stumpy and lumpy canaries. In all cases the HW test stage failed at an early stage before even running any test. with an error message "FAIL: Update failed. Timed out waiting for system to mark new kernel as successful."
- While trying to figure what it was, most of the builders cycled green. The two HW test failures in the new builds seem to be browser crashes, opening the tree.
- 19:43 UTC - Tree went red again on HW test failure on ZGB and Alex with error message that looked like a browser crash. After some investigation it appears to be http://crosbug.com/29701 which was reported several hours earlier. Also reported in http://crosbug.com/29725
13 Apr 2012 Fri
dgreid, tlambert
- Had an instance of 28631, re opened.
- platform_CryptohomeAuthTest was failing for most of the day and not closing the tree, Found the offending commit a reverted.
12 Apr 2012 Thu
dgreid, tlambert
- Error with HwTest reimaging systems cleared up.
- One instance of 26646
11 Apr 2012 Wed
dianders, thieule, chinyue (TPE)
- scottz: Unfortunate user error failure on HWTest: http://code.google.com/p/chromium-os/issues/detail?id=29322
- crosbug.com/26646 again on stumpy canary
- chinyue 06:32 UTC: VMTest failed, crostestutils.lib.dev_server_wrapper.DevServerException: Timeout waiting for the devserver to startup. (reopen http://crosbug.com/20251)
- chinyue 07:01 UTC: http://crosbug.com/26646 again on mario incremental
- chinyue 07:16 UTC: http://crosbug.com/26646 again on x86-mario canary
- chinyue 09:26 UTC: http://crosbug.com/29278, still investigating...
- dianders 12:27 MTV: x86-mario canary in HWTest.
- Not much was logged in the link pointed to by the waterfall. It sounds like that's because this wasn't a failure of the test but perhaps a failure in running the test (?).
- scottz says he knows the problem and working on it. Filing a bug for himself. TBD: bug #?
- thieule 12:40 MTV: alex-he failure is 26646
- thieule 3:04p MTV: Temporary bots failure due to chromeos-chrome needing libjpeg, vapier says they should cycle green once they pull in chromeos-chrome 20.0.1098.1_rc-r2.
- thieule/dianders 5:30 MTV: Lots of canaries died due to failure to build private version ixchariot. dianders reproduced locally. Found that ixchariot used the cros-binary eclass, which had changed today. Revert of the eclass fixed ixchariot build, so chumped it in.
10 Apr 2012 Tues
dianders, thieule, chinyue (TPE)
- x86-zgb canary builder failed (http://crosbug.com/29192)
- gclient sync failed when building chromeos-chrome (http://crosbug.com/29193)
- dianders 10:30a MTV: tree was left closed at start of shift with message: http://crosbug.com/29193. Note that at around 10am that bug had been marked as fixed. David James said that several syncs had passed, so not keeping tree closed for this.
- dianders 10:30a MTV: David James pointed that http://crosbug.com/29138 was still causing VM test bots to fail (old temp files still left over). He is fixing.
- dianders 11:03a MTV: Noticed that Lumpy paladin builder failed with something similar to yesterday's http://crosbug.com/29161. Ben confirmed that this was the same as http://crosbug.com/26646.
- dianders 12:30p MTV: chromium-os-sdk is broken (and has been for a little bit--didn't notice with all of the other redness). Proposed fix is here: https://gerrit.chromium.org/gerrit/19907.
- dianders 12:45p MTV: Checking to see if latest mario incremental failure is another http://crosbug.com/26646. Asking Ben (who is AFK) and digging myself.
- Ben says it's a dupe. Updated the bug. Note that according to Ben there doesn't appear to be any good way to tell between this bug and any other hang of chrome at bootup. ...but if we got sig 6 or sig 11, we'd know it was a crash and different.
- dianders 1:30p MTV: Kaen paladin is dead, which blocks all internal paladins. Escalating to troopers (both via email and IRC).
- Latest update on the machine: It found errors during a disk check and is now trying to fix the errors. Continuing to escalate.
- Going to move to another machine. http://crosbug.com/29224 is the bug to track that.
- Fixed now.
- dianders 2:45p MTV: Since CQ was so flaky for internal stuff, I ended up chumping people's changes in if they passed enough stuff (as suggested by davidjames). Ignored instances of 29224 and the fact that they hadn't gone through Kaen.
- thieule 4:47p MTV: arm generic full builder runs out of disk space. davidjames mention that the builder only has 60GB of disk so it can only hold about 3 builds. Opened http://crosbug.com/29246.
- thieule 5:04p MTV: arm-ironhide canary fails to emerge kernel, http://crosbug.com/29247.
6-9 Apr 2012 Fri-Mon
tbroch(fri), sjg(mon), mtennant, tammo
4-5 Apr 2012 Wed-Thur
tbroch, dparker, kinaba
2-3 Apr 2012 Mon-Tue
sheckylin, olofj, dkrahn
- autotest repo failed to replicate automatically, davidjames replicated manually and logged crbug.com/121806.
- Noticed alex-canary has failed the last two builds due to lab environment issues. Talked to johndhong, opened crosbug.com/28847.
- Reverted a CL that broke the commit queues: https://gerrit.chromium.org/gerrit/19493.
- Persistent bug crosbug.com/26646 (‘Timed out waiting for login’ in VMTest)
- New bug crosbug.com/28789 ('Timeout waiting for the devserver to startup' in VMTest)
29-30 Mar 2012 Thu-Fri
grundler, dlaurie, seanpaul
- crosbug.com/27992 on "lumpy canary"
- crosbug.com/26646 Lots of VMTest failures "Timed out waiting for login"
- crosbug.com/28581 stumpy-canary has been broken for a week, was isolated and hopefully fixed Friday
- upstream merge of modemmanager-next caused breakage due to interface change. shill was updated to compensate.
- A few builders ran out of space. Commit was landed to auto-clean ccache directory.
- R19 x86-alex pre-flight stuck after vmtest failure, discovered and escalated to troopers late Friday...
- Detailed notes at goto.google.com/adcgi
27-28 Mar 2012 Tue-Wed
reinauer, taysom
- New problem: crosbug.com28631 Failed cbuildbot failed archive failed report. Assigned to ferringb
- crosbug.com/26646 flake: VMTest fails in various login tests. This happened several times
- crosbug.com/28374 flake: Timed out waiting for system to mark new kernel as successful. Multiple times but not as many as 26646
- Problem with chrome - reinauer will need to describe.
23-26 Mar 2012 Fri-Mon
puneetster, snanda (PST), waihong (non-US)
adlr, bfreed (PST), kochi (non-US)
- ARM build failure (-nopie error): internal builds are only affected; toolchain is rebuilt in chroot and this is fixed.
- crosbug.com/28226 cgroup unhandled crash
- crosbug.com/26646 flake: VMTest fails with Timed out waiting for login prompt
- Transient gclient sync failure on Chrome.
- VMTest failed with Error parsing data because invalid syntax, but the Report log says Exception __main__._ShutDownException: _ShutDownException('Received signal 15; shutting down',)
- BuildTarget failed with Unavailable repository 'gentoo' referenced by masters entry due to https://gerrit.chromium.org/gerrit/#change,18809.
20 Mar 2012 Tue
sonnyrao/dennisjeffrey (PST), yjlou (non-US)
- Mon 22:51 UTC Tree closed. "amd64 generic full" bot running out of memory. Filed http://crbug.com/119009 and re-opened tree.
- Build team switched amd64-generic over to a Builder with more memory and we closed 119009
- Tue 06:30 UTC Tree closed. Transient download error. Tree re-opened.
- Tue 20:17 UTC Tree closed. tegra2_seaboard failed BuildBoard due to Arm hardening options being enabled in GCC.
- Saw another http://crosbug.com/26646 flake.
15 Mar 2012 Thu
keybuk (PST), katierh (PST)
13 Mar 2012 Tue
waihong (TPE)
12 Mar 2012 Mon
vbendeb (PST), bhthompson (PST)
- 11:05 PST - kernel rebase to 3.2 had to be reverted:
https://gerrit.chromium.org/gerrit/#change,17846
https://gerrit-int.chromium.org/#change,13546
but another problem crept in (http://crosbug.com/27657), bhthompson working to resolve
- 11:25 PST. The failing cashew unittest has been temporarily disabled (https://gerrit.chromium.org/gerrit/#change,17847). Tree status changed to opened.
- 12:40 PST The tree is closed again with the same unittest failure. The ebuild uprev has not happen.
- The uprev in fact happened at 12:23 (after the failed build started), reopening the tree
- With the kernel revert in place and the cashew unit test disabled, the tree cycles green and stays fairly clean in the afternoon apart from a couple of flakes.
- The kernel image size problem still needs to be dealt with, as does the cashew unit test, but they should not be impactful to tree status in the interim.
09 Mar 2012 Fri
vbendeb (PST), bhthompson (PST)
- Had to revert sshfs-fuse update to 2.4 as it was breaking on ARM due to instability marker in the new ebuild https://gerrit.chromium.org/gerrit/#change,17773
- Kernel 3.2 update was pushed on Friday but the impacts were not felt until the evening, leaving the tree red for the weekend.
08 Mar 2012 Thu
quiche (PST), micahc (PST), hungte (non-US)
- 17:48 PST - tegra2_kaen canary failure, sjg pushed https://gerrit.chromium.org/gerrit/#change,17630 to fix.
- 16:11 PST - x86-zgb canary, crosbug.com/27521
- 15:36 PST - lumpy canary failure, during PublishUprev
- 15:00 PST - assist with resolving chrome-PFQ failure (in chrometest)
- 12:43 PST - tegra2 seaboard full failure, reverted gerrit.chromium.org/gerrit/17523
** update: not reverted. quiche prepared the revert CL, but didn't push it. (confused by UI)
- 04:36 PST - alex_he canary ManifestVersionedSync failure (crosbug.com/27521)
07 Mar 2012 Wed
quiche (PST), micahc (PST), hungte (non-US)
- 16:36 PST - assist with chromium.chromiumos failure (gerrit.chromium.org/gerrit/17551)
- 12:41 PST - mario incremental failure due to TreeCloser crosbug.com/26646.
- 09:59 PST - CleanUp failed on x86 generic full. crosbug.com/5409, CL in review.
- 07:20 PST - CleanUp failed on amd64 generic full. petermayo rebooted the bot.
- 02:24 PST - x86-mario canary failure due to HWTest. crosbug.com/27287
06 Mar 2012 Tue
- 11 PST - amd64-generic full hit crosbug.com/25618
- 6 PST - tegra2-full complaining of disk full
- Periodic crosbug.com/26646
05 Mar 2012 Mon
- 9:43 PST - chromiumos-sdk hitting link errors in SDKTest; marcheu and zbehan got it fixed.
- Periodic crosbug.com/26646 all weekend.
01 Mar 2012 Thurs
- 04:00; llvm change landed tightening const strictness, breaking stumpy/lumpy canaries. https://gerrit.chromium.org/gerrit/#17133. Revert chumped in, canaries restarted manually.
- 10:14 PST - ScreenLocker smoke test failing. keybuk says the test was relying on a bug. http://crosbug.com/27146. Reopened for now.
- More chrome flakiness on internal builders - crosbug.com/26646
- 4:30 PST- x86 chrome PFQ failing to emerge chromeos-base/chromeos-0.0.1-r153, trying clobber build.
28 Feb 2012 Tue
kliegs (EST), davidjames (PST), nsanders (PST)
- Paladin bots aren't closing the tree - crosbug.com/27060
28 Feb 2012 Tue
kliegs (EST), davidjames (PST), nsanders (PST)
- 8:00 PST - chrome buildspec fixed (was missing new chromite dependency). chrome PFQ's running now to roll ebuild. Finished @9:45 AM and canaries were kicked off manually. Change was picked up by canaries and they went green.
- 7:39 PST - Canaries still red, Chrome 1055 buildspec was unsuccessful so no new chromeos-chrome ebuild. This means vapiers revert was not picked up so unused variables still creating errors on the canaries. Leaving tree throttled and have pinged Chrome sheriffs and PMs to help resolve
27 Feb 2012 Mon
26 Feb 2012 Sun
- 6:40pm, ferringb reopens for http://crosbug.com/26646 flake on mario-incremental.
- 5:43pm, vapier reopens, files http://crosbug.com/26886
25 Feb 2012 Sat
- 11:08: All internal canaries are taken out by http://crosbug.com/26886. Tree remains closed till sunday.
24 Feb 2012 Fri
dtu (MTV), mkrebs (MTV)
- 10:48 PST - All full and incremental builders, along with chromiumos on the Chrome waterfall, fail due to path conflict in DEPS. rcui reverts.
- 12:26 PST - x86-alex_he canary closes on chromium-os:26646. mkrebs reopens.
23 Feb 2012 Thu
- 14:06 PST - Builder could not find "configure" executable on x86 generic full, arm generic full, tegra2 full, and tegra2 seaboard full. Fixed by zbehan.
- 18:44 PST - Bug 26646 hit x86-mario incremental.
22 Feb 2012 Wed
- 12:22 PST - ExtensionTerminalPrivateApiTest.TerminalTest failed in Linux ChromeOS Tester. Possibly fixed, so no bug filed.
- 14:38 PST - Chromium-OS:26646
18 Feb 2012 Sat
17 Feb 2012 Fri
16 Feb 2012 Thur
15 Feb 2012 Wed
14 Feb 2012 Tue
dianders (MTV), ferringb (MTV), clchiou (TPE), tammo (TPE)
- overnight - bvt failure in zgb (couldn't access www.google.com). Interim failure?
- overnight - x86-mario canary failure => http://crosbug.com/26347 according to tree status history; clchiou has throttled because of this
- 8:35am - tree was open/green when dianders got in (ellyjones opened)
- 11:00am - x86 pineview full: ferringb IDed as an instance of http://code.google.com/p/chromium-os/issues/detail?id=21559
- dianders: nope. Actually http://crosbug.com/24935; I put a pling in that bug. Still agree that it shouldn't close tree, since it's not a new issue.
- 12:47p - lumpy canary: Another instance of http://crosbug.com/26347. Inserted pling and bumped to P1. dianders: kicked lumpy canary build (1:15p) when I noticed that it wouldn't retry for a while.
- 2:00p - dianders noticed that chromium.chromiumos build was broken (though no email).
- Filed: http://crosbug.com/26377
- Didn't close tree, since this doesn't appear to be a treecloser.
- 2:16p - gcc 4.6.2 (CL 15461) landed at 1:58p without a required dependency keyworded breaking amd64 i7 full, reverted via CL 15845.
- 3:49p - x86 generic incremental: vmtest, Actually http://crosbug.com/24935; time to escalate?
- 4:31p - x86 generic incremental: vmtest, http://crosbug.com/21559 (supplied_chrome crash). Added note to the bug, including a little bit of debugging. Not sure there's much we can do here.
- 6:22p - x86 zgb-he canary, update delta again: http://crosbug.com/24280.
- 6:31p - lumpy, cros_au_test_harness.py timeouts: http://crosbug.com/24crosbug.com/26347280.
13 Feb 2012 Mon
msb, mkrebs (MTV), kamrik (EST)
10 Feb 2012 Fri
msb, mkrebs (MTV), kamrik (EST)
- x86 canary bots are all red as of 2230 PDT last night - filed as http://crosbug.com/26168
- full buildbots have been running erratically for a week - e.g., the last run of x86-generic-full was on 2012-02-08 and the last run of arm-generic-full was on 2012-02-08 (with the previous run on 2012-02-01!)
- Tree closed at 2012-02-09 2244 PDT, reopened by ellyjones at 0628 PDT, reclosed by ellyjones at 0730 PDT. Still closed as of 0741 PDT.
- After much wailing and gnashing of teeth, ellyjones, kliegs and zbehan track the problem down to the binutils-2.21-r3 image produced by the chromiumos-sdk bot; if the same package is compiled locally (from the same git commit-id), the failure disappears. Tree still closed as of 0828 PDT.
- vapier points out that the act of rebuilding switches you back to bfd instead of gold, thus hiding the problem from earlier tests; back to square one
- CLs 15340 and 15176 were reverted and the SDK builder fired to rebuild the SDK. The new SDK works (verified around 12:40 PST)
- The problem was due to the -frecord-gcc-switches flag and the and the way how gold gets linked with glibc, which is linked with GNU ld.
- supplied_chrome sig 11 in security_ProfilePermissions.login again
- crosbug.com/25742
- http://chromegw/i/chromeos/builders/lumpy64%20PFQ/builds/534/steps/VMTest%20%5Blumpy64%5D/logs/stdio
9 Feb 2012 Thu
sque, dkrahn (MTV)
- Nightly chrome PFQ for amd64-corei7 failed causing ToT canary builders to continue to use chrome 19.0.1034.0 and fail again at 4:30am. Chrome pinned to 19.0.1033.0_rc-r1 to allow ToT canary builders at 10:30am to cycle green. Issue crbug.com/113475 has been logged and is assigned. When it is fixed and all nightly chrome PFQs cycle green chrome needs to be unpinned.
- Alex PFQ: supplied_chrome sig 11 in suite_Smoke/login_LoginSuccess.default
- x86-generic PFQ: "Unhandled IOError: close() called during concurrent operation on the same file object."
9 Feb 2012 Thu
djkurtz (TPE)
- tegra2 kaen PFQ fails to build uboot (reverted by ferringb; original issue investigated by clchiou)
- supplied_chrome sig 11 in suite_Smoke/login_CryptohomeUnmounted
- CHUMP caught a CL pushed past CQ that broke the tree, reverted by vapier
8 Feb 2012 Wed
sque, dkrahn (MTV)
- repo sync issue, should have been reverted by ferringb.
- Chrome nightly PFQ failed, should check 4:30am build, should have new build spec by then.
- lumpy64 canary failed in chrome build. There was a change to a language .xtb file that broke Chrome and was reverted, but the revert was not pulled inot Chrome OS:
- amd corei7 nightly chrome PFQ: Chrome build failure, likely same cause as above.
6 Feb - 2 Feb 2012 Mon/Tue
dparker, wfrichar (MTV)
31 Jan - 1 Feb 2012 Tue/Wed
jrbarnette, dgarrett (MTV), falken (Tokyo)
27-30 Jan 2012 Fri/Sat/Sun/Mon
ellyjones, dennisjeffrey, vbendeb
25-26 Jan 2012 Wed/Thu
- Lumpy64 and Pineview archive failures found to be a problem with dumpsym choking on files of the wrong architecture, filed as http://crosbug.com/25496. Mkrebs working on a fix as a p0 item, so tree reopened.
- Lumpy PFQ failure due to chrome aura crash: http://crosbug.com/25454 (Build log: http://chromegw/i/chromeos/builders/lumpy%20PFQ/builds/595). ChromeOS tree reopened since this is a chrome aura issue.
- Same aura crash also believed to be behind stumpy PFQ failure: http://chromegw/i/chromeos/builders/stumpy%20PFQ/builds/603
- Filed bug 25467 for Lumpy canary failure with cmt_drv.so and gobi.so
- Filed bug 25468 for dump_syms problem (ERROR : Unable to dump symbols for /build/x86-pineview/usr/lib/debug/boot/vmlinux:
dump_syms: src/common/linux/dump_symbols.cc:169: const Elf32_Shdr*{anonymous}: :FindSectionByName(const char*, const Elf32_Shdr*, const Elf32_Shdr*, int): Assertion `nsection > 0' failed.
23-24 Jan 2012 Mon/Tue
- Chrome PFQ was still red at the beginning of the shift. It was red since last Thursday, meaning chrome would not roll and pick up critical fixes, holding up dev channel builds
- Aura has started to flake - filed crosbug.com/25454
19-20 Jan 2012, Thu/Fri
anush, puneetster, miletus
17 Jan 2012, Tuesday
nirnimesh, sosa, josephsih
- Out of disk space on a bot - issue 110480 filed for long term fix.
13 Jan 2012, Friday
jennyz, achuith, zvorygin
12 Jan 2012, A Rainy Thursday
jamescook, keybuk, vapier (east coast)
11 Jan 2012, A Wednesday
jamescook, keybuk, vapier (east coast)
- Hit kvm ssh timeout again crosbug.com/24280
- build_image failure with "mount: you must specify the filesystem type" crosbug.com/24975
- x86-alex & tegra2_seaboard toolchain master bots dead due to sync error -> trooper reset them
- x86-alex 0.11.241.B factory & pre-flight bots dead for a while crosbug.com/24983
- x86-zgb release factory-980.B bot has been down for a while crosbug.com/24971
- google-breakpad failed its unit tests crosbug.com/24982
- new dev-libs/glib pkg failed in toolchain fortify smoketest; dev was informed of CQ usage and multiple CL's landed to resolve
10 Jan 2012, Tuesday
davidjames, tbarzic
- Saw a couple shutdown crashes in Chrome:
1/9/2012, Monday
davidjames, tbarzic, jglasgow (east coast)
- Tree still throttled; Looking for jhorwich to organize a sheriff summit.
- Found ZGB PFQ reporting errors (http://chromegw/i/chromeos/builders/zgb%20PFQ/builds/193).
"Found nothing new to build, trying again later.
If this is a PFQ, then you should have forced the master, which runs cbuildbot_master
Found no work to do."
Tried to force a build via the web page, but not sure if that is what the error message means.
- Found TOT PFQ has not run since Friday, despite a long (28) queue of changes. http://chromegw/i/chromeos/builders/TOT%20Pre-Flight%20Queue. Filing bug. Also observed that the ToT CQ has 749 pending requests. No troopers on IRC or responding to email.
1/7/2012, Saturday
jhorwich, tlambert, jglasgow (east coast)
- Tree was throttled; it's possible to push things past the PFQ with "Publish and Submit" after verify.
- You may need to remove gerrit as a reviewer to do this, since it will -2 you on the PFQ.
- If future sheriffs use this as a workaround for the VMTest problem, keep the following in mind:
- You need to watch the tree for non-VMTest failures.
- Consider "Publish and Submit" for CLs that were rejected by PFQ failures involving VMTest IFF shutdown related; this allows people to make forward progress towards deadlines despite the VMTest issue.
- Watching is no more onerous than reopening the tree every 27-35 minutes because of VMTest barfing (this was my Thr night and Fri day).
- jhorwich has the ability to get core dumps now; I'm talking to Randall Monday about making this a crossystem option. I honestly believe that he is a victim of VMTest in this, like the rest of us, and that we need to examine the (non)role of unit tests in diagnosing the test framework vis a vis tree closures:
- In case that's not clear, let me bluntly say that a chrome failure from a passed test should not be a tree closer.
- If we want to test Chrome fragility on shutdown, that should be a separate test; to my mind it would be of dubious value:
- Chrome crashes -> restart Chrome -> gaia login
- Chrome doesn't crash -> restart Chrome -> gaia login
- We need a sheriff's summit; if no one else calls one next week, I will.
1/6/2012, Friday
jhorwich, tlambert, jglasgow (east coast)
- jglasgow: Found stumpy PFQ failing, filed crosbug.com/24790, decided to fix rather than revert since VMTest was holding tree closed anyway
- VM tests failures dues to Chrome crashes are a huge problem, but better debugged by jhorwich and those with Chrome experience.
- Sorry to disagree here, but the VMTest failures are a meta-failure in the test infrastructure, and do not effect the validity of the test results. They are bugs, but they are bugs that should not result in tree closure.
- jglasgow: Filed crosbug.com/24795 for PFQ uprev failures
1/5/2012, Thursday
jhorwich, tlambert, jglasgow (east coast)
- Filed a tree closer crosbug.com/24733 because of a test platform_ToolchainOptions was failing because of bluez. Thanks zbehan for helping to look at this. It is not clear what change caused this to start failing.
- Filed a tree closer crosbug.com/24760 because uboot was failing to build on tegra2. Thanks David James who pointed this out, and vpalatin who quickly grabbed the bug. Lots of red on the tree due to chrome sig11 certainly affected the sheriff's ability to notice this -- but we should have been more vigilant in making sure we understood all the red builders.
- Chrome sig11 bug quite prevalent today. jhorwich noted 9 instances during MTV shift. Got a good stack trace on x86-alex canary build 1478, added to crosbug.com/19204
- Only other closure during shift was a straightforward build breakage (gerrit 13738) which was reverted
- jhorwich reproduced a chrome sig11 on local VM, is going to attempt to debug root cause Friday
- tlambert reopened over the sig11; mostly jhorwich was faster
- Added entry to the Sheriffs FAQ
- we need to update the builder/closer list
- temporary link to "all" for when you can't find the builder
12/28/2011, Wednesday
sonnyrao, mtennant
- Noticed that sheriffs cannot push through gerrit with red tree. Filed crosbug.com/24630.
- Starting around 2pm started to see Chrome sig 11 crashes on internal zgb PFQ and TOT PFQ. Filed crosbug.com/24646.
- Most Chrome sig 11 core files were unusable, but identified one as crosbug.com/19204. Re-opened bug.
12/28/2011, Wednesday
sonnyrao, mtennant
12/27/2011, Tuesday
derat, dlaurie, nkostylev
12/26/2011, Monday
derat, dlaurie, nkostylev
- transient x86-alex canary vmtest "Timed out waiting for login prompt" (crosbug.com/23199)
12/21/2011, Thursday
cwolfe
- transient alex PFQ vmtest "Timed out waiting for login prompt" (crosbug.com/23199)
- transient link PFQ vmtest sig 11 on login_BadAuthentication (didn't find a bug; should be one already)
- transient x86-pineview-pull svn error with webrtc (didn't find a bug; should be one already)
- Test failures on Chromium.ChromiumOS Linux ChromeOS Aura (crbug.com/108434 and crbug.com/108436)
12/21/2011, Wednesday
marcheu, quiche
- Another occurrence of crosbug.com/23413 (on x86 pineview full)
- Build failure in chromiumos sdk. Due to race condition in groff ebuild. (crosbug.com/24481).
- workaround by marcheu
- fixed by vapier
- Build failure in x86 generic commit queue. Due to missing sandbox exception for fontconfig. (crosbug.com/24488, fixed by vapier)
- Build failure in Chromium.ChromiumOS (aura). Fixed in ToT chrome.
- Build failure on x86 generic PFQ (due to https://gerrit.chromium.org/gerrit/13273). ferringb reverted.
- Build failure on tegra2_kaen-aura canary (due to https://gerrit.chromium.org/gerrit/13216). marcheu reverted.
- Another occurrence of crosbug.com/23413 (on zgb PFQ)
12/20/2011, Tuesday
rharrison, kamrik shadowing
- Came onto a red tree, due to Stumpy PFQ being forced directly instead of TOT PFQ
- Filing a bug about the error message not being descriptive enough, crosbug.com/24421
- Created a CL https://gerrit.chromium.org/gerrit/13235 to make error message more descriptive
- Kicked the TOT PFQ and reopened the tree
- VMTest Failure on amd full generic, created crosbug.com/24422
- Looked into crosbug.com/22577
- Pinged nkostlyev, altimofeev, and flackr to make sure it was being looked at
- Approved CL for altimofeev to change test run order to try to get more information
- Another occurrence of crosbug.com/23199 (on link PFQ)
12/15/2011, Thursday
ihf, gpike
- Hit: chrome crash in suite_Smoke/desktopui_ScreenLocker. crosbug.com/22577
- One case of: chrome crash in suite_Smoke/security_ProfilePermissions.login. crosbug.com/23258
- The 3AM build of x86-zgb_he full release-R17-1412.B Build #21 failed: FAIL Archive (1:35:59) with BackgroundException.
- The previous one (#20) failed in VMTest stage while unzipping the image.
- ... and before that, #19 failed in VMTest stage during ../platform/crostestutils/generate_test_payloads/cros_generate_test_payloads.py
- The 3AM build of x86-mario full release-R17-1412.B Build #21 also failed during ../platform/crostestutils/generate_test_payloads/cros_generate_test_payloads.py. The first problem may have been: "mount: you must specify the filesystem type"
12/14/2011, Wednesday
ihf, gpike
- Hit issues of: chrome crash in suite_Smoke/desktopui_ScreenLocker. crosbug.com/22577
- Hit another issue of: VMTest ERROR: Test that updates to itself. crosbug.com/20427
- Problems on alex_he canary with recovery and vmlinuz images. Filed crosbug.com/24242.
- chromiumos sdk broken: xmlrpc-c-1.18.02: curlmulti.c: curl/types.h missing. Filed crosbug.com/24235.
12/12/2011, Monday
glotov
- stumpy-canary link error in power_manager, can not reproduce locally. Clobber does not help as well. Filed crosbug.com/24091.
- Lumpy-binary fails on building chromeos-u-boot-0.0.1-r336: boot_kernel.c:206:26: error: 'CHROMEOS_BOOTARGS' undeclared. Filed crosbug.com/24136.
12/10/2011, Saturday
cwolfe (drive-by, times unknown)
- ARM release bots attempting to run vm_tests. Same as (crosbug.com/21536). Probably from gerrit/12702, e-mailed rcui
- Widespread build errors on pepper-flash "HTTP Error 403: User Rate Limit Exceeded" (crosbug.com/23511)
- stumpy canary link error in power_manager; can not reproduce, probably just needs a clobber after the 403 clears up
- Still some VMTest problems
12/9/2011, Friday
rspangler, chocobo, jglasgow, ellyjones
- 1130 PST: VMTest chrome-static crashes timed out (crosbug.com/21559)
- 1255 PST: VMTest timed out (crosbug.com/23413)
- 1300 PST: VMTest login timeout (crosbug.com/23199)
- 1405 PST: VMTest failure (crosbug.com/20427)
- 1530 PST: VMTest flakiness (crosbug.com/23778)
- 1600 PST: And more VMTest problems (crosbug.com/22577)
12/8/2011, Thursday
rspangler, chocobo, jglasgow, ellyjones
12/7/2011, Wednesday
thutt, thieule, yusukes
- 1033 PST: Canary builders broke (crosbug.com/23882), this was fixed and canary builder subsequently passed. Some builders used dash instead of bash.
- 1041 PST: Aura Chrome PFQ incorrectly configured, petermayo is working on a fix
- 1127 PST: Chrome crashed during VMTest (crosbug.com/23884)
- 1404 PST: Autotest client terminated unexpectedly (crosbug.com/20427), this could be related to crosbug.com/22333?
- 1717 PST: Chrome crashed during VMTest (crosbug.com/23884)
- 1729 PST: Chrome crashed during VMTest (crosbug.com/22577)
12/6/2011, Tuesday
thutt, thieule, yusukes
- 1059 PST: AU VM Test failure (crosbug.com/22333)
- 1140 PST: Timed out waiting for login prompt (crosbug.com/23199)
- 1415 PST: Chrome SEGV (crosbug.com/23675)
- 1722 PST: Timed out waiting for login prompt (crosbug.com/23199)
- 1740 PST: Commit queue hung and was restarted (crosbug.com/23864)
11/28/2011, Monday
ers, sleffler, stevenjb
- BVT failures for zgb are chrome sig 11's that appear unrelated but the dump logs are zero length
- BVT failures for mario sig 11 in synTPenh
- 1146 EST: looks like http://crosbug.com/23199 occurrences are still closing the tree frequently
- All three chrome PFQ builds had failed with "Clear and Clone chromite" errors (couldn't find branch named 'release'). A forced build on the arm generic chrome PFQ resulted in success, so I reopened the tree and forced builds on the other chrome PFQ bots.
11/10/2011, Thursday
- 0800 PST: 500 internal server error uploading prebuilt. Bug filed: http://crosbug.com/22804
- 0900 PST: x86 PFQ failure in autotest due to pygtk rev. pygobject was updated and PFQ clobbered.
- 1100 PST: TOT PFQ faliure in autotest due to pygtk/pygobject rev. Next build was successful.
- 1500 PST: Adobe pulls all Linux Flash 10 binaries. Bastards. http://crosbug.com/22837 I updated the adobe-flash ebuild to use flash11.
- 1700 PST: VMTest failures due to broken flash, my ebuild did not install into correct directories...
NOTE: autotest/pygtk/pygobject failures were related to python ebuild from a couple days ago
11/3/2011, Thursday
- 0700 PST: Chrome build was broke early in the morning. We kept the ChromeOS tree open. Resolved around 11:30.
- Red canaries were expected to run overnight, but they are still red Friday morning.
11/2/2011, Wednesday
- 1356 PST: tegra2_seaboard-tangent-binary failed with a too large u-boot image. reinauer fixed this.
- 1415 PST: transient VMTest failures from 2:15 to around 3:00.
Still open:
- Want to understand how to make sure we get chrome stack crawls. WIP at: <http://crosbug.com/21559> and <http://crosbug.com/22047>, which I think are different bugs.
- amd64-generic-full still failing (less important).
- BVT tests getting Synaptics sig 11s (http://crosbug.com/13377) and chrome sig 11s (not too surprising given the ones we see below).
10/28/2011, Friday
- 1625 PST: chromium.chromiumos failure. VMTest stage timed after 9000 seconds. ericroman reverted a webkit roll.
- 1620 PST: reopened tree
- 1540 PST: restarted most internal builders. (restarted any builder that had a build fail due to the network issue; did not restart builders that were idle at the time of network failure)
- 1530 PST: network issues resolved
10/27/2011, Thursday
- 2150 PST: network issues causing failure on internal builders (crosbug.com/22216)
- 1718 PST: chromium.chromiumos closes due to gclient sync failure on chromeos-chrome. ericroman reopens.
10/26/2011, Wednesday
- 0922 EST: VMTest Failed due to not being able to access update server
- Bug filed by petermayo as http://code.google.com/p/chromium-os/issues/detail?id=22111
- Reopened, since it only occurred for one bot
- 1443 EST: Tree closes because of failure to fetch webkit from svn.webkit.org. Not supposed to happen. Is crosbug.com/17959
- 1529 EST: Tree closes because of a build failure in chromium's chromium. Not supposed to happen; sosa and petermayo are fixing this.
- 1643 EST Another instance of crosbug.com/17959 from svn.webkit.org.
10/25/2011, Tuesday
- 1022 EST: Failures due to issues with cros_run_vm_test from http://gerrit.chromium.org/gerrit/#change,10599 . Reverted as http://gerrit.chromium.org/gerrit/#change,10647
- Multiple re-closures due to slow builds hitting this issue after the revert
10/24/2011, Monday
- 5:34p: Mosys ebuild failure. Reverted here: http://gerrit.chromium.org/gerrit/10605
- 5:12p: Another flaky sig11 in ChromiumOS (x86) (chromium.chromiumos). Haven't investigated, but it went away.
- 4:51p: Another flaky sig11 in alex-binary. http://crosbug.com/21559
- 3:15p (dianders): ChromiumOS (x86) (chromium.chromiumos) build failed. 2 issues:
- http://crosbug.com/22025
- First sig11 didn't give a stack crawl. Seems to be a different problem than http://crosbug.com/21559 (??)
- 2:31p (dianders): Stumpy canary 493 fails. Different than 492, but probably also flaky. http://crosbug.com/22019 filed.
- Early afternoon (dianders): Digging into overnight BVT failures. 2 of them thought to be another instance of http://crosbug.com/13377
- Morning (dianders): Digging into stumpy 492. Filed http://crosbug.com/22005 w/ info. Going to see what happens w/ 493.
- 9:45a PT: Started (West coast) day with:
- Tree opened with http://crosbug.com/21624 caveat (though already fixed). Kicked binary builders to try bugfix.
- Linux ChromeOS build failing (http://build.chromium.org/p/chromium.chromiumos/waterfall?builder=Linux%20ChromeOS). Looks like a flaky build. http://crbug.com/100538 seemed to be talking about this test, so added a comment.
- Chromium OS SDK looks like it's still probably broken. http://crosbug.com/21973
- Several emails about BVT failures
- Last stumpy canary (492) was a failed one.
- security_ProfilePermissions.login ERROR: Unhandled JSONInterfaceError: Automation call {'username': 'performancetestaccount@gmail. ...
10/21/2011, Friday
See also this doc: https://docs.google.com/a/google.com/document/d/17eHo0cN9gOEcdH43AQujNIKosRFjaHHeVdJyZ6jYJYY/edit
- 4:00pm PT: vpalatin points out that the (less important) amd64-generic-full is failing. http://crosbug.com/21970
- 3:16pm PT: Failure w/ shflags and testUpdateKeepStateful (http://crosbug.com/21966).
- 3:16pm PT: fix to chromium.chromeos waterfall <http://gerrit.chromium.org/gerrit/10526>
- 2:14pm PT: fix to crosbug.com/21945 is pushed.
- 2:14pm PT: another isntance of 21945
- 2:14pm PT: another kernel build failure (same problem--revert hasn't made it everywhere).
- 1:30pm PT: ...kernel build failure again (another case fixed by revert below)
- 1:00pm PT: kernel build failure; fix by reverting -> http://gerrit.chromium.org/gerrit/10509
- 10:38am PT: stumpy canary failure attributed to Bigstore; reportedly a power event in the data center.
- 10:23am PT: ellyjones reports kernel failure; fix: -> http://gerrit.chromium.org/gerrit/10497
- 9:00am PT: Started the (West coast) day with
- http://crosbug.com/21945
- Chrome SEVG failures lumped under http://crosbug.com/21559
- Chrome PFQs all down
- chromium.chromeos broken (and has been for several days).
10/20/2011, Thursday; 10/19/2011, Wednesday
10/17/2011, Monday
dgozman
- 9:40am. Tegra build fails. http://crosbug.com/21751.
10/12/2011, Wednesday
olege, semenzato, gpike
- 9pm: getting lots of chrome sig 11 during vmtests. Cause unknown.
- 5:32pm: webkit.2011101101.patch needs update. Updated http://crosbug.com/21624, got petermayo to work on a fix; restarted 3 bots after fix was done
- 5:15pm: cmasone kindly fixed a crash reporter bug introduced this morning.
- 4:43pm: hit 19204 again
- 4:16pm: webkit.2011101101.patch needs update. Opened http://crosbug.com/21624
- 2:21pm: autoupdate vmtest failed. Under psychological pressure, Chris Sosa admitted seeing this before. Opened http://crosbug.com/21610
- 9am: disabled desktopui_UrlFetch.not-live, thereby sweeping http://crosbug.com/21566 under the rug.
- sheriffs could not submit a change bypassing the commit queue. Chris Sosa fixed this.
- afternoon: svn checkout for chromeos-chrome failed again. Opened http://crosbug.com/21598.
- 8am. "arm generic full" failed on BuildTarget. svn checkout failed during building chromeos-chrome. Built fine on the next try.
- 0am - 8am. Multiple occurrences of http://crosbug.com/21566.
10/11/2011, Tuesday
olege, semenzato, gpike
times in PST unless otherwise marked
- 2:30pm: 21517 is fixed (xorg.conf missing in arm builds). This was making the arm canaries red.
- 12:20pm. Arm build broken by change 55311 at 11:45, fixed by change 55319 at 12:40.
- 11am. Another occurrence of http://crosbug.com/21402, assertion failure in google breakpad.
- 10:30am. http://crosbug.com/19204 happened twice. Raised priority and reopened.
- 8am. Oleg reverted a change apparently responsible for vmtest failure on stumpy. (http://gerrit.chromium.org/gerrit/#change,9841)
10/6/2011, Thursday
Sheriffs: derat, stevenjb
10/5/2011, Wednesday
Sheriffs: derat, stevenjb
Tree started closed with two issues:
|