Sheriff Log: Chromium

2013-12-09, Mon, vadimt@, scottmg@, grunell@
    • Doesn't seem like the failures-only waterfall is working properly. Filed infra bug (crbug.com/326934).
    2013-12-04, Wed, robliao@, michaeln@, dubroy@
    2013-12-03, Tue, robliao@, michaeln@, dubroy@
    • Winx64 Release telemetry_unittest continues to be flaky
    • WinAura likes to fail randomly after succeeding for a few runs. Cause unknown. (crbug/326009)
    • MacASAN will fail runhooks while checking for some windows platform directories. Cycling the bot and running again seems to get it going again.
    • GN Fun: brettw@ has been working on some build system improvements that have been causing issues with the bots off and on. They appear to be under control now, but if they do come up again, talk to him)
    2013-10-27, Wed, ygorshenin@
    • Disabled DevToolsBeforeUnloadTest.TestDevToolsOnDevTools (crbug.com/323847)
    • Seen a lot of failures of telemetry_unittests: testCloseReferencedTab, testGotTimeline, testScrollAction

    2013-09-31, Tue, dcheng@, acolwell@, phoglund@
    • Official Chrome Linux still running out of memory.
    • The Vista (1) runs of interactive_ui_tests are unreliable. I got 130 timed out tests out of nowhere, and it seems that was a flake.
    2013-09-30, Mon, dcheng@, acolwell@, phoglund@
    • Official Chrome Linux still running out of memory.
    • Seems there have been small size increases in chrome.dll on Win. All of them look legit as far as I can tell, so I'll just bump the expectations. The two big bumps are Aura and CLD2, respectively (CLD2 seems to be a language detection feature going into Chrome). Graph here.
    • Disabled a bunch of flaky tests.
    • The mac dbg builder has encountered som kind of auth failure: "error: git-credential-osxkeychain died of signal 11; fatal: remote error:  Invalid user name or password. Troopers notified.

    2013-09-26, Thu, petewil@, gene@, calamity@
    •     The official Google Chrome Linux machine is running out of memory during the link, so it will be red for awhile.  crbug.com/298195 has details.

    2013-06-27, Fri petewil@, asvitkine@, eroman@
    • the Chrome Frame tests for IE are especially flaky, the bot will go red often, but these are not tree closers.  It is safe to ignore these failures.  They are tracked by http://crbug.com/247062.  It will likely be awhile before these are resolved, so we will have to bear with the failures for awhile.

    2013-04-15, Mon, cpu@, mek@, akuegel@
    • Compile on Linux and Android is flaky, caused by some unknown bug in Blink. According to haraken@, the original failure was introduced in r148249. He reverted it in r148259. And rolled r148266. But it still doesn't work.

    2013-04-13, Sat, michaeln@, dmazzoni@, xusydoc@
    2013-03-27, Wed, ygorshenin@, jabdelmalek@, gene@

    2013-03-26, Thu, ygorshenin@, jabdelmalek@, gene@

    2013-03-21, Thu, khorimoto@, xusydoc@ (PST), tapted@ (AEDST)
    • Master tryserver got wedged around 5pm PST and was restarted
    • tryjobs all retried and exploded GOMA
    • git.chromium.org SSL certificate expired at 22:24:11 and linux bots all refused to update from it, waterfall lit on fire
      • error: server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none while accessing https://git.chromium.org/chromiumos/chromite.git/info/refs?service=git-upload-pack
    • tryserver never caught up, so no patches landed via CQ
    • bugs: http://crbug.com/222995 (certificates) http://crbug.com/223057 (workaround)

    12/19/2012, Wednesday, wjia@, mnissler@, jeremya@

    11/13/2012, Tuesday, ygorshenin@, wez@

    11/12/2012, Monday, ygorshenin@, wez@
    11/9/2012, Friday, nkostylev@, groby@, raymes@
    11/8/2012, Thursday, nkostylev@, groby@, raymes@

    11/5/2012, Monday, phoglund@, phajdan@
    • sergeyu landed http://src.chromium.org/viewvc/chrome?view=rev&revision=166003 which broke compile; then reverted using drover in http://src.chromium.org/viewvc/chrome?view=rev&revision=166013 ; the revert was incomplete, and phajdan.jr did the right thing with git: http://src.chromium.org/viewvc/chrome?view=rev&revision=166021
    • later on sergeyu relanded without full trybot cycle and broke the tree again http://src.chromium.org/viewvc/chrome?view=rev&revision=166068

    • Found consistent flaking in two tests in nacl_integration_test: filed http://code.google.com/p/chromium/issues/detail?id=159395.
    • Started rolling back https://codereview.chromium.org/11227020/ since it seems to be causing flakes in ChromeOS tests. See more comments on the bug. The author has had the patch rolled back like 2-3 times already but it still seems fairly obvious to me that it is causing the flakes, hence the rollback. The rollback itself got stuck in the commit queue though; the trybots were failing on content_browsertest flakes and had a bunch of other problems in general. Update: The revert _did_ get in using the drover tool eventually.
    • Looked into flaky PluginTest.PluginThreadAsyncCall failures. No obvious culprit CL, so I'm not sure what to do about it (perhaps mark the test as flaky?)
    • Reverted https://codereview.chromium.org/11362080; was breaking aura unit_tests.
    Lessons from that day:

    • drover may fail to revert changes properly; it's strongly recommended to use git
    11/2/2012, Friday, phoglund@, phajdan@

    • CQ landed mukai's https://chromiumcodereview.appspot.com/11369042 which broke win aura compile; CQ didn't run win_aura trybot even though win_aura is a tree closer

    • danakj landed https://codereview.chromium.org/11364054 which broke ui_unittests on Windows http://build.chromium.org/p/chromium.win/builders/XP%20Tests%20%28dbg%29%281%29/builds/28620 but the tree was not closed automatically ; this is inconsistent and surprising

    • beng landed https://codereview.chromium.org/11368010 which broke compile on ChromeOS; he reverted very quickly, which helped to keep the tree green; he also said linux_cros trybots are slow

      out/Release/../../third_party/gold/gold64: out/Release/obj.target/chrome/libbrowser_ui.a(out/Release/obj.target/chrome/../browser_ui/chrome/browser/ui/aura/chrome_browser_main_extra_parts_aura.o): in function ChromeBrowserMainExtraPartsAura::PreProfileInit():chrome_browser_main_extra_parts_aura.cc(.text._ZN31ChromeBrowserMainExtraPartsAura14PreProfileInitEv+0x31): error: undefined reference to 'aura::CreateDesktopScreen()'
      out/Release/../../third_party/gold/gold64: out/Release/obj.target/chrome/libbrowser_ui.a(out/Release/obj.target/chrome/../browser_ui/chrome/browser/ui/aura/chrome_browser_main_extra_parts_aura.o): in function ChromeBrowserMainExtraPartsAura::PreProfileInit():chrome_browser_main_extra_parts_aura.cc(.text._ZN31ChromeBrowserMainExtraPartsAura14PreProfileInitEv+0x75): error: undefined reference to 'aura::DesktopStackingClient::DesktopStackingClient()'
      

    • vandebo disabled a test in https://chromiumcodereview.appspot.com/11293067, which broke compile:
      chrome/browser/page_cycler/page_cycler_browsertest.cc:316:1: error: unterminated #else
      
      This was sloppily reviewed by phajdan.jr, i.e. the review didn't catch it.
    • CQ landed https://chromiumcodereview.appspot.com/10836347 which broke Linux Clang (dbg) compile. vandebo (sheriff) quickly reverted the change. The error was:
      ../../chrome/test/reliability/page_load_test.cc:339:69: error: no matching member function for call to 'Append'
              actual_crash_dumps_dir_path_ = actual_crash_dumps_dir_path_.Append(
                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
      ../../base/file_path.h:266:12: note: candidate function not viable: no known conversion from 'basic_string<char16, base::string16_char_traits>' to 'const basic_string<char, (default) char_traits<_CharT>>' for 1st argument
        FilePath Append(const StringType& component) const WARN_UNUSED_RESULT;
                 ^
      ../../base/file_path.h:267:12: note: candidate function not viable: no known conversion from 'string16' (aka 'basic_string<char16, base::string16_char_traits>') to 'const FilePath' for 1st argument
        FilePath Append(const FilePath& component) const WARN_UNUSED_RESULT;
                 ^
      1 error generated.

    • Some trybots were broken due to the libpci-related changes in https://chromiumcodereview.appspot.com/11343015, but they should be fixed now (on some of them build/install-build-deps.sh has not been run). There could be other problematic bots with the -b1 suffix (meaning they run on compute engine).

    • Disabled one test in the ash_unittests: https://codereview.chromium.org/11365063/, broken because of https://chromiumcodereview.appspot.com/11369017. Didn't roll back since the bot isn't a closer. Actually whole thing reverted later by phajdan.jr: https://codereview.chromium.org/11364052

    • There are still broken ash_unittests and views_unittests, most probably because of https://chromiumcodereview.appspot.com/11367041/. I didn't roll back that either (see above) but the author has been notified. And then Scott landed a fix (https://codereview.chromium.org/11362061).

    • content_browsertests started breaking here (http://build.chromium.org/p/chromium.linux/builders/Linux%20Tests%20x64/builds/28090). Does not repro when I build locally, but seems to be consistent on the bot (?). Can't find any obvious culprit CL here.
    Lessons from that day:

    • We need to find a way to consistently apply build/install-build-deps.sh updates to all bots, including trybots and webkit bots.

    • Need to check why CQ landed a patch that broke compile on Linux clang. It seems that was not a mid-air collision with the function signature actually changing in the collision window.

    • Changes disabling tests should go through CQ (and trybots) unless really urgent, and they really need a second pair of eyes.

    • linux_cros trybots must run faster (4 hours is too long) - filed http://code.google.com/p/chromium/issues/detail?id=159048

    • some build steps close the tree on failure, some don't; this should be made more consistent, and in fact lean more towards _not_ closing the tree except for serious errors

    • CQ must run win_aura trybot, - or - win aura bot must be removed from tree closers

    3/1/2012, Thursday, jsbell@
    2/21/2012, Tuesday, scottbyer@
    • There are a few tests that seemed to start going flaky after a particular change having to do with removing a singleton. Bugs files, tests disabled for now.
    • There was a nacl roll, which seemed to be OK, but the nacl_integration tests on the Mac became really aggressively flaky by the end of the day. Maybe worth a revert. If you click through on the same bot for 3-4 runs, you can see that the tests that fail change.
    2/20/2012, Monday, scottbyer@
    • Slow day due to holiday, really made the cross-test flakiness stand out. ProcessProxyTest.SigInt on linux cros is really bad, SSLUITest.TestHTTPSExpiredCertAndGoBackViaMenu showed up on a couple of bots, as did VerifyRetryOnConnectionReset in net_unittests.
    11/22/2011, Tuesday, yosin@
    Following failure are known and filed.

    10/13/2011, Thursday

    The reliability bot is sick (see crbug 98703 and internal mail dated Oct 7th, subject: Reliability Bot Errors).

    At 19:00 US Pacific Time, ui_tests NoStartupWindowTest.NoStartupWindowBasicTest fails on Mac. kbr is looking at it, assuming it's caused by http://src.chromium.org/viewvc/chrome?view=rev&revision=105399

    10/9/2011, Sunday

    nacl_integration has been intermittently flaking throughout the weekend on OS X, closing the tree - http://crbug.com/99642 . This differs from http://crbug.com/98293 in that 99642 appears to be a trouble downloading/syncing NaCL runtimes, while 98293 is an intermittent failure to start some process.

    10/7/2011, Friday

    NPAPIVisiblePluginTester.DeletePluginInDeallocate started failing on windows after the webkit roll, as everything else was passing fine I disabled this test on windows (it was already disabled on mac) and added to the existing bug.
    3 tests from sync_integration_tests (AllChanged, Sanity, EncryptedAndChanged) started failing from build 10936. A revert of r104459 got these tests passing again.
    Ran into a perf regression with the 'sizes' step in Linux 64 builder. This was because a V8 roll caused a 400k binary size increase and added 1 new static initializer. A bug was opened on the v8 project and we increased the limits in http://codereview.chromium.org/8198001/ . This process needs to be documented somewhere better.

    10/4/2011, Tuesday

    ProductTest.ProductInstallBasic seems to be very slightly flaky on the Windows trybots. I've seen it show up in a handful of Windows try runs over the last day. (But only showed up once on Tuesday).

    Had a real failure on the ASAN bot which was masked because those two other tests are flaky enough that the bot stays red much of the time. I think I just have to mark those tests flaky.

    nacl_integration failed on someone's Win try job, and then on the main waterfall, but ended up being flake. Log looked like a failed attempt to start a test server.

    10/3/2011, Monday

    OptionsWebUITest.testOpenAllOptionsPages has been mostly failing since Friday on Mac. Marking disabled.
    Memory waterfall has been showing BrowserActionApiTest.CloseBackgroundPage and ExtensionManagementTest.AutoUpdate as flaky they occasionally do not complete. It would be nice to have the flakiness dashboard track the ASAN bot as well.

    Had redness in the afternoon from a CL that failed on the trybots (103795). The trybot failures looked a bit like "something really went wrong with the trybots", so were ignored. The clue was that the Linux trybots failed in the same way. Well, and then the bots on the waterfall, too.
    Comments