Sheriff FAQ: Chromium OS ASAN bots

This doc describes ChromiumOS ASAN bots. Currently they are: 

Triggered on chromium-os commits:
1. x86 generic ASAN (config name x86-generic-asan) incremental bot on chromiumos waterfall,
2. amd64 generic ASAN (config name amd64-generic-asan) incremental bot on chromiumos waterfall,

Triggered on chromium commits:
3. Chromium OS (x86) ASAN (config name x86-generic-tot-asan-informational) pfq bot on chromium.memory waterfall,
4. Chromium OS (amd64) ASAN (config name x86-generic-tot-asan-informational) pfq bot on chromium.memory waterfall.

Note that these bots are not to be mixed with "Linux Chromium OS ASAN Builder/Testers". Those are actually Chromium bots with GYP_DEFINES="chromeos=1".

Sheriff brief guideline

The main idea of the bots is to use Clang as the compiler and its instrumentation, Address Sanitizer.

AddressSanitizer (ASAN) is a fast memory error detector based on compiler instrumentation (LLVM). It is fully usable for Chrome on Linux and Mac. 

If the bot goes red on VMTest step or Archive step, look up artifacts. They should contain top-level files asan_log.1234.txt where 1234 is the PID of the failing process. Logs contain detailed information about the failure. Use it to file a bug, see this one as an example.

Note that Archive step will go red if there is at least one asan_log file found. This is made so because I noticed a rare case when ASAN caught a bug (and created log), but no test failed (and VMTest step remained green).

How to suppress a bug

When you filed a bug, but don't want to revert a change for some reason, you may suppress the offending function (top of your stack trace) like this:

// Mask it out for ASAN before the bug is fixed, see crbug.com/...
#if defined(ADDRESS_SANITIZER)
__attribute__((no_address_safety_analysis))
#endif  // defined(ADDRESS_SANITIZER)
void BuggyFunction() { ... }

Note that, as a result, the bug may become a SEGV and the bot will still be red. If this happens and you want to suppress it, you may add some filtering to the buildbot code that make bot red after asan_log: see GenerateStackTraces() in chromeos_public/chromite/buildbot/cbuildbot_commands.py.

Details

All mentioned ChromiumOS ASAN bots are doing the same thing with slight differences:

1. x86 generic ASAN bot is triggered by ChromiumOS commits and tests x86-generic board,
2. amd64 generic ASAN bot is triggered by ChromiumOS commits and tests amd64-generic boad,
3. Chromium OS (x86) ASAN is triggered by Chromium commits and tests x86-generic board,
4. Chromium OS (amd64) ASAN is triggered by Chromium commits and tests amd64-generic boad.

Note that chromiumos waterfall bots (1-2) build last known good Chrome, so if the ASAN bug creeps into Chrome code, bots 3-4 will go red immediately, while bots 1-2 may still be green for a day or so.

Following are the differences between ChromiumOS ASAN bots and similar incremental / pgq bots (like x86-generic-incremental / x86-generic-tot-chrome-pfq-informational).

By default, ChromiumOS ASAN report asan errors to var/log/asan.PID. Also, one can modify the environment variable ASAN_OPTIONS=log_path=PATH 

Then every process will write error reports to PATH.PID 

BuildTarget step differences

The bots use asan profile (chromeos/src/overlays/overlay-x86-generic/profiles/asan/) to override some settings.
Mostly it adds USE="clang asan" and CC=CXX=clang. So Chrome package (currently the only one, more in future) is built with Clang/ASAN.

If you ever need to try asan profile locally, do 
setup_board --board=x86-generic --profile=asan --force 
then run build_packages as usual. Note here that only x86-generic and amd64-generic boards have asan profile. For other boards you may do manually what asan profile says.

VMTest step differences

Currently VMTest step runs the same sute_Smoke autotests as other ChromiumOS bots do. One small exception: logging_AsanCrash test. It makes main Chrome process simulate memory bug and makes sure ASAN catches it. If not, test goes red.

We are working to add more tests to increase code coverage.

Archive step differences

As mentioned above, Archive step goes red if there is at least one asan_log file in /var/log/chrome directory. Additionally, for every such file, the log is symbolized and put to the top-level artifacts. Note that logging_AsanCrash test remove its asan_logs so they don't break things.

Run time differences

session_manager_setup.sh adds some environment variables to the corresponding processes environment. See "if use_flag_is_set asan" and below.

The most important is that it restricts Chrome from using sandbox (adds --no-sandbox flag).

Updating Clang revision periodically

We keep Chromium OS clang and its dependent llvm packages updated. As a reference, I use Gentoo repo of sys-devel. When updating local clang, I try to make it as close as possible to Gentoo.

As a signal to update Chromium OS Clang I watch at Chrome's tools/clang/scripts/update.sh (added myself to the WATCHLIST for 'clang_update'). Chrome Clang guys do the good job to test the new version before they commit.

To try the change, I start remote buildbot for the change: cbuildbot x86-generic-asan amd64-generic-asan --remote -g <gerrit-change-id>.

Currently ChromiumOS ASAN bots are the only users of Clang in ChromiumOS. However, llvm package is also used by mesa.

From the bots point of view, we want to use the most recent stable version of Clang to have the most edge ASAN features working. From the mesa point of view, edge llvm may not work. So we use chromiumos-overlay/profiles/targets/chromeos/package.mask to restrict latest llvm to be pulled to target chroots. Remember to update it when llvm version changes.

SVN <-> Git mapping

The git clang/llvm repos are pulled from git.  Since the main llvm repo is maintained in subversion, and the the git repos are mirrors of the svn repo, and the ebuilds are versioned based on the svn repo, you have to map the two.

Every git commit has metadata in it that tells you the svn rev it maps to.  As an example:

Clean up whitespace and indentation a bit

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173960 91177308-0d34-0410-b5e6-96231b3b80d8

This git commit f578c89dc6ca3e79667c2aa9d0ac4fe409da7773 maps to svn rev 173960.

With this information in hand, you can update the git commit sha1 in the ebuilds.

Note that since upstream llvm has a single svn repo for all of its projects, the svn rev in the git tree might not be there exactly.  So you have to find the commit in there that is closet to (but does not exceed) the svn rev you want.

Comments