the Chromium logo

The Chromium Projects

Sandboxing ChromeOS system services

In ChromeOS, OS-level functionality (such as configuring network interfaces) is implemented by a collection of system services and provided to Chrome over D-Bus. These system services have greater system and hardware access than the Chrome browser.

Separating functionality like this aims to prevent malicious websites from gaining access to OS-level functionality. If Chrome were able to directly control network interfaces, then a compromise in Chrome would give an attacker almost full control over the system. For example, by having a separate network manager, we can reduce the functionality exposed to an attacker to just querying interfaces and performing pre-determined actions on them.

ChromeOS uses a few different mechanisms to isolate system services from Chrome and from each other. We use a helper program called Minijail (executable minijail0). In most cases, Minijail is used in the service's init script. In other cases, the Minijail library is used if a service wants to apply restrictions to the programs that it launches, or to itself.

These different sandboxing mechanisms are described in the ChromeOS sandboxing talk (internal only).

The “forbidden intersection”

The forbidden intersection is:

You must avoid the forbidden intersection by having at least one of, preferably more than one of, and ideally all of:

You don't normally need both Seccomp and SELinux but for very security-sensitive workloads this can be required.

Best practices for writing secure system services

Just remember that code has bugs, and these bugs can be used to take control of the code. An attacker can then do anything the original code was allowed to do. Therefore, code should only be given the absolute minimum level of privilege needed to perform its function.

Aim to keep your code lean, and your privileges low. Don't run your service as root. If you need to use third-party code that you didn't write, you should definitely not run it as root.

Use the libraries provided by the system/SDK. In ChromeOS, libchrome and libbrillo (née libchromeos) offer a lot of functionality to avoid reinventing the wheel, poorly. Don't reinvent IPC; use D-Bus or Mojo. Don't open listening sockets; connect to the required service.

Don't (ab)use shell scripts. Shell script logic is harder to reason about and shell command-injection bugs are easy to miss. If you need functionality separated from your main service, use programs written in a primary programming language like C++ or Rust, not shell scripts. Moreover, when you execute them, consider further restricting their privileges.

Just tell me what I need to do

User IDs

The first sandboxing mechanism is user IDs (UIDs). We try to run each service as its own UID, different from the root user, which allows us to restrict what files and directories the service can access, and also removes a big chunk of system functionality that's only available to root. For example, see the permission_broker service's /etc/init/permission_broker.conf file:

start on starting system-services
stop on stopping system-services

# Run as 'devbroker' user.
exec minijail0 -u devbroker -c 'cap_chown,cap_fowner+eip' -- \

Minijail's -u argument forces the target program (in this case permission_broker) to be executed as the devbroker user, instead of root. This is equivalent of doing sudo -u devbroker.

The user (devbroker in this case) needs to first be added to the build system database (example for a different user).

Next, the user needs to be installed on the system (example, again for a different user).

See the ChromeOS user accounts README for more details.

There's a test in the CQ that keeps track of the users present on the system that request additional access (e.g. listing more than one user in a group). If your user does that, the test baseline has to be updated at the same time the accounts are added with another CL (example). If you're unsure whether you need this, the CQ will reject your CL when the test fails, so if the tests pass, you should be good to go!

You can use Cq-Depend to land the CLs together (see How do I specify the dependencies of a change?).

chronos-access membership requires SELinux

The forbidden intersection notwithstanding, if a service accesses user data by having its UID in the chronos-access group, it must run with an enforcing SELinux domain. These services are accessing data owned by the chronos user, which could be malicious. A compromised Chrome browser could modify or corrupt chronos-owned data and attempt to escalate privileges by exploiting or confusing a service accessing this data.

SELinux is useful because it provides finer-grained control over what the service is allowed to do. Exploitation in these cases doesn't happen via memory corruption. Instead, the attacker will set up user data to confuse the service and trick it into performing valid operations on the wrong filesystem objects. This is usually referred to as a confused deputy attack. For example, a service might be tricked into mounting what it believes is a USB drive. In reality, however, it ends up mounting a virtual image on top of an existing file or directory, bypassing our write-XOR-execute restrictions.

Seccomp is not a great option to prevent this type of attack because it's not granular enough. In the previous example, since the service is allowed to perform mounts, the mount(2) system call has to be allowed. Seccomp does not have the ability to filter path arguments to system calls, so it would be impossible to restrict path arguments to the mount call. SELinux, on the other hand, allows us to restrict a service to only perform operations (like mount calls) on specific paths in the filesystem.


Some programs, however, require some of the system access usually granted only to the root user. We use Linux capabilities for this. Capabilities allow us to grant a specific subset of root's privileges to an otherwise unprivileged process. The link above has the full list of capabilities that can be granted to a process. Some of them are equivalent to root, so we avoid granting those. In general, most processes need capabilities to configure network interfaces, access raw sockets, or performing specific file operations. Capabilities are passed to Minijail using the -c switch. permission_broker, for example, needs capabilities to be able to chown(2) device nodes.

From permission_broker.conf:

start on starting system-services
stop on stopping system-services

# Run as <devbroker> user.
exec minijail0 -u devbroker -c 'cap_chown,cap_fowner+eip' -- \

Capabilities are expressed using the format that cap_from_text(3) accepts.


Many resources in the Linux world can be isolated now such that a process has its own view of things. For example, it has its own list of mount points, and any changes it makes (unmounting, mounting more devices, etc...) are only visible to it. This helps keep a broken process from messing up the settings of other processes.

For more in-depth details, see the namespaces overview.

In ChromiumOS, we like to see every process/daemon run under as many unique namespaces as possible. Many are easy to enable/rationalize about: if you don't use a particular resource, then isolating it is straightforward. If you do rely on it though, it can take more effort.

Here's a quick overview. Use the command line option if the description below matches your service (or if you don't know what functionality it's talking about -- most likely you aren't using it!).

When does a process need to run in the init mount or PID namespace?

Almost all processes do not need to run in the init namespace. Please contact chromeos-security@ for a consultation if you believe that your process needs to run in the init mount or PID namespace.

Passing common resources

When using many namespaces to isolate a service, there are some resources that the service still reasonably should be able to access.

*** note If bind-mounting on top of /run, you need to mount a tmpfs /run:

-k 'none,/run,tmpfs,MS_NODEV|MS_NOEXEC|MS_NOSUID,mode=755,size=10M'

If bind-mounting on top of /sys, you need to mount a tmpfs /sys:

-k 'none,/sys,sysfs,MS_NODEV|MS_NOEXEC|MS_NOSUID,mode=755,size=10M'

When does a process need to run in the Chrome mount namespace?

Note: Before utilizing these namespaces, please consult with the chromeos-security@ team to make sure it's used correctly.

ChromeOS Guest sessions run in an isolated mount namespace bound to the path /run/namespaces/mnt_chrome. This means Chrome runs in the non-init mount namespace at the path and Cryptohome mounts user profile directories and Daemon stores in this namespace. Regular sessions, on the other hand, don't have this session isolation yet.

Any process that needs to access user data during a Guest session must run in the Chrome mount namespace. For regular sessions however the namespace isolation is not active. The namespace exists for all sessions, however regular user sessions setup the user home directories in the root mount namespace to handle data propagation between ARCVM, Linux VM and other system parts. Therefore processes must not enter the mount namespace during a regular user session.

Most processes that access user data utilize daemon stores, which are already mounted in the Chrome mount namespace. However, if a new process needs to access user data from user cryptohome by explicitly entering the Chrome mount namespace by calling setns(2) or nsenter(1), it can do so by querying the state of the session isolation from the browser process. Here is an example CL for this approach.

Mount propagation type guidance

When creating a new mount or entering a new mount namespace, an important consideration is the mount propagation mode of the mount. Some background on mount types can be found in Linux kernel mount documentation, but a brief summary is:

For how to change the mount propagation mode when entering a new mount namespace see the section for -K[mode] in minijail0(1).

When does a mount need to be shared?

Mounts need to be shared if and only if mount/unmount events need to flow between namespaces in both directions. If mounts only need to flow from one namespace to the other, then they must be shared on the parent namespace but can be mounts-flow-in on the child namespace. Making mounts shared increases the possibilities for interaction between processes that could undermine security of the system and user namespace separation.

Seccomp filters

Removing access to the filesystem and to root-only functionality is not enough to completely isolate a system service. A service running as its own UID and with no capabilities has access to a big chunk of the kernel API. The kernel therefore exposes a huge attack surface to non-root processes, and we would like to restrict what kernel functionality is available for sandboxed processes.

The mechanism we use is called Seccomp-BPF. Minijail can take a policy file that describes what syscalls will be allowed, what syscalls will be denied, and what syscalls will only be allowed with specific arguments. The full description of the policy file language can be found in [the syscall_filter.c source].

Abridged policy for mtpd on amd64 platforms:

# Copyright 2012 The ChromiumOS Authors
# Use of this source code is governed by a BSD-style license that can be
# found in the LICENSE file.
read: 1
ioctl: 1
write: 1
timerfd_settime: 1
open: 1
poll: 1
close: 1
# Don't allow mmap with both PROT_WRITE and PROT_EXEC.
mmap: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE
mremap: 1
munmap: 1
# Don't allow mprotect with PROT_EXEC.
mprotect: arg2 in ~PROT_EXEC
lseek: 1
# Allow socket(domain==PF_LOCAL) or socket(domain==PF_NETLINK)
socket: arg0 == 0x1 || arg0 == 0x10
# Allow PR_SET_NAME from libchrome's base::PlatformThread::SetName()
prctl: arg0 == 0xf

Any syscall not explicitly mentioned, when called, results in the process being killed. The policy file can also tell the kernel to fail the system call (returning -1 and setting errno) without killing the process:

# execve: return EPERM
execve: return 1

*** note NOTE: mmap and mprotect both have argument filters to prevent writeable executable memory since that makes certain classes of attacks much easier. In most cases mprotect does not need PROT_EXEC, but you might have to use arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE just like mmap in cases where child processes are executed and need to dynamically link shared libraries or the code implements a JIT compiler.

Generating Seccomp policies using audit (on 4.14+ kernels)

On kernels 4.14 and above we can use the new SECCOMP_RET_LOG return value to make policy generation easier. On these kernels, the -L Minijail option will use SECCOMP_RET_LOG as the return value for blocked syscalls: those not listed in the policy or whose arguments don't match the policy. Instead of killing the process on a blocked syscall, the kernel will log the otherwise blocked syscall but will effectively allow it.

The advantage of this mechanism versus what we have available in pre-4.14 kernels is that instead of having to add syscalls to the policy one by one, you can run the process with -L, get a list of all the syscalls not included in the policy, review them, and automatically generate or augment a policy, all in one step.

*** note NOTE: Minijail's -L flag requires minijail to be built with USE=cros-debug. Generally, this means that it will not work out of the box in prebuilt (e.g Goldeneye, CPFE, etc.) OS images.

Our recommended way of using this functionality is to start with an empty policy which will cause all syscalls to be logged-but-allowed. The resulting audit logs (at /var/log/audit/audit.log*) can then be parsed with the script to automatically generate a policy. There's a bit of extra setup required and some associated caveats. Please see the detailed instructions at's README section on using Linux audit logs to generate policy.

This mechanism can also be combined with the strace-based mechanism below: run the process to be sandboxed under strace, generate a base policy using the policy generation script, and then refine it using -L.

Generating a seccomp policy using strace

This is the old and familiar way of generating policies by inspecting syscalls using strace. It does not have any kernel version dependencies and also does not require a minijail build with USE=cros-debug. Similar to audit logs above, the script can accept strace logs from an unsandboxed process to generate a policy.

*** note NOTE: When creating strace logs for arm64, make sure you're running it in arm64 userland as most devices that support arm64 kernels run 32-bit arm userland by default. The image running on the device should be built for 64-bit arm userland e.g. kevin device can run the image built for 32-bit arm userland with the --board=kevin flag and run the image for 64-bit arm userland built with the --board=kevin64 flag. You can run file -L /bin/sh command to check which environment you're running on.

Generate and pre-process the strace log

strace -f -o strace.log <program>

When sandboxing a dynamically-linked executable, Minijail will default to using LD_PRELOAD to install the seccomp filter. This will install the filter after glibc initialization, so remove the syscalls related to glibc initialization to obtain a smaller filter (and a tighter sandbox). Those are normally everything up to and including the following:

rt_sigaction(SIGRTMIN, {<sa_handler>, [], SA_RESTORER|SA_SIGINFO, <sa_restorer>}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {<sa_handler>, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, <sa_restorer>}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(NULL)                               = <addr>
brk(<addr>)                             = <addr>

If you want to collect strace logs for an existing service and you already have a test device set up, the following steps can help especially if you are extending an existing policy:

  1. Mount the root-fs read/write. mount -o remount,rw /
  2. Edit the init config file in /etc/init
    • Temporarily disable seccomp if it is present.
    • Add strace -f -o /tmp/strace.log before Minijail is invoked. Note that this will include all the minijail0 syscalls as well, but you can exclude them later.
  3. Reboot or reload the config initctl reload-configuration and restart the service.
  4. Run tests or perform actions that exercise the features of the service.
  5. Collect the resulting file /tmp/strace.log.

Generate policy using strace.log

~/chromiumos/src/platform/minijail/tools/ strace.log > $PROGRAM_NAME.policy

Testing and troubleshooting

WARNING minijail0[32315]: libminijail[32315]: trailing garbage after constant: 'LOOP_GET_STATUS64'
WARNING minijail0[32315]: libminijail[32315]: compile_atom: /usr/share/policy/e2fsck-seccomp.policy(13): invalid constant 'LOOP_GET_STATUS64'
WARNING minijail0[32315]: libminijail[32315]: could not allocate filter block
WARNING minijail0[32315]: libminijail[32315]: compile_filter: compile_file() failed
ERR minijail0[32315]: libminijail[32315]: failed to compile seccomp filter BPF program in '/usr/share/policy/e2fsck-seccomp.policy'

Finding failing syscalls

If a process violates its seccomp policy, it'll be terminated with SIGSYS (bad system call) and you'll see a message like this in /var/log/messages:

WARNING minijail0[1415]: libminijail[1415]: child process 1417 had a policy violation (/usr/share/policy/foo.policy)

There are a couple of ways to find out what syscall caused the violation:

Once you have the syscall number, find out the name of the syscall by looking it up our in online syscalls table or in minijail0 -H (run on the same architecture as the program you're debugging). You can then add the syscall to your policy.

If you do not want to allow an entire syscall, you can only allow certain parameters, e.g. ioctl: arg1 == FDGETPRM. You can find the values of these parameters from a core or minidump using a debugger, as described above.

Installing and applying the generated policy

The policy file needs to be installed in the system, so we need to add it to the ebuild file. For example:


# Install seccomp policy file.
insinto /usr/share/policy
use seccomp && newins "mtpd-seccomp-${ARCH}.policy" mtpd-seccomp.policy

And finally, the policy file has to be passed to Minijail, using the -S option. Again, using mtpd as an example:


# use minijail (drop root, set no_new_privs, set seccomp filter).
# Mount /proc, /sys, /dev, /run/udev so that USB devices can be
# discovered.  Also mount /run/dbus to communicate with D-Bus.
exec minijail0 -i -I -p -l -r -v -t -u mtp -g mtp -G \
  -P /mnt/empty -b / -b /proc -b /sys -b /dev \
  -k tmpfs,/run,tmpfs,0xe -b /run/dbus -b /run/udev \
  -n -S /usr/share/policy/mtpd-seccomp.policy -- \
  /usr/sbin/mtpd -minloglevel="${MTPD_MINLOGLEVEL}"

Securely mounting cryptohome daemon store folders

Some daemons store user data on the user's cryptohome under /home/.shadow/<user_hash>/mount/root/<daemon_name> or equivalently /home/root/<user_hash>/<daemon_name>. For instance, Session Manager stores user policy under /home/root/<user_hash>/session_manager/policy. This is useful if the data should be protected from other users since the user's cryptohome is only mounted (and therefore decrypted) when the user logs in. If the user is not logged in, it is encrypted with the user's password.

However, if a daemon is already running inside a mount namespace (minijail0 -v ...) when the user's cryptohome is mounted, it does not see the mount since mount events do not propagate into mount namespaces by default. This propagation can be achieved, though, by making the parent mount a shared mount and the corresponding mount inside the namespace a shared or MS_SLAVE mount. See shared subtrees.

To set up a cryptohome daemon store folder that propagates into your daemon's mount namespace, add this code to the src_install section of your daemon's ebuild:

local daemon_store="/etc/daemon-store/<daemon_name>"
dodir "${daemon_store}"
fperms 0700 "${daemon_store}"
fowners <daemon_user>:<daemon_group> "${daemon_store}"

This directory is never used directly. It merely serves as a secure template for the chromeos_startup script, which picks it up and creates /run/daemon-store/<daemon_name> as a shared mount.

Next, move the user/group setup to pkg_setup() since pkg_preinst(), where this is usually done, runs after src_install():

pkg_setup() {
	# Has to be done in pkg_setup() instead of pkg_preinst() since
	# src_install() needs <daemon_user> and <daemon_group>.
	enewuser <daemon_user>
	enewgroup <daemon_group>

In your daemon's init script, mount the daemon store folder as MS_SLAVE in your mount namespace. Be sure not to mount all of /run. Make sure to mount with the MS_REC flag to propagate any already-mounted cryptohome bind mounts into the mount namespace.

minijail0 -v -Kslave \
          -k 'tmpfs,/run,tmpfs,MS_NOSUID|MS_NODEV|MS_NOEXEC' \
          -k '/run/daemon-store/<daemon_name>,/run/daemon-store/<daemon_name>,none,MS_BIND|MS_REC' \

During sign-in, when the user's cryptohome is mounted, Cryptohome creates /home/.shadow/<user_hash>/mount/root/<daemon_name>, bind-mounts it to /run/daemon-store/<daemon_name>/<user_hash> and copies ownership and mode from /etc/daemon-store/<daemon_name> to the bind target. Since /run/daemon-store/<daemon_name> is a shared mount outside of the mount namespace and a MS_SLAVE mount inside, the mount event propagates into the daemon.

Your daemon can now use /run/daemon-store/<daemon_name>/<user_hash> to store user data once the user's cryptohome is mounted. Note that even though /run/daemon-store is on a tmpfs, your data is actually stored on disk and not lost on reboot.

Be sure not to write to the folder before the cryptohome is mounted. Consider listening to Session Manager's SessionStateChanged signal or similar to detect mount events. Note that /run/daemon-store/<daemon_name>/<user_hash> might exist even though cryptohome is not mounted, so testing existence is not enough (it only works the first time).

The <user_hash> can be retrieved with cryptohome's GetSanitizedUsername D-Bus method.

The following diagram illustrates the mount event propagation:

Mount propagation diagram

Landlock unprivileged filesystem access control

Landlock is a Linux Security Module that helps manage filesystem access, notably even for unprivileged processes. Minijail supports options to help manage a Landlock policy.

Policies consist of an allowlist of paths, and the specific permissions for a given path. Minijail includes the following flags to help set up a policy:

Creating useful policies

The objective of Landlock is reducing process interactions via the filesystem. As such, creating an overly broad policy that includes RW access to all of /run and /var would substantially diminish the security benefits of Landlock.

Instead, allow a minimal set of paths that you need. For example, if you need to access D-Bus, consider allowing /run/dbus.

Inode-based operation

Policies are based on the state of the filesystem when a policy is applied, rather than a string comparison against path names. Internally, Landlock looks at the inodes that exist when the sandbox is entered, so if you need to create new files or directories you’ll want to specify a Landlock policy that includes RW access one directory level above.

Limitations of Landlock

Landlock cannot be used if the sandboxed process needs to modify its filesystem topology, specifically via mount(2) or pivot_root(2). For additional background, see the official Landlock documentation.

Example Landlock config

Below is an example Landlock config file, for a process that needs access to D-Bus and needs to write to a file in /var/lib/example_daemon. If the config file is named example_daemon.conf, you can pass it to Minijail using --config=example_daemon.conf.

% minijail-config-file v0

# Filesystem access rules.
fs-path-rw = /run/dbus
fs-path-rw = /var/lib/example_daemon

# Other Minijail options....

Minijail wrappers (deprecated)

The Minijail wrappers are currently deprecated. They were designed to allow mocking of individual Minijail configuration settings but we concluded that this was the wrong level to mock Minijail. The mocks were fragile and wordy. A better way to mock Minijail is to just abstract away the entire sandboxed process execution. An example of this can be found in the SandboxedProcess class in debugd.