Overview This document describes the implementation details of thread-local storage in Native Client on each level of abstraction: syscalls and machine codes, integrated runtime (IRT), libc (newlib, glibc). It's highly desirable for the reader to have understanding how TLS is implemented on Linux. A good explanation of this topic can be found in Ulrich Drepper, ELF Handling For Thread-Local Storage The implementation details which are different from Linux are driven by few points: * statically linked binary does not have an access to the ELF headers and can't know the alignment of .tdata and .tbss. (16 bytes alignment is enforced to address this issue, see below) * x86: Due to Windows/Linux differences, it's not possible to use segment registers (gs/fs) to obtain the thread pointer (__nacl_read_tp is used for that; the implementation uses nacl syscalls described below) * IRT has its own TLS that means there're two TLS blocks for every untrusted thread. Implementation System calls Native Client service runtime provides the following system calls to support TLS: NACL_sys_tls_init, NACL_sys_tls_get - set/get NaCl module thread pointer NACL_sys_second_tls_set, NACL_sys_second_tls_get - set/get IRT thread pointer This layer is hidden from the NaCl module which never invokes system calls directly and uses IRT public interface instead. This is required to support a stable ABI for NaCl modules, because the implementation of service runtime and the list of syscalls is the subject to change. It's not even guaranteed that the list of nacl syscalls is the same on different architectures and/or operation systems. The example of usage of these syscalls can be found here: Machine codes The following methods are used to retrieve the thread pointer on different architectures: x86-32: %gs:0x0 is the primary method to access $tp. 2022f: 65 a1 00 00 00 00 mov %gs:0x0,%eax 20235: 8b 80 64 fb ff ff mov -0x49c(%eax),%eax -mtls-use-call option will enforce virtualized access to the thread pointer (required in case of IRT, see below) 2025b: e8 e0 b9 01 00 call 3bc40 <__nacl_read_tp> 20260: 8b 98 60 fb ff ff mov -0x4a0(%eax),%ebx x86-64: 2027b: e8 60 f4 01 00 callq 3f6e0 <__nacl_read_tp> 20280: 89 c0 mov %eax,%eax 20282: 41 8b 84 07 64 fb ff mov -0x49c(%r15,%rax,1),%eax ARM: ; load an address of offset 20104: e3000fa4 movw r0, #4004 ; 0xfa4 20108: e3410003 movt r0, #4099 ; 0x1003 2010c: e320f000 nop {0} ; read sandboxing 20110: e3c00103 bic r0, r0, #-1073741824 ; 0xc0000000 ; load an offset into r1 20114: e5901000 ldr r1, [r0] ; load $tp into r0 20118: e1a00009 mov r0, r9 ; get the actual address 2011c: e0800001 add r0, r0, r1 ; read sandboxing 20120: e3c00103 bic r0, r0, #-1073741824 ; 0xc0000000 ; load the value of TLS variable into r0 20124: e5900000 ldr r0, [r0] -mtls-use-call will enforce virtualized access to $tp (see IRT case): 20104: e3000fa4 movw r0, #4004 ; 0xfa4 20108: e3410003 movt r0, #4099 ; 0x1003 2010c: e320f000 nop {0} 20110: e3c00103 bic r0, r0, #-1073741824 ; 0xc0000000 20114: e5901000 ldr r1, [r0] 20118: e320f000 nop {0} 2011c: eb006813 bl 3a170 <__aeabi_read_tp> 20120: e0800001 add r0, r0, r1 20124: e3c00103 bic r0, r0, #-1073741824 ; 0xc0000000 20128: e5900000 ldr r0, [r0] __aeabi_read_tp is a part of ARM ABI for TLS. It calls __nacl_read_tp, but preserves all registers except r0. The implementation of __aeabi_read_tp, __nacl_read_tp and other relevant code can be found here: IRT: NaCl module: Integrated Runtime (IRT) IRT provides a stable, backward compatible interface to NaCl module. Once compiled, NaCl module will run forever even if a newer version of NaCl runtime is implemented. The following interface is defined to support TLS: #define NACL_IRT_TLS_v0_1 "nacl-irt-tls-0.1" struct nacl_irt_tls { int (*tls_init)(void *thread_ptr); void *(*tls_get)(void); }; It can be obtained via the standard TYPE_nacl_irt_query / nacl_interface_query mechanism. The relevant code is: Newlib and Glibc C libraries hide these details from the application programmer. They implement pthread library using __nacl_read_tp and IRT interfaces. Defining a TLS variable in C code is no different from the usual GCC-compatible code: /*initialized thread-local variable; goes into .tdata section */ __thread int tdata1 = 1; /* non-initialized thread-local variable; goes into .tbss section */ __thread int tbss1; /* this variable is aligned by 16 bytes. This is the maximum valid alignment for statically linked NaCl module, see below */ __thread int tdata2 __attribute__((aligned(0x10))) = 2; Initialization of TLS for statically linked module The important quirk of Native Client runtime is that ELF headers are not accessible from the statically linked untrusted code (NaCl module or IRT, which is technically just a special statically linked NaCl module). It means that the alignment of .tdata and .tbss section can be learnt at runtime. At the moment, 16-bytes alignment of .tdata and .tbss is required. See the implementation: To examine actual alignment of TLS sections, readelf -S can be used: krasin@krasin7:~/nacl/native_client$ readelf -S lala.nexe There are 15 section headers, starting at offset 0x505b0: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 00020000 010000 01a190 00 AX 0 0 16 [ 2] .rodata PROGBITS 10020000 030000 000748 00 A 0 0 8 [ 3] .eh_frame PROGBITS 10020748 030748 000388 00 A 0 0 4 [ 4] .tdata PROGBITS 10030ae0 040ae0 000480 00 WAT 0 0 16 [ 5] .tbss NOBITS 10030f60 040f60 000020 00 WAT 0 0 16 [ 6] .init_array INIT_ARRAY 10030f60 040f60 000004 00 WA 0 0 4 [ 7] .fini_array FINI_ARRAY 10030f64 040f64 000004 00 WA 0 0 4 [ 8] .data.rel.ro PROGBITS 10030f68 040f68 0000ac 00 WA 0 0 4 [ 9] .data PROGBITS 10040000 050000 000500 00 WA 0 0 16 [10] .bss NOBITS 10040500 050500 001428 00 WA 0 0 16 [11] .ARM.attributes ARM_ATTRIBUTES 00000000 050500 00002d 00 0 0 1 [12] .shstrtab STRTAB 00000000 05052d 000080 00 0 0 1 [13] .symtab SYMTAB 00000000 050808 001b00 10 14 394 4 [14] .strtab STRTAB 00000000 052308 001b4e 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) This alignment is enforced by linker scripts. See, for example, this CL: http://codereview.chromium.org/8243015/ |
