The goal is to provide a mechanism for symbolic debugging of the untrusted aspects of Native Client applications, while minimizing impact on the Service Run-time development and security. The trusted service run-time should be invisible to the user much the way that OS system calls are invisible during casual application debugging.
Both for speed of development, and portability, we have decided to use the GDB serial debug protocol (a.k.a. RSP - Remote Serial Protocol). This protocol runs between the target (the application being debugged) and the host (the debugger instance, usually GDB). Using RSP allows us to maintain compatibility with GDB based debuggers while focusing on the larger windows audience to enable windows based debugging tools. The stub and link libraries discussed here are a portion of the overall solution.
The entire debugging solution includes connection management, packetization of exchange of messages, control of the target, synchronization with the host, loading, processing, and interpreting debugging (DWARF) information, and interaction with the user through a Visual Studio Plug-In. This document focuses on the communication and target control aspects which together form the dynamically loadable debug stub.
The debug stub is activated by checking for a debugging environment variable. If the environment variable is set, the service run-time will attempt to load and use the debugging module. Once loaded, the debugging module works in co-operation with the parent application (most likely Chrome), for logging debugging information and events. It is advisable that the parent application do something to make the logging obvious such as opening a console style window, to ensure users to do not accidentally weaken the security of the application by accidentally enabling debugging.
Communication with the stub happens through a single library function in the service run-time. This function is in the form of:
int NaClDebugStubCommand(uint32 CMD, void *data, uint32 size);
Underneath, this function checks for the presence of the debug stub module. If found, it automatically prepends version information before calling the dispatch function in the DLL. The function will return zero on success, or an error code on failure. It is not expected that the service run-time will need to handle error cases since the debug stub is not require for correct execution of the Service Run-time. In addition, any events will be logged, providing the user a means to determine if the debugging system is working. However, a non-zero error allows us to indicate that data which would normally be passed back within the data block may not be valid. The actual function exported by the library is:
int NaClDebugStubDispatch(uint32 version, uint32 CMD, void *data, uint32 size);
This technique allows us to centralize checking for the presence of the debug stub as well as validating and or dealing with version compatibility. Within the debug stub, the dispatch function is responsible for validation of the parameters before dispatching to the individual handlers.
The debugging stub is interested in several events including load of the NEXE, load of additional modules, creation of threads, and destruction of threads. These events are raised from the TCB (trusted codebase) by calling NaClDebugStubCommand(). Of course, this function is a no-op if debugging is not enabled.
When an exception takes place, the inserted exception handler will catch the exception or signal, and update it's state in the debug thread object which tracks it (such as storing registers). The thread will then block on an OS synchronization object. The debugging thread created at initialization of the debug stub will eventually detect the exception by scanning through the list of active thread objects at which point it will sleep the other threads and signal to the debugger that the NEXE is in an exception state. When the debugger signals it is time to continue, the debugging thread will mark the debug thread structure with a flag to signal it may continue, it should step, or if the thread should be killed. It will then signal the thread to allow to clean itself up or destroy itself. If the thread continues, it will pull the new register state (in case it was modified during debugging) and update the exception context before returning.
Unfortunately under Windows, the thread context of the excepted thread is stored within the untrusted stack of the thread which took this exception. This opens a security hole where another malicious thread could modify either the exception context itself, or the return pointers within the stack making, any function return unsafe. In the absence of a Windows sigaltstack() implementation, it seems difficult if not impossible to reliably plug this hole.
One possible choice is to allow the user to chose if it wants to support safe or unsafe debugging. In the safe case, all excepted threads would be terminated, while in the unsafe case the debuggers provides the stepping and continue features of a normal debugger.
While exceptions should never happen within a system call, it is possible that while one thread takes an exception, another is in a system call. It is also possible that the debugger has signaled a force break and the thread of interest is within a system call. When the debug stub is asked for context info for a thread, it checks first to see if the app thread was in a system call. If so, it uses the thread context preserved on entry into the system call.
When a thread is stopped in user code, we allow alterations to its current context. This is not true for threads stopped in syscalls. We do not copy the context information from the debug thread object into the app thread before waking it up, since this would lead to unexpected behavior. Instead the stub will signal the debugger with an error if an attempt is made to modify the registers of a thread within a system call.
The RSP is in the form of transactions, consisting of a command issued by the host, which is then acked by the target. The target then responses which in turn is acked by the host. Generally transactions originate only from the Host to the Target, and the protocol can be processed in lock step. This is common for debugging stubs, since it simplifies the implementation. However the protocol supports an optional sequence field which allows for alignment of responses. The exception to this rule is ”Step” or “Continue” commands and “Stop” Packets. Step and Continue commands do not expect an immediate response. Instead they expect the host to resuming running and not reply until execution is halted again. Encoding Data travels between the Host and Target in the form of packets. A packet will begin with the “Ready” signal which is ‘$’ (dollar). It then can be followed by an optional two hex digit sequence number and ‘:’ (colon). Next is the command or response code and the optional data payload. Finally the packet is terminated with a number sign ‘#’ followed by a two hex digit check sum.
This protocol assumes that:
The communication library divides the protocol into three layers link, transport, application. The link layer is provided by a vitalized GDBLink object which could be either a socket or system pipe. The transport layer is managed by the GDBTransport which computes the checksum and provides the appropriate ACK/NAK. Finally the application layer involves the sending and receiving the payload in the form of GDBMessage objects which provides a mechanism for streaming various data types in and out of a packet.
Stop packets are packets generated by the target when it enters a stopped state. For example, when the target hits a signal or exception, it will emit a Stop packet which the debugger uses as a signal to know the target is stopped and is ready to receive debugging commands.
Minimal debugging support can be accomplished through reading and writing both registers and memory, as well as signalling when a host should continue, or a when it is stopped. While it is possible to support windows debugging with a minimal set, modern versions of GDB expect certain other commands to be available as well, and the lack of those features may cause GDB to incorrectly determine that the target is 32 bit instead of 64.
Retrieve registers from ‘Active’ thread context.
Set registers of ‘Active’ thread context. Command
Select the thread for subsequent operations (m', M', g', G', et.al.). c depends on the operation to be performed: it should be c' for step and continue operations, g' for other operations. The thread designator t... may be -1, meaning all the threads, a thread number, or zero which means pick any thread. Reply: Command
Retrieve memory range from the target process. Command
Set memory range in the target process.
Determine if a particular thread ID is alive Command
It is possible (and expected in the GDB case) that the target is already stopped when the host connects. Command
Query packets are an extension of the basic debugging. The query mechanism is extensible, and while custom queries are allowed, they would not be understood by a vanilla GDB. The following is the minimum set of standard queries found to be required by a 64b GDB. qSupported - Supported Queries Requests a list of supported queries. While in theory it is safe to send an empty response to a query to signal it is not supported, in practice, 64b GDB appears to require the Xfer:features:read query to determine if the target is 32 or 64 bit. Command
Obtain a comma separated list of active thread ids from the target. Since there may be too many active threads to fit into one reply packet, this query works iteratively. The qfThreadInfo requests the list begining at the first ID. qsThreadInfo requests a list of subsequent threads. The list is terminated with a lower case ‘l’ (Lower case “L”). Command
Requests a list of supported queries. While in theory it is safe to send an empty response to a query to signal it is not supported, in practice, 64b GDB appears to require the Xfer:features:read query to determine if the target is 32 or 64 bit. Command
‘Ctrl-C’ Break, ‘s’ Step, or ‘c’ Continue (Stop Requests) Both step and continue with cause execution in the target to start. A response will not be sent until execution of the target is halted again. Break is a request to stop a currently running system. Output “responses” may be sent while execution is taking place. These responses are informational, and the host should wait for an actual stop response. Command
The following packets are expected while the target is either running, or stopping. These reponses are also returned when
While RSP was developed originally for serial (UART) communication, it is often run over TCP. Since TCP is a loss less transport, the ACKs are superfluous. In addition, since GDB supports the concept of a sequence number per packet, it becomes possible for the Host and the Target to communicate asynchronously instead of in lockstep. The target is required to respond to a particular request with it’s sequence number, and since Target side packets do not expect a response (see Stop Packets), it is possible to support a multi-threaded system so long as each packet itself is sent atomically.