For Developers‎ > ‎Design Documents‎ > ‎

Design Plans for Out-of-Process iframes

This page provides high level information on our plans to support out-of-process iframes (OOPIF) in Chromium, as part of the Site Isolation project.  The work is tracked in issue https://crbug.com/99379.

Many parts of Chromium's architecture will need to change to support rendering and interacting with a frame in a different process from its parent page.  Our intended changes include keeping track of frames in multiple processes, routing certain script calls between processes, supporting cross-process navigations in frames, rendering and compositing frames in multiple processes, and sending input events to the correct processes.


Frame Representation

Much of Chromium's tab-level logic is moving to the frame level, since each frame may be rendered by different processes over its lifetime.

Background

In general, documents that have script references to each other need to either live in the same process (if they are same-site) or have a way to route messages to each other (if they are cross-site).  Chromium manages these relationships by putting all documents that can reference each other in the same BrowsingInstance, and by grouping all of a BrowsingInstance's documents into SiteInstances based on site (as described on the Process Models page).  BrowsingInstances often contain only a single tab, but they can include multiple tabs when a page uses window.open or targeted links.  Note that BrowsingInstances are not related to the number of tabs in a window, since tabs created manually are in their own BrowsingInstance.

To support cross-process interactions like postMessage, Chromium must keep a proxy version of the document's DOMWindow in each of the other processes of its BrowsingInstance, as a placeholder.  As shown in the diagram at right, this allows a document from site A to find a proxy DOMWindow in its own process for a tab that is currently active on site B.  The proxy DOMWindow can then forward the postMessage call to the browser and then to the correct document in the process for site B.

With out-of-process iframes, Chromium needs to keep track of proxy DOMWindows for subframes as well as main frames.

Browser Process

In the browser process, Chromium must keep track of the full frame tree for each tab.  WebContents will host a tree of FrameTreeNode objects, mirroring the frame tree of the current page.  Each FrameTreeNode will contain frame-specific information (e.g., the frame's name).  It will be responsible for cross-process navigations in the frame, and it will support routing messages from other processes to the active frame.

Renderer Process

In each renderer process, Chromium must keep track of proxy DOMWindows for each frame in the BrowsingInstance, allowing JavaScript code to find frames and send messages to them.  We aim to minimize the overhead for each of the proxy DOMWindows by not having a document or widget for them.

We plan to pull the frame-specific logic out of the content module's RenderView and RenderViewHost classes, into new RenderFrame and RenderFrameHost classes.  These new classes will allocate their own routing ids, so IPC messages can be targeted to specific frames. We will have a full RenderFrame for every active frame, regardless of process, and we will have a corresponding, slimmed down RenderFrameProxy as a placeholder in the other processes of the BrowsingInstance.  These proxies are shown with dashed lines in the diagram below, which shows one BrowsingInstance with two tabs, containing two subframes each.



Inside Blink

There are now local frames where all the state is in process, and remote frames that are proxies to act on frames that are rendered in other processes. The original Frame DOMWindow classes have been renamed to LocalFrame / LocalDOMWindow. The corresponding classes for remote frames are named RemoteFrame / RemoteDOMWindow.  
It is important to note that remote frames and remote DOM windows have very little state: in general, the only state maintained in the remote proxy objects is data needed to service synchronous operations. For example, a remote DOM window does not have a Document object.

Both LocalFrame and RemoteFrame inherit from a new interface, Frame, which represents operations that are valid on both local and remote frames. It is possible to downcast from Frame to LocalFrame, but having to do this usually indicates that the code structure is suboptimal for OOPIF. In the blink API layer (Source/web), there are also corresponding WebLocalFrame / WebRemoteFrame / WebFrame classes. content::RenderFrame is responsible for managing a WebLocalFrame, while content::RemoteFrameProxy is responsible for managing a WebRemoteFrame

In addition, Blink has the ability to swap any frame in the frame tree between the local and remote versions. Generally, the process of swapping frames should not be observable, but it's worth mentioning, because it will allow the current content::RenderView swap out logic to be removed. The current swap out implementation is tricky to understand, and prone to bugs: code incorrectly assumed that a content::RenderViewHost would never change, it required IPC filtering to make sure a swapped out page didn't try to do something funny, etc.

Finally, the <webview> implementation is being migrated to work on top of this new infrastructure. For the most part, Blink code will be able to treat a <webview> similar to an <iframe> However, there is one important difference: the parent frame of the root frame in an <iframe> is the document that contains the <iframe> element. The root frame of a <webview> has no parent. This implies that frame->page()->mainFrame() == frame->tree().top() is not always true!

Note: This last point is not finalized yet, and may change. It feels a bit too subtle, and it's unknown what sorts of issues it may cause. The rationale behind this is that content inside a <webview> tag should not be able to break out: it needs to believe, for all intents and purposes, that it is the top-level frame. The easiest way to do that right now is to not parent it to the document containing the <webview> tag.

These changes have several major implications for Blink:
  • Page's main frame may be local or remote. There are a number of places that assume that main frame will always be local: for example, the current drag and drop implementation always uses the main frame's event handler to perform hit-testing. This will need to change.
  • As a generalization of the previous point, any given frame in the frame tree may be remote. Thus, code like the web page serializer, which currently depends on iterating through all the frames in one process to generate the saved web page, will need to be rewritten.
  • Layout / rendering was formerly coordinated by the main frame. This (and other code like this) needs to change to use a new concept: the local frame root. The local frame root for a given LocalFrame A is the highest level LocalFrame that is a part of a contiguous LocalFrame subtree that includes frame A. In code form:
        LocalFrame* localRootFor(LocalFrame* f) {
            LocalFrame* r = frame;
            while (r && r->parent()->isLocalFrame())
                r = toLocalRoot(r->parent());
            return r;
        }
    These contiguous LocalFrame subtrees are important in Blink because they synchronously get layout, paint, event routing, etc.
Some further information on the refactoring goals can be found in the FrameHandle design doc, however that is largely obsolete.

Note: We are attempting to minimize the memory requirements of these swapped out RemoteFrames and RemoteDOMWindows, because there will be many more than in Chromium today.  Today, the space required is O(tabs * processes) within a BrowsingInstance, and most BrowsingInstances only contain 1 or 2 tabs.  This new model would require O(frames * processes) space.  This could be much higher, because the number of frames can be much larger than the number of tabs, and because the number of processes will increase based on cross-site frames.  Fortunately, RemoteFrames require far less memory than LocalFrames, and not all cross-site iframes will require separate processes.

Navigation

Chromium will add support for cross-process navigations within subframes. Rather than letting the renderer process intercept the navigation and decide if the browser process should handle it, all navigations will be intercepted in the browser process's network stack. If the navigation crosses a site boundary that requires isolation (according to our Site Isolation policy), the browser process will be able to swap the frame's renderer process. This can be done because the browser process knows the full frame tree, as described above. Implementing this requires adapting TransferNavigationResourceThrottle into CrossSiteResourceHandler and making sufficient information available on the IO thread to make process swap decisions.  This work was tracked in issue https://crbug.com/238331.

A tab's session history also becomes more complicated when subframes may be rendered by different processes. Currently, Blink takes care of tracking the frame tree in each HistoryItem in the renderer process, and the browser process just tracks each back/forward entry using NavigationEntry. We will remove the frame tracking logic from Blink's HistoryController and keep track of each frame's navigations in the browser process directly.

We will also change the representation of a tab's session history to more closely match the HTML5 spec. Rather than cloning the frame tree for each HistoryItem, we will keep track of each frame's session history separately in the browser process, and we will use a separate "joint session history" list for back and forward navigations. Each entry in this list will have a tree of pointers to each frame's corresponding session history item. We expect this to require changes to the session restore logic as well.

All details of navigation refactoring are described in a design document.


Rendering

To render an iframe in a different process than its parent frame, the browser process will pass information back and forth between the renderer processes and help the GPU process composite the images together in the correct sizes and locations.  We expect to use the Surfaces implementation to maintain a set of textures from multiple renderer processes, compositing them into a single output image.

The design for rendering is encapsulated in a separate document.

Input Events

We are continuing to investigate the changes required to support hit testing and delivering input events to the correct iframe process without having to ask multiple renderer processes.  As with rendering, we expect to use the Surfaces implementation to do hit testing in the browser process to deliver input events directly to the intended frame's renderer process.  We also need to manage focus in the browser process to send keyboard events directly to the renderer process of the focused frame.



Discussions/Questions

The mailing list for technical discussions on Site Isolations and Out-of-Process iframes is site-isolation-dev@chromium.org.