the Chromium logo

The Chromium Projects

Chromium String usage

Types of Strings

In the Chromium code base, we use std::string and std::u16string. Blink uses blink::String instead, which is patterned on std::string, but is a slightly different class (see the docs for their guidelines, we’ll only talk about Chromium here). We also have a StringPiece[16] class, which is basically a pointer to a string that is owned elsewhere with a length of how many characters from the other string form this “token”. Finally, there is also blink::WebString, which is used by the Blink glue layer.

String Encodings

We use a variety of encodings in the code base. UTF-8 is most common, but we also use ASCII and UTF-16.

When to use which encoding

The most important rule here is the meta-rule, code in the style of the surrounding code. In the frontend, we use std::string/char for UTF-8 and std::u16string16/char16_t for UTF-16 on all platforms. Even though std::string is encoding agnostic, we only put UTF-8 into it. std::wstring/wchar_t is rarely used in cross-platform code (in part because it's differently-sized on different platforms), but common in Windows-specific code to interface with native APIs (which often take wchar_t* or similar). Most UI strings are UTF-16. URLs are generally UTF-8. Strings in the webkit glue layer are typically UTF-16 with several exceptions. Chromium code does not use UTF-32.

The GURL class and strings

One common data type using strings is the GURL class. The constructor takes a std::string in UTF-8 for the URL itself. If you have a GURL, you can use the spec() method to get the std::string for the entire URL, or you can use component methods to get parsed parts, such as scheme(), host(), port(), path(), query(), and ref(), all of which return a std::string. All the parts of the GURL with the exception of the ref string will be pure ASCII. The ref string may have UTF-8 characters which are not also ASCII characters.

Guidelines for string use in our codebase