Developers tend to hard-code messages in the code, in their native language, or more often in English, and we need to provide simple enough message replacement system for them to prevent that behavior.
There are two approaches we can use, one widely known Firefox ENTITY replacement, and the other Google Gadgets are using.
Google gadgets are based on XML spec, which carries some metadata and html/js of a gadget.
Each spec lists supported locales, and links to locale files. Any item can be replaced within the gadget spec (urls, messages...). Substitution is done in container code (iGoogle, Orkut...), where we control the whole process (fallback order for example).
Message catalogs are in XML format (to better support our translation pipeline).
Public API, sample gadget spec and message bundles could be found at http://code.google.com/intl/sr-RS/apis/gadgets/docs/i18n.html.
Firefox is using XUL files for their extensions (XML format). XML parser automatically replaces DTD ENTITYs within a XML document given DTD file(s). For details see how to localize firefox extensions.
Problem with this approach is that we don't actually have XML/XHTML files tied to an extension. Also, we may want to implement more flexible fallback algorithm.
We use HTML/JS to develop extensions, and to keep metadata about extensions (manifest) vs. XUL files for Firefox.
We should use modified Google Gadget approach since they are too HTML/JS entities:
See details below.
Only some locales will have all of the messages translated, or resources generated. Some locales may be completely missing. In both cases Chrome should gracefully fall back to what's available.
To do that we need to order locales in tree like structure based on locale identifiers.
We support larger set of locales than Chrome UI. Current list is (as of 35300:
am, ar, bg, bn, ca, cs, da, de, el, en, en_GB, en_US, es, es_419, et, fi, fil, fr, gu, he, hi, hr, hu, id, it, ja, kn, ko, lt,
lv, ml, mr, nb, nl, or, pl, pt, pt_BR, pt_PT, ro, ru, sk, sl, sr, sv, sw, ta, te, th, tr, uk, vi, zh, zh_CN, zh_TW
To avoid hard-coding strings, developer should use message placeholders in the code/static files.
Message concatenation is usually a bad thing, and should be avoided, but it's possible with __MSG_msg_1__ + __MSG_msg_2__.
Message placeholders and message bodies have simple key-value structure, which can be implemented as:
Proposed JSON format:
"message": "message text - short sentence or even a paragraph with a optional placeholder(s)",
"description": "Description of a message that should give context to a translator",
"content": "Actual string that's placed within a message.",
"example": "Example shown to a translator."
"message": "Hello $YOUR_NAME$",
"description": "Peer greeting",
"message": "Bye from $CHROME$ to $YOUR_NAME$",
"description": "Going away greeting",
There are couple of possible forms message can take:
Same message ID should exist only once per catalog. If there are duplicates - detected when packing extension - we should ask developer to remove them.
Dealing with plural forms is hard. Each language has different rules and special cases. To avoid complexity we are going to use plural neutral form.
Instead of saying "11 file were moved" we could say "Files moved: 11".
This is a valid solution in most cases.
Chrome will automatically replace all message placeholders when loading static files (html, js, manifest...) given the current browser UI language.
Scripts may want to use messages from different locales, or to fetch resources and replace message placeholders in them dynamically.
For that we may need:
There would be a _locales subdirectory under main extension directory.
It would contain N subdirectories named as locale identifiers (sr, en_US, en, en_GB, ...).
Each locale_identifier subdirectory can contain only one messages.json file.
Extension manifest has an optional "default_locale": "language_country" field that points to default language. Some edge cases:
Default locale is used as final fallback option if message couldn't be found for current locale.
Manifest file contains metadata about extension in JSON format.
When loading manifest file, Chrome should replace all __MSG_msg_name__ identifiers with messages from the catalog and then process the final object.
New tab page and possibly some other static content. We currently use google2 template system? which is somewhat an overkill for couple of pages.
We could deliver message catalogs for each locale as part of installation package, and use message placeholders in new tab source.
All absolute urls (like href, src...) should be pointed to __MSG_some_url__, and each locale could provide separate implementation (image, script...).
On loading extension files - html, js - Chrome would replace all __MSG_some_url__ with actual, locale specific, url.
Local resources, like <img src="foo/bar.png"> should be auto resolved to _locale/current_locale/foo/bar.png or if that resource is missing to fallback location.