User Dictionary

Proposal Date
Jan 31, 2012.

Who is the primary contact for this API?
Jun Mukai (mukai@)

Who will be responsible for this API? (Team please, not an individual)

Mozc team


Overview
The purpose of this API is to allow third parties to develop extensions for user dictionary to built-in CJ input methods.

Use cases
This API is used to create some additional user interfaces for user dictionary.
User dictionary offers additional word set to CJK input methods such like uncommon family names, foreign imported words, and even shorthand inputs (よ → よろしくお願いします).  There would be several ideas of possible user interface to achieve this -- a dialog to register word (http://crosbug.com/18958), table-based editor (http://crosbug.com/18959), or importer of some external resource such like Google spreadsheet.
The main target of this API is the built-in (Chinese and Japanese) input methods, because the additional input methods (added by Input Method Editor API) can have such UI by themselves.

Do you know anyone else, internal or external, that is also interested in this API?
Zach Kuznia: the maintainer of “IME” extension API
Japanese/Chinese IME vendors: to allow users to customize their IMEs with a standard APIs.

Could this API be part of the web platform?
No, this is rather a part of IME extension API so this won’t be a part of HTML5 because this API is used entirely in client-side and works independently upon sites.

Do you expect this API to be fairly stable?  How might it be extended or changed in the future?
One of our core tenets is that we try to maintain backwards-compatibility, so that we don't break any extensions when updating Chrome. Therefore, we prefer to expose APIs to more mature parts of the system that are less likely to change.

List every UI surface belonging to or potentially affected by your API:
This API itself does not offer direct user interface but changes the behaviors of input methods, i.e. put additional words in the lookup table.

How could this API be abused?
  • Fingerprinting #1: badapp stuffs unique words in the dictionary in order to track the user across other properties.
    • Because we ask ‘experimental’ permission in the ‘manifest.json’ of the extension, and in the future, it will be another more specific name of permission.  So users will notice that an extension may edit the user dictionary before installs.
    • For tracking, localStorage would be much handy so nobody use it instead.
  • Fingerprinting #2: badapp looks for already registered unique words (address, name) to track a user across other properties
    • same to Fingerprinting #1.
  • Denial of functionality #1: badsite/badapp carpet bombs the user dictionary with common words rewritten to something else (random stuff, spam...)
    • it could happen.  we may want to cap the # of entries and storage size.  Not sure what is the exact number, and I don’t think we should explicitly describe the cap.
    • Potentially the risk is at the same level of bookmark operation and history operation APIs and they are guarded by ‘permission’ only.  So we should guard this API by permission too so users immediately notice this risk.
  • Denial of functionality #2: badsite/badapp deletes every words in the user dictionary
    • Potentially the risk is at the same level of bookmark operation and history operation APIs and they are guarded by ‘permission’ only.  So we should guard this API by permission too so users immediately notice this risk.
  • Denial of functionality #3: badsite/badapp may add as many words as possible and it will lead the slowdown of Chrome itself, especially for boot.
    • We will have a cap for both # of words and total size of registered words, and further registration will fail if it exceeds the cap. Then we can prevent this issue.
  • Privacy leak: badsite/badapp retrieves user dictionaries and publicized them later (e.g. address shortcuts, phone numbers and maybe credit cards numbers(!?) get leaked out)
    • Same as ‘Denial of functionality #2’
Imagine you’re Dr. Evil Extension Writer, list the three worst evil deeds you could commit with your API (if you’ve got good ones, feel free to add more):
  1. Assume that the user registers a word of privacy info (for example, “me” to my name). My extension can scan all the registered word and if find such one, I can put the data to evilsite.com or somewhere. In this case, I need to the permission to the access of evilsite.com, so it should seem problematic.
  2. My extension automatically register some offensive/problematic words from usual readings. That would surprise users.
  3. no idea...

Alright Doctor, one last challenge:  
Could a consumer of your API cause any permanent change to the user’s system using your API that would not be reversed when that consumer is removed from the system?
The user dictionary would be stored as a part of user profile.  That data should be removed when the user profile is removed.

How would you implement your desired features if this API didn't exist?
There are no other ways to handle user dictionary for the built-in input methods in ChromeOS, and there are no standards to handle user dictionary among input methods.  Because there are several user interface designs for user dictionary, we have to implement each UI one by one by ourselves.

Draft API spec
chrome.input.dictionary.registerWords
  • Descriptions
    • Register new word-reading pairs.  This is stored in the disk, so you don’t need to call this method every time.
    • If there are already a pair of the same reading but a different word, that’s ok.  Both are stored into the disk independently, i.e. getRegisteredWord may return duplicated entries.
    • Entries can be distinguished by its entry_id generated on the registration.  entry_id is unique among the system and not changed over the time, but no guarantee an entry_id is reused after the delete of the entry.
    • If there are already the same pair, it only updates the ‘tag’.  The tag will be union of the specified tag and the existing one.
  • Parameters
    • words (array of Object):
      • input (string) - the user input to get the word
        • The format of “reading” field is up to the engine.  Most Japanese engines will use Hiragana characters, and some pinyin engines will use latin characters (pinyin representations).
      • word (string) - the word which users want to get
      • engine (string) - the name of input method engine
      • tag (optional Object) - additional data used by the engine and/or extensions
        • ex: the value for ‘pos’ is used by Japanese input method for part-of-speech not to put the word in improper context.
        • semantics of tag is up to the engine and the extension.
    • callback (optional Function(Boolean success, string entry_id) {}) - Called when the operation completes.
      • success indicates whether the operation succeeded,  on failure, chrome.extension.lastError is set.
      • When succeed, a unique id is created for the entry and set to “entry_id” argument.
      • “entry_id” is undefined on failure.
chrome.input.dictionary.removeWords
  • Descriptions
    • Remove registered word-reading pairs.
    • An extension can remove word-reading pairs which were added by the extension itself.
  • Parameters
    • entry_ids (array of string) - the IDs for the entries to be removed.
    • callback (optional Function(Boolean success) {}) - Called when the operation completes. success indicates whether the operation succeeded,  on failure, chrome.extension.lastError is set.
      • It will fail if the specified reading/word pair does not exist or is already removed.  Put “does not exist” to the chrome.extension.lastError message.
chrome.input.dictionary.updateWord
  • Descriptions
    • Update an existing word-reading pair.
    • entry_id is consistent.  there are no way to edit the id.
  • Parameters
    • entry_id (string) - the ID for the entry to be updated.
    • update (Object) - the key-value pairs which denotes the update.  All pairs are optional.  If the caller does not specify the new value for the key, this method does not update the field.
      • input (string) - the new user input
      • word (string) - the new word
      • engine (string) - the new engine for the pair
      • tag (Object) - the new tag.  This completely substitutes existing tags.  All existing tags are wiped out and only specified tags are added.
    • callback (optional Function(Boolean success) - Called when the operation completes.  success indicates whether the operation succeeded, on failure, chrome.extension.lastError is set and the entry are not changed.
      • It will fail if there is no entry for the entry_id or the entry is already removed.  Put “does not exist” to the chrome.extension.lastError message.
chrome.input.dictionary.getRegisteredWords
  • Descriptions
    • Get all the registered words for an engine.  You can use ‘filter’ argument to filter the words.
    • caution: registerWord method allows duplicated entries.  If extensions has added duplicated entries, this will return all as individual items.  Deduping and/or merging items are the responsibility of the user apps.
  • Parameters
    • engine (string) - the name of input method engine
    • filter (optional Object) - key-value pairs to filter the result words.
      • AND if more than one pairs are specified, i.e. {‘k1’: ‘v1’, ‘k2’, ‘v2’} means entries which have ‘v1’ for ‘k1’ and ‘v2’ for ‘k2’
      • no filters if tag is not specified
    • callback (optional Function(Array words) {}) - Called when the operation completes.  The argument is the array of the words.  The order of words is undefined.
      • word (Object):
        • input (string)
        • word (string)
        • entry_id (string)
        • tag (Object) - key-value pairs
chrome.input.ime.list_active_engines
  • Description
    • returns all of the input engines currently activated
  • Parameters
    • callback (Function(Array engines){}) - Called when the operation completes.  The argument is the array of the engines.  When failed, it returns an empty array to callback and chrome.extension.lastError is set.
      • engine (Object):
        • name(string) - the name to be used in the dictionary API
        • the Object may have other fields

Open questions
--
Comments