For Developers > Design Documents > Sync >

Protection against data override by old Sync clients

Overview

Problem

This document outlines the necessary steps to prevent data loss scenarios with Sync's Model API for multi-client Sync users. A typical problematic scenario is as follows:

New proto field F is introduced in data specifics (e.g. PasswordSpecifics).
Client N (a newer client) submits a proto containing the introduced field F.
Client O (an older client) receives the proto, but doesn’t know the field F, discarding it before storing in the local model.
Client O submits a change to the same proto, which results in discarding field’s F data from client N.

To prevent the described data loss scenario, it is necessary for the old client (client O above) to keep a copy of a server-provided proto, including unknown fields (i.e. fields not even defined in the .proto file at the time the binary was built) and partially-supported fields (e.g. functionality guarded behind a feature toggle). The logic for caching these protos was implemented in DataTypeLocalChangeProcessor.

To have this protection for a specific datatype, its DataTypeSyncBridge needs to be updated to include the cached data during commits to the server (more details in the Implementation).

Checklist

To implement this solution, a Sync datatype owner should follow these steps:

Override TrimAllSupportedFieldsFromRemoteSpecifics function (see this section).
[Optional] Add DCHECK to local updates flow (see this section).
Include unsupported fields in local changes (see this section).
Redownload the data on browser upgrade (see this section).
[Optional] Add a version field to proto in the same milestone.
[Optional] Add sync integration test (see this section).

The result of these steps is that:

Local updates will carry over unsupported fields received previously from the Server.
Initial sync will be triggered if upgrading the client to a more modern version causes an unsupported field to be newly supported.
Modern clients may use the version field to discard some fields which were populated by the client which carried over them as unsupported fields (for cases where the carry-over-value behavior is not desired and the feature should rather remain unset or populated differently).

Implementation

Trimming

Storing a full copy of a proto may have performance impact (memory, disk). The Sync infrastructure allows and encourages to trim proto fields that do not need an additional copy (if the field is already well supported by the client).

Trimming is a functionality that allows each data type to specify which proto fields are supported in the current browser version. Any field that is not supported will be cached by the DataTypeLocalChangeProcessor and can be used during commits to the server to prevent the data loss.

Fields that should not be marked as supported:

Unknown fields in the current browser version
Known fields that are just defined and not actively used, e.g.:
- Partially-implemented functionality
- Functionality guarded behind a feature toggle

TrimAllSupportedFieldsFromRemoteSpecifics is a function of DataTypeSyncBridge that:

Takes a sync_pb::EntitySpecifics object as an argument.
Returns sync_pb::EntitySpecifics object that will be cached by the DataTypeLocalChangeProcessor.
By default trims all proto fields.

To add a data-specific unsupported fields caching, override the trimming function in the data-specific DataTypeSyncBridge to clear all its supported fields (i.e. fields that are actively used by the implementation and fully launched):

sync_pb::EntitySpecifics DataSpecificBridge::TrimAllSupportedFieldsFromRemoteSpecifics(
   const sync_pb::EntitySpecifics& entity_specifics) const {
 sync_pb::EntitySpecifics trimmed_entity_specifics = entity_specifics;
 {...}
 trimmed_entity_specifics.clear_username();
 trimmed_entity_specifics.clear_password();
 sync_pb::SpecificsSubmessage* submessage = trimmed_entity_specifics.mutable_submessage();
 mutable_submessage->clear_field();
 if (mutable_submessage->ByteSizeLong() == 0u) {
   trimmed_entity_specifics.clear_submessage();
 }
 {...}
 return trimmed_entity_specifics;
}

Safety check

Forgetting to trim fields that are supported might result in:

I/O, memory overhead (caching unnecessary data)
Unnecessary sync data redownloads on browser startup (more details below)

To prevent this scenario, add a check that:

Takes a local representation of the proto (containing supported fields only)
Makes sure that trimming it would return an empty proto

This should be done before every commit to the Sync server:

DCHECK_EQ(TrimAllSupportedFieldsFromRemoteSpecifics(datatype_specifics.ByteSizeLong()), 0u);

Local update flow

To use the cached unsupported fields data during commits to the server, add the code that does the following steps:

Query cached sync_pb::EntitySpecifics from the DataTypeLocalChangeProcessor (Passwords example).
Use the cached proto as a base for a commit and fill it with the supported fields from the local proto representation (Passwords example).
- Make it sure that all known fields are explicitly cleared or set to avoid preserving known fields in some corner cases.
Commit the merged proto to the server (Passwords example).

Browser upgrade flow

To handle the scenario when unsupported fields become supported due to a browser upgrade, add the following code to your data-specific DataTypeSyncBridge:

On startup, check whether the unsupported fields cache contains any field that is supported in the current browser version. This can be done by using the trimming function on cached protos and checking if it trims any fields (Passwords example).
If the cache contains any fields that are already supported, simply force the initial sync flow to deal with any inconsistencies between local and server states (Passwords example).

It’s important to implement the trimming function correctly, otherwise the client can run into unnecessary sync data redownloads if a supported field gets cached unnecessarily.

If the trimming function relies on having data-specific field present in the sync_pb::EntitySpecifics proto (example), make sure to skip entries without these fields present in the startup check (as e.g. cache can be empty for entities that were created before this solution landed). This can be tested with the following Sync integration test.

Integration test

Add a Sync integration test for the caching / trimming flow.

Limitations

Sync support horizon

The proposed solution is intended to be a long-term one, but it will take some time until it can be used reliably. This is due to the facts that:

Browser clients need to actually have the version with the mentioned implementation merged.
Sync supported version horizon is pretty long (multiple years).

Deprecating a field

Deprecated fields should still be treated as supported to prevent their unnecessary caching.

Migrating a field

This requires client-side handling as the newer clients will have both fields present and the legacy clients will have access to the deprecated field only. Newer clients should:

Keep filling the deprecated field for legacy clients to use
Add a logic to pick a correct value from a deprecated and new field to account for updates from legacy clients

The Chromium Projects