RAPPOR reports consist of randomly generated data that is biased based on data collected from the user. Data from many users can be aggregated to learn information about the population, but little or nothing can be concluded about individual users from their reports. Descriptions of individual metrics should be found in tools/metrics/rappor/rappor.xml For full technical details of the algorithm, see the RAPPOR paper. For information on how to use RAPPOR, its strengths and weaknesses, and common caveats, see the RAPPOR 101 document. AlgorithmThe first time the RapporService is started, a client will generate and save a random 128-byte secret key, which won't change and is never transmitted to the server. It will also assign itself to a random cohort. For each metric we collect, we store a Bloom filter, represented as an array of m bits. Each cohort uses a different set of hash functions for the bloom filter. When the RapporService is passed a sample for recording, it sets bits in the Bloom filter for that metric. For example, with the "Settings.HomePage2" metric, which is collected only for users who opt-in to UMA, our Bloom filter will be an array of 128 bits, and one or two of those bits will be set based on the eTLD+1 of the user's homepage. Once we have collected samples, and are ready to generate a report, we take the array of bits we've gathered for the metric and introduce two levels of noise by taking the following steps.
The cohort that the client that belongs to and the results from the above process are sent to the server. The large amount of randomness means that we can't draw meaningful conclusions from a small number of reports. Even if we aggregate many reports from the same user, they include the same pseudo-random noise in all of their reports of the same value, so we are effectively limited to one report for each distinct value. Indeed, even with infinite amounts of data on a RAPPOR statistic, there are strict bounds on how much information can be learned, as outlined in more detail at http://arxiv.org/abs/1407.6981. In particular, the data collected from any given user or client contains such significant uncertainty, and guarantees such strong deniability, as to prevent observers from drawing conclusions with any certainty. Adding metrics To add a new rappor metric, you need to add a bit of code to collect your sample, and add your metric to tools/metrics/rappor/rappor.xml. Samples should be recorded on the UI thread of the browser process. For most use cases, you will want to use one of the helper methods in components/rappor/rappor_utils.h (formerly chrome/browser/metrics/rappor/sampling.h) e.g.
"Settings.HomePage2", GURL(homepage_url)); If you need to do something more specific you may need to call RapporService::RecordSample directly, e.g.
If you collect multiple samples for the same metric in one reporting interval (currently 30 minutes), a single sample will be randomly selected for generating the randomized report. Remember to add documentation for your metric to tools/metrics/rappor/rappor.xml
Code overviewCL/49753002 introduces a Sample CollectionIn order to collect samples, we call Currently, samples may only be collected in the browser process, but Report generation and uploadingThe other function of The
The proto is passed to the LogUploader . It stores all of the logs it is passed in a queue, and sends them to the server. When uploads fail, it retries with exponential backoff. For now, if chrome exits before the logs are uploaded, they are lost. We may implement caching unsent logs in prefs similar to UMA in the future. |
For Developers > Design Documents >