└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Aggregated Reporting API
  2 | 
  3 | This is a proposal for a new Web Platform API that allows for collapsing
  4 | information across multiple sites into a single, privacy preserving
  5 | report. This is made possible by a write-only per-origin data store that
  6 | flushes data to a reporting endpoint after reaching aggregation
  7 | thresholds across many clients. That is, data is only reported if it is
  8 | sufficiently aggregated across browser users, using a server-side
  9 | aggregation service.
 10 | 
 11 | Note: We are still working on ideas for how to make aggregate
 12 | measurement work. This is just a strawman interface proposal. Most
 13 | likely an API like this would be backed by something like the
 14 | [Aggregation Service](https://github.com/WICG/conversion-measurement-api/blob/master/SERVICE.md)
 15 | proposal.
 16 | 
 17 | Motivation
 18 | ==========
 19 | 
 20 | This API, like the [Conversion Measurement
 21 | API](https://github.com/csharrison/conversion-measurement-api),
 22 | intends to provide a well-lit path for ads measurement without needing
 23 | to use consistent cross-site identifiers like third party cookies. This
 24 | allows ad measurement in a much more privacy preserving manner.
 25 | 
 26 | In this case, the API provides critical functionality to measure the
 27 | *reach* of a particular ad campaign (how many distinct users saw the
 28 | ad). This functionality is useful for other types of third party widgets
 29 | as well.
 30 | 
 31 | Simple Sample Strawman Usage
 32 | ============================
 33 | 
 34 | The API is built around a Javascript layer similar to `localStorage`. It
 35 | explicitly allows third party iframes to use this for third party
 36 | storage.
 37 | 
 38 | Example: Reporting total ad views for a campaign
 39 | ------------------------------------------------
 40 | 
 41 | **On every impression related to campaign-123:**
 42 | 
 43 | ```
 44 | // Add an entry to the storage with id 'campaign-123' if it doesn’t
 45 | // already exist.
 46 | 
 47 | const entryHandle = window.writeOnlyReport.get('campaign-123');
 48 | 
 49 | // Each entry supports simple string kv pairs. If the attribute is
 50 | // already present, it will be overridden. This can be used e.g.
 51 | // for demographic slices.
 52 | entryHandle.set('country', 'usa');
 53 | 
 54 | // Entry attributes support appends. If the attribute does not exist, is
 55 | // equivalent to |set|. This allows us to count in unary.
 56 | // Multiple visits will look like “11111..."
 57 | entryHandle.append(“visits”, “1”);
 58 | 
 59 | // Entries can be configured to report after a given time. After the
 60 | // time has passed, entries are queued for reporting, become immutable,
 61 | // and removed from the entry table. The UA will add additional
 62 | // randomized delay on reporting for privacy reasons (e.g. up to a day).
 63 | // After a report is configured for reporting it cannot be altered by
 64 | // subsequent calls to reportAfter.
 65 | entryHandle.reportAfter(2 * kMsecPerDay);
 66 | 
 67 | // Entries can optionally be set to expire without reporting if
 68 | // reportAfter is not called. All entries have a default expiry
 69 | // of seven days, with a max expiry of a month.
 70 | entryHandle.expireAfter(7 * kMsecPerDay);
 71 | ```
 72 | 
 73 | This snippet will end up sending the following report, assuming the
 74 | aggregation service has seen > T identical reports:
 75 | ```
 76 | {
 77 |  'entryName': 'campaign-123',
 78 |  'country': 'usa',
 79 |  'visits': '1'
 80 | }
 81 | ```
 82 | Using this data on the server side, ad tech can find distributions of ad
 83 | views across all their users, for a given reporting window.
 84 | 
 85 | Example: Reach measurement for an ad campaign
 86 | ---------------------------------------------
 87 | 
 88 | The reach of a campaign is the number of unique clients that saw
 89 | impressions for it. This can easily be done with this API, via keying
 90 | aggregated reports off of a campaign id.
 91 | 
 92 | **On every impression related to campaign-123:**
 93 | ```
 94 | const entryHandle = window.writeOnlyReport.get('campaign-123');
 95 | 
 96 | // Add any demographic slices you want or know in the current
 97 | // context.
 98 | entryHandle.set('country', 'usa');
 99 | 
100 | // Add a date field, so there’s no confusion with regard to
101 | // reporting delays.
102 | entryHandle.set('date', new Date().toDateString());
103 | 
104 | // Every night, queue a report per user that saw the campaign at
105 | // least once, with their demographic information, as long as
106 | // there are enough identical reports.
107 | entryHandle.reportAfter(msecFromNowUntilMidnight());
108 | ```
109 | 
110 | Example: Number of different domains a 3p widget is encountered on, per user
111 | ----------------------------------------------------------------------------
112 | 
113 | ### If you can recognize the same user over time on the same domain:
114 | 
115 | **On every impression related to widget-123:**
116 | ```
117 | const entryHandle = window.writeOnlyReport.get('widget-123');
118 | 
119 | // Filter out repeat views on this domain using first party state.
120 | if (!haveSeenWidgetOnThisDomainSinceLastReport('widget-123')) {
121 |   entryHandle.append('distinct-domains', '1');
122 | }
123 | 
124 | entryHandle.reportAfter(2 * kMsecPerDay);
125 | ```
126 | 
127 | ### If you cannot recognize the same user over time on the same domain:
128 | 
129 | Some browsers may limit read/write access to storage inside cross-domain
130 | iframes, making the filtering from the previous solution unavailable.
131 | Here is another technique.
132 | ```
133 | const entryHandle = window.writeOnlyReport.get('widget-123-domains');
134 | 
135 | // Record this domain
136 | 
137 | entryHandle.add('viewed-on-' + document.location.ancestorOrigins[0],
138 |                 '1');
139 | 
140 | entryHandle.reportAfter(2 * kMsecPerDay);
141 | ```
142 | 
143 | Because of the thresholding requirement, this will only generate reports
144 | about sufficiently common sets of domains. If your widget is embedded in
145 | too many different domains for that to be useful, consider replacing
146 | each domain with e.g. `hash(domain) % 100`.
147 | 
148 | Advanced Example: Calibrating a frequency capping model
149 | -------------------------------------------------------
150 | 
151 | “Frequency capping” an ad campaign is a feature in which a given ad
152 | campaign can be shown to a particular person only a certain limited
153 | number of times in some time period. (Both advertisers and people who
154 | see ads are happier if they don't see the same ad too often as they
155 | browse the web.)
156 | 
157 | Typically this is done via third party cookies, to keep track of a
158 | views-per-ad count across publisher sites. However, when third-party
159 | state is unavailable or otherwise undesirable, this feature could be
160 | approximated by using per-publisher frequency caps, along with some
161 | global, aggregated data for calibration.
162 | 
163 | Suppose you wanted to model frequency caps as:
164 | 
165 | ```
166 | fcap_domainX = fcap_global *
167 |   (typical #ads seen per user on domain X / 
168 |    typical #ads seen overall, for users who visited domain X)
169 | ```
170 | 
171 | Then this API could provide the distribution you need for the
172 | denominator of that fraction, for every domain where (enough) users see
173 | the ad campaigns:
174 | 
175 | ```
176 | const thisDomain = document.location.ancestorOrigins[0];
177 | 
178 | const thisDomainMod = someHash(thisDomain) % 100;
179 | 
180 | for (let i = 0; i < 100; i++) {
181 |   const entryHandle = window.writeOnlyReport.get(
182 |     'report-for-domain-mod-' + i);
183 | 
184 |   entryHandle.append('ads-seen-on-all-domains', '1');
185 | 
186 |   entryHandle.expireAfter(msecUntilMidnight() + 5 * kMsecPerMinute);
187 | 
188 |   if (i === thisDomainMod) {
189 |     entryHandle.set('this-domain-is-' + thisDomain, '1');
190 |     entryHandle.append('ads-seen-on-this-domain-mod', '1');
191 |     entryHandle.reportAfter(msecUntilMidnight());
192 |   }
193 | }
194 | ```
195 | This will result in daily reports showing a site someone visited plus
196 | the total number of ads they saw across all sites. There will be some
197 | reports from users who visited multiple sites with the same
198 | hash-mod-100, which can be recognized by having multiple
199 | "this-domain-is-..." lines in the report; the number 100 can be changed
200 | if these collisions cause too many problems.
201 | 
202 | Reporting
203 | =========
204 | 
205 | At specific time intervals, the browser will queue all entries in
206 | storage for reporting. The API itself must ensure somehow that the
207 | revealed data is private in some way. This could be achieved by
208 | integrating with the [Aggregation Service](https://github.com/WICG/conversion-measurement-api/blob/master/SERVICE.md)
209 | proposal in some way, by forming aggregation keys from the serialized
210 | reports.
211 | 
212 | Restrictions for Performance and Privacy
213 | ========================================
214 | 
215 | Limit on number of pending reports
216 | ----------------------------------
217 | 
218 | Pending reports take up storage on the client’s device, so there should
219 | be some limits on the total storage this API can use per origin.
220 | 
221 | Limit on number of reports per time period
222 | --------------------------------------------------
223 | 
224 | Additionally, some restriction on the number of reports per time period
225 | seems reasonable, to put a limit on the rate of data about any user that
226 | can be learned.
227 | 
228 | Note that for a use case like reach measurement, the demand for this API
229 | is at least the number of ads seen per time period per origin, if every
230 | campaign wants to implement this kind of measurement.
231 | 


--------------------------------------------------------------------------------