└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Aggregated Reporting API 2 | 3 | This is a proposal for a new Web Platform API that allows for collapsing 4 | information across multiple sites into a single, privacy preserving 5 | report. This is made possible by a write-only per-origin data store that 6 | flushes data to a reporting endpoint after reaching aggregation 7 | thresholds across many clients. That is, data is only reported if it is 8 | sufficiently aggregated across browser users, using a server-side 9 | aggregation service. 10 | 11 | Note: We are still working on ideas for how to make aggregate 12 | measurement work. This is just a strawman interface proposal. Most 13 | likely an API like this would be backed by something like the 14 | [Aggregation Service](https://github.com/WICG/conversion-measurement-api/blob/master/SERVICE.md) 15 | proposal. 16 | 17 | Motivation 18 | ========== 19 | 20 | This API, like the [Conversion Measurement 21 | API](https://github.com/csharrison/conversion-measurement-api), 22 | intends to provide a well-lit path for ads measurement without needing 23 | to use consistent cross-site identifiers like third party cookies. This 24 | allows ad measurement in a much more privacy preserving manner. 25 | 26 | In this case, the API provides critical functionality to measure the 27 | *reach* of a particular ad campaign (how many distinct users saw the 28 | ad). This functionality is useful for other types of third party widgets 29 | as well. 30 | 31 | Simple Sample Strawman Usage 32 | ============================ 33 | 34 | The API is built around a Javascript layer similar to `localStorage`. It 35 | explicitly allows third party iframes to use this for third party 36 | storage. 37 | 38 | Example: Reporting total ad views for a campaign 39 | ------------------------------------------------ 40 | 41 | **On every impression related to campaign-123:** 42 | 43 | ``` 44 | // Add an entry to the storage with id 'campaign-123' if it doesn’t 45 | // already exist. 46 | 47 | const entryHandle = window.writeOnlyReport.get('campaign-123'); 48 | 49 | // Each entry supports simple string kv pairs. If the attribute is 50 | // already present, it will be overridden. This can be used e.g. 51 | // for demographic slices. 52 | entryHandle.set('country', 'usa'); 53 | 54 | // Entry attributes support appends. If the attribute does not exist, is 55 | // equivalent to |set|. This allows us to count in unary. 56 | // Multiple visits will look like “11111..." 57 | entryHandle.append(“visits”, “1”); 58 | 59 | // Entries can be configured to report after a given time. After the 60 | // time has passed, entries are queued for reporting, become immutable, 61 | // and removed from the entry table. The UA will add additional 62 | // randomized delay on reporting for privacy reasons (e.g. up to a day). 63 | // After a report is configured for reporting it cannot be altered by 64 | // subsequent calls to reportAfter. 65 | entryHandle.reportAfter(2 * kMsecPerDay); 66 | 67 | // Entries can optionally be set to expire without reporting if 68 | // reportAfter is not called. All entries have a default expiry 69 | // of seven days, with a max expiry of a month. 70 | entryHandle.expireAfter(7 * kMsecPerDay); 71 | ``` 72 | 73 | This snippet will end up sending the following report, assuming the 74 | aggregation service has seen > T identical reports: 75 | ``` 76 | { 77 | 'entryName': 'campaign-123', 78 | 'country': 'usa', 79 | 'visits': '1' 80 | } 81 | ``` 82 | Using this data on the server side, ad tech can find distributions of ad 83 | views across all their users, for a given reporting window. 84 | 85 | Example: Reach measurement for an ad campaign 86 | --------------------------------------------- 87 | 88 | The reach of a campaign is the number of unique clients that saw 89 | impressions for it. This can easily be done with this API, via keying 90 | aggregated reports off of a campaign id. 91 | 92 | **On every impression related to campaign-123:** 93 | ``` 94 | const entryHandle = window.writeOnlyReport.get('campaign-123'); 95 | 96 | // Add any demographic slices you want or know in the current 97 | // context. 98 | entryHandle.set('country', 'usa'); 99 | 100 | // Add a date field, so there’s no confusion with regard to 101 | // reporting delays. 102 | entryHandle.set('date', new Date().toDateString()); 103 | 104 | // Every night, queue a report per user that saw the campaign at 105 | // least once, with their demographic information, as long as 106 | // there are enough identical reports. 107 | entryHandle.reportAfter(msecFromNowUntilMidnight()); 108 | ``` 109 | 110 | Example: Number of different domains a 3p widget is encountered on, per user 111 | ---------------------------------------------------------------------------- 112 | 113 | ### If you can recognize the same user over time on the same domain: 114 | 115 | **On every impression related to widget-123:** 116 | ``` 117 | const entryHandle = window.writeOnlyReport.get('widget-123'); 118 | 119 | // Filter out repeat views on this domain using first party state. 120 | if (!haveSeenWidgetOnThisDomainSinceLastReport('widget-123')) { 121 | entryHandle.append('distinct-domains', '1'); 122 | } 123 | 124 | entryHandle.reportAfter(2 * kMsecPerDay); 125 | ``` 126 | 127 | ### If you cannot recognize the same user over time on the same domain: 128 | 129 | Some browsers may limit read/write access to storage inside cross-domain 130 | iframes, making the filtering from the previous solution unavailable. 131 | Here is another technique. 132 | ``` 133 | const entryHandle = window.writeOnlyReport.get('widget-123-domains'); 134 | 135 | // Record this domain 136 | 137 | entryHandle.add('viewed-on-' + document.location.ancestorOrigins[0], 138 | '1'); 139 | 140 | entryHandle.reportAfter(2 * kMsecPerDay); 141 | ``` 142 | 143 | Because of the thresholding requirement, this will only generate reports 144 | about sufficiently common sets of domains. If your widget is embedded in 145 | too many different domains for that to be useful, consider replacing 146 | each domain with e.g. `hash(domain) % 100`. 147 | 148 | Advanced Example: Calibrating a frequency capping model 149 | ------------------------------------------------------- 150 | 151 | “Frequency capping” an ad campaign is a feature in which a given ad 152 | campaign can be shown to a particular person only a certain limited 153 | number of times in some time period. (Both advertisers and people who 154 | see ads are happier if they don't see the same ad too often as they 155 | browse the web.) 156 | 157 | Typically this is done via third party cookies, to keep track of a 158 | views-per-ad count across publisher sites. However, when third-party 159 | state is unavailable or otherwise undesirable, this feature could be 160 | approximated by using per-publisher frequency caps, along with some 161 | global, aggregated data for calibration. 162 | 163 | Suppose you wanted to model frequency caps as: 164 | 165 | ``` 166 | fcap_domainX = fcap_global * 167 | (typical #ads seen per user on domain X / 168 | typical #ads seen overall, for users who visited domain X) 169 | ``` 170 | 171 | Then this API could provide the distribution you need for the 172 | denominator of that fraction, for every domain where (enough) users see 173 | the ad campaigns: 174 | 175 | ``` 176 | const thisDomain = document.location.ancestorOrigins[0]; 177 | 178 | const thisDomainMod = someHash(thisDomain) % 100; 179 | 180 | for (let i = 0; i < 100; i++) { 181 | const entryHandle = window.writeOnlyReport.get( 182 | 'report-for-domain-mod-' + i); 183 | 184 | entryHandle.append('ads-seen-on-all-domains', '1'); 185 | 186 | entryHandle.expireAfter(msecUntilMidnight() + 5 * kMsecPerMinute); 187 | 188 | if (i === thisDomainMod) { 189 | entryHandle.set('this-domain-is-' + thisDomain, '1'); 190 | entryHandle.append('ads-seen-on-this-domain-mod', '1'); 191 | entryHandle.reportAfter(msecUntilMidnight()); 192 | } 193 | } 194 | ``` 195 | This will result in daily reports showing a site someone visited plus 196 | the total number of ads they saw across all sites. There will be some 197 | reports from users who visited multiple sites with the same 198 | hash-mod-100, which can be recognized by having multiple 199 | "this-domain-is-..." lines in the report; the number 100 can be changed 200 | if these collisions cause too many problems. 201 | 202 | Reporting 203 | ========= 204 | 205 | At specific time intervals, the browser will queue all entries in 206 | storage for reporting. The API itself must ensure somehow that the 207 | revealed data is private in some way. This could be achieved by 208 | integrating with the [Aggregation Service](https://github.com/WICG/conversion-measurement-api/blob/master/SERVICE.md) 209 | proposal in some way, by forming aggregation keys from the serialized 210 | reports. 211 | 212 | Restrictions for Performance and Privacy 213 | ======================================== 214 | 215 | Limit on number of pending reports 216 | ---------------------------------- 217 | 218 | Pending reports take up storage on the client’s device, so there should 219 | be some limits on the total storage this API can use per origin. 220 | 221 | Limit on number of reports per time period 222 | -------------------------------------------------- 223 | 224 | Additionally, some restriction on the number of reports per time period 225 | seems reasonable, to put a limit on the rate of data about any user that 226 | can be learned. 227 | 228 | Note that for a use case like reach measurement, the demand for this API 229 | is at least the number of ads seen per time period per origin, if every 230 | campaign wants to implement this kind of measurement. 231 | --------------------------------------------------------------------------------