├── .prettierrc.json ├── .pr-preview.json ├── img ├── transparency.png ├── element_capture_mock1.png ├── element_capture_mock2.png ├── occluding_occluded_1.png └── occluding_occluded_2.png ├── w3c.json ├── element-capture.js ├── questionnaire.md ├── README.md └── index.html /.prettierrc.json: -------------------------------------------------------------------------------- 1 | { 2 | "printWidth": 100 3 | } 4 | -------------------------------------------------------------------------------- /.pr-preview.json: -------------------------------------------------------------------------------- 1 | { 2 | "src_file": "index.html", 3 | "type": "respec" 4 | } 5 | -------------------------------------------------------------------------------- /img/transparency.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/screen-share/element-capture/HEAD/img/transparency.png -------------------------------------------------------------------------------- /img/element_capture_mock1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/screen-share/element-capture/HEAD/img/element_capture_mock1.png -------------------------------------------------------------------------------- /img/element_capture_mock2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/screen-share/element-capture/HEAD/img/element_capture_mock2.png -------------------------------------------------------------------------------- /img/occluding_occluded_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/screen-share/element-capture/HEAD/img/occluding_occluded_1.png -------------------------------------------------------------------------------- /img/occluding_occluded_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/screen-share/element-capture/HEAD/img/occluding_occluded_2.png -------------------------------------------------------------------------------- /w3c.json: -------------------------------------------------------------------------------- 1 | { 2 | "group": "cg/sccg", 3 | "contacts": ["eladalon1983"], 4 | "shortName": "element-capture", 5 | "repo-type": "cg-report" 6 | } 7 | -------------------------------------------------------------------------------- /element-capture.js: -------------------------------------------------------------------------------- 1 | var respecConfig = { 2 | group: "cg/sccg", 3 | specStatus: "CG-DRAFT", 4 | latestVersion: "https://screen-share.github.io/element-capture/", 5 | github: { 6 | repoURL: "https://github.com/screen-share/element-capture/", 7 | branch: "main", 8 | }, 9 | editors: [ 10 | { 11 | name: "Elad Alon", 12 | email: "eladalon@google.com", 13 | company: "Google", 14 | w3cid: 118124, 15 | }, 16 | ], 17 | xref: [ 18 | "html", 19 | "infra", 20 | "permissions", 21 | "permissions-policy", 22 | "dom", 23 | "mediacapture-streams", 24 | "mediacapture-region", 25 | "webidl", 26 | "screen-capture", 27 | ], 28 | subjectPrefix: "[element-capture]", 29 | localBiblio: { 30 | css2ed: { 31 | title: "CSS 2 Editor\'s Draft", 32 | href: "https://drafts.csswg.org/css2/", 33 | editors: [ 34 | "Sam Sneddon", 35 | "Tantek Çelik" 36 | ], 37 | publisher: "W3C", 38 | }, 39 | }, 40 | }; 41 | 42 | -------------------------------------------------------------------------------- /questionnaire.md: -------------------------------------------------------------------------------- 1 | # Security and Privacy Questionnaire for Element Capture 2 | 3 | ## Questions to Consider 4 | 5 | ### 2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary? 6 | 7 | This feature does not, in itself, expose additional information to Web sites or third-parties. This feature allows the shaping of information already exposed to Web sites that self-capture through pre-existing means (such as getDisplayMedia). 8 | 9 | ### 2.2. Do features in your specification expose the minimum amount of information necessary to enable their intended uses? 10 | 11 | Yes. 12 | 13 | ### 2.3. How do the features in your specification deal with personal information, personally-identifiable information (PII), or information derived from them? 14 | 15 | Not applicable. 16 | 17 | ### 2.4. How do the features in your specification deal with sensitive information? 18 | 19 | Not applicable. 20 | 21 | ### 2.5. Do the features in your specification introduce new state for an origin that persists across browsing sessions? 22 | 23 | No. 24 | 25 | ### 2.6. Do the features in your specification expose information about the underlying platform to origins? 26 | 27 | No. 28 | 29 | ### 2.7. Does this specification allow an origin to send data to the underlying platform? 30 | 31 | No. 32 | 33 | ### 2.8. Do features in this specification enable access to device sensors? 34 | 35 | No. 36 | 37 | ### 2.9. Do features in this specification enable new script execution/loading mechanisms? 38 | 39 | No. 40 | 41 | ### 2.10. Do features in this specification allow an origin to access other devices? 42 | 43 | No. 44 | 45 | ### 2.11. Do features in this specification allow an origin some measure of control over a user agent’s native UI? 46 | 47 | No. (Other than that the user agent's native UI will inform the user that tab-capture is being used. This feature builds on top of tab-capture; the native UI will have been shown regardless.) 48 | 49 | ### 2.12 What temporary identifiers do the features in this specification create or expose to the web? 50 | 51 | This feature allows a website to mint tokens called [`RestrictrionTarget`](https://screen-share.github.io/element-capture/#dom-restrictiontarget)s. These are opaque interfaces which are only meaningful within their [browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#browsing-context). They do not outlive the browsing session. The party minting the tokens may transfer them to trusted third-parties within the [browsing context](https://html.spec.whatwg.org/multipage/document-sequences.html#browsing-context). The only use these tokens have, is to allow transforming of video tracks through the [restriction transformation](https://screen-share.github.io/element-capture/#applying-the-restriction-transformation). 52 | 53 | ### 2.13. How does this specification distinguish between behavior in first-party and third-party contexts? 54 | 55 | This feature does not distinguish first-party and third-party contexts. 56 | 57 | ### 2.14. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode? 58 | 59 | Not applicable. 60 | 61 | 62 | ### 2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections? 63 | 64 | Yes. 65 | 66 | ### 2.16. Do features in your specification enable origins to downgrade default security protections? 67 | 68 | No. 69 | 70 | ### 2.17. How does your feature handle non-"fully active" documents? 71 | 72 | This feature only works for documents which use pre-existing mechanisms to self-capture. A non-"fully active" document will have this capture-session interrupted, thereby also terminating the use of this feature. 73 | 74 | ### 2.18. What should this questionnaire have asked? 75 | 76 | A Web application that's engaged in self-capture can bypass origin isolation and read pixels from a third-party iframe. This is pre-existing. The main concern raised by the feature introduced by this specification, is that it allows a Web application to observe pixels invisible to the user due to occlussions by other content. 77 | 78 | We contend that although this sounds scary at first, it does not actually diminish security and/or privacy, because no new attacks can be launched against the user. 79 | * First, a malicious Web application that managed to trick the user into self-capture, would already be able to obtain access to the same set of pixels before the user had a chance to stop it - load an iframe in the background, then bring it to the forefront; by the time the user mentally registers it, the pixels will have already been recorded by the attacker. 80 | * Second, if a malicious application wishes to read these pixels surreptitiously, this can be done using a combination of any number of techniques. The include: 81 | * Display the content briefly. 82 | * Display the content piecemeal. (As far as one pixel at a time.) 83 | * Display the content at a low opacity. 84 | 85 | Further, we contend that for non-malicious applications, this feature is a great boon to user privacy, as it allows responsible applications to pare down the set of pixels to which they gain access. This allows such applications to avoid the accidental recording, or transmission to remote users, of unintended pixels, which could be of a private nature. One example is a video-conferencing into which an iframe is embedded with content to be shared with remote participants; by using our feature, applications can guarantee that private chat messages that overlap the iframe which is intended to be shared, would not be accidentally captured and transmitted to remote participants. 86 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Element Capture 2 | 3 | ## Introduction 4 | 5 | Pre-existing mechanisms such as [getDisplayMedia()](https://www.w3.org/TR/screen-capture/#dom-mediadevices-getdisplaymedia) allow Web applications to initiate screen-capture. If the user chooses to capture a tab, mechanisms such as [Region Capture](https://w3c.github.io/mediacapture-region/) mutate the resulting video track and perform an operation on all subsequent frames produced. (In the example of [Region Capture](https://w3c.github.io/mediacapture-region/), the operation consists of cropping frames to the frame's intersection with the bounding box of a target-element.) 6 | 7 | Element Capture introduces a new mutation mechanism which we name "restriction". When an application "restricts" a video track to a given target-element, frames produced on the restricted video track only consist of information from the target-element and its descendants - clipped to the tab's viewport. Phrased differently, the track becomes a capture of the DOM sub-tree rooted at the target-element (or more accurately, its intersection witht the current tab's viewport). 8 | 9 | ## What is removed by "restriction"? 10 | 11 | When a track is "restricted", three things are removed: 12 | 13 | 1. Any content from outside of the target element's bounding box. 14 | 2. Any content which is occluding the target element. 15 | 3. Any content which is occluded by the target element. 16 | 17 | Note that pixels outside of the viewport were not listed. These pixels were not captured by either [getDisplayMedia()](https://www.w3.org/TR/screen-capture/#dom-mediadevices-getdisplaymedia) or [`getViewportMedia()`](https://w3c.github.io/mediacapture-viewport/) to begin with, and "restriction" does not add them back. 18 | 19 | ## Sample use cases 20 | 21 | Consider the following Web application: 22 | 23 |

24 | 25 |

26 | 27 | This Web app combines a productivity suite and a video conferencing tool. It uses [Region Capture](https://w3c.github.io/mediacapture-region/) to crop away remote participants' own videos before it transmits the main content area to everyone. 28 | 29 | But if other HTMLElements end up being drawn on top of the "main content area", they also get captured. This is not always desirable - see the following illustration. 30 | 31 |

32 | 33 |

34 | 35 | Element Capture allows this app to capture only the main content area, excluding any occluding content such as drop-down lists. 36 | 37 | A partial list of use-cases includes: 38 | 39 | - Removing sensitive content from video-captures, such as private messages. 40 | - Removing distracting content from video-captures, such as drop-down lists. 41 | - Client-side rendering. 42 | 43 | ## How do I use Element Capture? 44 | 45 | ### Sample code of a simple use case 46 | 47 | The `captureTarget` is an `Element` on your page which contains the content the user wishes to capture. You want the video conferencing web app to capture `captureTarget` and share it with remote participants. So you derive a `RestrictionTarget` from `captureTarget`. After restricting the video track using this `RestrictionTarget`, frames on that video track now consist only of the pixels that are part of `captureTarget` and its direct DOM descendants. 48 | 49 | If `captureTarget` changes size, shape or location, the video track follows along, without requiring any additional input from either web app. Occluding content that appears, disappears or moves around, similarly requires no special treatment. 50 | 51 | Start out by allowing the user to capture the current tab. 52 | 53 | ```js 54 | // Ask the user for permission to start capturing the current tab. 55 | // In the future, this should be done with getViewportMedia(). 56 | const stream = await navigator.mediaDevices.getDisplayMedia({ 57 | selfBrowserSurface: "include", 58 | }); 59 | const [track] = stream.getVideoTracks(); 60 | ``` 61 | 62 | Define a `RestrictionTarget` by calling `RestrictionTarget.fromElement()` with an element of your choice as input. 63 | 64 | ```js 65 | // Associate captureTarget with a new RestrictionTarget 66 | const captureTarget = document.querySelector("#captureTarget"); 67 | const restrictionTarget = await RestrictionTarget.fromElement(captureTarget); 68 | ``` 69 | 70 | Then call `restrictTo()` on the video track with the `RestrictionTarget` as the input. Once the last promise resolves, all subsequent frames will be restricted. 71 | 72 | ```js 73 | // Start restricting the self-capture video track using the RestrictionTarget. 74 | await track.restrictTo(restrictionTarget); 75 | ``` 76 | 77 | You can now do anything you'd like with this restricted track; for example, you could transmit it remotely. 78 | 79 | ### Deep dive 80 | 81 | #### Feature detection 82 | 83 | To check if `RestrictionTarget.fromElement()` is supported, use: 84 | 85 | ```js 86 | if ("RestrictionTarget" in self && "fromElement" in RestrictionTarget) { 87 | // Deriving a restriction target is supported. 88 | } 89 | ``` 90 | 91 | #### Derive a RestrictionTarget 92 | 93 | Focus on the `Element` called `captureTarget`. To derive a RestrictionTarget from it, call `RestrictionTarget.fromElement(captureTarget)`. The returned promise will be resolved with a new `RestrictionTarget` object if successful. 94 | 95 | ```js 96 | const captureTarget = document.querySelector("#captureTarget"); 97 | const restrictionTarget = await RestrictionTarget.fromElement(captureTarget); 98 | ``` 99 | 100 | Unlike an `Element`, a `RestrictionTarget` object is [serializable](https://developer.mozilla.org/en-US/docs/Glossary/Serializable_object). It can be passed to another document using [`Window.postMessage()`](https://developer.mozilla.org/en-US/docs/Web/API/Window/postMessage), for instance. 101 | 102 | #### Restricting 103 | 104 | When capturing a tab, the video track exposes `restrictTo()`. When capturing the current tab, it is valid to call `restrictTo()` with either `null` or any `RestrictionTarget` derived from an `Element` within the current tab. 105 | 106 | Calls to `restrictTo(restrictionTarget)` mutate the video track into a capture of `captureTarget`, as though it were drawn by itself, independently of the rest of the DOM. Any descendants of `captureTarget` are also captured; siblings of `captureTarget` are eliminated from the capture. The result is that any frames delivered on the track appear as though they were cropped to the contours of `captureTarget`, and any occluding and occluded content are removed. 107 | 108 | ```js 109 | // Start restricting the self-capture video track using the RestrictionTarget. 110 | await track.restrictTo(restrictionTarget); 111 | ``` 112 | 113 | Calls to `restrictTo(null)` revert the track to its original state. 114 | 115 | ```js 116 | // Stop restricting. 117 | await track.restrictTo(null); 118 | ``` 119 | 120 | If the call to `restrictTo()` is successful, the returned promise is resolved when it can be guaranteed that all subsequent video frames will be restricted to `captureTarget`. 121 | 122 | If unsuccessful, the promise is rejected. An unsuccessful call to `restrictTo()` will be for one of the following reasons: 123 | 124 | - If the `restrictionTarget` was minted in a tab other than the one being captured. 125 | - If the `restrictionTarget` was derived from an Element that no longer exists. 126 | - If the current track is not a self-capture video track. (Future extensions to restricting other tabs are considered but not planned.) 127 | - If the Element from which `restrictionTarget` was derived is not [eligible for restriction](). 128 | 129 | #### Eligible and ineligible capture targets 130 | 131 | It is always possible to start restricting a track to any valid capture-target. However, frames won't be produced under [certain conditions](https://screen-share.github.io/element-capture/#elements-eligible-for-restriction); for example, if the element or an ancestor is `display: none`. The general rationale is that restriction applies only to an element that comprises a single, cohesive, two-dimensional, rectangular area, whose pixels can be logically determined in isolation from any parent or sibling elements. 132 | 133 | One important consideration for ensuring the element is eligible for restriction, is that it must form its own [stacking context](https://developer.mozilla.org/en-US/docs/Glossary/Stacking_context). To ensure this, you could specify the [isolation](https://developer.mozilla.org/en-US/docs/Web/CSS/isolation) CSS property, setting it to `isolate`. 134 | 135 | ```html 136 |
137 | ``` 138 | 139 | Note that the target element can toggle between being eligible and ineligible for restriction at any arbitrary point, for example, if the app changes its CSS properties. It's up to the app to use reasonable capture targets and avoid changing their properties unexpectedly. If the target element becomes ineligible, new frames will simply not be emitted on the track until the target element again becomes eligible for restriction. 140 | 141 | ## Common questions 142 | 143 | ### Occluding content? Occluded content? 144 | 145 | Occluding content is content which is drawn on top of other content. In the following example, the red rectangle is occluding content. 146 | 147 |

148 | 149 |

150 | 151 | Occluded content is content which is partially obscured by other content. In the example above, the letters between A and Z (exclusive) are occluded by the red rectangle. 152 | 153 | To keep things interesting, consider partial transparency. In the illustration below, the occluding content is partially transparent. If Element Capture were used here to target the red rectangle, none of the content from the blue rectangle should be captured. 154 | 155 |

156 | 157 |

158 | 159 | ### What about the alpha channel? 160 | 161 | At the moment, most user agents do not support tab-capture with an alpha channel. That information is absent from the initial capture, prior to restriction. 162 | 163 | If an app sets a partially transparent capture-target, stripping the alpha channel has some possible consequences: 164 | 165 | - Colors might change. Partially transparent target-elements drawn over a light background might appear darker when the alpha channel is removed, and those drawn over a dark background might appear lighter. 166 | - Colors that were invisible or imperceptible to the user when the alpha channel was set to its maximum, would appear once the alpha channel is removed. For example, this could lead to unexpected black regions in the captured frames, if the transparent sections had the RGBA code `rgba(0, 0, 0, 0)`. 167 | 168 |

169 | 170 |

171 | 172 | It is expected that Web applications will find Element Capture useful in contexts where that is not an issue, or that they would employ their own mitigations. 173 | 174 | ### What is the permission flow? 175 | 176 | This API builds on top of existing screen-sharing APIs, meaning that the permission flow remains entirely unchanged. An application would first call [`getDisplayMedia()`](https://www.w3.org/TR/screen-capture/), [`getViewportMedia()`](https://w3c.github.io/mediacapture-viewport/), or any other future screen-sharing API, and the user would first go through the usual selection and consent processes associated with that API. It's only after this process completes, and only if the user shares the (entire) current tab, that the Element Capture API can be invoked. 177 | 178 | ### What about audio? 179 | 180 | This API only deals with video tracks (as did [Region Capture](https://w3c.github.io/mediacapture-region/)). No extension of this work to deal with audio is expected; that would be a completely separate effort, and would likely employ a different approach. 181 | 182 | ### What about pixels outside the tab's viewport? 183 | 184 | Pixels outside the tab's viewport are not captured by either normal tab-capture, nor after "restriction" is applied. 185 | 186 | ## Alternatives considered 187 | 188 | ### Rejected alternative: Capture-specific-element API 189 | 190 | We have considered shaping the API along the lines of `element.capture()`. 191 | 192 | We preferred the restriction-model instead due to multiple reasons, among them: 193 | 194 | 1. It is desirable to hook into established patterns in obtaining the user's informed consent. 195 | 1. Prompting the user to share anything other than the entire current tab, might mislead the users into thinking that they were granting permission to capture only what they currently see; in reality, the target element might have sub-elements that can be navigated, like iframes. 196 | 1. It is useful to be able to switch between target-elements without having to prompt the user again. 197 | 1. We wanted a convenient way to anchor the API to up-and-coming APIs such as [`getViewportMedia()`](https://w3c.github.io/mediacapture-viewport/) and the security mechanisms they intend to provide. 198 | 1. We see it as idiomatic and egonomic to shape the API along the lines of such established adjacent APIs as [Region Capture](https://w3c.github.io/mediacapture-region/). 199 | 1. The current API shape lends itself to a potential future extension, that might allow restricting a track obtained from capturing _another_ tab. 200 | 201 | ## Demos 202 | 203 | - [element-capture-demo.glitch.me](https://element-capture-demo.glitch.me/) 204 | - [sub-capture-demo.glitch.me](https://sub-capture-demo.glitch.me/) 205 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Element Capture 6 | 7 | 8 | 9 | 10 |
11 |

Abstract

12 |

13 | Pre-existing mechanisms such as {{MediaDevices/getDisplayMedia()}} allow Web applications to 14 | initiate screen-capture. If the user chooses to capture a tab, mechanisms such as 15 | [[mediacapture-region|Region Capture]] mutate the resulting video track and perform an 16 | operation on all subsequent frames produced. (In the example of [[mediacapture-region|Region 17 | Capture]], the operation consists of cropping frames to the frame's intersection with the 18 | bounding box of a target-element.) 19 |

20 |

21 | Element Capture introduces a new mutation mechanism which we name "restriction". When an 22 | application "restricts" a video track to a given target-element, frames produced on the 23 | restricted video track only consist of information from the target-element and its 24 | descendants. Phrased differently, the track becomes a capture of the DOM sub-tree rooted at 25 | the target-element. 26 |

27 |
28 |
29 |
30 |
31 |

Use Cases

32 |
33 |

Generic Use-Case

34 |

35 | [[mediacapture-region|Region Capture]] allows applications to crop captures. Assume some 36 | element TARGET is the restriction-target. What if other elements, which are 37 | not DOM-descendants of TARGET, draw in front of TARGET? Using 38 | [[mediacapture-region|Region Capture]], these other elements would also get captured, 39 | which is not always desirable. A mechanism is sought that would allow cropping to 40 | TARGET's bounding box, while also excluding from capture any of the content 41 | that is not a DOM-descendant. 42 |

43 |
44 |
45 |

Practical Use-Case #1: Recording part of an app

46 |

47 | Consider an "editor" Web application (text-editor, image-editor, slides-editor or 48 | video-editor). Such applications often include a main content area, surrounded by various 49 | toolbars, drop-down menus and widgets which allow the local user to edit the content in 50 | the main content area. 51 |

52 |

53 | Sometimes a Web application wishes to record only the main content area, and then either 54 | transmit it "live" to remote participants, or record it to disk. Such an application would 55 | not necessarily wish to expend storage, bandwidth, or remote participants' screen 56 | real-estate on anything outside of the main content area. 57 |

58 |

59 | A mechanism such as [[mediacapture-region|Region Capture]] helps with cropping to the 60 | bounding box of the target-element, but what happens when drop-down lists temporarily draw 61 | over it? 62 |

63 |
64 |
65 |

Practical Use-Case #2: Collaborative tools during video-conferencing

66 |

67 | Video-conferencing applications often arrange themselves using "tiles" - each remote 68 | participant's video is presented in a tile. Assume that a collaborative Web application, 69 | like a text editor or an image-editing application, is loaded in another iframe, and that 70 | this iframe is also presented as a tile. 71 |

72 |

73 | Some remote participants would similarly load the same tool in a dedicated tile. But what 74 | if some users don't have the necessary permissions to load that tool? Or if they are 75 | joining from a platform that does not support the tool? 76 |

77 |

78 | The video conferencing solution may then choose to have one of the participants who have 79 | loaded the tool successfully screen-share that tool's tile to the users who cannot load 80 | the tool, allowing them to at least view it, although not interact with it. This can be 81 | done using self-capture through {{MediaDevices/getDisplayMedia()}} and 82 | [[mediacapture-region|Region Capture]]. 83 |

84 |

85 | But such a solution introduces some problems. What happens if other elements ever draw on 86 | top of the tool tile, either briefly or permanently? Examples include: 87 |

88 |
    89 |
  • Private messages sent inside of the video-conferencing application.
  • 90 |
  • Requests by new users to join.
  • 91 |
  • Other tiles during tile-layout changes.
  • 92 |
  • Drop-down lists from other elements of the video-conferencing application.
  • 93 |
94 |
95 |
96 |
97 |

Solution Overview

98 |

The Element Capture mechanism comprises two parts:

99 |
    100 |
  1. 101 | [=RestrictionTarget production=]: A mechanism for tagging an {{Element}} as a 102 | potential target for the [=restriction mechanism=]. 103 |
  2. 104 |
  3. 105 | [=Restriction mechanism=]: A mechanism for instructing the user agent to start restricting 106 | a video track to the bounding box of a previously [=tagging|tagged=] {{Element}}, or to 107 | stop such restriction and revert a track to its [=unrestricted=] state. 108 |
  4. 109 |
110 |

111 | We define two restriction-states. restricted and 112 | unrestricted. Video tracks are always in one state or the other. Tracks start out 113 | [=unrestricted=], and may turn to [=restricted=] when 114 | {{BrowserCaptureMediaStreamTrack/restrictTo()}} is successfully called. 115 |

116 |
117 |
118 |

RestrictionTarget Production

119 |
120 |

Motivation for defining RestrictionTarget

121 |

122 | The [=restriction mechanism=] presented in this document 123 | ({{BrowserCaptureMediaStreamTrack/restrictTo}}) relies on a {{RestrictionTarget}} token 124 | rather than on direct node references. This allows restriction by one document to a target 125 | element specified in another document. 126 |

127 |

128 | Because {{BrowserCaptureMediaStreamTrack/cropTo()}} and 129 | {{BrowserCaptureMediaStreamTrack/restrictTo()}} use different token types - {{CropTarget}} 130 | and {{RestrictionTarget}}, respectively - it is possible for documents to limit the 131 | capabilities they bestow on documents that capture them. 132 |

133 |
134 |
135 |

RestrictionTarget Definition

136 |

137 | RestrictionTarget is an intentionally empty, opaque identifier. Its purpose is to be 138 | handed to {{BrowserCaptureMediaStreamTrack/restrictTo}} as input. 139 |

140 |
141 |           [Exposed=(Window,Worker), Serializable]
142 |           interface RestrictionTarget {
143 |             [Exposed=Window, SecureContext] static Promise<RestrictionTarget> fromElement(Element element);
144 |           };
145 |         
146 |
147 |
148 | fromElement() 149 |
150 |
151 |

152 | Calling {{RestrictionTarget/fromElement}} with an {{Element}} of a supported type 153 | associates that {{Element}} with a {{RestrictionTarget}}. This {{RestrictionTarget}} 154 | may be used as input to {{BrowserCaptureMediaStreamTrack/restrictTo}}. We define a 155 | valid RestrictionTarget as one returned by a call to 156 | {{RestrictionTarget.fromElement()}} in a 157 | document that is still 158 | active. 159 |

160 |

161 | When {{RestrictionTarget/fromElement}} is called with a given |element|, the user 162 | agent [=create a RestrictionTarget|creates a RestrictionTarget=] with |element| as 163 | input. The user agent MUST return a {{Promise}} |p|. The user agent MUST resolve |p| 164 | only after it has finished all the necessary internal propagation of state associated 165 | with the new {{RestrictionTarget}}, at which point the user agent MUST be ready to 166 | receive the new {{RestrictionTarget}} as a valid parameter to 167 | {{BrowserCaptureMediaStreamTrack/restrictTo}}. 168 |

169 |

170 | When cloning an {{Element}} on which {{RestrictionTarget/fromElement}} was previously 171 | called, the clone is not associated with any {{RestrictionTarget}}. If 172 | {{RestrictionTarget/fromElement}} is later called on the clone, a new 173 | {{RestrictionTarget}} will be assigned to it. 174 |

175 |
176 |
177 |

178 | To create a RestrictionTarget with |element| as input, run the 179 | following steps: 180 |

181 |
    182 |
  1. 183 |

    Let |restrictionTarget| be a new object of type {{RestrictionTarget}}.

    184 |
  2. 185 |
  3. 186 |

    187 | Set |restrictionTarget|.[[\Element]] 188 | to |element|. 189 |

    190 |
  4. 191 |
192 |

193 | {{RestrictionTarget}} objects are serializable. The [=serialization steps=], given 194 | |value|, |serialized|, and a boolean |forStorage|, are: 195 |

196 |
    197 |
  1. 198 |

    199 | If |forStorage| is true, throw with new {{DOMException}} object whose 200 | {{DOMException/name}} attribute has the value {{"DataCloneError"}}. 201 |

    202 |
  2. 203 |
  3. 204 |

    205 | Set |serialized|.[[\RestrictionTargetElement]] to 206 | |value|.{{RestrictionTarget/[[Element]]}}. 207 |

    208 |
  4. 209 |
210 |

The [=deserialization steps=], given |serialized| and |value| are:

211 |
    212 |
  1. 213 |

    214 | Set |value|.{{RestrictionTarget/[[Element]]}} to 215 | |serialized|.[[\RestrictionTargetElement]]. 216 |

    217 |
  2. 218 |
219 |
220 |
221 |
222 |

Restriction Mechanism

223 |
224 |

Definitions

225 |
226 |

Restrictable tracks

227 |

228 | We say that a {{MediaStreamTrack}} |T| is a 229 | restrictable MediaStreamTrack if and only if it fulfills all of the following 230 | conditions: 231 |

232 |
    233 |
  • |T|.{{MediaStreamTrack/[[Restrictable]]}} is true.
  • 234 |
  • 235 | |T| is associated with a 236 | browser 237 | display surface. (That is, if 238 | |T|.{{MediaStreamTrack/getSettings()}} were called, it would have returned a 239 | {{MediaTrackSettings}} dictionary containing the key 240 | {{MediaTrackSettings/displaySurface}} mapped to the value 241 | {{DisplayCaptureSurfaceType/"browser"}}.) 242 |
  • 243 |
  • 244 | |T|.[[\Kind]] is 245 | "video". 246 |
  • 247 |
  • 248 | |T|.[[\ReadyState]] is 249 | "live". 250 |
  • 251 |
252 |
253 |
254 |

Elements eligible for restriction

255 |

256 | We say that an {{Element}} |E| is eligible for restriction if and only if it 257 | fulfills all of the following conditions: 258 |

259 | 285 |
286 |

287 | To ensure these conditions hold, developers may use CSS such as the following snippet: 288 |

289 |
290 |               #target {
291 |                 isolation: isolate;     /* Forms a stacking context. */
292 |                 transform-style: flat;  /* Flattened. */
293 |               }
294 |             
295 |
296 |
297 |

Valid restriction targets

298 |

299 | We say that an {{Element}} |E| is a valid restriction target for a 300 | {{MediaStreamTrack}} |T|, if and only if all of the following conditions hold: 301 |

302 |
    303 |
  • |T| is a [=restrictable MediaStreamTrack=].
  • 304 |
  • |E| is a [=eligible for restriction=].
  • 305 |
  • 306 | The [=top-level browsing context=] of the 307 | display surface 308 | that is the source of |T|, is |E|'s [=shadow-including root=]. 309 |
  • 310 |
311 |
312 |

313 | Informally, this means that |T| is an active video track associated with tab-capture, 314 | and |E| is an Element [=connected=] to the DOM in the captured tab. 315 |

316 |

317 | Note that whether an Element |E| is a [=valid restriction target=] for a 318 | {{MediaStreamTrack}} |T| may change either before or after a capture starts, as well as 319 | before or after restriction starts. Examples include: 320 |

321 |
    322 |
  • |T| is stopped programmatically.
  • 323 |
  • |T| is stopped by the user.
  • 324 |
  • 325 | |T|.[[\Source]] changes due to 326 | user interaction with the user agent and/or operating system. 327 |
  • 328 |
  • 329 | |E|'s set of CSS attributes change such that |E| is no longer [=eligible for 330 | restriction=]. 331 |
  • 332 |
333 |

Invalidity will suppress additional frames until validity is restored.

334 |
335 |
336 |
337 |

BrowserCaptureMediaStreamTrack extension

338 |

339 | [[mediacapture-region|Region Capture]] introduced the {{BrowserCaptureMediaStreamTrack}} 340 | interface. We extend it with a new method, {{BrowserCaptureMediaStreamTrack/restrictTo}}. 341 |

342 |
343 |           [Exposed = Window]
344 |           partial interface BrowserCaptureMediaStreamTrack {
345 |             Promise<undefined> restrictTo(RestrictionTarget? RestrictionTarget);
346 |           };
347 |         
348 |

349 | All tasks queued below use the 350 | rendering task source associated with the 351 | same global object as the 352 | {{BrowserCaptureMediaStreamTrack}}. 353 |

354 |
358 |
359 | restrictTo() 360 |
361 |
362 |

363 | Calls to this method instruct the user agent to start/stop restrict a video track. 364 |

365 |

366 | When invoked with |restrictionTarget| as the first parameter, the user agent MUST 367 | execute the following algorithm: 368 |

369 |
    370 |
  1. 371 |

    372 | If [=this=] is not a [=restrictable MediaStreamTrack=], return a {{Promise}} 373 | [=rejected=] with a new {{NotSupportedError}}. 374 |

    375 |
  2. 376 |
  3. Let |p| be a new {{Promise}}.
  4. 377 |
  5. 378 |

    Run the following steps in parallel:

    379 |
      380 |
    1. 381 |

      Let |E| be |restrictionTarget|.{{RestrictionTarget/[[Element]]}}.

      382 |
    2. 383 |
    3. 384 |

      385 | Update [=this=] video track's 386 | crop-state 387 | to uncropped. 388 |

      389 |
    4. 390 |
    5. 391 |

      392 | Update [=this=] video track's [=restriction-state=] according to 393 | |restrictionTarget|: 394 |

      395 |
        396 |
      1. 397 | If |restrictionTarget| is NOT {{undefined}}, the user agent MUST set 398 | [=this=] video track's [=restriction-state=] to [=restricted=] and start 399 | [=applying the restriction transformation=] to all frames delivered to 400 | [=this=] video track with |restrictionTarget| as the target. 401 |
      2. 402 |
      3. 403 | If |restrictionTarget| is set to {{undefined}}, the user agent MUST set 404 | [=this=] video track's [=restriction-state=] to [=unrestricted=] and stop 405 | [=applying the restriction transformation=] to frames delivered to [=this=] 406 | video track. 407 |
      4. 408 |
      409 |
    6. 410 |
    7. 411 |

      412 | Call the track's state before this method invocation |preState|, and after 413 | this method invocation |postState|. The user agent MUST 414 | queue a global task to resolve 415 | |p| when it is guaranteed that no more frames [=restricted=] (or 416 | [=unrestricted=]) according to |preState| will be delivered to the 417 | application, and that any additional frames delivered to the application will 418 | therefore be [=restricted=] (or [=unrestricted=]) according to either 419 | |postState| or a later state. 420 |

      421 |
    8. 422 |
    423 |
  6. 424 |
  7. Return |p|.
  8. 425 |
426 |
427 |
428 |
429 |
430 |
431 |

Applying the restriction transformation

432 |

433 | Whenever the user agent is about to produce a new |frame| for a video track |T| that is 434 | [=restricted=] to a given target |restrictionTarget|, the user agent MUST execute the 435 | following algorithm: 436 |

437 |
    438 |
  1. Let |E| be |restrictionTarget|.{{RestrictionTarget/[[Element]]}}.
  2. 439 |
  3. 440 | If |E| is not a [=valid restriction target=] for |T|, abort without producing a new frame. 441 |
  4. 442 |
  5. 443 | Let |intersection| be the intersection of |E|'s bounding box and the captured surface's 444 | [=top-level browsing context=]'s viewport. 445 |
  6. 446 |
  7. If |intersection| is empty, abort without producing a new frame.
  8. 447 |
  9. 448 | A corollary of previous steps is that |E| forms a stacking context. Produce and deliver a 449 | frame consisting of an independent rendering of that stacking context, clipped to 450 | |intersection|. 451 |
  10. 452 |
453 |

454 | The frame produced in the final step is constructed by rendering |E| and its descendants 455 | over an infinite transparent canvas, positioned so that the edges of the 456 | decorated bounding box are flush 457 | with the edges of the frame. 458 |

459 |

460 | In some implementations, the underlying pixel format for the frame data will not be able to 461 | carry alpha channel information. In this case, the implementation can blend the rendered 462 | frame with an infinite canvas of black (`rgb(0,0,0)`). 463 |

464 |

465 | Implementations may either re-use existing bitmap data generated for |E| or regenerate the 466 | display of the element to maximize quality at the frame's size (for example, if the 467 | implementation detects that the referenced element is an SVG fragment). However, the frame 468 | must look identical to |E| as rendered above, modulo rasterization quality. 469 |

470 |
471 |
472 |

Sample Code

473 |
474 |

Code in the capture-target:

475 |
476 |           const mainContentArea = navigator.getElementById('mainContentArea');
477 |           const restrictionTarget = await RestrictionTarget.fromElement(mainContentArea);
478 |           sendRestrictionTarget(restrictionTarget);
479 | 
480 |           function sendRestrictionTarget(restrictionTarget) {
481 |             // Either send the restriction-target using postMessage(),
482 |             // or pass it on locally within the same document.
483 |           }
484 |         
485 |
486 |
487 |

Code in the capturing-document:

488 |
489 |           async function startRestrictedCapture(RestrictionTarget) {
490 |             const stream = await navigator.mediaDevices.getDisplayMedia();
491 |             const [track] = stream.getVideoTracks();
492 |             if (!!track.restrictTo) {
493 |               handleError(stream);
494 |               return;
495 |             }
496 |             await track.restrictTo(RestrictionTarget);
497 |             transmitVideoRemotely(track);
498 |           }
499 |         
500 |
501 |
502 |
503 |

Privacy and Security Considerations

504 |
505 |

Benefits of this API

506 |

507 | For non-malicious applications, the APIs introduced by this specifications should be a 508 | pure positive, as they allow responsible applications to pare down the information 509 | recorded. This has positive properties. 510 |

511 |

For example, using pre-existing mechanisms, video-conferencing applications can:

512 |
    513 |
  1. 514 |

    Embed content in an iframe.

    515 |
  2. 516 |
  3. 517 |

    518 | Prompt the user to capture the current tab. (Using 519 | {{MediaDevices/getDisplayMedia()}}.) 520 |

    521 |
  4. 522 |
  5. 523 |

    524 | Crop the resulting capture to just the iframe that's intended for capture. (Using 525 | {{BrowserCaptureMediaStreamTrack/cropTo()}}.) 526 |

    527 |
  6. 528 |
  7. 529 |

    530 | Transmit the resulting pixels to remote participants. (Using 531 | RTCPeerConnection.) 532 |

    533 |
  8. 534 |
535 |

536 | However, this is risky, because any content that happens to be drawn in front of the 537 | content intended for capture will also be transmitted remotely. Even if this happens but 538 | briefly, remote users might notice. And such content might be highly private - for 539 | example, chat notifications, reminders, speaker notes... 540 |

541 |

542 | The mechanisms introduced in this specification allow a responsible application to 543 | structure itself in a way that would completely guarantee that such issues are impossible. 544 | Such an application can more easily make and keep privacy guarantees to its users. 545 |

546 |
547 |
548 |

Concerns about this API

549 |
550 |

Reading cross-origin pixels

551 |

552 | The mechanisms introduced by this specification all rely on self-capture being provided 553 | by some other means - typically {{MediaDevices/getDisplayMedia()}}. The main concern 554 | with these, is that they allow an application read-access to cross-origin content. 555 |

556 |

557 | When a malicious application tricks the user to approve self-capture, it can then load 558 | cross-origin content in an invisible iframe and then bring the content to the forefront, 559 | allowing the attacker to read the content before the user can react. Such attacks are 560 | already possible without any of the mechanisms introduced by this specification. 561 |

562 |
563 |
564 |

Aggravating old attack vectors

565 |

566 | The main concern is that the mechanisms we introduce in this specification should not 567 | aggravate the old attack vectors described above. One naturally worries that the 568 | mechanisms we introduce allow the old attacks to be carried out surreptitiously. 569 | We contend that the mechanisms introduced here do not increase an attacker's power to 570 | hide the attack; such attack-concealment was always possible using any of the following 571 | techniques: 572 |

573 |
    574 |
  • 575 |

    576 | Displaying the content briefly. Attackers could always flash content to the 577 | screen for a timespan of a single frame. This is long enough to record it, but not 578 | long enough for users to understand it. 579 |

    580 |
  • 581 |
  • 582 |

    583 | Displaying the content piecemeal. Attackers could always display break the 584 | content up into multiple small pieces, even one pixel each, and display them in 585 | different locations and times. Users would not be able to observe this manipulation, 586 | but it is trivial for software to collect these pixels and reconstruct a picture 587 | from it. 588 |

    589 |
  • 590 |
  • 591 |

    592 | Displaying the content at low opacity. Attackers could always display content 593 | at an opacity that is imperceptible for a user, but which machines can still read. 594 |

    595 |
  • 596 |
597 |

598 | Any of these techniques is enough on its own, but through a combination of them, 599 | malicious applications were always able to conceal their attacks effectively and still 600 | read content efficiently. 601 |

602 |
603 |
604 |

Reading occluded content

605 |

606 | One might worry that a malicious app be able to remove occlusions in cross-origin 607 | iframes, without opt-in from that content. The shape of the API prevents such attacks - 608 | the cross-origin iframe would have to produce a {{RestrictionTarget}} and pass it to the 609 | would-be attacker. As {{RestrictionTargets}} serve no purpose other than as part of the 610 | API introduced by this specification, the minting and passing of a {{RestrictionTarget}} 611 | proves the cross-origin iframe's permission for its occlusions to be removed. 612 |

613 |
614 |
615 |

Interaction with Region Capture

616 |

617 | In designing the APIs introduced by this specification, a conscious decision was made to 618 | not reuse {{CropTarget}}, and define a dedicated token instead ({{CropTarget}}). This 619 | ensures that any existing Web applications that have previously been designed and 620 | implemented with {{BrowserCaptureMediaStreamTrack/cropTo()}} in mind, but not with 621 | {{BrowserCaptureMediaStreamTrack/restrictTo()}}, would not be effectively opting into 622 | allowing occlusions to be removed, as described in the 623 | previous section. 624 |

625 |
626 |
627 |
628 | 629 | 630 | --------------------------------------------------------------------------------