13 | This document proposes a mechanism by which an application APP can opt-in to
14 | exposing certain information with another application CAPTR, if
15 | CAPTR is screen-capturing the tab in which APP is running.
16 | It describes a mechanism for tab capture only.
17 |
18 |
19 |
20 |
21 | This document is not complete.
22 |
23 |
24 |
25 |
26 |
Problem Description
27 |
28 |
Problem Description
29 |
30 | Consider a web-application, running in one tab, which we’ll name "main_app."
31 | Assume main_app calls
32 | getDisplayMedia
33 | and the user chooses to share another tab, where an application is running which we’ll
34 | call "captured_app."
35 |
36 |
Note that:
37 |
38 |
main_app does not know what it is capturing.
39 |
40 | captured_app does not know that it is being captured; let alone by whom.
41 |
42 |
43 |
44 | Both of these traits are desirable in the general case, but there exists specific
45 | use cases where the user of main_app would benefit if main_app
46 | had available to it a limited set of standard instructions that captured_app
47 | has opted into receiving.
48 |
49 |
50 | We wish to enable the specific use cases while keeping the general case as it was
51 | before.
52 |
53 |
54 |
55 |
Use-case #1: Driving Presentations from Video Conferencing Apps
56 |
57 | Consider a collaborating presentation software and video-conferencing software. Assume the
58 | user is in a VC session. The user starts sharing a presentation. Both applications are
59 | interested in letting the VC app discover that it is capturing a slides session,
60 | so that the VC application will be able to expose
61 | controls to the user for flipping through slides. When the user clicks those controls, the
62 | VC app will be able to send messages to the presentation app, requesting that it do such
63 | things as flip through slides, etc.
64 |
65 |
66 |
67 |
68 |
The Capture-Handle Actions Mechanism
69 |
70 | The capture-handle actions mechanism consists of two parts - one on the captured side, one on
71 | the capturing side.
72 |
73 |
74 |
75 | Captured applications opt-in by registering support for
76 | standard actions they handle by calling {{MediaDevices/setSupportedCaptureActions}}.
77 |
78 |
79 | Capturing applications may trigger these actions using
80 | {{MediaStreamTrack/sendCaptureAction}}.
81 |
82 |
83 |
84 | There is disagreement on whether actions should be specified here or in a separate document.
85 |
86 |
87 |
Captured Side for Actions
88 |
89 | Applications in top-level documents can declare the [=capture actions=]
90 | they support, if any. They would typically do so before even knowing if
91 | they are being captured. The intended use is for an application to expect to receive
92 | these actions from capturer applications wishing to control the progression of
93 | the captured session, in response to interaction with the user.
94 | Supported actions are declared by calling {{MediaDevices/setSupportedCaptureActions}}
95 | with an array of the names of actions the application is prepared to respond to.
96 |
97 |
98 |
Registering and responding to capture actions
99 |
100 | {{MediaDevices}} is extended with a method - {{MediaDevices/setSupportedCaptureActions}} -
101 | which accepts an array of {{DOMString}}s. By calling this method, an application
102 | registers with the user agent a set of zero or more [=capture actions=] it wishes to
103 | respond to.
104 |
105 |
106 | Capture actions are values defined in {{CaptureAction}}.
107 | They are meant to be interpreted as instructions from the capturing application to
108 | control the advancement of the presentation of the captured session, however the
109 | captured application wishes to define this. The intent is to support capturer
110 | applications implementing interactive controls for these actions, whose sending
111 | requires [=transient activation=] and [=consume user activation=].
112 |
When this method is invoked, the user agent MUST run the following steps:
132 |
133 |
134 | If the [=relevant global object=]'s [=associated `Document`=] is
135 | either not [=Document/fully active=] or its [=browsing context=] is not a
136 | [=top-level browsing context=], then throw {{InvalidAccessError}}.
137 |
138 |
139 | Let |actions| be the method's first argument.
140 |
141 |
142 | If |actions| is non-empty, and this method was previously
143 | called with a non-empty array on [=this=] {{MediaDevices}} object,
144 | then throw {{InvalidStateError}}.
145 |
146 |
147 | Remove from |actions| any value not found in {{CaptureAction}}.
148 |
149 |
150 | Remove from |actions| any duplicates.
151 |
152 |
153 | Set [=this=]'s {{MediaDevices/[[RegisteredCaptureActions]]}} to |actions|.
154 |
155 |
156 | return `undefined` and run the remaining step [=in parallel=].
157 |
158 |
159 | If this document is currently being captured as part of a
160 | browser
161 | display surface,
162 | then for each capturer of that surface, queue a task on that capturer's
163 | task-list to set all associated video {{MediaStreamTrack}}s'
164 | {{MediaDevices/[[AvailableCaptureActions]]}} to |actions|.
165 |
166 |
167 |
168 |
169 |
170 |
oncaptureaction of type {{EventHandler}}
171 |
172 |
The event type of this event handler is `"captureaction"`.
173 |
174 |
175 |
176 | When {{MediaDevices}} is created, give it a
177 | [[\RegisteredCaptureActions]] internal slot,
178 | initialized to an empty list.
179 |
180 |
181 |
182 |
Capture Action Event
183 |
184 |
CaptureActionEvent
185 |
186 | This event is fired on the captured application's {{MediaDevices}}
187 | object whenever an action it registered with
188 | {{MediaDevices/setSupportedCaptureActions}} has been triggered. This
189 | lets the application respond by executing its implementation of this
190 | action.
191 |
223 | The {{CaptureAction}} to initialize the event with.
224 |
225 |
226 |
227 |
228 |
229 |
230 |
Capturing Side for Actions
231 |
232 | Capturing applications can enumerate available [=capture actions=] that
233 | are supported on the video track they have obtained, by using
234 | {{MediaStreamTrack/getSupportedCaptureActions}}, and can trigger those
235 | actions by using {{MediaStreamTrack/sendCaptureAction}}.
236 |
237 |
238 |
Enumerating supported actions and triggering them
239 |
240 | When a {{MediaStreamTrack}} is a video track derived from screen-capture
241 | of a browser
242 | display surface,
243 | {{MediaStreamTrack/getSupportedCaptureActions}} returns the set of
244 | available [=capture actions=], if any, supported by the captured
245 | application associated with this video track.
246 |
When this method is invoked, the user agent MUST return [=this=]'
259 | {{MediaDevices/[[AvailableCaptureActions]]}} if defined, or `[]` if not defined.
260 |
261 |
262 | sendCaptureAction
263 |
264 |
265 |
When this method is invoked, the user agent MUST run the following steps:
266 |
267 |
268 | If the [=relevant global object=] of [=this=] does not have
269 | [=transient activation=], return a promise [=rejected=] with
270 | {{InvalidStateError}}.
271 |
272 |
273 | [=Consume user activation=].
274 |
275 |
276 | Let |action| be the method's first argument.
277 |
278 |
279 | If |action| is not in [=this=]' {{MediaDevices/[[AvailableCaptureActions]]}},
280 | return a promise [=rejected=] with {{NotFoundError}}.
281 |
282 |
283 | Let |p| be a new promise.
284 |
285 |
286 | Run the following steps [=in parallel=]:
287 |
288 |
289 |
290 | Queue a task on the task-list of the captured
291 | browser
292 | display surface's
293 | [=top-level browsing context=]'s [=active document=] to run the
294 | following steps:
295 |
296 |
297 |
298 | Let |target| be the the [=relevant global object=]'s
299 | [=associated `Document`=]'s
300 | associated navigator's {{MediaDevices}} object.
301 |
302 | If |action| is not in |target|'s
303 | {{MediaDevices/[[RegisteredCaptureActions]]}}, abort these steps.
304 |
305 | [=Fire an event=] named `"captureaction"`, using a
306 | {{CaptureActionEvent}} with {{CaptureActionEventInit/action}}
307 | set to |action|, at |target|.
308 |
309 |
310 |
311 | Wait for the event to have been fired.
312 |
313 |
314 | Resolve |p|.
315 |
316 |
317 |
318 |
319 |
320 | Return |p|.
321 |
322 |
323 |
324 |
325 |
326 | When a video {{MediaStreamTrack}} is created as part of the
327 | getDisplayMedia
328 | algorithm, whose source is a
329 | browser
330 | display surface,
331 | give it an
332 | [[\AvailableCaptureActions]] internal
333 | slot, initialized to the captured
334 | browser
335 | display surface's
336 | [=top-level browsing context=]'s [=Browsing context/active window=]'s
337 | associated navigator's {{MediaDevices}} object's
338 | {{MediaDevices/[[RegisteredCaptureActions]]}}.
339 |
340 |
341 | While capture of a
342 | browser
343 | display surface
344 | is occurring, whenever that surface's
345 | [=top-level browsing context=] is navigated, then for each capturer of
346 | that surface, queue a task on that capturer's task-list to set all
347 | associated video {{MediaStreamTrack}}s'
348 | {{MediaDevices/[[AvailableCaptureActions]]}} to `[]`.
349 |
44 | Switch back to the controlling tab to proceed with the demo.
45 |
46 |
47 |
48 |
49 |
50 |
51 |
Remote Control using Capture Handle Demo
52 |
Make sure you're running Chrome m92 or later.
53 |
54 | If you're seeing this despite using Chrome m92 or later, the Origin Trial on this demo might
55 | have expired. In that case, either [Experimental Web Platform features] or launch the
56 | browser with --enable-blink-features=CaptureHandle.
57 |
Open a new presentation by clicking Launch Presentation.
58 |
Capture the tab in which the presentation is running.
59 |
Switch back to this tab.
60 |
Remotely control the other presentation using the previous/next buttons.
61 |
62 |
Worth trying:
63 |
64 |
Try opening multiple instances. You'll only be controlling the one you capture.
65 |
Try refreshing the captured tab. You can still control it.
66 |
67 |
68 |
69 |
70 |
Remote Control using Capture Handle Demo
71 |
Make sure you're running Chrome m92 or later.
72 |
73 | If you're seeing this despite using Chrome m92 or later, the Origin Trial on this demo might
74 | have expired. In that case, either [Experimental Web Platform features] or launch the
75 | browser with --enable-blink-features=CaptureHandle.
76 |
Self-capture detected. Video suppressed to avoid the hall-of-mirrors effect.
60 |
61 |
62 |
63 | This demo is more interesting when a tab is shared. :-)
64 |
65 |
66 |
67 |
Make sure you're running Chrome m92 or later.
68 |
69 | If you're seeing this despite using Chrome m92 or later, the Origin Trial on this demo might
70 | have expired. In that case, either [Experimental Web Platform features] or launch the
71 | browser with --enable-blink-features=CaptureHandle.
72 |
73 |
74 |
75 |
Demo operation:
76 |
77 |
Capture any tab.
78 |
If you're capturing a different tab, the video will be played back to you here.
79 |
80 | If you're capturing the current tab, the hall-of-mirrors effect will be suppressed by NOT
81 | playing the video back to you.
82 |
83 |
84 | If you capture anything other than a tab, this demo is a bit irrelevant, so we'll just ask
85 | you to try again.
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 |
203 |
204 |
205 |
--------------------------------------------------------------------------------
/identity/explainer.md:
--------------------------------------------------------------------------------
1 | # TL;DR - Demos
2 | Two quick demos are available:
3 | 1. Demo for remotely controlling a presentation [here](https://w3c.github.io/mediacapture-handle/identity/demos/remote_control/capturer.html).
4 | 2. Demo for detecting self-capture [here](https://w3c.github.io/mediacapture-handle/identity/demos/self_capture_detection/index.html).
5 |
6 | # Summary
7 |
8 | Capture Handle is a mechanism that allows a display-capturing web-application to ergonomically and confidently identify the web-application it is display-capturing (provided that the captured application has opted-in). Such identification allows these two applications to collaborate in interesting ways.
9 |
10 | For example, if a VC application is capturing a presentation, then the VC application can expose user-controls for previous/next-slide directly in the VC application. This lets the user navigate presentations without having to jump between the VC and presentation tabs.
11 |
12 | # Problem Description
13 |
14 | ## Generic Problem Description
15 |
16 | Consider a web-application, running in one tab, which we’ll name “main_app.” Assume main_app calls [getDisplayMedia ](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getDisplayMedia) and the user chooses to share another tab, where an application is running which we’ll call “captured_app.”
17 |
18 | Note that:
19 |
20 | 1. main_app does not know what it is capturing.
21 | 2. captured_app does not know that it is being captured; let alone by whom.
22 |
23 | Both these traits are desirable for the general case, but there exist legitimate use cases where the browser would want to allow applications to opt-in to bridging that gap and enable a connection.
24 |
25 | We wish to enable the legitimate use cases while keeping the general case as it was before.
26 |
27 | ## Use-case #1: Driving Presentations from Video Conferencing Apps
28 |
29 | Consider a collaborating presentation software and video-conferencing software. Assume the user is in a VC session. The user starts sharing a presentation. Both applications are interested in letting the VC app discover that it is capturing a slides session, which application, and even which session, so that the VC application will be able to expose controls to the user for flipping through slides. When the user clicks those controls, the VC app will be able to send messages to the presentation app, requesting that it do such
30 | things as flip through slides, enter/leave presentation-mode, etc.
31 |
32 | The means for transmitting these messages are outside the scope of this document. Some options are:
33 | * Shared cloud infrastructure.
34 | * Messaging via a worker. (Note: Storage Partitioning might disrupt this option.)
35 | * A rudimentary messaging API might be added expressly for this purpose.
36 |
37 | ## Use-case #2: Avoiding “Hall of Mirrors”
38 |
39 | The “Hall of Mirrors” effect occurs when users choose to share the tab in which the VC call takes place. When detecting self-capture, a VC application can avoid displaying the captured stream back to the user, thereby avoiding the dreaded effect. Other mitigation strategies are also possible based on Capture Handle.
40 |
41 | ## Use-case #3: Detecting Unintended or Unapproved Captures
42 |
43 | Users sometimes choose to share the wrong tab. Sometimes they switch to sharing the wrong tab by clicking the share-this-tab-insead button by mistake. A benevolent application could try to protect the user by presenting an in-app dialog for re-confirmation, if they believe that the user may have made a mistake.
44 |
45 | ## Use-case #4: Analytics
46 |
47 | Capturing applications often wish to gather statistics over what applications their users tend to capture. For example, VC applications would like to know how often their users share presentation applications from specific providers, Wikipedia, CNN, etc. Gathering such information can be used to improve service for the users by introducing new collaborations, such as the one described above.
48 |
49 |
50 | # Our Solution
51 |
52 | ## Summary
53 |
54 | * Captured applications opt-in to exposing information by setting CaptureHandleConfig.
55 | * Capturing applications read this information as CaptureHandle, which is available through two access points (discussed below).
56 |
57 | ## Captured Applications (e.g. presentation software)
58 |
59 | Application can call a newly added method, `MediaDevices.setCaptureHandleConfig()`, and opt-in to exposing information to capturing applications. Two pieces of information may be exposed:
60 | * `exposeOrigin`: The captured applications's origin. (This is a boolean. Origin exposure is mediated by the browser, meaning the origin cannot be spoofed.)
61 | * `handle`: An arbitrary string carrying semantic meaning of the captured application's choosing.
62 |
63 | ## Capturing Applications (e.g. video-conferencing software)
64 |
65 | Capturing applications can read information exposed to them by captured applications. The immediate value is exposed via `MediaStreamTrack.getCaptureHandle()`. An event handler for on-change events is exposed as `oncapturehandlechange` (also on `MediaStreamTrack`).
66 |
67 | ## Controlled Exposure
68 |
69 | The captured application may control which capturing applications are allowed to see this exposed information using `permittedOrigins`. Only capturers allowlisted by `permittedOrigins` may read the captured application's information.
70 |
71 | ## Sample Usage
72 |
73 | Consider the case of a conference call on **VC-MAX** where the local user chooses to present another tab with a slides deck by **Slides 3000**.
74 |
75 | The captured application, Slides 3000, is aware it could be captured:
76 | ```js
77 | function getSessionId() {
78 | ... // Returns some ID which is meaningful using loonyAPI.
79 | }
80 |
81 | function onPageLoaded() {
82 | ...
83 | setCaptureHandleConfig({
84 | exposeOrigin: true,
85 | handle: JSON.stringify({
86 | description: "See slides-3000.com for our API. Collaborations welcome!",
87 | protocol: "loonyAPI",
88 | version: "1.983",
89 | sessionId: getSessionId(),
90 | }),
91 | permittedOrigins: ['*']
92 | });
93 | ...
94 | }
95 | ```
96 |
97 | The capturing application, VC-MAX, reads the capture-handle of the captured display-surface the user chose:
98 | ```js
99 | function startCapture() {
100 | ...
101 | const stream = await navigator.mediaDevices.getDisplayMedia();
102 | const [track] = stream.getVideoTracks();
103 | if (track.getCaptureHandle) { // Feature detection.
104 | // Subscribe to notifications of the capture-handle changing.
105 | track.oncapturehandlechange = (event) => {
106 | OnNewCaptureHandle(event.target.getCaptureHandle());
107 | };
108 | // Read the current capture-handle.
109 | OnNewCaptureHandle(track.getCaptureHandle());
110 | }
111 | ...
112 | }
113 |
114 | function OnNewCaptureHandle(captureHandle) {
115 | if (captureHandle.origin == 'slides-3000.com') {
116 | const parsed = JSON.parse(captureHandle.handle);
117 | OnNewSlides300Session(parsed.protocol, parsed.version, parsed.sessionId);
118 | }
119 | }
120 |
121 | function OnNewSlides300Session(protocol, version, sessionId) {
122 | if (protocol != "loonyAPI" || version > "2.02") {
123 | return;
124 | }
125 | // Exposes prev/next buttons to the user. When clicked, these send
126 | // a message to some REST API, where |sessionId| indicates that the
127 | // message has to be relayed to the Slides 3000 session in question.
128 | ExposeSlides300Controls(sessionId);
129 | }
130 | ```
131 |
132 | # Privacy + Security Considerations
133 |
134 | ## Opt-In
135 | By default, nothing new is revealed.
136 |
137 | ## User-driven
138 | The process is still user-driven, as [getDisplayMedia](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getDisplayMedia) itself is user-driven - the user chooses what to share, as well as whether to share.
139 |
140 | ## Theoretically No-op
141 | When users consent to a capture, they implicitly allow the capturing-app to receive any information that the captured-app wishes broadcast. Such information could be broadcast by embedding “magic pixels” in the captured-app, embedding QR codes in the video stream, etc.
142 |
143 | The change described in this document is almost a no-op from the perspective of security and privacy. The word “almost” is necessary because previously the capturing-app would have to either take that information with a grain of salt, or validate it externally, e.g. by establishing contact with the collaborating app over some external medium. Now, at least one piece of information is mediated by the browser and therefore known to be non-spoofable - the origin.
144 |
145 | ## Captured-App Still Cannot Discover it is Being Captured
146 | This change does not allow a captured application to discover when it is being captured, unless the capturing application sends it an out-of-band message to that effect. This would have been possible regardless of the change described in this document, and is therefore not a concern.
147 |
148 | **Note:** The general concern with capturing applications discovering they’re being captured is that they could use that fact in order to censor their own content, limiting the user’s control and aggravating them. This concern does not apply to capturers who choose to inform the captured application of the capture. Such applications could either stop informing the captured application, after all. And the user is at the mercy of the capturing application to begin with, which could have chosen to never even start the capture.
149 |
150 | ## Controlled Exposure
151 | One concern is that a capturer could misbehave when capturing specific origins. For example, when a VC application detects the user is capturing a competitor’s productivity suite, it could display ads for its own productivity suite. This concern is mitigated through the CaptureHandleConfig’s permittedOrigins field, which allows applications to control which origins may observe the CaptureHandle they set.
152 |
153 | ## App-controlled Opaqueness
154 | Applications can use any CaptureHandle.handle they wish, and may also independently choose whether to expose their origin. The handle can either expose information according to some advertised format, or it can be completely opaque to anyone but a few privileged, collaborating apps.
155 |
156 | For example, HypotheticalSite could make it widely known that their format is “HypotheticalSite:”. This could then be used in tandem with some other API exposed by HypotheticalSite, such as an API for remotely controlling a playback (subject to access-control set by HypotheticalSite on their own API).
157 |
158 | Continuing with the example above, it would also be possible for HypotheticalSite to set their format to , with rand_guid as a 32-character hexadecimal string, and set exposeOrigin=false. It would then be difficult for arbitrary capturers to find out if they’re capturing a HypotheticalSite tab, since many different applications could follow that handle pattern. However, HypotheticalSite could give select collaborating applications access to a HypotheticalSite-operated API that checks whether a given GUID is a valid HypotheticalSite ID.
159 |
160 | ## Non-spoofable Origin
161 | If the captured-application opts into exposing its origin, the capturing application gains access to a field that is non-spoofable. Claims made in the capture-handle can be trusted by the capturer if they are known to originate from a given origin. If an application chooses to share a handle but not the origin, on the contrary, the capturing application can either treat that information as suspect, or verify it in some external way before using it.
162 |
163 | ## Improvements over Steganography
164 | As previously mentioned, applications could have previously used QR codes or [steganographic means](https://en.wikipedia.org/wiki/Steganography) to advertise some capture-handle. However, that would have been susceptible to interference from embedded frames, either intentionally or not. The capture handle mechanism, in contrast, is only accessible to the top-level document, and is safe from interference.
165 |
166 | ## Capturer Can Detect Navigation
167 | * Assume EXP is a site exposing something - possibly the origin, possibly a handle, possibly both. When we want to denote two sites exposing different configs, we’ll name them EXP1 and EXP2.
168 | * Assume sites NOEXP, NOEXP1, NOEXP2, etc. never call setCaptureHandleConfig. (Recall that this is treated as implicitly calling setCaptureHandleConfig with the empty config.)
169 |
170 | We distinguish these types navigation events:
171 | 1. NOEXP to EXP: Non-exposing site to exposing site.
172 | 2. EXP to NOEXP: Exposing site to non-exposing site.
173 | 3. EXP1 to EXP2: One exposing site to another, different exposing site.
174 | 4. EXP1 to EXP1*: One exposing site to another, but which sets the same configuration.
175 | 5. NOEXP1 to NOEXP2: One non-exposing site to another non-exposing site.
176 |
177 | #1 partially reveals navigation. Depending on additional parameters, the capturer might have either full or partial certainty over whether navigation occurred, or whether EXP set a capture handle relatively late.
178 |
179 | #2 and #3 make navigation unconcealable by definition.
180 |
181 | #4 is similar to #2 in its first stage. Recall that when navigating away from EXP1 to EXP1*, the browser does not know if/when EXP1* will call setCaptureHandleConfig, and must treat navigation away from EXP1 as an implicit call to setCaptureHandleConfig with the empty config.
182 |
183 | #5 We don’t fire an event in this case (rationale). The result of this decision is that a capturer can detect navigation away from an exposing site, but not navigation away from a non-exposing site.
184 |
185 | ## Excessive Events
186 | It is possible for a captured application to “bombard” its capturer with events by repeatedly calling setCaptureHandleConfig. Note that this is true regardless of whether the event fires only on a new config, or whenever setCaptureHandleConfig is called, since the application can alternately set two different handles. This concern does not seem significant, as:
187 | 1. The captured application would be expending the same general amount of resources as it would be costing the capturing application. (Note that the case of multiple capturers is very rare in practice, and even then is limited to only a handful of capturers.)
188 | 2. The captured application would normally be carrying out the attack without knowing whether it is being captured, let alone knowing by whom. (The case where the capturer communicated its identity back to the captured application presumes a level of collaboration that makes such an attack unlikely, and at any rate - within the power of the capturing application to avoid.)
189 |
190 | ## Incognito Mode
191 | Calls to setCaptureHandleConfig from an incognito tab must not be blocked, so as to avoid exposing incognito-status to the application. However, we avoid propagating the actual CaptureHandle between the capturing app and the captured app.
192 |
193 | # Upcoming Extensions
194 | * The current Capture Handle only applies to tab-capture.
195 | * It is possible to extending the concept to browser windows, where the captured window's active tab determines the capture-handle.
196 | * Extensions to native applications are possible, but are trickier, likely requiring OS-level support.
197 |
198 | # Goals and Non-Goals
199 |
200 | Communication often presupposes that both sides can recognize each other. Previously, a capturing application had no way to ergonomically and reliably detect which application was captured. It is this gap that we seek to address - **identification**. How communication proceeds is left entirely in the hands of the capturing+captured applications.
201 |
202 | Two noteworthy possible ways for communication to proceed are [BroadcastChannel](https://developer.mozilla.org/en-US/docs/Web/API/BroadcastChannel/BroadcastChannel) or a shared cloud infrastructure. Virtually all indirect communication methods presuppose that some **session ID** is used. For example, when using a BroadcastChannel, it would likely be useful to address messages to the specific tab being captured, and not to all tabs of that origin (consider multiple tabs with different presentations, only one of which is being shared by the user).
203 |
204 | # Consideration of Alternative Approaches
205 |
206 | ## Rejected Alternative #1: MessageChannel on MediaStreamTrack
207 | ### Idea
208 | Add a [MessageChannel](https://developer.mozilla.org/en-US/docs/Web/API/MessageChannel) to the MediaStreamTrack and allow the two applications to communicate directly.
209 |
210 | ### Drawbacks
211 | * This proposal lacks a convenient way for the capturing application to identify the captured application, and determine the protocol needed for communication. (I.e. what messages may be sent, that would be understood.)
212 | * If this approach is extended to include controlled exposure of the captured application's origin, then the approach becomes a more complex variant of Capture Handle, that also requires at least one RTT between the apps before anything useful can be done by the capturing application. (This limits the usefulness of Capture Handle for the upcoming Conditional Focus feature - link pending.)
213 | * In either case, the capturing application is forced to alert the captured application to the presence of a display-capture session.
214 | * Difficulties arise when considering that capture sessions can stop, restart, and that multiple applications could be capturing the same tab - but that the browser does not wish to alert the captured application to any of these developments. (It's OK if the capturing application does that, though.)
215 |
216 | ## Rejected Alternative #2: On-Rails Approach
217 | ### Idea
218 | Define a closed set of messages that can be sent from the capturer to the captured, e.g. by extending `MediaStreamTrack ` with a `jumpToSlide(num)` method.
219 |
220 | ### Drawbacks
221 | * This is a very partial solution, addressing only the subset of use-case #1 where the captured application has slides. Even then, it is unlikely that our imagination is going to be enough to think of all required actions, express them all in the form of simple actions with predetermined parameters.
222 | * The lack of an identification mechanism, and therefore of an authentication mechanism, means that collaboration between the capturing and captured applications would be limited to a set of simple, unprivileged actions which the captured application would be willing to accept from an arbitrary capturing application.
223 |
224 | # API Changelog
225 |
226 | ## Chrome m92
227 | API introduced and exposed as an origin trial.
228 |
229 | ## Chrome m93
230 | * Capture handle previously exposed `track.getSettings().captureHandle`; now as `track.getCaptureHandle()`.
231 | * Events previously contained the capture handle as `event.captureHandle`, now as `event.captureHandle()`.
232 |
233 | ## Chrome m102
234 | * `CaptureHandleChangeEvent` has been replaced by a simple `Event`.
235 |
--------------------------------------------------------------------------------
/identity/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Capture Handle - Bootstrapping Collaboration when Screensharing
6 |
7 |
8 |
9 |
10 |
11 |
Abstract
12 |
13 | This document proposes a mechanism by which an application APP can opt-in to
14 | exposing certain information with another application CAPTR, if
15 | CAPTR is screen-capturing the tab in which APP is running. It
16 | describes a mechanism for tab capture or
17 | window capture.
18 |
19 |
20 |
21 |
This document is not complete.
22 |
23 |
24 |
25 |
Problem Description
26 |
27 |
Generic Problem Description
28 |
29 | Consider a web-application, running in one tab, which we'll name "main_app."
30 | Assume main_app calls
31 | getDisplayMedia
32 | and the user chooses to either:
33 |
34 |
35 |
36 | Share another tab, where an application is running which we'll call
37 | "captured_app."
38 |
39 |
Share a user agent window hosting such a tab.
40 |
41 |
Note that:
42 |
43 |
main_app does not know what it is capturing.
44 |
45 | captured_app does not know that it is being captured; let alone by whom.
46 |
47 |
48 |
49 | Both these traits are desirable for the general case, but there exist legitimate use cases
50 | where the browser would want to allow applications to opt-in to bridging that gap and
51 | enable a connection.
52 |
53 |
54 | We wish to enable the legitimate use cases while keeping the general case as it was
55 | before.
56 |
57 |
58 |
59 |
Use-case #1: Driving Presentations from Video Conferencing Apps
60 |
61 | Consider a collaborating presentation software and video-conferencing software. Assume the
62 | user is in a VC session. The user starts sharing a presentation. Both applications are
63 | interested in letting the VC app discover that it is capturing a slides session, which
64 | application, and even which session, so that the VC application will be able to expose
65 | controls to the user for flipping through slides. When the user clicks those controls, the
66 | VC app will be able to send messages to the presentation app, requesting that it do such
67 | things as flip through slides, enter/leave presentation-mode, etc.
68 |
69 |
70 | The means for transmitting these messages are outside the scope of this document. Some
71 | options are:
72 |
73 |
74 |
Shared cloud infrastructure.
75 |
Messaging via a worker. (Note: Storage Partitioning might disrupt this option.)
76 |
A rudimentary messaging API might be added expressly for this purpose.
77 |
78 |
79 |
80 |
Use-case #2: Analytics
81 |
82 | Capturing applications often wish to gather statistics over what applications their users
83 | tend to capture. For example, VC applications would like to know how often their users
84 | share presentation applications from specific providers, Wikipedia, CNN, etc. Gathering
85 | such information can be used to improve service for the users by introducing new
86 | collaborations, such as the one described above.
87 |
88 |
89 |
90 |
Use-case #3: Detecting Unintended Captures
91 |
92 | Users sometimes choose to share the wrong tab. Sometimes they switch to sharing the wrong
93 | tab by clicking the share-this-tab-instead button by mistake. A benevolent application
94 | could try to protect the user by presenting an in-app dialog for re-confirmation, if they
95 | believe that the user may have made a mistake.
96 |
97 |
98 |
99 |
Use-case #4: Avoiding "Hall of Mirrors"
100 |
101 | This use-case is a sub-case of #3, but deserves its own section due to its importance. The
102 | "Hall of Mirrors" effect occurs when users choose to share the tab in which the VC call
103 | takes place. When detecting self-capture, a VC application can avoid displaying the
104 | captured stream back to the user, thereby avoiding the dreaded effect.
105 |
106 |
107 |
108 |
109 |
The Capture-Handle Mechanism
110 |
111 | The capture-handle mechanism consists of two main parts - one on the captured side, one on
112 | the capturing side.
113 |
114 |
115 |
116 | Captured applications opt-in to exposing information by
117 | calling {{MediaDevices/setCaptureHandleConfig}}.
118 |
128 | Applications are allowed to expose information to capturing applications. They would
129 | typically do so before knowing if they even are captured. The mechanism used is calling
130 | {{MediaDevices/setCaptureHandleConfig}} with an appropriate {{CaptureHandleConfig}}.
131 |
132 |
133 |
CaptureHandleConfig
134 |
135 | The CaptureHandleConfig dictionary is used to instruct the user agent what information the
136 | captured application intends to expose, and to which applications it is willing to expose
137 | said information.
138 |
152 | If true, the user agent MUST expose the captured application's origin
153 | through the {{CaptureHandle/origin}} field of {{CaptureHandle}}. If
154 | false, the user agent MUST NOT expose the captured application's origin.
155 |
156 |
157 |
158 | handle
159 |
160 |
161 |
The user agent MUST expose this value as {{CaptureHandle/handle}}.
162 |
163 | Note: Values to this field are limited to 1024 16-bit characters. This limitation is
164 | specified further in {{MediaDevices/setCaptureHandleConfig}}.
165 |
166 |
167 |
168 | permittedOrigins
169 |
170 |
171 |
Valid values of this field include:
172 |
173 |
The empty list.
174 |
A list with the single item "*"
175 |
A list consisting of valid origins.
176 |
177 |
178 | If {{CaptureHandleConfig/permittedOrigins}} consists of the single item
179 | "*", then the {{CaptureHandle}} is observable by all
180 | capturers. Otherwise, {{CaptureHandle}} is [=observable=] only to capturers whose
181 | origin is lists in {{CaptureHandleConfig/permittedOrigins}}.
182 |
183 |
184 |
185 |
186 |
187 |
MediaDevices.setCaptureHandleConfig()
188 |
189 | {{MediaDevices}} is extended with a method - {{MediaDevices/setCaptureHandleConfig}} -
190 | which accepts a {{CaptureHandleConfig}} object. By calling this method, an application
191 | informs the user agent which information it permits capturing applications to observe.
192 |
193 |
194 |
195 | There is no consensus yet on how {{MediaDevices/setCaptureHandleConfig}} should behave
196 | if called more than once, due to concerns over it being misused as a cross-origin
197 | messaging channel itself. This is under discussion in
198 | issue #11.
199 |
The user agent MUST run the following validations:
212 |
213 |
214 | If {{CaptureHandleConfig/handle}} is set to an invalid value, the user agent MUST
215 | reject by raising {{TypeError}}.
216 |
217 |
218 | If {{CaptureHandleConfig/permittedOrigins}} is set to an invalid value, the user
219 | agent MUST reject by raising {{NotSupportedError}}.
220 |
221 |
222 | If the call to {{MediaDevices/setCaptureHandleConfig()}} is not from the [=top-level
223 | browsing context=], the user agent MUST reject by raising {{InvalidStateError}}.
224 |
225 |
226 |
227 | If all validations passed, the user agent MUST accept the new config. The user agent
228 | MUST forget any previous call to {{MediaDevices/setCaptureHandleConfig}}; from now on,
229 | the application's {{CaptureHandleConfig}} is config.
230 |
231 |
232 | The [=observable=] {{CaptureHandle}} is re-evaluated for all capturing applications.
233 |
234 |
235 |
236 | For every capturing application for which the new [=observable=] {{CaptureHandle}}
237 | is different than prior to the call to {{MediaDevices/setCaptureHandleConfig}}, the
238 | user agent MUST [=fire an event=] named {{MediaStreamTrack/capturehandlechange}}.
239 |
240 |
241 | The user agent MUST report the new [=observable=] {{CaptureHandle}} whenever
242 | {{MediaStreamTrack/getCaptureHandle}} is called.
243 |
244 |
245 |
246 |
247 |
248 |
249 |
250 |
Capturing Side
251 |
252 | Capturing applications which are permitted to [=observable|observe=] a track's
253 | {{CaptureHandle}} have two ways of reading it.
254 |
255 |
256 |
Reading the current value returned by {{MediaStreamTrack/getCaptureHandle}}.
257 |
Registering an {{EventListener}} at {{MediaStreamTrack/oncapturehandlechange}}.
258 |
259 |
260 |
CaptureHandle
261 |
262 | The user agent exposes information about the captured application to the capturing
263 | application through the {{CaptureHandle}} dictionary. Note that a {{CaptureHandle}} object
264 | MUST NOT be given to a capturing application that is not permited to
265 | [=observable|observe=] it.
266 |
279 | If the captured application opted-in to exposing its origin (by setting
280 | {{CaptureHandleConfig/exposeOrigin}} to true), then the user agent MUST set
281 | {{CaptureHandle/origin}} to the origin of the captured application. Otherwise,
282 | {{CaptureHandle/origin}} is not set.
283 |
284 |
285 |
286 | handle
287 |
288 |
289 |
290 | The user agent MUST set this field to the value which the captured application set in
291 | {{CaptureHandleConfig/handle}}.
292 |
293 |
294 |
295 |
296 |
297 |
MediaStreamTrack.getCaptureHandle()
298 |
299 | Extend {{MediaStreamTrack}} with a method called {{MediaStreamTrack/getCaptureHandle}}.
300 | When the {{MediaStreamTrack}} is a video track derived of screen-capture,
301 | {{MediaStreamTrack/getCaptureHandle}} returns the latest [=observable=] {{CaptureHandle}}.
302 | Otherwise it returns null.
303 |
304 |
305 |
306 | There is no consensus yet on whether {{MediaStreamTrack/getCaptureHandle}} belongs on
307 | {{MediaStreamTrack}} or on a dedicated controller object that is neither
308 | clonable nor
309 | transferable, to separate messaging affecting all tracks from consumption of a single track. This
312 | is under discussion in
313 | issue #12.
314 |
327 | If the track in question is not a video track then the user agent MUST return
328 | null.
329 |
330 |
331 | If the track does not represent either a
332 | browser or a
333 | window
334 | display surface then the user
335 | agent MUST return null.
336 |
337 |
338 | If the track is [=MediaStreamTrack/ended=], then the user agent MUST return
339 | null.
340 |
341 |
342 | If the captured application did not set a {{CaptureHandleConfig}}, or if the last time
343 | it set it to the empty {{CaptureHandleConfig}}, then the user agent MUST return
344 | null.
345 |
346 |
347 | The user agent MUST compare the origin of the capturing document to those which the
348 | captured application listed in {{CaptureHandleConfig/permittedOrigins}}. If the
349 | capturing origin is not permitted to [=observable|observe=] the {{CaptureHandle}},
350 | then the user agent MUST return null.
351 |
352 |
353 | If all previous validations passed, then the user agent MUST return a
354 | {{CaptureHandle}} dictionary with the values derived of the last
355 | {{CaptureHandleConfig}} set by the captured application.
356 |
357 |
358 |
359 |
360 |
361 |
On-Change Event
362 |
363 |
364 | capturehandlechange
365 |
366 |
367 | Whenever the [=observable=] {{CaptureHandle}} for a given capturing application changes,
368 | the user agent fires an event named {{MediaStreamTrack/capturehandlechange}}. This can
369 | happen in the following cases:
370 |
371 |
372 |
373 | The captured application call {{MediaDevices/setCaptureHandleConfig()}} with a new
374 | {{CaptureHandleConfig}}. (Note that the new {{CaptureHandleConfig}} might or might not
375 | cause the [=observable=] {{CaptureHandle}} to change, e.g. if changing
376 | {{CaptureHandleConfig/permittedOrigins}}.)
377 |
378 |
379 | The captured application's [=top-level browsing context=] is navigated cross-document.
380 |
381 |
The user agent switches the track to follow a new application.
382 |
383 |
384 | Events are not fired when the track [=MediaStreamTrack/ended|ends=], nor after it
385 | [=MediaStreamTrack/ended|ends=].
386 |
387 |
388 |
389 |
oncapturehandlechange
390 |
391 | {{MediaStreamTrack}} is extended with an {{EventListener}} called
392 | {{oncapturehandlechange}}.
393 |
{{EventHandler}} for events named {{MediaStreamTrack/capturehandlechange}}.
405 |
406 |
407 |
408 |
409 |
410 |
411 |
412 |
--------------------------------------------------------------------------------
/identity/security-privacy-questionnaire.md:
--------------------------------------------------------------------------------
1 | ### 01. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
2 |
3 | Applications **opt-in** to exposing either a self-selected string, their origin (unspoofable), or both.
4 |
5 | ### 02. Do features in your specification expose the minimum amount of information necessary to enable their intended uses?
6 |
7 | Yes.
8 |
9 | ### 03. How do the features in your specification deal with personal information, personally-identifiable information (PII), or information derived from them?
10 |
11 | N/A. The application determines what is shared - PII or not. An application interested in leaking PII could do so in multiple other ways much more easily and reliably.
12 |
13 | ### 04. How do the features in your specification deal with sensitive information?
14 |
15 | By making its dissemination opt-in.
16 |
17 | ### 05. Do the features in your specification introduce new state for an origin that persists across browsing sessions?
18 |
19 | No.
20 |
21 | ### 06. Do the features in your specification expose information about the underlying platform to origins?
22 |
23 | No.
24 |
25 | ### 07. Does this specification allow an origin to send data to the underlying platform?
26 |
27 | Yes. That data is not stored long-term. It lives for as long as the application is loaded. It is not processed by the platform in any meaningful way.
28 |
29 | ### 08. Do features in this specification enable access to device sensors?
30 |
31 | No.
32 |
33 | ### 09. What data do the features in this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.
34 |
35 | Information which a screen-captured application opts into exposing, is exposed to capturers.
36 |
37 | ### 10. Do features in this specification enable new script execution/loading mechanisms?
38 |
39 | No.
40 |
41 | ### 11. Do features in this specification allow an origin to access other devices?
42 |
43 | No.
44 |
45 | ### 12. Do features in this specification allow an origin some measure of control over a user agent's native UI?
46 |
47 | No.
48 |
49 | ### 13. What temporary identifiers do the features in this specification create or expose to the web?
50 |
51 | Whichever identifiers the application itself assigns and opts into exposing.
52 |
53 | ### 14. How does this specification distinguish between behavior in first-party and third-party contexts?
54 |
55 | It does not.
56 |
57 | ### 15. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?
58 |
59 | Capture Handle exposure is not disallowed when either the capturing or captured application is in incognito mode, **unless** self-capturing.
60 |
61 | ### 16. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?
62 |
63 | No. But they can be added.
64 |
65 | ### 17. Do features in your specification enable origins to downgrade default security protections?
66 |
67 | No.
68 |
69 | ### 18. What should this questionnaire have asked?
70 |
71 | I think we're good.
72 |
--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 | Capture Handle - Landing Page
4 |
5 |
6 |
This repo is associated with two distinct documents: