├── .gitignore ├── .pr-preview.json ├── .prettierrc.json ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.md ├── README.md ├── captured-surface-control.js ├── images └── explainer │ ├── onboarding_mock.png │ ├── onboarding_mock_full_context.png │ └── zoom_controls_mock.png ├── index.html ├── questionnaire.md ├── style.css └── w3c.json /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /.pr-preview.json: -------------------------------------------------------------------------------- 1 | { 2 | "src_file": "index.html", 3 | "type": "respec" 4 | } 5 | -------------------------------------------------------------------------------- /.prettierrc.json: -------------------------------------------------------------------------------- 1 | { 2 | "printWidth": 100 3 | } 4 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | All documentation, code and communication under this repository are covered by the [W3C Code of Conduct](https://www.w3.org/policies/code-of-conduct/). 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Web Real-Time Communications Working Group 2 | 3 | Contributions to this repository are intended to become part of Recommendation-track documents governed by the 4 | [W3C Patent Policy](https://www.w3.org/Consortium/Patent-Policy/) and 5 | [Software and Document License](https://www.w3.org/copyright/software-license/). To make substantive contributions to specifications, you must either participate 6 | in the relevant W3C Working Group or make a non-member patent licensing commitment. 7 | 8 | If you are not the sole contributor to a contribution (pull request), please identify all 9 | contributors in the pull request comment. 10 | 11 | To add a contributor (other than yourself, that's automatic), mark them one per line as follows: 12 | 13 | ``` 14 | +@github_username 15 | ``` 16 | 17 | If you added a contributor by mistake, you can remove them in a comment with: 18 | 19 | ``` 20 | -@github_username 21 | ``` 22 | 23 | If you are making a pull request on behalf of someone else but you had no part in designing the 24 | feature, you can remove yourself with the above syntax. 25 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | All documents in this Repository are licensed by contributors 2 | under the 3 | [W3C Software and Document License](https://www.w3.org/copyright/software-license/). 4 | 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Captured Surface Control 2 | 3 | ## In a nutshell 4 | 5 | We introduce a new Web API that allows Web applications to: 6 | 7 | 1. Forward wheel events to a captured tab. 8 | 2. Read/write the zoom level of a captured tab. 9 | 10 | ## Motivation 11 | 12 | Nearly all video-conferencing Web applications offer their users the ability to share a browser tab, a native window, or screen. Many of these applications also show the local user a "preview tile" with a video of the captured [display surface](https://www.w3.org/TR/screen-capture/#dfn-display-surface). 13 | 14 | All these applications suffer from the same drawback - if the user wishes to interact with a captured tab or window, the user must first switch to that surface, taking them away from the video-conferencing application. This presents some challenges: 15 | 16 | - The user can't see the captured app and the videos of remote users at the same time unless they use [Picture-in-Picture](https://wicg.github.io/document-picture-in-picture/) or separate side-by-side windows for the video conference tab and the shared tab. On a smaller screen, this could be difficult. 17 | - The user is burdened by the need to jump between the video conferencing app and the captured surface. 18 | - The user loses access to the controls exposed by the video conferencing app while they are away from it; for example, an embedded chat app, emoji reactions, notifications about users asking to join the call, multimedia and layout controls, and other useful video conferencing features. 19 | 20 | The Captured Surface Control APIs address these problems. 21 | 22 | ## Terminology 23 | 24 | A **"capturing application"** is an application which has called [getDisplayMedia()](https://www.w3.org/TR/screen-capture/#dom-mediadevices-getdisplaymedia), leading the browser to present a "media-picker" dialog to the user, from which the user will have chosen to capture a tab, a window or a screen. The surface selected by the user is the **"captured surface"**. If that surface was a tab, we call the Web application currently loaded in that tab the **"captured application"**. If the user's chosen surface is a window, we designate the associated native application as the **"captured application"**. 25 | 26 | ## General API shape 27 | 28 | We extend [CaptureController](https://www.w3.org/TR/screen-capture/#capturecontroller) to support a limited set of low-risk actions. 29 | 30 | In the code samples provided in this document, assume the following preliminaries: 31 | 32 | ```js 33 | const controller = new CaptureController(); 34 | const stream = await navigator.mediaDevices.getDisplayMedia({ controller }); 35 | const previewTile = document.querySelector("video"); 36 | previewTile.srcObject = stream; 37 | ``` 38 | 39 | ### Prelude: Permission Prompt 40 | 41 | We distinguish between write-access and read-access APIs. 42 | 43 | - The write-access APIs introduced here, `forwardWheel()`, `increaseZoomLevel()`, `decreaseZoomLevel()` and `resetZoomLevel()`, are gated by a new [Permissions Policy](https://www.w3.org/TR/permissions-policy-1/#permissionspolicy) called `"captured-surface-control"`. 44 | - Our read-access APIs are innocuous and are threfore left ungated. 45 | 46 | With most browsers' interpretation of [Permissions Policy](https://www.w3.org/TR/permissions-policy-1/#permissionspolicy), the first time an origin invokes either `forwardWheel()`, `increaseZoomLevel()`, `decreaseZoomLevel()` or `resetZoomLevel()`, the browser shows a [permission prompt](https://w3c.github.io/permissions/#prompt-the-user-to-choose). How long this permission is persisted is up to the browser, with typical durations being "forever" or "for the current browsing session". 47 | 48 | Before displaying a permission prompt to the user, the app must solicit a user gesture. If the app wants to show zoom-in/out buttons ahead of time, then the user gesture is a given. But if the app wants to first inform the user about these new features, and provide clearer context about the ensuing permission prompt, then the app could include an onboarding experience that features a "start" button of some sort, after which it will invoke a write-access API in a way that will produce the prompt but will not cause change of state, as perceived by the user. An example is: 49 | 50 |

51 | 52 |

53 | 54 | Code to support this could look as follows: 55 | 56 | ```js 57 | document.getElementById("startButton").onclick = async () => { 58 | try { 59 | const hasPermission = await navigator.permissions.query({ 60 | name: "captured-surface-control", 61 | }); 62 | if (hasPermission.state !== "granted") { 63 | await controller.forwardWheel(previewTile); 64 | } 65 | } catch (e) { 66 | console.log(`Error: ${e}`); 67 | } 68 | }; 69 | ``` 70 | 71 | ### Scroll forwarding 72 | 73 | #### forwardWheel() 74 | 75 | To faciliate scrolling of captured surfaces, we extend `CaptureController` as follows: 76 | 77 | ```webidl 78 | partial interface CaptureController { 79 | Promise forwardWheel(HTMLElement? element); 80 | }; 81 | ``` 82 | 83 | Using `forwardWheel()`, a capturing application can forward subsequent [wheel events](https://developer.mozilla.org/en-US/docs/Web/API/Element/wheel_event) from a local element, such as the preview tile, to the captured surface's viewport. The browser determines the coordinates of the event relative to the origin of the target element, then produces a corresponding event on the captured surface at corresponding coordinates, after scaling. This forwarded event is indistinguishable to the captured application from direct user interaction. 84 | 85 | `forwardWheel()` is subject to a permissions policy, which might involve a permission prompt. The method returns a `Promise` that is resolved if the connection is successfully made, and rejected otherwise (for example, if the user rejects the permission prompt). 86 | 87 | Sample usage: 88 | 89 | ```js 90 | try { 91 | await controller.forwardWheel(previewTile); 92 | } catch (e) { 93 | console.log(`Error: ${e}`); 94 | } 95 | ``` 96 | 97 | It is possible to use `forwardWheel()` with any type of element. This allows applications to forward gestures from elements other than the `HTMLVideoElement` itself. Thanks to this useful property of the API, applicationss can draw text, annotations and emoji-reactions over the video preview tile, and the experience will still work as the user expects. 98 | 99 | To stop the forwarding of wheel events, applications can invoke `forwardWheel(null)`. 100 | 101 | Forwarding will also stop if the capture-session ends for whatever reason. 102 | 103 | ## Zoom controls 104 | 105 | To faciliate read-access and write-access to a captured surface's zoom, we extend `CaptureController` as follows: 106 | 107 | ```webidl 108 | partial interface CaptureController { 109 | sequence getSupportedZoomLevels(); 110 | readonly attribute long? zoomLevel; 111 | Promise increaseZoomLevel(); 112 | Promise decreaseZoomLevel(); 113 | Promise resetZoomLevel(); 114 | attribute EventHandler onzoomlevelchange; 115 | }; 116 | ``` 117 | 118 | #### getSupportedZoomLevels() 119 | 120 | Returns a list of valid zoom-levels for the captured display surface. This zoom level is represented as a percentage of the "default zoom-level", which is defined as 100%. The list is guaranteed to be monotonically increasing, and is guaranteed to contain the value `100`. It is also guaranteed to contain the minimum and maximum values. 121 | 122 | Note that: 123 | - The list `[100]` is technically valid, as the minimum/maximum values are not required to be distinct from the default value or from each other. 124 | - The user agents may trim the list to a reasonable length. If the need arises, this function may in the future be extended to receive an argument with the maximum number of entries the application is interested in receiving. 125 | - `getSupportedZoomLevels()` may only be called if `zoomLevel` is non-null; otherwise, the user agent does not have a concept of zoom level for that type of display surface, and if `getSupportedZoomLevels()` were called, an exception would be raised. 126 | - `getSupportedZoomLevels()` may only be called while `controller` is associated with an active capture; otherwise, the method raises an exception. 127 | 128 | #### zoomLevel 129 | 130 | This read-only attribute contains the current zoom-level of the captured surface. (Or `null` if a zoom level is not defined to the display surface, which is currently the case for windows and screens.) 131 | 132 | Sample usage: 133 | 134 | ```js 135 | currentZoomLabel.textContent = `${controller.zoomLevel}%`; 136 | ``` 137 | 138 | This method is not gated by a permission policy. 139 | 140 | After capture stops, reading `zoomLevel` would yield the last value it held while the capture was active. Notably, `zoomLevel` will NOT be updated after the capture session ends even if the captured surface's zoom level changes again. 141 | 142 | #### increaseZoomLevel(), decreaseZoomLevel() and resetZoomLevel() 143 | 144 | These methods are used to increase, decrease or reset the zoom level of the captured surface. (Resetting sets the value to the default - 100.) 145 | 146 | These methods are subject to a permissions policy, which might involve a permission prompt. These methods return a promise. If the permission policy is in the `'granted'` state, or if it is in the `'prompt'` state and the user does grant it once prompted, the promise is resolved; otherwise, it is rejected. 147 | 148 | One way to use these methods is to present UX elements to the user: 149 | 150 |

151 | 152 |

153 | 154 | Code backing up these controls could look like: 155 | 156 | ```js 157 | zoomIncreaseButton.addEventListener("click", async (event) => { 158 | try { 159 | await controller.increaseZoomLevel(); 160 | } catch (e) { 161 | console.log(`Error: ${e}`); 162 | } 163 | }); 164 | ``` 165 | 166 | #### onzoomlevelchange 167 | 168 | Users can change the captured application's zoom-level by interacting with the user agent, the captured application, or possibly by additional means. If the capturing application is displaying any user-facing controls and UX element, such as an indicator of the current zoom-level, or buttons to increase/decrease zoom, then the capturing application will want to listen to such externally-triggered zoom-changes, and reflect them in the capturing application's own UX. The `onzoomlevelchange` event handler helps with that. 169 | 170 | Sample usage: 171 | 172 | ```js 173 | controller.addEventListener("zoomlevelchange", (event) => { 174 | const zoomLevel = controller.zoomLevel; 175 | 176 | // Update label. 177 | zoomLevelLabel.textContent = `${zoomLevel}%`; 178 | 179 | // Update controls. 180 | const supportedZoomLevels = controller.getSupportedZoomLevels(); 181 | const currentZoomLevelIndex = supportedZoomLevels.indexOf(zoomLevel); 182 | zoomIncreaseButton.disabled = currentZoomLevelIndex >= supportedZoomLevels.length - 1; 183 | zoomDecreaseButton.disabled = currentZoomLevelIndex <= 0; 184 | }); 185 | ``` 186 | 187 | ## Security and Privacy Considerations 188 | 189 | ### Permission prompts 190 | 191 | Permission prompts are currently used as mitigations for Web Platform capabilities which are arguably even riskier than those presented in this document - clipboard access, geolocation, mic- and camera-access, and most notably, screen-capture itself. It follows that, if the prompt can be clear enough for the user, it should be a sufficient mitigation for the risks associated with the API surfaces we introduce. 192 | 193 | ### Risks and mitigations 194 | 195 | #### User confusion 196 | 197 | To obtain initial permission to use the API, and to keep on using it, an application does not need to show the user a video representation of the surface under control via a `