├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.md ├── Makefile ├── README.md ├── dash-events-explainer.md ├── emsg-processing-model-figure1.png ├── emsg-processing-model-figure2.png ├── emsg-processing-model-figure3.png ├── emsg-processing-model.md ├── explainer.md ├── inband-events-using-datacue.png ├── inband-events-using-vttcue.png ├── index.bs ├── index.html ├── requirements.md ├── text-track-cue-constructor.md └── w3c.json /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Code of Conduct 2 | 3 | All documentation, code and communication under this repository are covered by the [W3C Code of Ethics and Professional Conduct](https://www.w3.org/Consortium/cepc/). 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Web Platform Incubator Community Group 2 | 3 | This repository is being used for work in the W3C Web Platform Incubator Community Group, governed by the [W3C Community License 4 | Agreement (CLA)](http://www.w3.org/community/about/agreements/cla/). To make substantive contributions, 5 | you must join the CG. 6 | 7 | If you are not the sole contributor to a contribution (pull request), please identify all 8 | contributors in the pull request comment. 9 | 10 | To add a contributor (other than yourself, that's automatic), mark them one per line as follows: 11 | 12 | ``` 13 | +@github_username 14 | ``` 15 | 16 | If you added a contributor by mistake, you can remove them in a comment with: 17 | 18 | ``` 19 | -@github_username 20 | ``` 21 | 22 | If you are making a pull request on behalf of someone else but you had no part in designing the 23 | feature, you can remove yourself with the above syntax. 24 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | All Reports in this Repository are licensed by Contributors 2 | under the 3 | [W3C Software and Document License](http://www.w3.org/Consortium/Legal/2015/copyright-software-and-document). 4 | 5 | Contributions to Specifications are made under the 6 | [W3C CLA](https://www.w3.org/community/about/agreements/cla/). 7 | 8 | Contributions to Test Suites are made under the 9 | [W3C 3-clause BSD License](https://www.w3.org/Consortium/Legal/2008/03-bsd-license.html) 10 | 11 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SOURCEFILE=index.bs 2 | OUTPUTFILE=index.html 3 | PREPROCESSOR=bikeshed 4 | REMOTE_PREPROCESSOR_URL=https://api.csswg.org/bikeshed/ 5 | 6 | all: $(OUTPUTFILE) 7 | 8 | $(OUTPUTFILE): $(SOURCEFILE) 9 | ifneq (,$(REMOTE)) 10 | curl $(REMOTE_PREPROCESSOR_URL) -F file=@$(SOURCEFILE) > "$@" 11 | else 12 | $(PREPROCESSOR) -f spec "$<" "$@" 13 | endif 14 | 15 | clean: 16 | rm -f index.html 17 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Web Incubator CG For DataCue 2 | 3 | This is the repository for the DataCue Web Incubator CG, a collaborative project hosted by the WICG. 4 | 5 | The repo is used for developing documentation and code resources identified by the group. 6 | 7 | ## Proposals 8 | 9 | ### DataCue API 10 | 11 | * [Explainer](explainer.md) 12 | 13 | ### TextTrackCue enhancements for programmatic subtitle and caption presentation 14 | 15 | * [Explainer](https://github.com/WebKit/explainers/blob/main/texttracks/README.md) 16 | 17 | ### Expose TextTrackCue constructor 18 | 19 | * [Explainer](text-track-cue-constructor.md) 20 | 21 | ## References 22 | 23 | * [Draft Spec](https://wicg.github.io/datacue/) 24 | * [DataCue Explainer](explainer.md) 25 | * [Requirements for Media Timed Events](https://w3c.github.io/me-media-timed-events/) W3C Media & Entertainment Interest Group Note 26 | * [WICG Discourse Thread](https://discourse.wicg.io/t/media-timed-events-api-for-mpeg-dash-mpd-and-emsg-events/3096) 27 | * [Video Metadata Cues TPAC 2018 breakout summary](https://github.com/w3c/strategy/issues/113#issuecomment-432971265) 28 | * [Video Search with Location OGC TC Leuven 2019 breakout summary](https://github.com/w3c/sdw/issues/1130#issuecomment-508531749) 29 | * [Video Metadata for Moving Objects & Sensors TPAC 2020 breakout summary](https://github.com/w3c/sdw/issues/1194#issuecomment-718702993) 30 | -------------------------------------------------------------------------------- /dash-events-explainer.md: -------------------------------------------------------------------------------- 1 | # Browser Handling of DASH Event Messages 2 | 3 | The aim of this proposal is to define browser-native handling of MPEG-DASH timed metadata events, which today is handled at the web application layer. MPEG DASH `emsg` events are included in MPEG CMAF, which has emerged as the common media delivery format in HLS and MPEG-DASH. 4 | 5 | The current approach for handling in-band event information, implemented by libraries such as [dash.js](https://github.com/Dash-Industry-Forum/dash.js/wiki) and [hls.js](https://github.com/video-dev/hls.js), is to parse the media segments in JavaScript to extract the events and construct `VTTCue` objects. 6 | 7 | On resource constrained devices such as smart TVs and streaming sticks, this leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread (although we note the [proposal](https://github.com/wicg/media-source/blob/mse-in-workers-using-handle/mse-in-workers-using-handle-explainer.md) to make Media Source Extensions available to Worker contexts). There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided. 8 | 9 | Avoiding parsing in JavaScript is also important for low latency video streaming applications, where minimizing the time taken to pass media content through to the media element's playback buffer is essential. 10 | 11 | Instead of using `VTTCue`, a separate proposal introduces `DataCue` as a more appropriate cue API for timed metadata. See the [DataCue explainer](explaner.md) for details. 12 | 13 | ## Use cases 14 | 15 | Many of the use cases are described in the [DataCue explainer](explainer.md). 16 | 17 | ### Dynamic content insertion 18 | 19 | A media content provider wants to allow insertion of content, such as personalised video, local news, or advertisements, into a video media stream that contains the main program content. To achieve this, timed metadata is used to describe the points on the media timeline, known as splice points, where switching playback to inserted content is possible. 20 | 21 | [SCTE 35](https://scte-cms-resource-storage.s3.amazonaws.com/ANSI_SCTE-35-2019a-1582645390859.pdf) defines a data cue format for describing such insertion points. Use of these cues in MPEG-DASH streams is described in [SCTE 214-1](https://scte-cms-resource-storage.s3.amazonaws.com/Standards/ANSI_SCTE%20214-1%202016.pdf), [SCTE 214-2](https://scte-cms-resource-storage.s3.amazonaws.com/Standards/ANSI_SCTE%20214-2%202016.pdf), and [SCTE 214-3](https://scte-cms-resource-storage.s3.amazonaws.com/Standards/ANSI_SCTE%20214-3%202015.pdf). Use in HLS streams is described in SCTE-35 section 12.2. 22 | 23 | ### Media player control messages 24 | 25 | MPEG-DASH defines several control messages for media streaming clients (e.g., libraries such as [dash.js](https://github.com/Dash-Industry-Forum/dash.js/wiki)). Control messages exist for several scenarios, such as: 26 | 27 | * The media player should refresh or update its copy of the manifest document (MPD) 28 | * The media player should make an HTTP request to a given URL for analytics purposes 29 | * The media presentation will end at a time earlier than expected 30 | 31 | These messages may be carried as in-band `emsg` events in the media container files. 32 | 33 | ## Proposed API 34 | 35 | The proposed API is based on the existing [text track support](https://html.spec.whatwg.org/multipage/media.html#timed-text-tracks) in HTML and the [proposed `DataCue` API](explainer.md). 36 | 37 | > TODO: Add API summary 38 | 39 | As new `emsg` event types can be introduced from time to time, we propose to expose the raw binary `emsg` data for applications to parse. This avoids the need for browsers to natively understand the structure of the event messages. 40 | 41 | We will need to specify how to extract in-band timed metadata from the media container, and the structure in which the data is exposed via the `DataCue` interface. There are a couple of options for how to do this: 42 | 43 | 1. We could update the existing [Sourcing In-band Media Resource Tracks from Media Containers into HTML](https://dev.w3.org/html5/html-sourcing-inband-tracks/) spec. 44 | 45 | 2. We could produce a new set of specifications, following a registry approach with one specification per media format that describes the timed metadata details for that format, similar to the [Media Source Extensions Byte Stream Format Registry](https://www.w3.org/TR/mse-byte-stream-format-registry/). This could be based on [Sourcing In-band Media Resource Tracks from Media Containers into HTML](https://dev.w3.org/html5/html-sourcing-inband-tracks/). 46 | 47 | ## Code examples 48 | 49 | > TODO: Needs updating: show how to subscribe to specific event streams, show how to set the dispatch mode. 50 | 51 | ### Subscribing to receive in-band timed metadata cues 52 | 53 | This example shows how to add a `cuechange` handler that can be used to receive media-timed data and event cues. 54 | 55 | ```javascript 56 | const video = document.getElementById('video'); 57 | 58 | video.textTracks.addEventListener('addtrack', (event) => { 59 | const textTrack = event.track; 60 | 61 | if (textTrack.kind === 'metadata') { 62 | textTrack.mode = 'hidden'; 63 | 64 | // See cueChangeHandler examples below 65 | textTrack.addEventListener('cuechange', cueChangeHandler); 66 | } 67 | }); 68 | ``` 69 | 70 | ### MPEG-DASH callback event handler 71 | 72 | ```javascript 73 | const cueChangeHandler = (event) => { 74 | const metadataTrack = event.target; 75 | const activeCues = metadataTrack.activeCues; 76 | 77 | for (let i = 0; i < activeCues.length; i++) { 78 | const cue = activeCues[i]; 79 | 80 | // The UA delivers parsed message data for this message type 81 | if (cue.type === 'urn:mpeg:dash:event:callback:2015' && 82 | cue.value.emsgValue === '1') { 83 | const url = cue.value.data; 84 | fetch(url).then(() => { console.log('Callback completed'); }); 85 | } 86 | } 87 | }; 88 | ``` 89 | 90 | ### SCTE-35 dynamic content insertion cue handler 91 | 92 | This example shows how a web application can handle [SCTE 35](https://scte-cms-resource-storage.s3.amazonaws.com/Standards/ANSI_SCTE%20214-3%202015.pdf) cues, both in the case where the cues are parsed by the browser implementation, and where parsed by the web application. 93 | 94 | ```javascript 95 | const cueChangeHandler = (event) => { 96 | const metadataTrack = event.target; 97 | const activeCues = metadataTrack.activeCues; 98 | 99 | for (let i = 0; i < activeCues.length; i++) { 100 | const cue = activeCues[i]; 101 | 102 | if (cue.type === 'urn:scte:scte35:2013:bin') { 103 | // Parse the SCTE-35 message payload. 104 | // parseSCTE35Data() is similar to Comcast's scte35.js library, 105 | // adapted to take an ArrayBuffer as input. 106 | // https://github.com/Comcast/scte35-js/blob/master/src/scte35.ts 107 | const scte35Message = parseSCTE35Data(cue.value.data); 108 | 109 | console.log(cue.startTime, cue.endTime, scte35Message.tableId, scte35Message.spliceCommandType); 110 | } 111 | } 112 | }; 113 | ``` 114 | 115 | ### Cue enter/exit handlers 116 | 117 | This example shows how a web application can use the proposed new `addcue` event to attach `enter` and `exit` handlers to each cue on the metadata track. 118 | 119 | ```javascript 120 | // video.currentTime has reached the cue start time 121 | // through normal playback progression 122 | const cueEnterHandler = (event) => { 123 | const cue = event.target; 124 | console.log('cueEnter', cue.startTime, cue.endTime); 125 | }; 126 | 127 | // video.currentTime has reached the cue end time 128 | // through normal playback progression 129 | const cueExitHandler = (event) => { 130 | const cue = event.target; 131 | console.log('cueExit', cue.startTime, cue.endTime); 132 | }; 133 | 134 | // A cue has been parsed from the media container 135 | const addCueHandler = (event) => { 136 | const cue = event.cue; 137 | 138 | // Attach enter/exit event handlers 139 | cue.onenter = cueEnterhandler; 140 | cue.onexit = cueExitHandler; 141 | }; 142 | 143 | const video = document.getElementById('video'); 144 | 145 | video.textTracks.addEventListener('addtrack', (event) => { 146 | const textTrack = event.track; 147 | 148 | if (textTrack.kind === 'metadata') { 149 | textTrack.mode = 'hidden'; 150 | 151 | textTrack.addEventListener('addcue', addCueHandler); 152 | } 153 | }); 154 | ``` 155 | 156 | ## Considered alternatives 157 | 158 | > TODO 159 | 160 | ## References 161 | 162 | This explainer is based on content from a [Note](https://w3c.github.io/me-media-timed-events/) written by the W3C Media and Entertainment Interest Group, and from a number of associated discussions, including the [TPAC breakout session on video metadata cues](https://github.com/w3c/strategy/issues/113#issuecomment-432971265). It is also closely related to the DASH-IF [DASH Player's Application Events and Timed Metadata Processing Models and APIs](https://dashif-documents.azurewebsites.net/Events/master/event.html) document. 163 | -------------------------------------------------------------------------------- /emsg-processing-model-figure1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WICG/datacue/dac8f4ece6f84aaf61f2dc6257d105319066a324/emsg-processing-model-figure1.png -------------------------------------------------------------------------------- /emsg-processing-model-figure2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WICG/datacue/dac8f4ece6f84aaf61f2dc6257d105319066a324/emsg-processing-model-figure2.png -------------------------------------------------------------------------------- /emsg-processing-model-figure3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WICG/datacue/dac8f4ece6f84aaf61f2dc6257d105319066a324/emsg-processing-model-figure3.png -------------------------------------------------------------------------------- /emsg-processing-model.md: -------------------------------------------------------------------------------- 1 | # DASH inband event processing using MSE data model 2 | 3 | Iraj Sodagar, irajs@live.com 4 | 5 | Tencent America 6 | 7 | 2021-05-17 8 | 9 | ## Introduction 10 | 11 | This document provides an extended W3C [Media Source Extensions](https://w3c.github.io/media-source/) (MSE) model for the processing of DASH inband events. 12 | 13 | Note: The current MSE specification does not support the processing of inband events and this document is just one possible illustrative design on how MSE can be extended to support such functionality. 14 | 15 | ## Dispatch modes 16 | 17 | See https://www.w3.org/TR/media-timed-events/#event-triggering 18 | 19 | For those events that the application has subscribed to receive, the API should: 20 | 21 | * Generate a DOM event when an in-band media timed event cue is parsed from the media container or media stream (DASH-IF **on-receive** mode). 22 | * Generate DOM events when the current media playback position reaches the start time and the end time of a media timed event cue during playback (DASH-IF **on-start** mode). This applies equally to cues generated by the user agent when parsed from the media container and cues added by the web application. 23 | 24 | In general, it is not possible for the UA to know which dispatch mode to use for any given event type. So we introduce the following API so that the web application can tell the UA: 25 | 26 | ```javascript 27 | enum DispatchMode { 28 | "onstart", 29 | "onreceive" 30 | }; 31 | 32 | interface InbandEventTrack extends TextTrack { 33 | undefined subscribe(DOMString eventType, DispatchMode dispatchMode); 34 | undefined unsubscribe(DOMString eventType); 35 | }; 36 | ``` 37 | 38 | > TODO: How to construct `eventType` from emsg `scheme_id_uri` and `value`? 39 | 40 | > TODO: How to construct an `InbandEventTrack`? Some UAs already process inband events using `TextTrack` and create the `TextTrack` automatically. 41 | 42 | ## Process@append rule 43 | 44 | The process@append rule defines how the inband events of a segment are processed, i.e. parsed, and dispatched or scheduled to dispatch at the time of appending the segment to the MSE [`SourceBuffer`](https://w3c.github.io/media-source/#sourcebuffer) using [`appendBuffer()`](https://www.w3.org/TR/media-source/#dom-sourcebuffer-appendbuffer). 45 | 46 | In the case of an inband event whose `eventType` has not been subscribed by the web application: 47 | 48 | 1. Discard the event 49 | 50 | In the case of an inband event whose `eventType` has been subscribed by the web application with the **onreceive** dispatch mode: 51 | 52 | 1. If the event end time is not smaller than the current playback position, and 53 | 2. If this event or an equivalent has not been dispatched before, 54 | 55 | Then the dispatcher dispatches the event immediately. 56 | 57 | In the case of an inband event whose `eventType` has been subscribed by the web application with the **onstart** dispatch mode: 58 | 59 | 1. If the current playback position is not smaller than the event start time, and 60 | 2. If the current playback position is not equal or larger than the event end time, and 61 | 3. If this event or an equivalent has not been dispatched before, 62 | 63 | Then the event is stored in a dispatch buffer for dispatching at the event start time. 64 | 65 | ## Dispatch buffer timing model 66 | 67 | Figure 1 demonstrates an inband event with the **onstart** dispatch mode relative to the MSE timing model. 68 | 69 |

70 | Media source and in-band event dispatch buffers 71 |

72 | 73 | Figure 1. Media source and in-band event dispatch buffers 74 | 75 | ## Implementation 76 | 77 | Figure 2 demonstrates an example of the overlapping events with **onstart** dispatch mode. 78 | 79 |

80 | Event buffer model example for onstart events 81 |

82 | 83 | Figure 2. Event buffer model example for **onstart** events 84 | 85 | As is shown above, emsgs E0, E1, and E2 are mapped to the dispatch buffer. With the initial appending of the S1 media segment to the media buffer, the ranges between the event's start and the event's end are marked in the dispatch buffer for E0 and E1. 86 | 87 | When S2 is appended to the media buffer, in this case since E2 overlaps with E1 in the dispatch buffer, the corresponding range in the dispatched first is divided into 3 subranges, as shown in the figure. 88 | 89 | Figure 3 demonstrates an example of an overwrite, in which the segment S2 is overwritten by a new segment S2' (that does not contain any emsgs) and has a duration that only covers a portion of S2 in the media buffer. 90 | 91 |

92 | Overwrite of a part of a segment with events having onstart dispatch mode 93 |

94 | 95 | Figure 3. Overwrite of a part of a segment with events having **onstart** dispatch mode 96 | 97 | As shown, since the event E2 has the **onstart** dispatch mode, its range in the dispatch buffer is unchanged. 98 | 99 | ## Algorithms 100 | 101 | ### Initialization 102 | 103 | 1. Applications inputs to DASH client 104 | 1. Subscribe SchemeIdURI/value 105 | 2. Provide dispatch mode 106 | 2. Event buffer initialization: 107 | 1. Event dispatch (range of event purge may go beyond media buffer) 108 | 3. Set Presentation Time Offset (PTO) 109 | 110 | ### Append 111 | 112 | Add the following steps to the MSE [Segment Parser Loop](https://www.w3.org/TR/media-source/#sourcebuffer-segment-parser-loop), which is called from the [Buffer Append Algorithm](https://www.w3.org/TR/media-source/#sourcebuffer-buffer-append). After step 6.2., which describes handling of coded frames, add: 113 | 114 | 1. If the input buffer contains one or more __inband event messages__, then run the __inband event processing algorithm__. 115 | 116 | ### Inband Event Processing 117 | 118 | The __inband event processing algorithm__ is a new algorithm which we propose to add to MSE. 119 | 120 | When __inband event messages__ have been parsed by the segment parser loop then the following steps are run: 121 | 122 | 1. For each inband event message in the media segment run the following steps: 123 | 1. Parse the emsg 124 | 2. Generate the `eventType` from the emsg.scheme_id_uri and emsg.value 125 | 3. Look up the `eventType` in the `InbandEventTrack`'s list of subscribed event types 126 | 1. If not present, discard the emsg and abort these steps 127 | 4. Calculate the `startTime` and `endTime` values for the `DataCue`: 128 | 1. For emsg v0 (esmg.version = 0): startTime = segment_start + emsg.presentation_time_delta / emsg.time_scale 129 | 2. For emsg v1 (emsg.version = 1): startTime = emsg.presentation_time / emsg.timescale 130 | 3. If emsg.duration is 0xFFFFFFFF then endTime = +Infinity, else endTime = startTime + emsg.duration / emsg.timescale 131 | 5. If there is an equivalent event message in the `InbandEventTrack`'s [list of text track cues](https://html.spec.whatwg.org/multipage/media.html#text-track-list-of-cues), discard the event message and abort these steps. An event message is equivalent if its `id`, `scheme_id_uri`, and `value` values are the same those of any existing 132 | 5. Construct a `DataCue` instance with the following attributes: 133 | 1. startTime (as calculated above) 134 | 2. endTime (as calculated above) 135 | 3. id = emsg.id 136 | 4. pauseOnExit = false 137 | 5. type = emsg.scheme_id_uri 138 | 6. value = { data: emsg.message_data, emsgValue: emsg.value } 139 | 5. If the subscription's dispatch mode is **onreceive**, queue a task to fire an event named `addcue` at the `InbandEventTrack` object with the `cue` attribute initialized to the new `DataCue` object 140 | 6. If the subscription's dispatch mode is **onstart**, run the HTML [`addCue()` steps](https://html.spec.whatwg.org/multipage/media.html#dom-texttrack-addcue) with the new `DataCue` object 141 | 142 | ### Dispatch 143 | 144 | > TODO: Firing of cue `enter` and `exit` for event messages in **onstart** dispatch mode is handled by the HTML [time marches on steps](https://html.spec.whatwg.org/multipage/media.html#time-marches-on) 145 | 146 | 1. Find the events occurring in the dispatch buffer at the playback position. 147 | 2. For each event 148 | 1. If its emsg.id is not in the "already-dispatched" table, 149 | 1. Dispatch the event 150 | 2. Add its emsg.id to the "already-dispatched" table 151 | 3. Remove the event from the dispatch buffer 152 | 2. Otherwise, remove the event from the dispatch buffer 153 | 154 | ### Purge 155 | 156 | > TODO: Purging is controlled by the web application, by calling [SourceBuffer.remove(startTime, endTime)](https://www.w3.org/TR/media-source/#dom-sourcebuffer-remove). The MSE [Range Removal](https://www.w3.org/TR/media-source/#sourcebuffer-range-removal) algorithm applies. Should this algorithm also remove cues that lie in the removed time range? 157 | 158 | In a purge operation, either a range from the start or a range from the end of the media buffer is purged. This range is referred to as the "purged-range" in this subclause. 159 | 160 | 1. If any event in the dispatch buffer overlaps with the purged-range 161 | 1. Split the event into two events around the purge-range boundary 162 | 2. Remove the purged-range from the dispatch buffer 163 | -------------------------------------------------------------------------------- /explainer.md: -------------------------------------------------------------------------------- 1 | # DataCue 2 | 3 | DataCue is a proposed web API to allow support for timed metadata, i.e., metadata information that is synchronized to audio or video media. 4 | 5 | Timed metadata can be used to support use cases such as dynamic content replacement, ad insertion, or presentation of supplemental content alongside the audio or video, or more generally, making changes to a web page, or executing application code triggered from JavaScript events, at specific points on the media timeline of an audio or video media stream. 6 | 7 | ## Use cases 8 | 9 | Timed metadata can be used to support use cases such as dynamic content replacement, ad insertion, or presentation of supplemental content alongside the audio or video, or more generally, making changes to a web page, or executing application code triggered from JavaScript events, at specific points on the media timeline of an audio or video media stream. The following sections describe some specific use cases in more detail. 10 | 11 | ### Lecture recording with slideshow 12 | 13 | An HTML page contains title and information about the course or lecture, and two frames: a video of the lecturer in one and their slides in the other. Each timed metadata cue contains the URL of the slide to be presented, and the cue is active for the time range over which the slide should be visible. 14 | 15 | ### Media stream with video and synchronized graphics 16 | 17 | A website wants to provide synchronized graphical elements that may be rendered next to or on top of a video. 18 | 19 | For example, in a talk show this could be a banner, shown in the lower third of the video, that displays the name of the guest. In a sports event, the graphics could show the latest lap times or current score, or highlight the location of the current active player. It could even be a full-screen overlay, to blend from one part of the program to another. 20 | 21 | The graphical elements are described in a stream or file containing cues that describe the start and end time of each graphical element, similar to a subtitle stream or file. The web application takes this data as input and renders it on top of the video image according to the cues. 22 | 23 | The purpose of rendering the graphical elements on the client device, rather than rendering them directly into the video image, is to allow the graphics to be optimized for the device's display parameters, such as aspect ratio and orientation. Another use case is adapting to user preferences, for localization or to improve accessibility. 24 | 25 | This use case requires frame accurate synchronization of the content being rendered over the video. 26 | 27 | ### Synchronized map animations 28 | 29 | A user records footage with metadata, including geolocation, on a mobile video device such as a drone or dashcam, to share on the web alongside a map, e.g., OpenStreetMap. 30 | 31 | WebVMT is an open format for metadata cues, synchronized with audio or video media, that can be used to drive an online map rendered in a separate HTML element alongside the media element on the web page. The media playhead position controls presentation and animation of the map, e.g., pan and zoom, and allows annotations to be added and removed, e.g., markers, at specified times during media playback. Control can also be overridden by the user with the usual interactive features of the map at any time, e.g., zoom. Concrete examples are provided by the [tech demos at the WebVMT website](http://webvmt.org/demos). 32 | 33 | ### Media metadata search results 34 | 35 | A user searches for online media matching certain metadata conditions, for example within a given distance of a geographic location or an acceleration profile corresponding to a traffic accident. Results are returned from a remote server using a RESTful API as a list in JSON format. 36 | 37 | It should be possible for search results to be represented as media in the user agent, with linked metadata presented as `DataCue` objects programmatically to provide a common interface within the client web browser. Further details are given in the video metadata search experiments, proposed in the [OGC](http://www.opengeospatial.org) Ideas GitHub, to return [frames](https://github.com/opengeospatial/ideas/issues/91) and [clips](https://github.com/opengeospatial/ideas/issues/92). 38 | 39 | > NOTE: Whether this use case requires any changes to the user agent or not is unclear without further investigation. If no changes are required, this capability should be demonstrated and the use case listed as a non-goal. 40 | 41 | ## Event delivery 42 | 43 | HTTP Live Streaming (HLS) and MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) are the two main adaptive streaming formats in use on the web today. The media industry is coverging on the use of [MPEG Common Media Application Format (CMAF)](https://www.iso.org/standard/71975.html) as the common media delivery format. HLS, MPEG-DASH, and MPEG CMAF all support delivery of timed metadata, i.e., metadata information that is synchronized to the audio or video media. 44 | 45 | Both HLS and MPEG-DASH use a combination of encoded media files and manifest files that identify the available streams their respective URLs. 46 | 47 | Some user agents (notably Safari and HbbTV) include native support for adaptive streaming playback, rather than through use of [Media Source Extensions](https://www.w3.org/TR/media-source/). In these cases, we need the user agent to expose to web applications any timed metadata cues that are carried either in-band with the media (i.e., delivered within the audio or video media container or multiplexed with the media stream), or out-of-band via the manifest document. 48 | 49 | ## Proposed API 50 | 51 | The proposed API is based on the existing [text track support](https://html.spec.whatwg.org/multipage/media.html#timed-text-tracks) in HTML and WebKit's `DataCue`. This extends the [HTML5 `DataCue` API](https://www.w3.org/TR/2018/WD-html53-20181018/semantics-embedded-content.html#text-tracks-exposing-inband-metadata) with two attributes to support non-text metadata, `type` and `value` that replace the existing `data` attribute. We also add a constructor that allows these fields to be initialized by web applications. 52 | 53 | ```webidl 54 | interface DataCue : TextTrackCue { 55 | constructor(double startTime, unrestricted double endTime, any value, optional DOMString type); 56 | 57 | // Propose to deprecate / remove this attribute. 58 | attribute ArrayBuffer data; 59 | 60 | // Proposed extensions. 61 | attribute any value; 62 | readonly attribute DOMString type; 63 | }; 64 | ``` 65 | 66 | `value`: Contains the message data, which may be in any arbitrary data structure. 67 | 68 | `type`: A string that can be used to identify the structure and content of the cue's `value`. 69 | 70 | ## User agent-generated DataCue instances 71 | 72 | Some user agents may automatically generate `DataCue` timed metadata cues while playing media. For example, WebKit supports several kinds of timed metadata in HLS streams, using the following `type` values: 73 | 74 | | Type | Purpose | 75 | | -------------------------- | ------------------- | 76 | | `com.apple.quicktime.udta` | QuickTime User Data | 77 | | `com.apple.quicktime.mdta` | QuickTime Metadata | 78 | | `com.apple.itunes` | iTunes metadata | 79 | | `org.mp4ra` | MPEG-4 metadata | 80 | | `org.id3` | ID3 metadata | 81 | 82 | Additional information about existing support in WebKit can be found in [the IDL](https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/html/track/DataCue.idl) and in [this layout test](https://trac.webkit.org/browser/webkit/trunk/LayoutTests/http/tests/media/track-in-band-hls-metadata.html), which loads various types of ID3 metadata from an HLS stream. 83 | 84 | This proposal does not seek to standardize UA-generated `DataCue` schemas, but the proposed API is intended to support this usage. 85 | 86 | Other proposals may be developed for this purpose, e.g., for the above or MPEG-DASH timed metadata events. 87 | 88 | ## Code examples 89 | 90 | ### Create an unbounded DataCue with geolocation data 91 | 92 | ```javascript 93 | const video = document.getElementById('video'); 94 | const textTrack = video.addtrack('metadata'); 95 | // Create a cue from 5 secs to end of media 96 | const data = { "moveto": { "lat": 51.504362, "lng": -0.076153 } }; 97 | const cue = new DataCue(5.0, Infinity, data, 'org.webvmt'); 98 | textTrack.addCue(cue); 99 | ``` 100 | 101 | ### Create a DataCue from an in-band DASH 'emsg' box 102 | 103 | ```javascript 104 | // Parse the media segment to extract timed metadata cues 105 | // contained in DASH 'emsg' boxes 106 | function extractEmsgBoxes(mediaSegment) { 107 | // etc. 108 | } 109 | 110 | // video.currentTime has reached the cue start time 111 | // through normal playback progression 112 | const cueEnterHandler = (event) => { 113 | const cue = event.target; 114 | console.log('cueEnter', cue.startTime, cue.endTime); 115 | }; 116 | 117 | // video.currentTime has reached the cue end time 118 | // through normal playback progression 119 | const cueExitHandler = (event) => { 120 | const cue = event.target; 121 | console.log('cueExit', cue.startTime, cue.endTime); 122 | }; 123 | 124 | function createDataCues(events, textTrack) { 125 | events.forEach(event => { 126 | const cue = new DataCue( 127 | event.startTime, 128 | event.endTime, 129 | event.payload, 130 | event.schemeIdUri 131 | ); 132 | 133 | // Attach enter/exit event handlers 134 | cue.onenter = cueEnterhandler; 135 | cue.onexit = cueExitHandler; 136 | 137 | textTrack.addCue(cue); 138 | }); 139 | } 140 | 141 | // Append the segment to the MSE SourceBuffer 142 | function appendSegment(segment) { 143 | // etc. 144 | sourceBuffer.appendBuffer(segment); 145 | } 146 | 147 | const video = document.getElementById('video'); 148 | const textTrack = video.addtrack('metadata'); 149 | 150 | // Fetch a media segment, parse and create DataCue instances, 151 | // and append the segment for playback using Media Source Extensions. 152 | fetch('/media-segments/12345.m4s') 153 | .then(response => response.arrayBuffer()) 154 | .then(buffer => { 155 | const events = extractEmsgBoxes(buffer); 156 | createDataCues(events, textTrack) 157 | 158 | appendSegment(buffer); 159 | }); 160 | ``` 161 | 162 | ### Create a DataCue from a DASH MPD event 163 | 164 | > TODO: Add example code showing how a web application can construct `DataCue` objects with start and end times, event type, and data payload from a DASH MPD event, where the MPD is parsed by the web application 165 | 166 | ## Considered alternatives 167 | 168 | ### WebVTT metadata cues 169 | 170 | Web applications today can use WebVTT metadata cues (the [`VTTCue`](https://www.w3.org/TR/webvtt1/#vttcue) API) to schedule timed metadata events by serializing the data to a string format (JSON, for example) when creating the cue, and deserializing the data when the cue's `onenter` event is fired. Although this works in practice, `DataCue` avoids the need for the serialization/deserialization steps. 171 | 172 | `DataCue` is also sementically consistent with timed metadata use cases, where `VTTCue` is designed for subtitles and captions. `VTTCue` contains a lot of API surface related to caption layout and presentation, which are not relevant to timed metadata cues. 173 | 174 | ## Event synchronization 175 | 176 | The Media Timed Events Task Force of the Media and Entertainment Interest Group has also [identified requirements for synchronization accuracy of event triggering](https://w3c.github.io/me-media-timed-events/#synchronization), which suggest changes to the [time marches on](https://html.spec.whatwg.org/multipage/media.html#time-marches-on) steps in HTML. These will be followed up separately to this `DataCue` proposal. 177 | 178 | ## References 179 | 180 | This explainer is based on content from a [Note](https://w3c.github.io/me-media-timed-events/) written by the W3C Media and Entertainment Interest Group, and from a number of associated discussions, including the [TPAC breakout session on video metadata cues](https://github.com/w3c/strategy/issues/113#issuecomment-432971265). It is also closely related to the DASH-IF [DASH Player's Application Events and Timed Metadata Processing Models and APIs](https://dashif-documents.azurewebsites.net/Events/master/event.html) document. 181 | 182 | ## Acknowledgements 183 | 184 | Thanks to Eric Carlson, François Daoust, Charles Lo, Nigel Megitt, Jon Piesing, Rob Smith, and Mark Vickers for their contribution and input to this document. 185 | -------------------------------------------------------------------------------- /inband-events-using-datacue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WICG/datacue/dac8f4ece6f84aaf61f2dc6257d105319066a324/inband-events-using-datacue.png -------------------------------------------------------------------------------- /inband-events-using-vttcue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WICG/datacue/dac8f4ece6f84aaf61f2dc6257d105319066a324/inband-events-using-vttcue.png -------------------------------------------------------------------------------- /index.bs: -------------------------------------------------------------------------------- 1 |

 2 | Title: DataCue API
 3 | Shortname: datacue
 4 | Level: 1
 5 | Status: CG-DRAFT
 6 | ED: https://wicg.github.io/datacue/
 7 | Group: WICG
 8 | Repository: WICG/datacue
 9 | Editor: Chris Needham, BBC https://www.bbc.co.uk, chris.needham@bbc.co.uk
10 | Abstract: This document describes an API that allows web pages to associate
11 |   arbitrary timed data with audio or video media resources, and for exposing
12 |   timed data from media resources to web pages.
13 | !Participate: Git Repository.
14 | !Participate: File an issue.
15 | !Version History: https://github.com/WICG/datacue/commits
16 |

17 | 18 | # Introduction # {#introduction} 19 | 20 | *This section is non-normative* 21 | 22 | Media resources often contain one or more media-resource-specific tracks 23 | containing data that browsers don't render, but want to expose to script to 24 | allow being dealt with. 25 | 26 | TODO: ... 27 | 28 | # Security and privacy considerations # {#security-and-privacy} 29 | 30 | *This section is non-normative.* 31 | 32 | TODO: ... 33 | 34 | # API # {#api} 35 | 36 | ## The DataCue interface ## {#datacue-interface} 37 | 38 | 39 | [Exposed=Window] 40 | interface DataCue : TextTrackCue { 41 | constructor(double startTime, unrestricted double endTime, 42 | any value, optional DOMString type); 43 | attribute any value; 44 | readonly attribute DOMString type; 45 | }; 46 | 47 | 48 | # In-band event mappings # {#in-band-event-mappings} 49 | 50 | The following sections describe how various in-band message formats are mapped to the {{DataCue}} API. 51 | 52 | ## MPEG-DASH emsg ## {#mpeg-dash-emsg} 53 | 54 | The emsg data structure is defined in section 5.10.3.3 of [[!MPEGDASH]]. Use of emsg within CMAF media is defined in section 7.4.5 of [[!MPEGCMAF]]. 55 | 56 | There are two versions in use, version 0 and 1: 57 | 58 |

59 | aligned(8) class DASHEventMessageBox extends FullBox ('emsg', version, flags = 0) {
60 |   if (version == 0) {
61 |     string scheme_id_uri;
62 |     string value;
63 |     unsigned int(32) timescale_v0;
64 |     unsigned int(32) presentation_time_delta;
65 |     unsigned int(32) event_duration;
66 |     unsigned int(32) id;
67 |   } else if (version == 1) {
68 |     unsigned int(32) timescale_v1;
69 |     unsigned int(64) presentation_time;
70 |     unsigned int(32) event_duration;
71 |     unsigned int(32) id;
72 |     string scheme_id_uri;
73 |     string value;
74 |   }
75 |   unsigned int(8) message_data[];
76 | }
77 |

78 | 79 | # Examples # {#examples} 80 | 81 | ## Application-generated DataCues ## {#app-generated-datacue-example} 82 | 83 | TODO: ... 84 | 85 | ## In-band MPEG-DASH emsg events ## {#in-band-emsg-example} 86 | 87 | TODO: ... 88 | 89 | # Acknowledgements # {#acknowledgements} 90 | 91 | TODO: ... 92 | 93 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | DataCue API 5 | 6 | 7 | 8 | 9 | 10 | 72 | 181 | 210 | 251 | 261 | 313 | 376 | 566 | 567 |

568 |

569 |

DataCue API

570 |

Draft Community Group Report, 8 March 2021

571 |

572 |

This version: 574 |: https://wicg.github.io/datacue/ 575 |
Issue Tracking: 576 |: GitHub 577 |
Editor: 578 |: Chris Needham (BBC) 579 |
Participate: 580 |: Git Repository. 581 |; File an issue. 582 |
Version History: 583 |: https://github.com/WICG/datacue/commits 584 |

585 |

586 |

587 |

Copyright © 2021 the Contributors to the DataCue API Specification, published by the Web Platform Incubator Community Group under the W3C Community Contributor License Agreement (CLA). 588 | A human-readable summary is available.

589 |

590 |

591 |

592 |

Abstract

593 |

This document describes an API that allows web pages to associate 594 | 595 | arbitrary timed data with audio or video media resources, and for exposing 596 | timed data from media resources to web pages.

597 |

598 |

599 |

Status of this document

600 |

601 |

This specification was published by the Web Platform Incubator Community Group. 602 | It is not a W3C Standard nor is it on the W3C Standards Track. 603 | 604 | Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. 605 | 606 | Learn more about W3C Community and Business Groups.

607 |

608 |

609 |

610 |

611 |

612 |

1 Introduction 614 |
2 Security and privacy considerations 615 |
616 | 3 API 617 |
1. 3.1 The DataCue interface 619 |
620 |
621 | 4 In-band event mappings 622 |
1. 4.1 MPEG-DASH emsg 624 |
625 |
626 | 5 Examples 627 |
1. 5.1 Application-generated DataCues 629 |
2. 5.2 In-band MPEG-DASH emsg events 630 |
631 |
6 Acknowledgements 632 |
633 | Conformance 634 |
1. Document conventions 636 |
2. Conformant Algorithms 637 |
638 |
639 | Index 640 |
1. Terms defined by this specification 642 |
2. Terms defined by reference 643 |
644 |
645 | References 646 |
1. Normative References 648 |
649 |
IDL Index 650 |

651 |

652 |

653 |

1. Introduction

654 |

*This section is non-normative*

655 |

Media resources often contain one or more media-resource-specific tracks 656 | containing data that browsers don’t render, but want to expose to script to 657 | allow being dealt with.

658 |

TODO: ...

659 |

2. Security and privacy considerations

660 |

*This section is non-normative.*

661 |

TODO: ...

662 |

3. API

663 |

3.1. The DataCue interface

664 |

[Exposed=Window]
665 | interface DataCue : TextTrackCue {
666 |   constructor(double startTime, unrestricted double endTime,
667 |               any value, optional DOMString type);
668 |   attribute any value;
669 |   readonly attribute DOMString type;
670 | };
671 |

672 |

4. In-band event mappings

673 |

The following sections describe how various in-band message formats are mapped to the DataCue API.

674 |

4.1. MPEG-DASH emsg

675 |

The emsg data structure is defined in section 5.10.3.3 of [MPEGDASH]. Use of emsg within CMAF media is defined in section 7.4.5 of [MPEGCMAF].

676 |

There are two versions in use, version 0 and 1:

677 |

aligned(8) class DASHEventMessageBox extends FullBox ('emsg', version, flags = 0) {
678 |   if (version == 0) {
679 |     string scheme_id_uri;
680 |     string value;
681 |     unsigned int(32) timescale_v0;
682 |     unsigned int(32) presentation_time_delta;
683 |     unsigned int(32) event_duration;
684 |     unsigned int(32) id;
685 |   } else if (version == 1) {
686 |     unsigned int(32) timescale_v1;
687 |     unsigned int(64) presentation_time;
688 |     unsigned int(32) event_duration;
689 |     unsigned int(32) id;
690 |     string scheme_id_uri;
691 |     string value;
692 |   }
693 |   unsigned int(8) message_data[];
694 | }
695 |

696 |

5. Examples

697 |

5.1. Application-generated DataCues

698 |

TODO: ...

699 |

5.2. In-band MPEG-DASH emsg events

700 |

TODO: ...

701 |

6. Acknowledgements

702 |

TODO: ...

703 |

704 |

705 |

Conformance

706 |

Document conventions

707 |

Conformance requirements are expressed 708 | with a combination of descriptive assertions 709 | and RFC 2119 terminology. 710 | The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” 711 | in the normative parts of this document 712 | are to be interpreted as described in RFC 2119. 713 | However, for readability, 714 | these words do not appear in all uppercase letters in this specification.

715 |

All of the text of this specification is normative 716 | except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

717 |

Examples in this specification are introduced with the words “for example” 718 | or are set apart from the normative text 719 | with class="example", 720 | like this:

721 |

722 | 723 |

This is an example of an informative example.

724 |

725 |

Informative notes begin with the word “Note” 726 | and are set apart from the normative text 727 | with class="note", 728 | like this:

729 |

Note, this is an informative note.

730 |

Conformant Algorithms

731 |

Requirements phrased in the imperative as part of algorithms 732 | (such as "strip any leading space characters" 733 | or "return false and abort these steps") 734 | are to be interpreted with the meaning of the key word 735 | ("must", "should", "may", etc) 736 | used in introducing the algorithm.

737 |

Conformance requirements phrased as algorithms or specific steps 738 | can be implemented in any manner, 739 | so long as the end result is equivalent. 740 | In particular, the algorithms defined in this specification 741 | are intended to be easy to understand 742 | and are not intended to be performant. 743 | Implementers are encouraged to optimize.

744 |

745 | 746 |

Index

747 |

Terms defined by this specification

748 |

constructor(startTime, endTime, value), in §3.1 750 |
constructor(startTime, endTime, value, type), in §3.1 751 |
DataCue, in §3.1 752 |
DataCue(startTime, endTime, value), in §3.1 753 |
DataCue(startTime, endTime, value, type), in §3.1 754 |
type, in §3.1 755 |
value, in §3.1 756 |

757 | 763 | 769 | 775 | 781 | 787 | 793 |

Terms defined by reference

794 |

796 | [HTML] defines the following terms: 797 |
- TextTrackCue 799 |
800 |
801 | [WebIDL] defines the following terms: 802 |
- DOMString 804 |
- Exposed 805 |
- any 806 |
- double 807 |
- unrestricted double 808 |
809 |

810 |

References

811 |

Normative References

812 |

[HTML] 814 |: Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/ 815 |
[MPEGCMAF] 816 |: Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media. March 2020. Published. URL: https://www.iso.org/standard/79106.html 817 |
[MPEGDASH] 818 |: Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats. December 2019. Published. URL: https://www.iso.org/standard/79329.html 819 |
[RFC2119] 820 |: S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119 821 |
[WebIDL] 822 |: Boris Zbarsky. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/ 823 |

824 |

IDL Index

825 |

[Exposed=Window]
826 | interface DataCue : TextTrackCue {
827 |   constructor(double startTime, unrestricted double endTime,
828 |               any value, optional DOMString type);
829 |   attribute any value;
830 |   readonly attribute DOMString type;
831 | };
832 | 
833 |

834 | 840 | -------------------------------------------------------------------------------- /requirements.md: -------------------------------------------------------------------------------- 1 | # DataCue API Requirements 2 | 3 | ## Introduction 4 | 5 | There is a need in the media industry for an API to support arbitrary data associated with points in time or periods of time in a continuous media (audio or video) presentation. This data may include: 6 | 7 | * Metadata that describes the content in some way, such as program or chapter titles, geolocation information, often referred to as timed metadata, used to drive an interactive media experience 8 | * Control messages for the media player that are expected to take effect at specific times during media playback, such as ad insertion cues 9 | 10 | This document presents the use cases and technical requirements for an API that supports media-timed metadata and event cues. 11 | 12 | ## Use cases 13 | 14 | ### Dynamic content insertion 15 | 16 | A media content provider wants to allow insertion of content, such as personalised video, local news, or advertisements, into a video media stream that contains the main program content. To achieve this, timed metadata is used to describe the points on the media timeline, known as splice points, where switching playback to inserted content is possible. 17 | 18 | [SCTE 35](https://scte-cms-resource-storage.s3.amazonaws.com/ANSI_SCTE-35-2019a-1582645390859.pdf) defines a data cue format for describing such insertion points. Use of these cues in MPEG-DASH streams is described in [SCTE 214-1](https://scte-cms-resource-storage.s3.amazonaws.com/Standards/ANSI_SCTE%20214-1%202016.pdf), [SCTE 214-2](https://scte-cms-resource-storage.s3.amazonaws.com/Standards/ANSI_SCTE%20214-2%202016.pdf), and [SCTE 214-3](https://scte-cms-resource-storage.s3.amazonaws.com/Standards/ANSI_SCTE%20214-3%202015.pdf). Use in HLS streams is described in SCTE-35 section 12.2. 19 | 20 | ### Media player control messages 21 | 22 | MPEG-DASH defines several control messages for media streaming clients (e.g., libraries such as [dash.js](https://github.com/Dash-Industry-Forum/dash.js/wiki)). Control messages exist for several scenarios, such as: 23 | 24 | * The media player should refresh or update its copy of the manifest document (MPD) 25 | * The media player should make an HTTP request to a given URL for analytics purposes 26 | * The media presentation will end at a time earlier than expected 27 | 28 | These messages may be carried as in-band `emsg` events in the media container files. 29 | 30 | ### Media stream with video and synchronized graphics 31 | 32 | A content provider wants to provide synchronized graphical elements that may be rendered next to or on top of a video. 33 | 34 | For example, in a talk show this could be a banner, shown in the lower third of the video, that displays the name of the guest. In a sports event, the graphics could show the latest lap times or current score, or highlight the location of the current active player. It could even be a full-screen overlay, to blend from one part of the program to another. 35 | 36 | The graphical elements are described in a stream or file containing cues that describe the start and end time of each graphical element, similar to a subtitle stream or file. The web application takes this data as input and renders it on top of the video image according to the cues. 37 | 38 | The purpose of rendering the graphical elements on the client device, rather than rendering them directly into the video image, is to allow the graphics to be optimized for the device's display parameters, such as aspect ratio and orientation. Another use case is adapting to user preferences, for localization or to improve accessibility. 39 | 40 | This use case requires frame accurate synchronization of the content being rendered over the video. 41 | 42 | ## Limitations of existing solutions 43 | 44 | Today, most media player libraries include support for timed metadata. Support varies between players, with some supporting only HLS timed metadata, e.g., [JWPlayer](https://www.jwplayer.com/html5-video-player/), others having support for DASH `emsg` boxes, such as [DASH.js](https://github.com/Dash-Industry-Forum/dash.js) and some that support both, e.g., [Shaka Player](https://github.com/google/shaka-player/). 45 | [Video.js](https://github.com/videojs/video.js) can be used with [mux.js](https://github.com/videojs/mux.js#metadata) to parse in-band timed metadata and captions. 46 | 47 | ### Processing efficiency 48 | 49 | On resource constrained devices such as smart TVs and streaming sticks, parsing media segments in JavaScript to extract timed metadata or event information leads to a significant performance penalty, which can have an impact on UI rendering updates if this is done on the UI thread. There can also be an impact on the battery life of mobile devices. Given that the media segments will be parsed anyway by the user agent, parsing in JavaScript is an expensive overhead that could be avoided. 50 | 51 | ### Low latency streaming 52 | 53 | Avoiding parsing in JavaScript is important for low latency video streaming applications, where it's important to minimize the time taken to pass media content through to the media element's playback buffer. 54 | 55 | If the proposed Media Source Extensions `appendStream` method (see [GitHub issue](https://github.com/w3c/media-source/issues/14)) is used to deliver media content directly from a Fetch API response to the playback buffer, application level parsing of the timed metadata or `emsg` boxes adds unnecessary delay. 56 | 57 | ## Requirements 58 | 59 | ### Subscribing to receive media timed event cues 60 | 61 | The API should allow web applications to subscribe to receive specific types of media timed event cue. For example, to support MPEG-DASH emsg and MPD events, the cue type is identified by a combination of the `scheme_id_uri` and (optional) `value`. The purpose of this is to make receiving cues of each type opt-in from the application's point of view. The user agent should deliver only those cues to a web application for which the application has subscribed. The API should also allow web applications to unsubscribe from specific cue types. 62 | 63 | ### Out-of-band events 64 | 65 | To be able to handle out-of-band media timed event cues, including MPEG-DASH MPD events, the API should allow web applications to create and add timed data cues to the media timeline, to be triggered by the user agent. The API should allow the web application to provide all necessary parameters to define the cue, including start and end times, cue type identifier, and data payload. The payload should be any data type. 66 | 67 | ### Event triggering 68 | 69 | For those events that the application has subscribed to receive, the API should: 70 | 71 | * Generate a DOM event when an in-band media timed event cue is parsed from the media container or media stream (DASH-IF _on-receive_ mode). 72 | * Generate DOM events when the current media playback position reaches the start time and the end time of a media timed event cue during playback (DASH-IF _on-start_ mode). This applies equally to cues generated by the user agent when parsed from the media container and cues added by the web application. 73 | 74 | The API should provide guarantees that no media timed event cues can be missed during linear playback of the media. 75 | 76 | ### MPEG-DASH events 77 | 78 | Implementations should support MPEG-DASH `emsg` in-band events and MPD out-of-band events, as part of their support for the MPEG Common Media Application Format (CMAF). 79 | 80 | ### Cues with unbounded duration 81 | 82 | Implementations should support media timed event cues with unknown end time, where the cue is active from its start time to the end of the media stream. 83 | 84 | ### Updating media timed event cues 85 | 86 | The API should allow media timed event cue information to be updated, such as an event's position on the media timeline, and its data payload. Where the media timed event is updated by the user agent, such as for in-band events, we recommend that the API allows the web application to be notified of any changes. 87 | 88 | ### Synchronization 89 | 90 | In order to achieve synchronization accuracy between media playback and web content rendered by a web application, media timed event cue `enter` and `exit` events should be delivered to the web application within 20 milliseconds of their positions on the media timeline. 91 | 92 | Additionally, to allow such synchronization to happen at frame boundaries, we recommend introducing a mechanism that would allow a web application to accurately predict, using the user's wall clock, when the next frame will be rendered (e.g., as done in the Web Audio API). 93 | -------------------------------------------------------------------------------- /text-track-cue-constructor.md: -------------------------------------------------------------------------------- 1 | ## Explainer 2 | 3 | This page explains the motivation for the [proposal to expose TextTrackCue constructor in the web interface](https://github.com/WICG/datacue/issues/35). 4 | 5 | ### TextTrackCue History 6 | 7 | [VTTCue](https://www.w3.org/TR/webvtt1/#the-vttcue-interface) provides timed text support for video files on the web. This API is extended from [TextTrackCue](https://html.spec.whatwg.org/multipage/media.html#texttrackcue) which is widely supported in modern browsers. 8 | 9 | ![TextTrackCue_Support2502](https://github.com/user-attachments/assets/566d1a6c-50c9-4a1a-af55-6c3c9e0ce08e) 10 | [Screenshot from caniuse.com/textrackcue](https://caniuse.com/texttrackcue) 11 | 12 | [DataCue](https://wicg.github.io/datacue/#datacue-interface) was proposed to provide equivalent support for timed metadata and is also extended from TextTrackCue. DataCue was implemented and matured in Apple's WebKit, though that feature was subsequently dropped in accordance with W3C rules because only a single browser implemented this API. 13 | 14 | ### DataCue Design 15 | 16 | [DataCue](https://wicg.github.io/datacue/#datacue-interface) implements a simple interface with `type` and `value` attributes which represent the cue type and cue content respectively. Any form of timed metadata can be stored in `value` and this is identified using `type` so that relevant cue content can be accessed quickly and easily. 17 | 18 | ### TextTrackCue Proposal 19 | 20 | [TextTrackCue is an abstract base class](https://developer.mozilla.org/en-US/docs/Web/API/TextTrackCue) for all types of cue, which means that it is designed to be extended. Hence it does not directly specify any attributes related to cue content and expects these attributes to be defined by the programmer in the extended cue class. VTTCue and DataCue are both examples of extended cue classes which inherit the properties of TextTrackCue. 21 | 22 | However, a user-defined cue extension is not currently possible in Javascript. The extended cue's constructor is unable to call the TextTrackCue constructor because this is prohibited by the web interface. 23 | 24 | #### Extended Cue Example 25 | ```` 26 | // define extended cue class 27 | class MyExtendedCue extends TextTrackCue { 28 | myCueContent; // cue content 29 | 30 | // extend constructor 31 | constructor(startTime, endTime, cueContent) { 32 | super(startTime, endTime); // inherit properties from TextTrackCue 33 | console.log('Cue start at ' + this.startTime + ', end at ' + this.endTime); 34 | 35 | this.myCueContent = cueContent; // set cue content 36 | } 37 | } 38 | 39 | // create an extended cue 40 | const cue = new MyExtendedCue(0, 1, {hello: 'extended-cue'}); 41 | ```` 42 | Permitting this `super` call in the web interface would enable custom cue extensions to be written in Javascript and make this widely-implemented feature accessible to the web community. 43 | 44 | ### Comparison With DataCue 45 | 46 | Custom cue extensions are functionally equivalent to the DataCue API design: 47 | * The extended cue class name is equivalent to `DataCue.type`. 48 | * The cue content defined by the extended cue class is equivalent to `DataCue.value`. 49 | 50 | In addition, an extended cue can define class functions which are not explicitly included in the DataCue API design. 51 | 52 | The change required to enable this functionality is very simple and the potential benefit to the web community is significant. As a result, browser implementers are more likely to adopt the proposed change. 53 | 54 | ### Summary 55 | 56 | This proposal yields equivalent functionality to DataCue API and addresses the challenge that caused the previous DataCue feature to be dropped. 57 | 58 | ## Demos 59 | 60 | Example code has been created to test and demonstrate how custom cue extensions can be supported in web browsers if [this proposal](https://github.com/WICG/datacue/issues/35) is accepted. 61 | 62 | ### Custom Cues Demo 63 | 64 | In this demo: 65 | 66 | 1. Two user-defined custom cues are extended from TextTrackCue: 67 | * Countdown cue contains a number; 68 | * Colour cue contains an object with `foreground` and `background` attributes. 69 | 1. A mixture of `CountdownCue` and `ColourCue` cues are created. 70 | 1. Event listeners are added to `enter` and `exit` events for each cue. 71 | 1. A TextTrack of `kind='metadata'` is attached to the `