110 |
111 | Use one of the following categories:
112 |
113 |
114 | C
115 |
116 | (correction)
117 |
118 | A
119 | (addition of feature)
120 |
121 | B
122 | (editorial modification)
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 | Reason for change:
142 |
143 |
144 |
145 |
146 | Improve interoperability between cloud and server side streaming entities. In particular,
147 | between ABR live encoders, origin servers and content delivery networks.
148 |
149 |
150 |
151 |
152 |
153 |
154 |
155 | Summary of change:
156 |
157 |
158 |
159 |
160 | This document specifies protocol interfaces for live ingest/egress of media content.
161 | It can be used between live ABR encoders, streaming origins, packagers and content delivery networks.
162 | It features support for redundant workflows with failover support and timed metadata.
163 |
164 |
165 |
166 |
167 |
168 |
169 |
170 | Consequences if not approved:
171 |
172 |
173 |
174 |
175 | Inconsistent implementations, Poor interoperability, less rich metadata and ad insertion support
176 |
177 |
178 |
179 |
180 |
181 |
182 |
183 | Sections affected:
184 |
185 |
186 |
187 |
188 | New Independent Document
189 |
190 |
191 |
192 |
193 |
194 |
195 |
196 | Other comments:
197 |
198 |
199 |
200 |
201 | Feedback during community review is welcome.
202 |
203 |
204 |
205 |
206 |
207 |
208 |
209 |
210 |
211 |
212 |
213 |
214 |
215 |
216 |
217 | Disclaimer:
218 |
219 |
220 |
221 |
222 | This document is not yet final. It is provided for public
223 | review until the deadline mentioned below. If you have
224 | comments on the document, please submit comments by one of
225 | the following means:
226 |
Please add a detailed description of the problem and the
235 | comment.
236 |
237 |
238 | Based on the received comments a final document will be
239 | published latest by the expected publication date below,
240 | integrated in a new version of DASH-IF IOP, if the following
241 | additional criteria are fulfilled:
242 |
243 |
All comments from community review are addressed
244 |
The relevant aspects for the Conformance Software are
245 | provided
246 |
Verified IOP test vectors are provided
247 |
248 |
249 |
250 |
251 |
252 |
253 |
254 |
255 | Commenting Deadline:
256 |
257 |
258 |
259 |
260 | July 31st, 2019
261 |
262 |
263 |
264 |
265 |
266 |
267 |
268 | Expected Publication:
269 |
270 |
271 |
272 |
273 | August 31st, 2019
274 |
275 |
276 |
277 |
278 |
279 |
280 |
281 |
282 |
283 |
--------------------------------------------------------------------------------
/DASH-IF-Ingest.bs.md:
--------------------------------------------------------------------------------
1 | # Specification: Live Media Ingest # {#ingestspec}
2 |
3 | ## Abstract ## {#abstract}
4 |
5 | Two closely related protocol interfaces are defined: CMAF Ingest (Interface-1)
6 | based on fragmented MP4 and DASH/HLS Ingest (Interface-2) based on DASH and HLS.
7 | Both interfaces use the HTTP POST (or PUT) method to transmit media objects from
8 | an ingest source to a receiving entity. Smart implementations can implement
9 | and support both at the same time. These interfaces support carriage of
10 | audiovisual media, timed metadata and timed text. Examples of workflows using
11 | these interfaces are provided. In addition, guidelines for synchronization of
12 | multiple ingest sources, redundancy and failover are presented.
13 |
14 | The current version of the protocol is 1.2.
15 |
16 | ## Copyright Notice and Disclaimer ## {#copyrights}
17 |
18 | Review these documents carefully as they describe your rights and restrictions
19 | with respect to this document. Code Components extracted from this document must
20 | include Simplified BSD License text as described in Section 4.e of the Trust
21 | Legal Provisions and are provided without warranty as described in the
22 | Simplified BSD License.
23 |
24 | This is a document made available by DASH-IF. The technology embodied in this
25 | document may involve the use of intellectual property rights, including patents
26 | and patent applications owned or controlled by any of the authors or developers
27 | of this document. No patent license, either implied or express, is granted to
28 | you by this document. DASH-IF has made no search or investigation for such
29 | rights and DASH-IF disclaims any duty to do so. The rights and obligations which
30 | apply to DASH-IF documents, as such rights and obligations are set forth and
31 | defined in the DASH-IF Bylaws and IPR Policy including, but not limited to,
32 | patent and other intellectual property license rights and obligations. A copy of
33 | the DASH-IF Bylaws and IPR Policy can be obtained at http://dashif.org/.
34 |
35 | The material contained herein is provided on an AS IS basis. The authors and
36 | developers of this material and DASH-IF hereby disclaim all other warranties and
37 | conditions, either express, implied or statutory, including, but not limited to,
38 | any (if any) implied warranties, duties or conditions of merchantability, of
39 | fitness for a particular purpose, of accuracy or completeness of responses, of
40 | workmanlike effort, and of lack of negligence. In addition, this document may
41 | include references to documents and/or technologies controlled by third parties.
42 | Those third party documents and technologies may be subject to third party rules
43 | and licensing terms. No intellectual property license, either implied or
44 | express, to any third party material is granted to you by this document or
45 | DASH-IF. DASH-IF makes no warranty whatsoever for such third party material.
46 |
47 | # Introduction # {#introduction}
48 |
49 | The main goal of this specification is to define the interoperability points
50 | between an [=ingest source=] and a [=receiving entity=] that typically reside in
51 | the cloud or network. This specification does not impose any new constraints or
52 | requirements to clients that consume media streams.
53 |
54 | Live media ingest happens between an [=ingest source=] such as a
55 | [=live encoder=] and a [=receiving entity=]. The [=receiving entity=] could be a
56 | media packager, streaming origin or a content delivery network (CDN) or another
57 | cloud media service. The
58 | combination of ingest sources and receiving entities is common in practical
59 | video streaming deployments, where media processing functionality is distributed
60 | between the ingest sources and receiving entities. Nevertheless, in such
61 | deployments, interoperability can sometimes be challenging.
62 | This challenge comes from the fact that
63 | there are multiple levels of interoperability to be considered and vendors may
64 | have a different view of what is expected/preferred as well as how various
65 | technical specifications apply. First of all, the choice for the data
66 | transmission protocol, and connection establishing and tearing down are
67 | important. Handling premature/unexpected disconnects and recovering from
68 | failovers are also critical.
69 |
70 | A second level of interoperability lies with the media container and coded media
71 | formats. MPEG defined several media container formats such as [[!ISOBMFF]] and
72 | [[!MPEG2TS]], which are widely adopted and well supported. However, these are
73 | general purpose formats, targeting several different application areas. To do
74 | so, they provide many different profiles and options. Interoperability
75 | is often achieved through other application standards such as those for
76 | broadcast, storage or streaming. For interoperable live media ingest, this
77 | document provides guidance on how to use [[!ISOBMFF]] and [[!MPEGCMAF]] for
78 | formatting the media content.
79 |
80 | A third level of interoperability lies in the way metadata is inserted in
81 | streams. Live content often needs such metadata to signal opportunities for ad
82 | insertion, program information or other attributes like timed graphics or
83 | general information relating to the broadcast. Examples of such metadata formats
84 | include [[!SCTE35]] markers, which are often found in broadcast streams and
85 | other metadata such as ID3 tags [[!ID3v2]] containing information relating to
86 | the media presentation. In fact, many more types of metadata relating to the
87 | live event might be ingested and passed on to an over-the-top (OTT) streaming
88 | workflow.
89 |
90 | Fourth, for live media, handling the timeline of the presentation consistently
91 | is important. This includes sampling of the media, avoiding timeline
92 | discontinuities and synchronizing timestamps attached by different ingest
93 | sources such as audio and video. In addition, media timeline discontinuities
94 | must be avoided as much as possible during normal operation. Further, when using
95 | redundant ingest sources, the ingested streams must be synchronized in a sample
96 | accurate manner.
97 |
98 | Fifth, in practice multiple ingest sources and receiving entities are often
99 | used. This requires that multiple ingest sources and receiving entities work
100 | together in a redundant workflow to avoid interruptions when some of the
101 | components fail. Well defined failover behavior is important for
102 | interoperability.
103 |
104 | This document provides a specification for establishing these interoperability
105 | points. The approaches are based on known standardized technologies that have
106 | been tested and deployed in several large-scale streaming deployments.
107 |
108 | To address these interoperability points, two different interfaces and their protocol
109 | specifications have been developed. The first interface (CMAF Ingest) mainly
110 | functions as an ingest format to a packager or active media processor, while the
111 | second interface (DASH/HLS Ingest) works mainly to ingest media presentations to
112 | an origin server, cloud storage or CDN. Smart implementations can implement
113 | both interfaces at once. With CMAF being used increasingly by both DASH and HLS in
114 | practice this would be a preferred implementation option.
115 |
116 | [[#workflows]] provides more background and motivation for the two interfaces.
117 | We further motivate the specification in this document supporting HTTP/1.1
118 | [[!rfc9112]] and [[!ISOBMFF]].
119 |
120 | The document is structured as follows: Section 3 presents the conventions and
121 | terminology used throughout this document. Section 4 presents the use cases and
122 | workflows related to media ingest and the two interfaces. Section 5 lists the
123 | common requirements for both interfaces. Sections 6 and 7 detail Interface-1 and
124 | Interface-2, respectively. Sections 8 provides example workflows and Section 9
125 | shows example implementations.
126 |
127 | # Conventions and Terminology # {#conventions}
128 |
129 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
130 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
131 | interpreted as described in BCP 14, RFC 2119 [[RFC2119]].
132 |
133 | The following terminology is used in the rest of this document:
134 |
135 | **ABR**: Adaptive bitrate.
136 |
137 | **CMAF chunk**: [=CMAF media object=] defined in
138 | [[!MPEGCMAF]] clause 7.3.2.3.
139 |
140 | **CMAF fragment**: [=CMAF media object=] defined in
141 | [[!MPEGCMAF]] clause 7.3.2.4.
142 |
143 | **CMAF header**: Defined in [[!MPEGCMAF]] clause 7.3.2.1.
144 |
145 | **CMAF Ingest**: Ingest interface defined in this
146 | specification for push-based [[!MPEGCMAF]].
147 |
148 | **CMAF media object**: Defined in [[!MPEGCMAF]]: a CMAF chunk,
149 | segment, fragment or track.
150 |
151 | **CMAF presentation**: Logical grouping of CMAF tracks
152 | corresponding to a media presentation as defined in [[!MPEGCMAF]] clause 6.
153 |
154 | **CMAFstream**: Byte-stream that follows the CMAF track format
155 | structure format defined in [[!MPEGCMAF]] between the ingest source and
156 | receiving entity. Due to error control behavior such as retransmission of
157 | CMAF fragments and headers, a CMAFstream may not fully conform to a CMAF
158 | track file. The receiving entity can filter out retransmitted fragments and
159 | headers and restore a valid CMAF track from the CMAFstream.
160 |
161 | **CMAF track**: [=CMAF media object=] defined in
162 | [[!MPEGCMAF]] clause 7.3.2.2.
163 |
164 | **connection**: A connection setup between two hosts,
165 | typically the media [=ingest source=] and [=receiving entity=].
166 |
167 | **DASH Ingest**: Ingest interface defined in this
168 | specification for push-based DASH.
169 |
170 | **HLS Ingest**: Ingest interface defined in this specification
171 | for push-based HLS.
172 |
173 | **HTTP POST**: HTTP command for sending data from a source to
174 | a destination.
175 |
176 | **HTTP PUT**: HTTP command for sending data from a source to
177 | a destination.
178 |
179 | **ingest source**: A media source ingesting live media content
180 | to a receiving entity. It is typically a [=live encoder=] but not restricted
181 | to this, e.g., it could be a stored media resource.
182 |
183 | **ingest stream**: The stream of media pushed from the ingest
184 | source to the receiving entity.
185 |
186 | **live stream session**: The entire live stream for the ingest
187 | relating to a broadcast event.
188 |
189 | **live encoder**: Entity performing live encoding of a high
190 | quality ingest stream. This can serve as an [=ingest source=].
191 |
192 | **manifest objects**: Objects ingested that represent
193 | streaming manifest, e.g., .mpd in DASH and .m3u8 in HLS.
194 |
195 | **media objects**: Objects ingested that represent the media,
196 | timed text or other non-manifest objects. Typically, these are CMAF
197 | addressable media objects such as CMAF chunks, segments or tracks.
198 |
199 | **media fragment**: Media fragment, combination of
200 | MovieFragmentBox ("moof") and MediaDataBox ("mdat") in ISOBMFF structure.
201 | This could be a CMAF fragment or chunk. A media fragment may include
202 | top-level boxes defined in CMAF fragments such as "emsg", "prft" and "styp".
203 | Used for backward compatibility with fragmented MP4.
204 |
205 | **objects**: [=Manifest objects=] or [=media objects=].
206 |
207 | **OTT**: Over-the-top.
208 |
209 | **POST_URL**: Target URL of a POST command in the HTTP
210 | protocol for posting data from a source to a destination (e.g., /ingest1).
211 | The POST_URL is known by both the ingest source and receiving entity. The
212 | POST_URL is setup by the receiving entity. The ingest source may add extended
213 | paths to signal track names, fragment names or segment names.
214 |
215 | **publishing_point_URL**: Entry point used to receive an
216 | [=ingest stream=] (e.g., https://example.com/ingest1).
217 |
218 | **receiving entity**: Entity used to receive the media
219 | content, receives/consumes an [=ingest stream=].
220 |
221 | **RTP**: Real-time Transport Protocol as specified in
222 | [[!RFC3550]].
223 |
224 | **streaming presentation**: Set of [=objects=] composing a
225 | streaming presentation based on a streaming protocol such as DASH.
226 |
227 | **switching set**: Group of tracks corresponding to a
228 | switching set defined in [[!MPEGCMAF]] or an adaptation set defined in
229 | [[!MPEGDASH]].
230 |
231 | **switching set ID**: Identifier generated by a live ingest
232 | source to group CMAF tracks in a switching set. The switching set ID is
233 | unique for each switching set in a live stream session.
234 |
235 | **TCP**: Transmission Control Protocol (TCP) as specified in
236 | [[!RFC793]].
237 |
238 | **baseMediaDecodeTime**: Decode time of the first sample in a movie fragment as
239 | signaled in the "[=tfdt=]" box.
240 |
241 | **elng**: The ExtendedLanguageTag box ("elng") as defined in
242 | [[!ISOBMFF]] overrides the language information.
243 |
244 | **ftyp**: The FileTypeBox ("ftyp") as defined in
245 | [[!ISOBMFF]].
246 |
247 | **mdat**: The MediaDataBox ("mdat") defined in [[!ISOBMFF]].
248 |
249 | **mdhd**: The MediaHeaderBox ("mdhd") as defined in
250 | [[!ISOBMFF]] contains information about the media such
251 | as timescale, duration, language using ISO 639-2/T [[!iso-639-2]] codes.
252 |
253 | **mfra (deprecated)**: The MovieFragmentRandomAccessBox
254 | ("mfra") defined in [[!ISOBMFF]] signals random access samples
255 | (these are samples that require no prior or other samples for decoding).
256 |
257 | **moof**: The MovieFragmentBox ("moof") as defined in
258 | [[!ISOBMFF]].
259 |
260 | **nmhd**: The NullMediaHeaderBox ("nmhd") as defined in
261 | [[!ISOBMFF]] signals a track for which no specific media header is defined.
262 | This is used for metadata tracks.
263 |
264 | **prft**: The ProducerReferenceTime ("prft") as defined in
265 | [[!ISOBMFF]] supplies times corresponding to the production of associated
266 | movie fragments.
267 |
268 | **tfdt**: The TrackFragmentBaseMediaDecodeTimeBox ("tfdt")
269 | defined in [[!ISOBMFF]] signals the decode time of the first sample in the
270 | movie fragment.
271 |
272 | # Media Ingest Workflows and Interfaces (Informative) # {#workflows}
273 |
274 | Two workflows have been identified mapping to two protocol interfaces. The first
275 | workflow uses a [=live encoder=] as the [=ingest source=] and a separate
276 | packager as the [=receiving entity=]. In this case, Interface-1
277 | ([=CMAF Ingest=]) is used to ingest a live encoded stream to the packager, which
278 | can perform packaging, encryption or other active media processing. Interface-1
279 | is defined in a way that it will be possible to generate DASH or HLS
280 | presentations based on information in the ingested stream. Figure 1 shows an
281 | example for Interface-1. In many cases a common implementation is possible.
282 |
283 | Figure 1: Example with [=CMAF Ingest=].
284 |
285 |
286 | The second workflow constitutes ingest to a passive delivery system such as a
287 | cloud storage or a CDN. In this case, Interface-2 ([=DASH Ingest=] or
288 | [=HLS Ingest=]) is used to ingest a stream already formatted to be ready for
289 | delivery to an end client. Figure 2 shows an example for Interface-2.
290 |
291 | Figure 2: Example with [=DASH Ingest=].
292 |
293 |
294 | A legacy example of a media ingest protocol for the first workflow is the ingest
295 | part of the Microsoft Smooth Streaming protocol [[=MS-SSTR=]]. Interface-1 ([=CMAF Ingest=],
296 | detailed in [[#interface-1]]) improves the Smooth Streaming's ingest protocol
297 | including lessons learned over the last ten years after the initial deployment of
298 | Smooth Streaming in 2009 and several advances on signaling metadata and timed text.
299 | In addition, it includes support for next-generation media codecs such as [[!MPEGHEVC]]
300 | and protocols like DASH [[!MPEGDASH]] by adding explicit support for MPEG-DASH Media presentation description.
301 |
302 | Interface-2 (DASH/HLS Ingest) is included for ingest of media streaming
303 | presentations to a passive receiving entity that provides a pass-through
304 | functionality. In this case, [=manifest objects=] and other client-specific
305 | information also need to be ingested and updated, and segments may be deleted.
306 |
307 | Combining the two interfaces can be considered in many cases.
308 | An example of this is given at the end of the document in [[#examples]].
309 |
310 | Table 1 highlights some of the key differences and practical considerations of
311 | the interfaces. In Interface-1, the ingest source can be simple since the
312 | [=receiving entity=] can do many of the operations related to the delivery such
313 | as encryption or generating the streaming manifests. In addition, the
314 | distribution of functionalities can make it easier to scale a deployment with
315 | concurrent (redundant) live media sources and receiving entities. Besides these
316 | factors, choosing a workflow for a video streaming platform depends on many
317 | other factors.
318 |
319 | Table 1: Different ingest use cases.
320 |
321 |
Global overview, targets duplicate presentations, limited flexibility, no redundancy
335 |
Manifest manipulation, transmission, storage
336 |
337 |
338 |
339 | Figure 3: Workflow with redundant ingest sources and receiving entities.
340 |
341 |
342 | Finally, Figure 3 highlights another aspect that was taken into consideration
343 | for large-scale systems with many users. Often content owners would like to run
344 | multiple ingest sources, multiple receiving entities and make them available to
345 | the clients in a seamless fashion. This approach is already common when serving
346 | web pages, and this architecture also applies to media streaming over HTTP. In
347 | Figure 3, it is highlighted how one or more ingest sources can be sending data
348 | to one or more receiving entities. In such a workflow, it is important to handle
349 | the case when one ingest source or receiving entity fails and synchronization.
350 | Both the system and client behavior are an important consideration in systems
351 | that need to run 24/7. Failovers must be handled robustly and without causing
352 | service interruption. This specification details how this failover and
353 | redundancy support can be achieved and provides recommendations for dual
354 | encoder synchronisation.
355 |
356 | # Common Requirements for Interface-1 and Interface-2 # {#interface-1-2}
357 |
358 | The media ingest follows the following common requirements for both interfaces.
359 |
360 | ## Ingest Source Identification ## {#interface-1-2-user-agent}
361 |
362 | - The [=ingest source=] SHOULD include a User-Agent header (which provides
363 | information about brand name, version number and build number in a readable
364 | format) in all allowed HTTP messages. The receiving entity can log the
365 | received information along with other relevant HTTP header data to
366 | facilitate troubleshooting. The version number of the current version is
367 | DASH-IF-Ingest 1.1, thus header name may be DASH-IF-Ingest and value may be 1.1
368 |
369 | ## General Requirements ## {#interface-1-2-general}
370 |
371 | 1. The [=ingest source=] SHALL communicate using the [=HTTP POST=] or [=HTTP PUT=] as
372 | defined in the HTTP protocol, version 1.1 [[!rfc9112]].
373 |
374 | NOTE: This specification does not imply any functional differentiation
375 | between a POST and PUT command. Either may be used to transfer content to
376 | the [=receiving entity=]. Unless indicated otherwise, the use of the term
377 | POST can be interpreted as POST or PUT.
378 |
379 | 2. The [=ingest source=] SHOULD use HTTP over TLS, if TLS is used it SHALL
380 | support at least TLS version 1.2, a higher version may also be supported
381 | additionally [[!rfc9110]].
382 | 3. The [=ingest source=] SHOULD use a domain name system for resolving
383 | hostnames to IP addresses such as DNS [[!RFC1035]] or any other system
384 | that is in place. If this is not the case, the domain name<->IP address
385 | mapping(s) MUST be known and static.
386 | 4. In the case of 3, [=ingest source=] MUST update the IP to hostname
387 | resolution respecting the TTL (time-to-live) from DNS query responses.
388 | This enables better resilience to IP address changes in large-scale
389 | deployments where the IP address of the media processing entities may
390 | change frequently.
391 | 5. In case HTTP over TLS [[!rfc9110]] is used, at least one of the basic
392 | authentication HTTP AUTH [[!RFC7617]], TLS client certificates or HTTP
393 | Digest authentication [[!RFC7616]] MUST be supported.
394 | 6. Mutual authentication SHALL be supported. TLS client certificates SHALL
395 | chain to a trusted CA or be self-signed. Self-signed certificates MAY be
396 | used, for example, when the ingest source and receiving entity fall under
397 | the same administration.
398 | 7. As compatibility profile for the TLS encryption, the [=ingest source=]
399 | SHOULD support the Mozilla's intermediate compatibility profile
400 | [[=Mozilla-TLS=]].
401 | 8. In case of an authentication error confirmed by an HTTP 403 response, the
402 | ingest source SHALL retry to establish the [=connection=] within a fixed
403 | time period with updated authentication credentials. When that also
404 | results in error, the [=ingest source=] can retry N times, after which the
405 | [=ingest source=] SHOULD stop and log an error. The number of retries N
406 | can be configurable in the [=ingest source=].
407 | 9. The [=ingest source=] SHOULD terminate the [=HTTP POST=] or [=HTTP PUT=] request if data
408 | is not being sent at a rate commensurate with the MP4 fragment duration.
409 | An [=HTTP POST=] or [=HTTP PUT=] command that does not send data can prevent the
410 | [=receiving entity=] from quickly disconnecting from the
411 | [=ingest source=] in the event of a service update.
412 | 10. The HTTP request for sparse data SHOULD be short-lived, terminating as soon
413 | as the data of a fragment is sent.
414 | 11. The HTTP request uses the [=publishing_point_URL=] at the
415 | [=receiving entity=] and SHOULD use an additional relative path when
416 | posting different streams and fragments, for example, to signal the
417 | stream or fragment name.
418 | 12. Both the [=ingest source=] and [=receiving entity=] MUST support IPv4 and
419 | IPv6 transport.
420 | 13. The [=ingest source=] and [=receiving entity=] SHOULD support gzip based
421 | content encoding.
422 | 14. The response from the [=receiving entity=] may, in addition to response code,
423 | return information in the response body, such as relating to the transfer time,
424 | size etc. of the last HTTP request, especially in case this request was in HTTP chunked
425 | transfer mode. But no specific response format is defined at this time, but this
426 | may be considered in future revisions.
427 | NOTE: More specific response body formatting may be defined in future revisions,
428 | input from implementors is welcome.
429 | 15. The ingest source MUST support the configuration and use of Fully
430 | Qualified Domain Names (per RFC 8499) to identify the receiving entity.
431 | 16. The ingest source MUST support the configuration of the path, which it
432 | will POST all the objects to.
433 | 17. The ingest source SHOULD support the configuration of the delivery path
434 | that the receiving entity will use to retrieve the content. When provided,
435 | the ingest source MUST use this path to build absolute URLs in the
436 | manifest files it generates. When absent, use of relative paths is assumed
437 | and the ingest source MUST build the manifest files accordingly.
438 | 18. The ingest source MUST transfer [=media objects=] and
439 | [=manifest objects=] to the receiving entity via individual HTTP/1.1 POST
440 | commands to the configured path.
441 | 19. To avoid delay associated with the TCP handshake, the ingest source SHOULD
442 | use persistent TCP connections.
443 | 20. To avoid head of line blocking, the ingest source SHOULD use multiple
444 | parallel TCP connections to transfer the streaming presentation that it is
445 | generating. For example, the ingest source SHOULD POST each representation
446 | (e.g., CMAF track) in a media presentation over a different TCP
447 | connection.
448 | 21. The ingest source SHOULD use the chunked transfer encoding option for the
449 | HTTP requests when the content length of the request is unknown at the
450 | start of transmission or to support the low-latency use cases.
451 |
452 | ## Failure Behaviors ## {#interface-1-2-failure}
453 |
454 | 1. The [=ingest source=] SHOULD use a timeout in the order of a segment
455 | duration (e.g., 1-6 seconds) for establishing the TCP connection. If an
456 | attempt to establish the connection takes longer than the timeout, the
457 | ingest source aborts the operation and tries again.
458 | 2. The [=ingest source=] SHOULD resend the [=objects=] for which a connection
459 | was terminated early or when an HTTP 400 or 403 error response was
460 | received if the connection was down for less than three average segments
461 | durations. For connections that were down longer, the [=ingest source=]
462 | can resume sending [=objects=] at the live edge of the media presentation.
463 | 3. After a TCP error, the [=ingest source=] performs the following:
464 |
465 | 3a. The current connection MUST be closed and a new connection MUST be
466 | created for a new [=HTTP POST=] or [=HTTP PUT=] request.
467 |
468 | 3b. The new HTTP [=POST_URL=] MUST be the same as the initial
469 | [=POST_URL=] for the object to be ingested.
470 |
471 | 4. In case the [=receiving entity=] cannot process the HTTP request due to
472 | authentication or permission problems, or incorrect path, it SHALL return
473 | an HTTP 403 Forbidden error.
474 | 5. The following error conditions apply to the receiving entity:
475 |
476 | 5a. If the [=publishing_point_URL=] receiving the HTTP request is not
477 | available, it SHOULD return an HTTP 404 Not Found error to the
478 | [=ingest source=].
479 |
480 | 5b. If the receiving entity can process a fragment in the HTTP request
481 | body but finds the media type is not supported, it may return an HTTP 415
482 | Unsupported Media Type error.
483 |
484 | 5c. If the receiving entity cannot process a fragment in the POST request
485 | body due to missing or incorrect initialization fragment, it may return an
486 | HTTP 412 Precondition Failed error.
487 |
488 | 5d. If there is an error at the receiving entity not particularly relating
489 | to the request from the [=ingest source=], it may return an
490 | appropriate HTTP 5xx error.
491 |
492 | 5e. In all other scenarios, the receiving entity MUST return an HTTP 400
493 | Bad Request error.
494 |
495 | 6. The [=ingest source=] SHOULD support the handling of HTTP 30x redirect
496 | responses from the receiving entity.
497 |
498 | ## Identifier ## {#interface-1-2-identifier}
499 |
500 | The interfaces described in this document (clauses [[#interface-1]] and [[#interface-2]]) are identified with the following identifier:
501 |
502 |
503 |
504 |
Identifer
505 |
Reference
506 |
Sections
507 |
Comments
508 |
509 |
510 |
http://dashif.org/ingest/v1.2
511 |
http://dashif.org/ingest/v1.2
512 |
Clause [[#interface-1]] and [[#interface-2]]
513 |
Conforming to the requirements of this document
514 |
515 |
516 |
517 | The above identifier may be used by an entity to signal the support of interfaces defined in clause [[#interface-1]] and [[#interface-2]].
518 |
519 |
520 | # Interface-1: CMAF Ingest # {#interface-1}
521 |
522 | This section describes the protocol behavior specific to Interface-1. Operation
523 | of this interface MUST also adhere to the common requirements given in [[#interface-1-2]].
524 |
525 | ## General Considerations (Informative) ## {#interface-1-general}
526 |
527 | The media format is conforming to the track constraints specified
528 | in [[!MPEGCMAF]] clause 7. Note that no CMAF media profile is
529 | needed by this specification unless stated otherwise; only the structural format
530 | based on [[!MPEGCMAF]] clause 7 is used. Supporting CMAF media profiles is optional.
531 |
532 | [=CMAF Ingest=] can also be used for simple transport of
533 | media to an archive, as the combination of CMAF header and CMAF fragments will
534 | result in a valid archived CMAF track file when an ingest is stored on disk by
535 | the receiving entity.
536 |
537 | [=CMAF Ingest=] improves over Smooth Streaming's ingest
538 | protocol [[=MS-SSTR=]] by only using standardized media container formats and
539 | boxes based on [[!ISOBMFF]] and [[!MPEGCMAF]] instead of specific UUID boxes.
540 |
541 | Many new technologies like MPEG HEVC, AV1, HDR have CMAF bindings. Using CMAF
542 | will make it easier to adopt such technologies.
543 |
544 | Some discussions on the early development of the specification have been documented in [[=fmp4git=]].
545 |
546 | Figure 4: CMAF Ingest with multiple ingest sources.
547 |
548 |
549 | Figures 5-7 detail some of the concepts and structures defined in
550 | [[!MPEGCMAF]]. Figure 5 shows the data format structure of the [=CMAF track=].
551 | In this format, media samples and media indexes are interleaved. The
552 | MovieFragmentBox "[=moof=]" box as specified in [[!ISOBMFF]] is used to signal
553 | the information to playback and decode properties of the samples stored in the
554 | "[=mdat=]" box. The CMAF header contains the track specific information and is
555 | referred to as a [=CMAF header=] in [[!MPEGCMAF]]. The combination of
556 | "[=moof=]" and "[=mdat=]" can be referred as a [=CMAF fragment=] or
557 | [=CMAF chunk=] depending on the structure content and the number of moof-mdat
558 | pairs in the addressable object.
559 |
560 | Figure 5: CMAF track stream.
561 |
562 |
563 | Figure 6 illustrates the presentation timing model, defined in [[!MPEGCMAF]]
564 | clause 6.6. Different bit-rate tracks and/or media streams are conveyed in
565 | separate CMAF tracks. By having fragment boundaries time aligned for tracks and
566 | applying constraints on tracks, seamless switching can be achieved. By using a
567 | common timeline different streams can be synchronized at the receiver, while
568 | they are in a separate [=CMAF track=], sent over a separate connection, possibly
569 | from a different [=ingest source=].
570 |
571 | For more information on the synchronization model, we refer the readers to
572 | Section 6 of [[!MPEGCMAF]]. For synchronization of tracks coming from different
573 | encoders, sample-time accuracy is required, i.e., the samples with identical
574 | timestamp contain identical content.
575 |
576 | In Figure 7, another advantage of this synchronization model is illustrated,
577 | which is the concept of late binding. In the case of late binding, streams are
578 | combined on playout/streaming in a presentation (see Section 7.3.6 of
579 | [[!MPEGCMAF]]).
580 |
581 | NOTE: As defined in [[!MPEGCMAF]], different CMAF tracks have the same starting
582 | time sharing an implicit timeline. A stream becoming available from a different
583 | source needs to be synchronized and time-aligned with other streams.
584 |
585 | Figure 6: CMAF track synchronization.
586 |
587 |
588 | Figure 7: CMAF late binding.
589 |
590 |
591 | Figure 8 shows the flow diagram of the protocol. It starts with a DNS resolution
592 | (if needed) and an authentication step (using two-factor authentication, TLS
593 | certificates or HTTP Digest Authentication) to establish a secure [=TCP=]
594 | connection.
595 |
596 | In private datacenter deployments where nodes are not reachable from outside, a
597 | non-authenticated connection may also be used. The ingest source then issues an
598 | [=HTTP POST=] or [=HTTP PUT=] request to test that the [=receiving entity=] is listening.
599 | This request include the [=CMAF header=] or could be empty. In case the test is successful,
600 | it is followed by the CMAF header and fragments composing the [=CMAFstream=]. At
601 | the end of the session, the source may send an empty [=mfra (deprecated)=] box
602 | or a segment with the *lmsg* brand. Then, the
603 | [=ingest source=] can follow up by closing the TCP connection using a TCP FIN
604 | packet.
605 |
606 | NOTE: If the HTTP POST is using the chunked transfer encoding option, the
607 | [=ingest source=] sends a zero-length terminating chunk per [[!rfc9112]] after
608 | sending the *lmsg* brand letting the [=receiving entity=] know that the POST
609 | command has been concluded.
610 |
611 | Figure 8: CMAF Ingest flow.
612 |
613 |
614 | ## General Protocol, Manifest and Track Format Requirements ## {#interface-1-requirements}
615 |
616 | The ingest source transmits media content to the receiving entity using HTTP
617 | POST or PUT. The receiving entity listens for content at the [=publishing_point_URL=]
618 | that is known by both the ingest source and receiving entity. The [=POST_URL=]
619 | may contain an extended path to identify the stream name, switching set or
620 | fragment may be added by the ingest source. It is assumed that the ingest source
621 | can retrieve these paths and use them.
622 |
623 | In Interface-1, the container format is based on CMAF, conforming to the track
624 | constraints specified in [[!MPEGCMAF]] clause 7. Unless stated otherwise, no
625 | conformance to a specific CMAF media profile is REQUIRED.
626 |
627 | 1. The ingest source SHALL start by an [=HTTP POST=] or [=HTTP PUT=] request with the CMAF
628 | header, or an empty request, to the POST_URL. This can help the ingest
629 | source quickly detect whether the [=publishing_point_URL=] is valid, and
630 | if there are any authentication or other conditions required.
631 | 2. The ingest source MUST initiate a media ingest connection by posting at
632 | least one CMAF header after step 1 for each track. Before doing so,
633 | it SHOULD post a DASH manifest with a file name extension .mpd
634 | to the [=publishing_point_URL=] without an additional relative path
635 | but the manifest filename and in addition following clause 16 of this section.
636 | If not the case, the grouping of the CMAF tracks
637 | is trivial and the Streams() keyword is used to identify CMAF tracks.
638 | 3. The ingest source SHALL transmit one or more CMAF segments composing the
639 | track to the receiving entity once they become available. In this case, a
640 | single HTTP POST or PUT request message body MUST contain one CMAF segment.
641 | 4. The ingest source MAY use the chunked transfer encoding option of the HTTP
642 | POST command [[!rfc9112]] when the content length is unknown at the start
643 | of transmission or to support use cases that require low latency.
644 | 5. If the HTTP request terminates or times out with a TCP error, the
645 | ingest source MUST establish a new connection and follow the preceding
646 | requirements. Additionally, the ingest source MAY resend the segment in
647 | which the timeout or TCP error occurred.
648 | 6. The ingest source MUST handle any error responses received from the
649 | receiving entity, as described in general requirements, and by
650 | retransmitting the [=CMAF header=].
651 | 7. *(deprecated)* In case the [=live stream session=] is over the ingest
652 | source MAY signal the stop by transmitting an empty [=mfra (deprecated)=]
653 | box towards the receiving entity. After that it SHALL send an empty HTTP
654 | chunk and wait for the HTTP response before closing TCP connection.
655 | 8. The ingest source SHOULD use a separate parallel TCP connection for ingest
656 | of each different CMAF track.
657 | 9. The ingest source MAY use a separate relative path in the [=POST_URL=] for
658 | ingesting each different track or track segment by appending it to the
659 | [=POST_URL=]. This makes it easy to detect redundant streams from
660 | different ingest sources. Specific naming convention of the segments and
661 | paths can be derived from the MPEG-DASH manifest, SegmentTemplate@media and
662 | @initialization. If not, the Streams(stream_name) keyword (deprecated)
663 | shall be used to signal the name of the cmaf track representation.
664 | 10. The [=baseMediaDecodeTime=] timestamps in "tfdt" of fragments in the
665 | [=CMAFstream=] SHOULD arrive in increasing order for each of the
666 | fragments in the different tracks/streams that are ingested.
667 | 11. The fragment sequence numbers in the [=CMAFstream=] signaled in the
668 | "mfhd" box SHOULD arrive in increasing order for each of the different
669 | tracks/streams that are ingested. Using both [=baseMediaDecodeTime=] and
670 | sequence number based indexing helps the receiving entities identify
671 | discontinuities. In this case sequence numbers SHOULD increase by one.
672 | 12. The average and maximum bitrate of each track SHOULD be signaled in the
673 | "btrt" box in the sample entry of the CMAF header. These can be used to
674 | signal the bitrate later on, such as in the manifest.
675 | 13. In case a track is part of a [=switching set=], all properties in
676 | Sections 6.4 and 7.3.4 of [[!MPEGCMAF]] MUST be satisfied, enabling the
677 | receiver to group the tracks in the respective switching sets.
678 | 14. Ingested tracks MUST conform to CMAF track structure defined in
679 | [[!MPEGCMAF]]. Additional constraints on the CMAF track structure are
680 | defined in later sections for specific media types.
681 | 15. CMAF tracks MAY use SegmentTypeBox to signal brands like chunk, fragment
682 | or segment. Such signaling may also be inserted in a later stage by the
683 | receiving entity.
684 | 16. The MPEG-DASH manifest shall use SegmentTemplate in each AdaptationSet
685 | (or in each contained Representation).
686 | - a. The SegmentTemplate@initiatization in the MPEG-DASH manifest
687 | shall contain the single substring $RepresentationID$ and the
688 | SegmentTempate@media shall contain the single substring $RepresentationID$ and
689 | the substring $Number$ or $Time$ (not both). For best interoperability, a separator
690 | character should be between representation substrings that is not an integer,
691 | this is especially important in case the $RepresentationID$ substitution
692 | ends with a number character.
693 | - b. SegmentTemplate@media shall be identical for each
694 | SegmentTemplate Element in the MPEG-DASH manifest.
695 | - c. SegmentTemplate@initialization shall be identical for each
696 | SegmentTemplate Element in the MPEG-DASH manifest.
697 | - d. The BaseURL element shall be absent.
698 | - e. The AvailabilityStartTime SHOULD be set to 1970-01-01T00:00:00Z (Unix epoch)
699 | and the period @start to PT0S (if this is not the case it may be more difficult to
700 | synchronize more than one ingest source).
701 | - f. Each Representation in the MPEG-DASH manifest represents a CMAF track,
702 | each AdaptationSet in the MPD represents a CMAF SwitchingSet.
703 | - g. In case an ingest source issues an HTTP Request with an updated MPEG-DASH
704 | manifest, identical naming conventions apply. A receiver may ignore such updated MPD
705 | send by an ingest source.
706 | - h. The MPEG-DASH manifest shall contain a single Period Element.
707 | 17. The Ingest source may send an HTTP Live Streaming manifest, but its structure
708 | and naming shall be derived from or matching the MPEG-DASH manifest
709 | described in clause 16 above. In particular:
710 | - a. In a master playlist, the groupings identified represent CMAF Switching sets
711 | For media playlists named X.m3u8, X shall match the name of the corresponding Representation@id.
712 | - b. The segment URI announced in media playlists shall follow a structure that can be derived using
713 | the SegmentTemplate@media from the MPEG-DASH manifest.
714 | - c. The EXT-X-MAP URI attribute in media playlists shall follow a naming structure
715 | that can be derived using a SegmentTemplate@initialization from the MPEG-DASH manifest.
716 | - d. A receiver may ignore EXT-X-DATE-RANGE tags in the manifest,
717 | timed metadata shall be caried as described in the section on timed metadata
718 | [[#interface-1-timed-metadata]].
719 | - e. A receiver may ignore updated HTTP Live Streaming manifests.
720 |
721 | 18. In case the ingest source loses its own input or input is absent, it
722 | SHALL insert filler or replacement content, and output these as valid
723 | CMAF segments. Examples may be black frames, silent audio, or empty timed
724 | text segments. Such segments SHOULD be labelled by using a SegmentTypeBox
725 | ("styp") with the *slat* brand. This allows a receiver to still replace
726 | those segments with valid content segments at a later time.
727 | 19. The last segment in a CMAF track, SHOULD be labelled with a
728 | SegmentTypeBox ("styp") with the *lmsg* brand. This way, the receiver
729 | knows that no more media segments are expected for this track. In case
730 | the track is restarted, a request with a [=CMAF header=] with (identical
731 | properties) must be issued to the same [=POST_URL=].
732 | 20. CMAF segments may include one or more DASHEventMessageBox'es ("emsg")
733 | containing timed metadata.
734 |
735 | NOTE: According to [[!MPEGDASH]], all DASHEventMessageBox'es ("emsg")
736 | must have a presentation_time later as compared to the segment's earliest
737 | presentation time. This can make re-signaling of continuation events
738 | (events that are still active) troublesome (this is fixed in MPEG-DASH 5th edition).
739 |
740 | NOTE: Including DASHEventMessageBox'es ("emsg") boxes in media segments
741 | may result in a loss of performance for just-in-time (re-)packaging. In this
742 | case, timed metadata [[#interface-1-timed-metadata]] should be
743 | considered.
744 |
745 | 20. CMAF media (audio and video) tracks SHALL include the
746 | ProducerReferenceTimeBox'es ("[=prft=]") in the ingest. In these media
747 | tracks, all segments SHALL include a "[=prft=]" box. The "[=prft=]" box
748 | permits the end client to compute the end-to-end latency or the encoding
749 | plus distribution latency.
750 |
751 | 21. In case the input to the ingest source is MPEG-2 TS based, the ingest
752 | source is responsible for converting the presentation timestamps and
753 | program clock reference (PCR) to a timeline suitable for [[!MPEGDASH]]
754 | and [[!ISOBMFF]] with the correct anchor and timescales. The RECOMMENDED
755 | timescales and anchors are provided in next sections for each track type.
756 | For dual-encoder synchronization, it is also RECOMMENDED to use the Unix
757 | epoch or another similar well known time anchor (e.g.
758 | 2:14 a.m., EDT, on August 29, 1997, the time sky-net became self-aware
759 | is sometimes used).
760 |
761 | 22. In case a receiving entity cannot process a request from an ingest source
762 | correctly, it can send an HTTP error code. See [[#interface-1-failover]] or
763 | [[#interface-1-2]] for details.
764 |
765 | ## Requirements for Formatting Media Tracks ## {#interface-1-media-tracks}
766 |
767 | [[!MPEGCMAF]] has the notion of [=CMAF track=], which are composed of
768 | [=CMAF fragment=] and [=CMAF chunk=]s. A fragment can be composed of one or more
769 | chunks. The [=media fragment=] defined in ISOBMFF predates the definition in
770 | CMAF. It is assumed that the ingest source uses [=HTTP POST=] or
771 | [=HTTP PUT=] requests to transmit CMAF
772 | fragment(s) to the receiving entity. The following are additional requirements
773 | imposed to the formatting of CMAF media tracks.
774 |
775 | 1. Media tracks SHALL be formatted using boxes according to Section 7 of
776 | [[!MPEGCMAF]]. Media track SHOULD not use media-level encryption (e.g.,
777 | common encryption), as HTTP over TLS (HTTPS) should provide sufficient
778 | transport layer security. However, in case common encryption is used, the
779 | decryption key shall be made available out of band by supported means such
780 | as CPIX defined by DASH-IF.
781 | 2. The [=CMAF fragment=] durations SHOULD be constant; the duration MAY
782 | fluctuate to compensate for non-integer frame rates. By choosing an
783 | appropriate timescale (a multiple of the frame rate is recommended) this
784 | issue should be avoided. A last fragment of a track may have a
785 | different duration.
786 | 3. The [=CMAF fragment=] durations SHOULD be between approximately one and
787 | six seconds.
788 | 4. Media tracks SHOULD use a timescale for video streams based on the
789 | framerate and 44.1 KHz or 48 KHz for audio streams or any another
790 | timescale that enables integer increments of the decode times of fragments
791 | signaled in the "tfdt" box based on this scale. If necessary, integer
792 | multiples of these timescales could be used.
793 | 5. The language of the CMAF track SHOULD be signaled in the "[=mdhd=]" box or
794 | "[=elng=]" boxes in the CMAF header.
795 | 6. Media tracks SHOULD contain the ("btrt") box specifying the target average
796 | and maximum bitrate of the CMAF fragments in the sample entry container in
797 | the CMAF header.
798 | 7. Media tracks MAY be composed of CMAF chunks [[!MPEGCMAF]] clause 7.3.2.3.
799 | In this case, they SHOULD be signaled using SegmentTypeBox ("styp") to
800 | make it easy for the receiving entity to differentiate them from CMAF
801 | fragments. The brand type of a chunk is *cmfl*. CMAF chunks should only be
802 | signaled if they are not the first chunk in a CMAF fragment.
803 | 8. In video tracks, profiles like avc1 and hvc1 MAY be used that signal the
804 | sequence parameter set in the CMAF header. In this case, these codec
805 | parameters do not change dynamically during the live session in the media
806 | track.
807 | 9. However, video tracks SHOULD use profiles like avc3 or hev1 that signal
808 | the parameter sets (PPS, SPS, VPS) in in the media samples. This allows
809 | inband signaling of parameter changes. This is because in live content,
810 | codec configuration may change slightly over time.
811 | 10. In case the language of a track changes, a new CMAF header with updated
812 | "[=mdhd=]" and/or "[=elng=]" SHOULD be present. The CMAF header MUST be
813 | identical, except the "elng" tag.
814 | 11. Track roles SHOULD be signaled in the ingest by using a "kind" box in
815 | UserDataBox ("udta"). The "kind" box MUST contain a schemeURI
816 | urn:mpeg:dash:role:2011 and a value containing a Role as defined in
817 | [[!MPEGDASH]]. In case this signaling does not occur, the processing
818 | entity can define the role for the track independently.
819 |
820 | ## Requirements for Signaling Switching Sets ## {#interface-1-switchingsets}
821 |
822 | In live streaming, a [=CMAF presentation=] of streams corresponding to a channel
823 | is ingested by posting to a [=publishing_point_URL=] at the receiving entity.
824 | CMAF has the notion of switching sets [[!MPEGCMAF]] that map to similar
825 | streaming protocol concepts like Adaptation Set in DASH. To signal a switching
826 | set in a CMAF presentation, CMAF media tracks MUST correspond to the constraints
827 | defined in [[!MPEGCMAF]] clause 7.3.4.
828 |
829 | In addition, optional explicit signaling is defined in this clause. This would
830 | mean the following steps could be implemented by the live ingest source.
831 |
832 | 1. A live ingest source MAY generate a [=switching set ID=] that is unique
833 | for each switching set in a live stream session. Tracks with the same
834 | [=switching set ID=] belong to the same switching set. The switching set
835 | ID can be a string or (small) integer number. Characters in
836 | [=switching set ID=] SHALL be unreserved, i.e., A-Za-z0-9_.-~ in order to
837 | avoid introducing delimiters.
838 | 2. The [=switching set ID=] may be added in a relative path to the
839 | [=POST_URL=] using the Switching() keyword. In this case, a CMAF segment
840 | is sent from the live ingest source as POST chunk.cmfv
841 | POST_URL/Switching([=switching set ID=])/Streams(stream_id) (deprecated not
842 | commonly supported). This option is only recommended when Streams() keyword
843 | is used and the option to signal switchingsets in the MPD is not used.
844 |
845 | 3. The live ingest source MAY add a "kind" box in the "udta" box in each
846 | track to signal the switching set it belongs to. The schemeURI of this
847 | "kind" box SHALL be urn:dashif:ingest:switchingset_id and the value field
848 | of the "kind" box SHALL be the [=switching set ID=].
849 | 4. The switching sets are grouped as adaptation sets present in the DASH
850 | manifest in a POST request issued earlier, i.e., before the segments of
851 | that switching set are transmitted. In this case, the naming of the
852 | segment URIs follows the naming defined in the DASH manifest based on a
853 | SegmentTemplate elements. In this case the SwitchingSet ID corresponds
854 | to the AdaptationSet @id attribute
855 | 5. SwitchingSet grouping may be derived from the HTTP Live Streaming master playlist.
856 |
857 | Table 2: Switching set signaling options.
858 |
859 |
860 |
Signaling option
861 |
Requirement
862 |
863 |
864 |
Implicit signaling based on switching set constraints [[!MPEGCMAF]] clause 7.3.4.
865 |
Mandatory
866 |
867 |
868 |
Signaling using [=switching set ID=] in the [=POST_URL=] using Switching() keyword (only when not MPD and Streams() is used)
869 |
Optional
870 |
871 |
872 |
Signaling using DASH AdaptationSet and defined naming structure based on SegmentTemplate and SegmentTimeline
873 |
Optional
874 |
875 |
876 |
Signaling using HTTP Live Streaming master playlist
877 |
Optional
878 |
879 |
880 |
Signaling using [=switching set ID=] in the track using "kind" box with schemeURI urn:dashif:ingest:switchingset_id and value set to [=switching set ID=]
881 |
Optional
882 |
883 |
884 |
885 | ## Requirements for Timed Text, Captions and Subtitle Tracks ## {#interface-1-timed-text-captions}
886 |
887 | The live media ingest specification follows requirements for ingesting a track
888 | with timed text, captions and/or subtitle streams. The recommendations for
889 | formatting subtitle and timed text tracks are defined in [[!MPEGCMAF]] and
890 | [[!MPEG4-30]].
891 |
892 | We provide additional guidelines and best practices for formatting timed text
893 | and subtitle tracks.
894 |
895 | 1. CMAF tracks carrying WebVTT signaled by the *cwvt* brand or TTML Text
896 | signaled by the *im1t* brand are preferred. [[!MPEG4-30]] defines the
897 | track format selected in [[!MPEGCMAF]].
898 | 2. Based on this [[!ISOBMFF]], the trackhandler "hdlr" SHALL be set to "text"
899 | for WebVTT and "subt" for TTML.
900 | 3. The "[=ftyp=]" box in the CMAF header for the track containing timed text,
901 | images, captions and subtitles MAY use signaling using CMAF profiles based
902 | on [[!MPEGCMAF]]:
903 |
904 | 4. The BitRateBox ("btrt") SHOULD be used to signal the average and maximum
905 | bitrate in the sample entry box, this is most relevant for bitmap or XML
906 | based timed text subtitles that may consume significant bandwidth (e.g.,
907 | im1i or im1t).
908 | 5. In case the language of a track changes, a new CMAF header with updated
909 | "[=mdhd=]" and/or "[=elng=]" SHOULD be sent from the ingest source to the
910 | receiving entity.
911 | 6. Track roles can be signaled in the ingest, by using a "kind" box in the
912 | "udta" box. The "kind" box MUST contain a schemeURI
913 | urn:mpeg:dash:role:2011 and a value containing a role as defined in
914 | [[!MPEGDASH]].
915 |
916 | NOTE: [[!MPEGCMAF]] allows multiple "kind" boxes, hence, multiple roles can be
917 | signaled. By default, one should signal the DASH role urn:mpeg:dash:role:2011. A
918 | receiver may derive corresponding configuration for other streaming protocols
919 | such as HLS. In case this is not desired, additional "kind" boxes with
920 | corresponding schemeURI and values can be used to explicitly signal this
921 | information for other protocol schemes.
922 |
923 | An informative scheme of defined roles in DASH and respective corresponding
924 | roles in HLS can be found below, additionally the forced subtitle in HLS might
925 | be derived from a DASH forced subtitle role as well by a [=receiving entity=].
926 |
927 | Table 3: Roles for subtitle and audio tracks and HLS characteristics.
928 |
929 |
930 |
HLS characteristic
931 |
urn:mpeg:dash:role:2011
932 |
933 |
934 |
transcribes-spoken-dialog
935 |
subtitle
936 |
937 |
938 |
easy-to-read
939 |
easyreader
940 |
941 |
942 |
describes-video
943 |
description
944 |
945 |
946 |
describes-music-and-sound
947 |
caption
948 |
949 |
950 |
951 | DASH roles are defined in urn:mpeg:dash:role:2011 [[!MPEGDASH]]. Another example
952 | for explicitly signaling roles could be DVB DASH [[!DVB-DASH]]:
953 |
954 |
958 |
959 | ## Requirements for Timed Metadata Tracks ## {#interface-1-timed-metadata}
960 |
961 | This section discusses the specific formatting requirements for [=CMAF Ingest=]
962 | of timed metadata. Examples of timed metadata are opportunities for splice
963 | points and program information signaled by SCTE-35 markers. Such event signaling
964 | is different from regular audio/video information because of its sparse nature.
965 | In this case, the signaling data usually does not happen continuously and the
966 | intervals may be hard to predict. Other examples of timed metadata are ID3 tags
967 | [[!ID3v2]], SCTE-35 markers [[!SCTE35]] and DASHEventMessageBox'es defined in
968 | Section 5.9.8.3 of [[!MPEGDASH]].
969 |
970 | Table 4 provides some example urn schemes to be signaled. Table 5 illustrates an
971 | example of a SCTE-35 marker stored in a DASHEventMessageBox that is in turn
972 | stored as a metadata sample in a metadata track. The presented approach enables
973 | ingest of timed metadata from different sources, because data is not interleaved
974 | with the media.
975 |
976 | By using CMAF timed metadata tack, the same track and presentation formatting
977 | are applied for metadata as for other tracks ingested, and the metadata is part
978 | of the [=CMAF presentation=].
979 |
980 | By embedding the DASHEventMessageBox structure in timed metadata samples, some
981 | of the benefits of its usages in DASH and CMAF are kept. In addition, it enables
982 | signaling of gaps, overlapping events and multiple events starting at the same
983 | time in a single timed metadata track for this scheme. In addition, the parsing
984 | and processing of DASHEventMessageBox'es is supported in many players. The
985 | support for this DASHEventMessageBox embedded timed metadata track instantiation
986 | is described.
987 |
988 | An example of adding an ID3 tag in a DASHEventMessageBox can be found in
989 | [[=aomid3=]].
990 |
991 | Table 4: Example URN schemes for timed metadata tracks.
992 |
993 |
994 |
URI
995 |
Reference
996 |
997 |
998 |
urn:mpeg:dash:event:2012
999 |
[[!MPEGDASH]]
1000 |
1001 |
1002 |
urn:dvb:iptv:cpm:2014
1003 |
[[!DVB-DASH]]
1004 |
1005 |
1006 |
urn:scte:scte35:2013:bin
1007 |
[[!SCTE214-3]]
1008 |
1009 |
1010 |
www.nielsen.com:id3:v1
1011 |
Nielsen ID3 in DASH [[!ID3v2]]
1012 |
1013 |
1014 |
1015 | Table 5: Example of a SCTE-35 marker embedded in a DASH EventMessageBox.
1016 |
1017 |
1018 |
Tag
1019 |
Value
1020 |
1021 |
1022 |
scheme_id_uri
1023 |
urn:scte:scte35:2013:bin
1024 |
1025 |
1026 |
value
1027 |
value used to signal subscheme
1028 |
1029 |
1030 |
timescale
1031 |
positive number, ticks per second, similar to track timescale
1032 |
1033 |
1034 |
presentation_time_delta
1035 |
non-negative number
1036 |
1037 |
1038 |
event_duration
1039 |
duration of event "0xFFFFFFFF" if unknown
1040 |
1041 |
1042 |
id
1043 |
unique identifier for message
1044 |
1045 |
1046 |
message_data
1047 |
splice info section including CRC
1048 |
1049 |
1050 |
1051 | The following are requirements and recommendations that apply to the timed
1052 | metadata ingest of information related to events, tags, ad markers and program
1053 | information and others:
1054 |
1055 | 1. Timed Metadata SHALL be conveyed in a CMAF track, where the media handler (hdlr)
1056 | is "meta", the track handler box is a NullMediaHeaderBox ("[=nmhd=]") as
1057 | defined for timed metadata tracks in [[!ISOBMFF]] clause 12.3.
1058 | 2. The CMAF timed metadata track applies to the [=CMAF presentation=]
1059 | ingested to a [=publishing_point_URL=] at the receiving entity.
1060 | 3. To fulfill CMAF track requirements in [[!MPEGCMAF]] clause 7.3., such as
1061 | not having gaps in the media timeline, filler data may be needed. Such
1062 | filler data SHALL be defined by the metadata scheme signaled in
1063 | URIMetaSampleEntry. For example, WebVTT tracks define a VTTEmptyCueBox in
1064 | [[!MPEG4-30]] clause 6.6. This cue is to be carried in samples in which no
1065 | active cue occurs. Other schemes could define empty fillers amongst
1066 | similar lines, such as the EventMessageEmptyBox (emeb) in ISO/IEC 23001-18.
1067 | 4. CMAF track files do not support overlapping, multiple concurrently active
1068 | or zero duration samples. In case metadata or events are concurrent,
1069 | overlapping or of zero duration, such semantics MUST be defined by the
1070 | scheme signaled in the URIMetaSampleEntry. The timed metadata track MUST
1071 | still conform to [[!MPEGCMAF]] clause 7.3.
1072 | 5. CMAF timed metadata tracks MAY carry DASH Events as defined in
1073 | [[!MPEGDASH]] clause 5.9.8.3 in the metadata samples. The best way to
1074 | create such a track is based on ISO/IEC 23001-18. Some
1075 | older implementations may use DASHEventMessageBox'es as defined in
1076 | ISO/IEC 23009-1. Using DASHEventMessageBox'es directly in samples may be
1077 | implemented as follows:
1078 |
1079 | 5a. Version 1 SHOULD be used. In case version 0 is used, the
1080 | presentation_time_delta refers to presentation time of the sample
1081 | enclosing the DASHEventMessageBox.
1082 |
1083 | 5b. The URIMetaSampleEntry SHOULD contain the URN
1084 | "urn:mpeg:dash:event:2012" or an equivalent URN to signal the presence of
1085 | DASHEventMessageBox'es.
1086 |
1087 | 5c. The timescale of the DASHEventMessageBox SHALL match the value
1088 | specified in the MediaHeaderBox ("mdhd") of the timed metadata track.
1089 |
1090 | 5d. The sample SHOULD contain all DASHEventMessageBox'es that are active
1091 | in during the presentation time of the sample.
1092 |
1093 | 5e. A single metadata sample MAY contain multiple DASHEventMessageBox'es.
1094 | This happens if multiple DASHEventMessageBox'es have the same presentation
1095 | time or if an earlier event is still active in a sample containing a newly
1096 | started and overlapping event.
1097 |
1098 | 5f. The scheme_id_uri in the DASHEventMessageBox can be used to signal the
1099 | scheme of the data carried in the message data field. This enables
1100 | carriage of multiple metadata schemes in a track.
1101 |
1102 | 5g. For SCTE-35 ingest, the scheme_id_uri in the DASHEventMessageBox MUST
1103 | be "urn:scte:scte35:2013:bin" as defined in [[!SCTE214-3]]. A binary
1104 | SCTE-35 payload is carried in the message_data field of a
1105 | DASHEventMessageBox. If a splice point is signaled, media tracks MUST
1106 | insert an IDR sample at the time corresponding to the event presentation
1107 | time.
1108 |
1109 | 5h. It may be necessary to add filler samples to avoid gaps in the CMAF
1110 | track timeline. This may be done using EventMessageEmptyBox (8 bytes) with
1111 | 4cc code of "emeb" defined in ISO/IEC 23001-18.
1112 |
1113 | 5i. If ID3 tags are carried, the DASHEventMessageBox MUST be formatted as
1114 | defined in [[=aomid3=]].
1115 |
1116 | 5j. The value and id field of the DASHEventMessageBox can be used by the
1117 | receiving entity to detect duplicate events.
1118 |
1119 | 6. The ingest source SHOULD NOT embed inband top-level DASHEventMessageBox'es
1120 | ("emsg") in the timed metadata track.
1121 |
1122 | 7. Timed metadata tracks, similar to other CMAF tracks, SHOULD use a constant
1123 | segment duration. As actual timed metadata durations may vary in practice,
1124 | timed metadata schemes should support schemes for re-signaling all active
1125 | timed metadata in each sample. This way, constant duration segments
1126 | (e.g., two-second segments) can still be used and metadata that is still
1127 | active can be repeated in later segments. ISO/IEC 23001-18 has explicit
1128 | support for this feature by repeating the event message instance boxes
1129 | in subsequent samples.
1130 | 8. A change in the set of active events shall trigger a sample boundary in
1131 | the timed medata track.
1132 |
1133 | 9. In case the timed metadata track is also signaled in the manifest, the
1134 | @codecs string should be set to the 4cc code of the sample entry, e.g.,
1135 | "urim" for URIMetaSampleEntry or "evte" for ISO/IEC 23001-18.
1136 | The contentType field should be set to "meta" and mimeType field to "application/mp4".
1137 | Additional supplemental or Essential property descriptors may
1138 | be used to further describe the content of the metadata track in the manifest.
1139 |
1140 | ## Requirements for Signaling and Conditioning Splice Points ## {#interface-1-splicing}
1141 |
1142 | Splicing is important for use cases like ad insertion or clipping of content.
1143 | The requirements for signaling splice points and content conditioning at
1144 | respective splice points are as follows.
1145 |
1146 | 1. The preferred method for signaling splice point uses the timed metadata
1147 | track sample with a presentation time corresponding to the splice point.
1148 | The timed metadata track sample is carrying events carrying binary SCTE-35
1149 | based on the scheme urn:scte:scte35:2013:bin as defined in
1150 | [[!SCTE214-3]]. The command carried in the binary SCTE-35 SHALL carry a
1151 | splice info section with spliceInsert command with out of network
1152 | indicator set to 1 and a break_duration matching the actual break
1153 | duration.
1154 |
1155 | 2. Information related to splicing, whether SCTE-35 based or by other means,
1156 | whether in an EventMessageBox or timed metadata track sample or event MUST
1157 | be available to the receiver at least four seconds before the media
1158 | segment with the intended splice point.
1159 |
1160 | 3. The splice time SHALL equal the presentation time of the metadata sample
1161 | or event message, as the SCTE-35 timing is based on MPEG-2 TS and has no
1162 | meaning in CMAF or DASH. The media ingest source is responsible for the
1163 | frame accurate conversion of this time similar to for the media segments.
1164 |
1165 | 4. In case a separate SCTE-35 command is used with out_of_network_indicator=0,
1166 | the actual duration of the break SHALL match the announced break duration in the
1167 | SCTE-35 command iwth out_of_network_indicator=1 in the earlier SCTE-35
1168 | splice_insert command.
1169 |
1170 | 5. In case segmentation descriptors are used and multiple descriptors are
1171 | present, a separate event message with a duration corresponding to each of
1172 | the descriptors SHOULD be used.
1173 |
1174 | The conditioning follows [[=DASH-IFad=]] shown in Figure 9:
1175 |
1176 | Figure 9: Splice point conditioning
1177 |
1178 |
1179 |
1180 | The splice point conditioning in [[=DASH-IFad=]] are defined as follows:
1181 |
1182 | 1. Option 1 (splice conditioned packaging): Both a fragment boundary and a
1183 | SAP 1 or SAP 2 (stream access point) at the splice point.
1184 | 2. Option 2 (splice conditioned encoding): A SAP 1 or SAP 2 stream access
1185 | point at the frame at the boundary.
1186 | 3. Option 3 (splice point signaling): No specific content conditioning at the
1187 | splice point.
1188 |
1189 | This specification requires option 1 or 2 to be applied. Option 2 is required
1190 | for dual-encoder synchronization to avoid variation of the segment durations.
1191 |
1192 | ## Requirements for Failovers and Connection Error Handling ## {#interface-1-failover}
1193 |
1194 | Given the nature of live streaming, good failover support is critical for
1195 | ensuring the availability of the service. Typically, media services are designed
1196 | to handle various types of failures, including network errors, server errors,
1197 | and storage issues. When used in conjunction with proper failover logic from the
1198 | ingest source side, highly reliable live streaming setups can be built. In this
1199 | section, we discuss requirements for failover scenarios.
1200 |
1201 | When the [=receiving entity=] fails:
1202 |
1203 | - A new instance SHOULD be created listening to the same
1204 | [=publishing_point_URL=] for the ingest stream.
1205 |
1206 | When the [=ingest source=] fails:
1207 |
1208 | 1. A new instance SHOULD be instantiated to continue the ingest for the live
1209 | streaming session.
1210 | 2. The new instance MUST use the same URL's for HTTP requests as the
1211 | failed instance for segments.
1212 | 3. The new instance's POST request MUST include the same [=CMAF header=] or
1213 | CMAF header as the failed instance.
1214 | 4. The new instance MUST be properly synced with all other running ingest
1215 | sources for the same live presentation to generate synced audio/video
1216 | samples with aligned fragment boundaries in the track. This implies that
1217 | timestamps in the "tfdt" [=baseMediaDecodeTime=] box match.
1218 | 5. The new stream MUST be semantically equivalent with the previous stream,
1219 | and interchangeable at the header and media fragment levels.
1220 | 6. The new instance SHOULD try to minimize data loss. The
1221 | [=baseMediaDecodeTime=] of fragments SHOULD increase from the point where
1222 | the encoder last stopped. The [=baseMediaDecodeTime=] in the "tfdt" box
1223 | SHOULD increase in a continuous manner, but it is permissible to introduce
1224 | a discontinuity, if necessary. A receiving entity can ignore fragments
1225 | that it has already received and processed, so it is better to err on the
1226 | side of resending fragments than to introduce discontinuities in the media
1227 | timeline.
1228 | 7. In some cases, an alternative source can be used by the receiving entity
1229 | to request the missing segments through additional signaling, which is out
1230 | of the scope of this specification.
1231 |
1232 | ## Requirements for Ingest Source Synchronization ## {#interface-1-dualsync}
1233 |
1234 | In the case of more than one redundant ingest sources, synchronization between
1235 | them can be achieved as follows. A fixed segment duration is chosen such as
1236 | based on the fixed GoP duration, e.g., two seconds that is used by all ingest
1237 | sources and CMF tracks.
1238 | So the CMAF segment duration is fixed for all CMAF tracks (not only the video
1239 | tracks). The CMAF tracks use a fixed anchor T as a timeline origin, this
1240 | should be 1-1-1970 (Unix epoch) or another well-known defined time anchor. The
1241 | segment boundaries in this case are K * segment duration (since anchor T) for an
1242 | integer K > 0. Any media source joining or starting can compute the fragment
1243 | boundary and produce segments with equivalent segment boundaries corresponding
1244 | to approximately the current time by choosing K sufficiently large.
1245 |
1246 | It is assumed that media sources generate signals from a synchronized input source and
1247 | can use timing information from this source, e.g., MPEG-2 TS presentation time
1248 | stamp or SDI signals to compute such timestamps for each segment. For example,
1249 | in the case of MPEG-2 TS program clock reference (PCR) and presentation
1250 | timestamps can be used. Based on this conversion, different media sources will
1251 | produce segments with identical durations, per frame timestamps and enclosing frames.
1252 | By this conversion to a common timeline based on a common anchor (in this case the
1253 | Unix epoch) and fixed segment durations, ingest sources can join and leave the
1254 | synchronized operation, enabling both synchronization and redundancy. Each
1255 | time a source join it can compute based on the anchor, fixed segment duration
1256 | and current Time a suitable value for K and the CMAF base media decode times.
1257 |
1258 | In this setup, a first ingest source can be seamlessly replaced by a redundant
1259 | second ingest source. In case of splicing, it is important that the ingest
1260 | source inserts an IDR frame but not a segment or fragment boundary.
1261 |
1262 | ## Identifier ## {#interface-1-identifier}
1263 |
1264 | The interface described in this clause is identified with the following identifier:
1265 |
1266 |
1267 |
1268 |
Identifer
1269 |
Reference
1270 |
Sections
1271 |
Comments
1272 |
1273 |
1274 |
http://dashif.org/ingest/v1.2/interface-1
1275 |
http://dashif.org/ingest/v1.2
1276 |
Clause [[#interface-1]]
1277 |
Conforming to the requirements of clause [[#interface-1]]
1278 |
1279 |
1280 |
1281 | The above identifier may be used by an entity to signal the support of the interface defined in clause [[#interface-1]].
1282 |
1283 | # Interface-2: DASH and HLS Ingest # {#interface-2}
1284 |
1285 | Interface-2 defines the protocol specific behavior required to ingest a
1286 | [=streaming presentation=] composed of mandatory [=manifest objects=] and
1287 | [=media objects=] to receiving entities. In this mode, the ingest source
1288 | prepares and delivers to the receiving entity all the [=objects=] intended for
1289 | consumption by a client. These are a complete streaming presentation including
1290 | all manifest and media objects.
1291 |
1292 | This interface is intended to be used by workflows that do not require active
1293 | media processing after encoding. It leverages the fact that many encoders
1294 | provide DASH and HLS packaging capabilities and that the resulting packaged
1295 | content can easily be transferred via HTTP to standard web servers. However,
1296 | neither DASH nor HLS has specified how such a workflow is intended to work
1297 | leaving the industry to self-specify key decisions such as how to secure and
1298 | authenticate ingest sources, who is responsible for managing the content life
1299 | cycle, the order of operations, failover features, robustness methods, etc. In
1300 | most cases, a working solution can be had using a readily available web server
1301 | such as Nginx or Varnish and the standard compliment of HTTP methods. In many
1302 | cases, Interface-2 simply documents what is considered an industry best practice
1303 | while attempting to provide guidance to areas less commonly considered.
1304 |
1305 | The requirements below (in addition to the common requirements listed in
1306 | [[#interface-1-2]]) encapsulate all the needed functionality to support
1307 | Interface-2. In case [[!MPEGCMAF]] media is used, the media track and segment
1308 | formatting will be similar as defined in Interface-1.
1309 |
1310 | ## General Requirements ## {#interface-2-requirements}
1311 | 1. The ingest source MUST be able to create a compliant streaming
1312 | presentation for DASH and/or HLS. The ingest source may create both DASH
1313 | and HLS streaming presentations using common media objects (i.e., CMAF),
1314 | but the ingest source MUST generate format-specific manifest objects.
1315 |
1316 | ### HTTP Sessions ### {#interface-2-http-sessions}
1317 |
1318 |
1319 | 1. The ingest source SHOULD remove media objects from the receiving entity
1320 | that are no longer referenced in the corresponding manifest objects via an
1321 | HTTP DELETE command. How long the ingest source waits to remove
1322 | unreferenced content can be configurable. Upon receiving an HTTP DELETE
1323 | command, the receiving entity SHOULD:
1324 |
1325 | 1a. delete the referenced content and return an HTTP 200 OK status code,
1326 |
1327 | 1b. delete the corresponding folder if the last file in the folder is
1328 | deleted and it is not a root folder and not necessarily recursively
1329 | deleting empty folders.
1330 |
1331 | ### Unique Segment and Manifest Naming ### {#interface-2-naming}
1332 |
1333 | 1. The ingest source MUST ensure all [=media objects=] (video segments, audio
1334 | segments, initialization segments and caption segments) have unique paths.
1335 | This uniqueness applies across all ingested content in previous sessions
1336 | as well as the current session. This requirement ensures previously cached
1337 | content (i.e., by a CDN) is not inadvertently served instead of newer
1338 | content of the same name.
1339 | 2. The ingest source MUST ensure all objects in a [=live stream session=] are
1340 | contained within the configured path. Should the receiving entity receive
1341 | media objects outside of the allowed path, it SHOULD return an HTTP 403
1342 | Forbidden response.
1343 | 3. For each live stream session, the ingest source MUST provide unique paths
1344 | for the [=manifest objects=]. One suggested method of achieving this is to
1345 | introduce a timestamp of the start of the live stream session into the
1346 | manifest path. A session is defined by the explicit start and stop of the
1347 | encoding process.
1348 | 4. When receiving objects with the same path as an existing object, the
1349 | receiving entity MUST overwrite the existing objects with the newer
1350 | objects of the same path.
1351 | 5. To support unique naming and consistency, the ingest source SHOULD include
1352 | a number, which is monotonically increasing with each new media object at
1353 | the end of media object's name, separated by a non-numeric character. This
1354 | way it is possible to retrieve this numeric suffix via a regular
1355 | expression.
1356 |
1357 | NOTE: Using DASH SegmentTemplate with @media and @intitialization and a single period
1358 | can achieve this.
1359 |
1360 | 6. The ingest source MUST identify media objects containing initialization
1361 | fragments by using the .init file extension.
1362 | 7. The ingest source MUST include a file extension and a MIME type for all
1363 | media objects. Table 6 outlines the formats that manifest and media
1364 | objects are expected to follow based on their file extension. Segments may
1365 | be formatted as MPEG4 (.mp4, .m4v, m4a), [[!MPEGCMAF]] (.cmfv, .cmfa,
1366 | .cmfm, .cmft) or [[!MPEG2TS]] .ts (HLS only). Manifests may be formatted
1367 | as DASH (.mpd) or HLS (.m3u8).
1368 |
1369 | NOTE: Using MPEG-2 TS breaks consistency with Interface-1, which uses a CMAF
1370 | container format structure.
1371 |
1372 | Table 6: List of the permissible combinations of file extensions and MIME types.
1373 |
1374 |
1375 |
File extension
1376 |
MIME type
1377 |
1378 |
1379 |
.m3u8 [[!RFC8216]]
1380 |
application/x-mpegURL or vnd.apple.mpegURL
1381 |
1382 |
1383 |
.mpd [[!MPEGDASH]]
1384 |
application/dash+xml
1385 |
1386 |
1387 |
.cmfv [[!MPEGCMAF]]
1388 |
video/mp4
1389 |
1390 |
1391 |
.cmfa [[!MPEGCMAF]]
1392 |
audio/mp4
1393 |
1394 |
1395 |
.cmft [[!MPEGCMAF]]
1396 |
application/mp4
1397 |
1398 |
1399 |
.cmfm [[!MPEGCMAF]]
1400 |
application/mp4
1401 |
1402 |
1403 |
.mp4 [[!ISOBMFF]]
1404 |
video/mp4 or application/mp4
1405 |
1406 |
1407 |
.m4v [[!ISOBMFF]]
1408 |
video/mp4
1409 |
1410 |
1411 |
.m4a [[!ISOBMFF]]
1412 |
audio/mp4
1413 |
1414 |
1415 |
.m4s [[!ISOBMFF]]
1416 |
video/iso.segment
1417 |
1418 |
1419 |
.init
1420 |
video/mp4
1421 |
1422 |
1423 |
.header [[!ISOBMFF]]
1424 |
video/mp4
1425 |
1426 |
1427 |
.key
1428 |
application/octet-stream
1429 |
1430 |
1431 |
1432 | ### Additional Failure Behaviors ### {#interface-2-failure-behaviors}
1433 |
1434 | The following items defines additional behavior of an ingest source when
1435 | encountering certain error responses from the receiving entity.
1436 |
1437 | 1. When the ingest source receives a TCP connection attempt timeout, abort
1438 | midstream, response timeout, TCP send/receive timeout or an HTTP 5xx error
1439 | code when attempting to POST content to the receiving entity, it MUST:
1440 |
1441 | 1a. For manifest objects: Re-resolve DNS on each retry (per the DNS TTL)
1442 | and retry as defined in [[#interface-1-2]].
1443 |
1444 | 1b. For media objects: Re-resolve DNS on each retry (per the DNS TTL) and
1445 | continue uploading for n seconds, where n is the segment duration. After
1446 | it reaches the media object duration value, the ingest source MUST
1447 | continue with the next media object and update the manifest object with a
1448 | discontinuity marker appropriate for the protocol format. To maintain
1449 | continuity of the timeline, the ingest source SHOULD continue to upload
1450 | the missing media object with a lower priority. The reason for this is to
1451 | maintain an archive without discontinuity in case the stream is played
1452 | back at a later time. Once a media object is successfully uploaded, the
1453 | ingest source SHOULD update the corresponding manifest object to reflect
1454 | the now available media object.
1455 |
1456 | NOTE: Some clients may not like changes made in the manifest about the
1457 | past media objects (e.g., removing a previously present discontinuity).
1458 | Thus, care should be taken when making such changes.
1459 |
1460 | 2. Upon receipt of an HTTP 403 or 400 error code, the ingest source MAY be
1461 | configured to not retry sending the fragments (N, as described in
1462 | [[#interface-1-2]], will be 0 in this case).
1463 |
1464 | ## DASH-Specific Requirements ## {#dash-ingest-requirements}
1465 |
1466 | ### File Extensions and MIME Types ### {#dash-ingest-extensions-mime}
1467 |
1468 | 1. The ingest source MUST use an .mpd file extension for the manifest.
1469 | 2. The ingest source MUST use one of the allowed file extensions (see Table
1470 | 6) for the media objects.
1471 |
1472 | ### Relative Paths ### {#dash-ingest-relative-paths}
1473 |
1474 | - The ingest source SHOULD use relative URLs to address each segment within
1475 | the manifest.
1476 |
1477 | ## HLS-Specific Requirements ## {#hls-ingest-requirements}
1478 |
1479 | ### File Extensions and MIME Types ### {#hls-ingest-extensions-mime}
1480 |
1481 | 1. The ingest source MUST use an .m3u8 file extension for master and variant
1482 | playlists.
1483 | 2. The ingest source SHOULD use a .key file extension for any keyfile posted
1484 | to the receiving entity for client delivery.
1485 | 3. The ingest source MUST use a .ts file extension for segments encapsulated
1486 | in an MPEG-2 TS file format.
1487 | 4. The ingest source MUST use one of the allowed file extensions (see Table
1488 | 6) appropriate for the MIME type of the content encapsulated using
1489 | [[!MPEGCMAF]].
1490 |
1491 | ### Relative Paths ### {#hls-ingest-relative-paths}
1492 |
1493 | 1. The ingest source SHOULD use relative URLs to address each segment within
1494 | the variant playlist.
1495 | 2. The ingest source SHOULD use relative URLs to address each variant
1496 | playlist within the master playlist.
1497 |
1498 | ### Encryption ### {#hls-ingest-encryption}
1499 |
1500 | - The ingest source may choose to encrypt the media segments and publish the
1501 | corresponding keyfile to the receiving entity.
1502 |
1503 | ### Upload Order ### {#hls-ingest-upload_order}
1504 |
1505 | In accordance with [[!RFC8216]] recommendation, ingest sources MUST upload all
1506 | required files for a specific bitrate and segment before proceeding to the next
1507 | segment. For example, for a bitrate that has segments and a playlist that
1508 | updates every segment and key files, ingest sources upload the segment file
1509 | followed by a key file (optional) and the playlist file in serial fashion. The
1510 | encoder MUST only move to the next segment after the previous segment has been
1511 | successfully uploaded or after the segment duration time has elapsed. The order
1512 | of operation should be:
1513 |
1514 | 1. Upload the media segment,
1515 | 2. Upload the key file (if required),
1516 | 3. Upload the playlist.
1517 |
1518 | If there is a problem with any of the steps, retry. Do not proceed to step 3
1519 | until step 1 succeeds or times out as described above. Failed uploads MUST
1520 | result in a stream manifest discontinuity per [[!RFC8216]].
1521 |
1522 | ### Resiliency ### {#hls-ingest-resiliency}
1523 |
1524 | 1. When ingesting media objects to multiple receiving entities, the ingest
1525 | source MUST send identical media objects with identical names.
1526 | 2. When multiple ingest sources are used, they MUST use consistent media
1527 | object names including when reconnecting due to an application or
1528 | transport error. A common approach is to use (epoch time)/(segment
1529 | duration) as the object name.
1530 |
1531 | ## Identifier ## {#interface-2-identifier}
1532 |
1533 | The interface described in this clause is identified with the following identifier:
1534 |
1535 |
1536 |
1537 |
Identifer
1538 |
Reference
1539 |
Sections
1540 |
Comments
1541 |
1542 |
1543 |
http://dashif.org/ingest/v1.2/interface-2
1544 |
http://dashif.org/ingest/v1.2
1545 |
Clause [[#interface-2]]
1546 |
Conforming to the requirements of clause [[#interface-2]]
1547 |
1548 |
1549 |
1550 | The above identifier may be used by an entity to signal the support of the interface defined in clause [[#interface-2]].
1551 |
1552 | # Examples (Informative) # {#examples}
1553 |
1554 | In this section, we provide some example deployments for live streaming.
1555 |
1556 | ## Example 1: CMAF Ingest and a Just-in-Time Packager ## {##example-1}
1557 |
1558 | Figure 10 shows an example where a separate packager and origin server are used.
1559 |
1560 | Figure 10: Example setup with CMAF Ingest and DASH/HLS Ingest.
1561 |
1562 |
1563 | The broadcast source is used as input to the [=live encoder=]. The broadcast
1564 | sources can be the SDI signals from a broadcast facility or MPEG-2 TS streams
1565 | intercepted from a broadcast that need to be re-used in an [=OTT=] distribution
1566 | workflow. The live encoder performs the encoding of the tracks into CMAF tracks
1567 | and functions as the ingest source in the CMAF Ingest interface. Multiple live
1568 | encoders can be used, providing redundant inputs to the packager using
1569 | dual-encoder synchronization. In this case, the segments are of constant
1570 | duration, and audio and video segment boundaries are aligned. Segments should
1571 | use a timing relative to a shared anchor such as the Unix epoch as to support
1572 | synchronization based on epoch locking (see section on ingest source synchronization).
1573 |
1574 | Following the CMAF Ingest specification in this document allows for failover and
1575 | many other features related to the content tracks. The live encoder performs the
1576 | following tasks:
1577 |
1578 | - It demuxes and receives the MPEG-2 TS and/or SDI signal.
1579 |
1580 | - It translates the metadata in these streams such as SCTE-35 or SCTE-104 to
1581 | timed metadata tracks.
1582 |
1583 | - It performs a high quality [=ABR=] encoding in different bitrates with
1584 | aligned switching points.
1585 |
1586 | - It packages all media and timed text tracks as CMAF-compliant tracks and
1587 | signals track roles in "kind" boxes.
1588 |
1589 | - It posts the addressable media objects composing the tracks to the packager
1590 | according to the CMAF Ingest interface defined in [[#interface-1]], and
1591 | optionally a manifest describing the groupings and naming of the inputs.
1592 |
1593 | - The CMAF Ingest allows multiple live encoders and packagers to be deployed
1594 | benefiting from redundant stream creation avoiding timeline discontinuities
1595 | due to failures as much as possible.
1596 |
1597 | - In case the receiving entity fails, it reconnects and resends as defined in
1598 | [[#interface-1-2]] and [[#interface-1-failover]].
1599 |
1600 | - In case the ingest source itself fails, it restarts and performs the steps
1601 | as in [[#interface-1-failover]].
1602 |
1603 | The live encoder can be deployed in the cloud or on a bare metal server or even
1604 | as a dedicated hardware. The live encoder may have some tools or configuration
1605 | APIs to author the CMAF tracks and feed instructions/properties from the SDI or
1606 | broadcast feed into the CMAF tracks. The packager receives the ingested streams
1607 | and performs the following tasks.
1608 |
1609 | - It receives the CMAF tracks, grouping switching sets based on switching set
1610 | constraints, based on the "kind" box or information in the URI or MPD.
1611 |
1612 | - When packaging to DASH, an adaptation set is created for each switching set
1613 | ingested.
1614 |
1615 | - The near constant fragment duration is used to generate segment template
1616 | based presentation using either $Number$ or $Time$.
1617 |
1618 | - In case a splice point occurs, an IDR frame is inserted in the segment
1619 | without introducing a segment boundary (this is important if more than one
1620 | synchronized encoders are used). The SCTE-35 signal is included as timed
1621 | metadata.
1622 |
1623 | - In case changes happen, the packager can update the manifest and embed
1624 | inband events to trigger manifest updates in the fragments.
1625 |
1626 | - The DASH packager encrypts media segments according to key information
1627 | available. This key information is typically exchanged by protocols defined
1628 | in CPIX. This allows configuration of the content keys, initialization
1629 | vectors and embedding encryption information in the manifest.
1630 |
1631 | - The DASH packager signals subtitles in the manifest based on received CMAF
1632 | streams and roles signaled in the "kind" box.
1633 |
1634 | - In case a fragment is missing and SegmentTimeline is used, the packager
1635 | signals a discontinuity in the MPD.
1636 |
1637 | - In case the low-latency mode is used, the packager may make output
1638 | available before the entire fragment is received using HTTP chunked
1639 | transfer encoding.
1640 |
1641 | - The packager may have a proprietary API similar to the live encoder for
1642 | configuration of aspects like the timeShiftBuffer, DVR window, encryption
1643 | modes enabled, etc.
1644 |
1645 | - The packager uses DASH/HLS Ingest (as specified in [[#interface-2]]) to
1646 | push content to the origin server of a CDN. Alternatively, it could also
1647 | make content directly available as an origin server. In this case, DASH/HLS
1648 | Ingest is avoided and the packager also serves as the origin server.
1649 |
1650 | - The packager converts the timed metadata track and uses it to convert to
1651 | either MPD events or inband events signaled in the manifest. The packager
1652 | creates a segment boundary in case this was not present in the original
1653 | ingest and in case a SCTE-35 splice event was received.
1654 |
1655 | - The packager may also generate HLS or other streaming media presentations
1656 | based on the input.
1657 |
1658 | - In case the packager crashes or fails, it restarts and waits for the ingest
1659 | source to perform the actions detailed in [[#interface-1-failover]].
1660 |
1661 | The CDN consumes a DASH/HLS Ingest or serves as a proxy for content delivered to
1662 | a client. The CDN, in case it is consuming the POST-based DASH/HLS Ingest,
1663 | performs the following tasks:
1664 |
1665 | - It stores all posted content and makes them available for HTTP GET requests
1666 | from locations corresponding to the paths signaled in the manifest.
1667 |
1668 | - It occasionally deletes content based on instructions from the ingest
1669 | source, which is the packager in this setup.
1670 |
1671 | - In case the low-latency mode is used, content could be made available
1672 | before the entire pieces of content are available.
1673 |
1674 | - It updates the manifest accordingly when a manifest update is received.
1675 |
1676 | - It serves as a proxy for HTTP GET requests forwarded to the packager.
1677 |
1678 | In case the CDN serves as a proxy, it only forwards requests for content to the
1679 | packager to receive the content and caches the relevant segments for a certain
1680 | duration.
1681 |
1682 | The client receives DASH or HLS streams and is not affected by the specification
1683 | of this work. Nevertheless, it is expected that by using a common streaming
1684 | format, less caching and less overhead in the network will result in a better
1685 | user experience. The client still needs to retrieve license and key information
1686 | by steps defined outside of this specification. Information on how to retrieve
1687 | this information will typically be signaled in the manifest prepared by the
1688 | packager.
1689 |
1690 | ## Example 2: Low-Latency DASH, and Combination of Interface-1 and Interface-2 ## {##example-2}
1691 |
1692 | A second example is given in Figure 11. It constitutes the reference workflow
1693 | for live chunked CMAF developed by DASH-IF and DVB. In this workflow, a
1694 | contribution encoder produces an [=RTP=] mezzanine stream that is transmitted to
1695 | FFmpeg, an example open-source encoder/packager running on a server.
1696 | Alternatively, a file resource may be used. In this workflow, the encoder
1697 | functions as the ingest source. FFmpeg produces the ingest stream with
1698 | different ABR encoded CMAF tracks. In addition, it sends a manifest that
1699 | complies with DASH-IF and DVB low-latency CMAF specification and MPD updates.
1700 | The CMAF tracks also contain respective timing information (i.e., "[=prft=]").
1701 | In this case, the ingest source implements Interface-1 and Interface-2 based
1702 | ingest at once. By also resending CMAF headers in case of failures both
1703 | interfaces may be satisfied. In some cases, URI rewrite rules are needed to
1704 | achieve the compatibility between Interface-1 and Interface-2. For example, the
1705 | DASH segment naming structure can be used to derive the explicit Streams()
1706 | keywords.
1707 |
1708 | The origin server is used to pass the streams to the client and may in some
1709 | cases also perform a re-encryption or re-packaging of the streaming presentation
1710 | as needed by the clients. The example client is DASH.js and a maximum end-to-end
1711 | latency of 3500 ms is targeted.
1712 |
1713 | The approaches for authentication and DNS resolution are similar for the two
1714 | interfaces, as are the track formatting in case CMAF is used. This example does
1715 | not use timed metadata. The ingest source may resend the CMAF header or
1716 | initialization segment in case of connection failures to conform to the CMAF
1717 | Ingest specification.
1718 |
1719 | Figure 11: DASH-IF/DVB reference live chunked CMAF workflow.
1720 |
1721 |
1722 |
1723 | # Implementations (Informative) # {#implementations}
1724 |
1725 | ## Implementation 1: FFmpeg Support for Interface-1 and Interface-2 ## {##implementation1}
1726 |
1727 | Ingest of a single (or multiple) tracks can be achieved in FFmpeg with the MP4
1728 | and CMAF muxer. This example shows the ingest of a single SMPTE header bar video
1729 | track with FFmpeg.
1730 |
1731 |
1744 |
1745 | A more extensive example with epoch locking (dual-encoder synchronization) is
1746 | available from [=PythonFFmpegIngest=]. In this case, a patch is used to add
1747 | correct audio timescale and epoch time offset to FFmpeg.
1748 |
1749 | An example of CMAF and DASH/HLS ingest can be achieved using the DASH muxer. An
1750 | example script is shown below as provided by FFlabs.
1751 |
1752 |
1827 |
1828 | ## Implementation 2: Ingesting CMAF Track Files Based on fmp4 Tools ## {##implementation2}
1829 |
1830 | Another example of ingesting CMAF track files is provided by [=fmp4tools=] as
1831 | described in [=LiveCMAF=]. In this case, stored track files are used. The tool
1832 | can patch the timestamp of the input tracks to a real time and upload the
1833 | segments in real time. The tool can upload timed text and timed metadata tracks.
1834 | Also, the tools support conversion and creation of timed metadata tracks, and
1835 | on-the-fly generation of avail cues based on SCTE-35.
1836 |
1837 | Options available when using fmp4 tools:
1838 |
1839 | Usage: fmp4ingest [options]
1840 | [-u url] Publishing Point URL
1841 | [-r, --realtime] Enable realtime mode
1842 | [-l, --loop] Enable looping arg1 + 1 times
1843 | [--wc_offset] (boolean )Add a wallclock time offset for converting VoD (0) asset to Live
1844 | [--ism_offset] insert a fixed value for hte wallclock time offset instead of using a remote time source uri
1845 | [--wc_uri] uri for fetching wall clock time default time.akamai.com
1846 | [--initialization] SegmentTemplate@initialization sets the relative path for init segments, shall include $RepresentationID$
1847 | [--media] SegmentTemplate@media sets the relative path for media segments, shall include $RepresentationID$ and $Time$ or $Number$
1848 | [--avail] signal an advertisment slot every arg1 ms with duration of arg2 ms
1849 | [--dry_run] Do a dry run and write the output files to disk directly for checking file and box integrity
1850 | [--announce] specify the number of seconds in advance to presenation time to send an avail
1851 | [--auth] Basic Auth Password
1852 | [--aname] Basic Auth User Name
1853 | [--sslcert] TLS 1.2 client certificate
1854 | [--sslkey] TLS private Key
1855 | [--sslkeypass] passphrase
1856 | CMAF files to ingest (.cmf[atvm])
1857 |
1858 |
1859 | Example command line using fmp4 tools:
1860 |
1861 | ## Example with inserting 9600 ms breaks every 57.6 seconds with three track
1862 | files for audio, video and timed text
1863 | ## Also a wallclock time is added
1864 | fmp4ingest -r -u publishing_point_url --wc_offset --avail 57600 9600 tos-096-750k.cmfv tos-096s-128k.cmfa tears-of-steel-nl.cmft
1865 |
1866 |
1867 | Example creating a timed metadata track from a DASH manifest:
1868 |
1869 | ## Example converting an MPD with DASH events to a timed metadata track
1870 | dashEventfmp4 scte-35.mpd scte-35.cmfm
1871 |
1872 |
1873 | # List of Versions and Changes # {#changes}
1874 |
1875 | ## Version 1.0 ## {#version-1-0}
1876 |
1877 | This initial version with Interface-1 and Interface-2 was published in April 2020.
1878 |
1879 | ## Version 1.1 ## {#version-1-1}
1880 |
1881 | Technical updates completed:
1882 |
1883 | 1. Added a section on encoder synchronization (issues #126 and #140)
1884 | 2. Added restriction for single segment per post or PUT (issue #112)
1885 | 3. Added text on encoder input loss (issue #113)
1886 | 4. Added guidance on the manifest formatting (issue #111)
1887 | 5. Added reference to MPEG-B part 18 for timed metadata track (issue #31)
1888 | 6. Clarified emsg time is leading (issue #129)
1889 | 7. Added the brand for the last segment (issue #114)
1890 | 8. Deprecated the usage of mfra to close the ingest (issue #124)
1891 | 9. Allowed common encryption of media tracks (issue #117)
1892 | 10. Added text on requesting segments from an alternative server (issue #119)
1893 | 11. Swapped priority preferred sample entry to hev1/avc3 (issue #115)
1894 | 12. Clarified SCTE-35 carriage (issues #128, #133, #130, #121 and #127)
1895 | 13. Added text for the prft box and made it a requirement (issue #116)
1896 | 14. Added guidelines for constant segment duration for timed metadata (issue #145)
1897 | 15. Added text on conversion of MPEG-2 TS to DASH timeline (issue #131)
1898 | 16. Added an informative section with example implementations (issue #147)
1899 | 17. Added additional requirements on the formatting of DASH MPD for CMAF ingest (issue #125 )
1900 | 18. Added additional requirements on the formatting of HTTP Live Streaming playlist (issue #148)
1901 | 19. Deprecated streams keyword in favor of manifest + SEgmentTEmplate signals (issue #125)
1902 |
1903 | Editorial updates completed:
1904 |
1905 | 1. Fixed capitalization errors, cross reference errors and some terms
1906 | 2. Updated the references
1907 | 3. Clarified POST_URL vs. publishing_point_URL
1908 | 4. Cleaned up the informative sections
1909 | 5. Updated the diagrams including the fixes
1910 | 6. Updated/simplified the text for the examples
1911 | 7. Fixed several references (including new/updated section numbers)
1912 | 8. Made text referring to CMAF less verbose
1913 | 9. Moved some of the common requirements of Interface 2 to general 1-2 requirements
1914 |
1915 | ## Version 1.2 ## {#version-1-2}
1916 |
1917 | Technical updates completed:
1918 |
1919 | 1. Added an identifier for the protocols
1920 | 2. Added an interface identifier for both interfaces
1921 |
1922 |
1923 | # Acknowledgements # {#contributors}
1924 |
1925 | We thank the contributors from the following companies for their comments and
1926 | support: Huawei, Akamai, BBC, CenturyLink, Microsoft, Unified Streaming,
1927 | Facebook, Hulu, Comcast, ITV, Qualcomm, Tencent, Samsung, MediaExcel, Harmonic,
1928 | Sony, Arris, Bitmovin, ATEME, EZDRM, DSR, Broadpeak and AWS Elemental.
1929 |
1930 | # URL References # {#url-references}
1931 |
1932 | fmp4git: Unified Streaming fmp4-ingest:
1933 | https://github.com/unifiedstreaming/fmp4-ingest
1934 |
1935 | aomid3: Carriage of ID3 Timed Metadata in the Common Media
1936 | Application Format (CMAF): https://aomediacodec.github.io/id3-emsg
1937 |
1938 | Mozilla-TLS: Mozilla Wiki Security/Server Side TLS:
1939 | https://wiki.mozilla.org/Security/Server_Side_TLS#Intermediate_compatibility_.28recommended.29
1940 |
1941 | MS-SSTR: Smooth Streaming Protocol:
1942 | https://msdn.microsoft.com/en-us/library/ff469518.aspx
1943 |
1944 | fmp4tools: fmp4 Ingest Tools:
1945 | https://github.com/unifiedstreaming/fmp4-ingest/tree/master/ingest-tools
1946 |
1947 | LiveCMAF: Tools for Live CMAF Ingest:
1948 | https://dl.acm.org/doi/abs/10.1145/3339825.3394933
1949 |
1950 | DASH-IFad: Advanced Ad Insertion in DASH (under community
1951 | review): https://dashif.org/docs/CR-Ad-Insertion-r4.pdf
1952 |
1953 | PythonFFmpegIngest: Python Script for Generating Interface-1 with
1954 | FFmpeg:
1955 | https://github.com/unifiedstreaming/live-demo-cmaf/blob/master/ffmpeg/entrypoint.py
1956 |
1957 |
1958 |