├── .gitignore ├── Historic ├── .gitignore ├── DASH-IF-Ingest.zip ├── jan3release.txt ├── liason ├── CR_v0.txt └── 00-CommunityReview.inc.md ├── setup.bat ├── .vs ├── ProjectSettings.json ├── slnx.sqlite ├── Ingest │ └── v16 │ │ └── .suo └── VSWorkspaceState.json ├── Images ├── DASH-IF.png ├── Example-1.png ├── Example-2.png ├── CMAF-Ingest.png ├── CMAF-Track.png ├── DASH-Ingest.png ├── Late-Binding.png ├── Splice-Ingest.png ├── CMAF-Track-Sync.png ├── Multiple-Sources.png ├── Redundant-Sources.png └── Ingest-Flow.wsd ├── setup.sh ├── README.md ├── .github └── workflows │ ├── build-pr.yml │ └── publish.yml ├── Diagrams └── Ingest-Flow.wsd └── DASH-IF-Ingest.bs.md /.gitignore: -------------------------------------------------------------------------------- 1 | dist/ 2 | -------------------------------------------------------------------------------- /Historic/.gitignore: -------------------------------------------------------------------------------- 1 | Output 2 | -------------------------------------------------------------------------------- /setup.bat: -------------------------------------------------------------------------------- 1 | @echo off 2 | docker pull dashif/specs-builder:latest -------------------------------------------------------------------------------- /.vs/ProjectSettings.json: -------------------------------------------------------------------------------- 1 | { 2 | "CurrentProjectSetting": null 3 | } -------------------------------------------------------------------------------- /.vs/slnx.sqlite: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/.vs/slnx.sqlite -------------------------------------------------------------------------------- /Images/DASH-IF.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/DASH-IF.png -------------------------------------------------------------------------------- /.vs/Ingest/v16/.suo: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/.vs/Ingest/v16/.suo -------------------------------------------------------------------------------- /Images/Example-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/Example-1.png -------------------------------------------------------------------------------- /Images/Example-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/Example-2.png -------------------------------------------------------------------------------- /Images/CMAF-Ingest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/CMAF-Ingest.png -------------------------------------------------------------------------------- /Images/CMAF-Track.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/CMAF-Track.png -------------------------------------------------------------------------------- /Images/DASH-Ingest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/DASH-Ingest.png -------------------------------------------------------------------------------- /Images/Late-Binding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/Late-Binding.png -------------------------------------------------------------------------------- /Images/Splice-Ingest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/Splice-Ingest.png -------------------------------------------------------------------------------- /Historic/DASH-IF-Ingest.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Historic/DASH-IF-Ingest.zip -------------------------------------------------------------------------------- /Images/CMAF-Track-Sync.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/CMAF-Track-Sync.png -------------------------------------------------------------------------------- /Images/Multiple-Sources.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/Multiple-Sources.png -------------------------------------------------------------------------------- /Images/Redundant-Sources.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Dash-Industry-Forum/Ingest/HEAD/Images/Redundant-Sources.png -------------------------------------------------------------------------------- /setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Pull the latest build image 4 | IMG=dashif/specs-builder:latest 5 | docker pull ${IMG} 6 | -------------------------------------------------------------------------------- /.vs/VSWorkspaceState.json: -------------------------------------------------------------------------------- 1 | { 2 | "ExpandedNodes": [ 3 | "" 4 | ], 5 | "SelectedNode": "\\C:\\Users\\tsto\\Source\\Repos\\Ingest", 6 | "PreviewInSolutionExplorer": false 7 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DASH-IF Ingest Specification 2 | 3 | See [Document Authoring Kit](https://dashif.org/DocumentAuthoring/) for details on document authoring process. 4 | 5 | # Output Documents 6 | 7 | Most recent output from the **master** branch: 8 | 9 | [![Build status of master branch](https://github.com/Dash-Industry-Forum/Ingest/actions/workflows/publish.yml/badge.svg?branch=master)](https://github.com/Dash-Industry-Forum/Ingest/actions/workflows/publish.yml) 10 | 11 | * [HTML document](http://dashif.org/Ingest/) 12 | * [PDF document](http://dashif.org/Ingest/Ingest.pdf) 13 | 14 | # Document authoring 15 | 16 | See [Document Authoring Kit](https://dashif.org/DASH-IF-IOP/authoring/) for details on document authoring process and the relevant tooling. 17 | -------------------------------------------------------------------------------- /.github/workflows/build-pr.yml: -------------------------------------------------------------------------------- 1 | name: Build Pull Request 2 | 3 | on: 4 | pull_request: 5 | branches: 6 | - master 7 | 8 | jobs: 9 | build: 10 | runs-on: ubuntu-latest 11 | container: 12 | image: ghcr.io/dash-industry-forum/dashif-specs:latest 13 | credentials: 14 | username: ${{ github.actor }} 15 | password: ${{ secrets.github_token }} 16 | 17 | steps: 18 | - uses: actions/checkout@v4 19 | - name: Build 20 | env: 21 | # Reset OPTS to empty to make sure we are not using 22 | # interactive mode in CI 23 | OPTS: 24 | run: make -f /tools/Makefile spec SRC=DASH-IF-Ingest.bs.md NAME=Ingest 25 | 26 | - name: Archive 27 | uses: actions/upload-artifact@v4 28 | with: 29 | name: dist 30 | path: dist/ 31 | -------------------------------------------------------------------------------- /Historic/jan3release.txt: -------------------------------------------------------------------------------- 1 | release jan 3 2 | #9 document title updated 3 | #12 track sync: references CMAF, half timescale accuracy 4 | #13 naming of profiles: CMAF ingest, DASH/HLS ingest 5 | #17 removed cenc encryption for profile 1 6 | #21 added note on handling of inband events 7 | #22 added a method to name tracks by adding an extension to POST_URL 8 | #23 same as 17 9 | #26 added extra text 10 | #27 added extra text 11 | #28 text on closing of the ingest added (both media and HTTP/TCP) 12 | #18 #15 en #14 adressed 13 | 14 | f2f 15 | referenced cmaf in the timed text subtitle section 16 | HTTP 1.1. TLS 1.2 added 17 | RFC references checked 18 | added text to support language changes by resending init segment 19 | sequence numbering added 20 | arrival/application time defined 21 | text on timeout on reconnect removed 22 | acknowledgement section 23 | several text rephrase based on feedback 24 | 25 | TODO: some more work on profile 2 can be added to in next revision 26 | -------------------------------------------------------------------------------- /Diagrams/Ingest-Flow.wsd: -------------------------------------------------------------------------------- 1 | @startuml 2 | ingest_source -> receiving_entity: Authentication request 3 | receiving_entity --> ingest_source: Authentication response 4 | 5 | ingest_source -> receiving_entity: POST CMAF header 6 | receiving_entity --> ingest_source: 200 OK 7 | 8 | ingest_source -> receiving_entity: POST CMAF segment 9 | receiving_entity --> ingest_source: 200 OK 10 | 11 | ingest_source -> receiving_entity: POST CMAF segment 12 | receiving_entity --> ingest_source: 200 OK 13 | 14 | ingest_source -> receiving_entity: POST CMAF segment 15 | receiving_entity --> ingest_source: 200 OK 16 | ingest_source -> receiving_entity: POST CMAF segment 17 | receiving_entity --> ingest_source: 200 OK 18 | ingest_source -> receiving_entity: POST CMAF segment 19 | receiving_entity --> ingest_source: 412 Precondition Failed error 20 | ingest_source -> receiving_entity: POST CMAF header 21 | receiving_entity --> ingest_source: 200 OK 22 | ingest_source -> receiving_entity: POST CMAF segment 23 | receiving_entity --> ingest_source: 200 OK 24 | ingest_source -> receiving_entity: POST CMAF segment with the lmsg brand 25 | receiving_entity --> ingest_source: 200 OK 26 | @enduml 27 | -------------------------------------------------------------------------------- /Images/Ingest-Flow.wsd: -------------------------------------------------------------------------------- 1 | @startuml 2 | ingest_source -> receiving_entity: Authentication request 3 | receiving_entity --> ingest_source: Authentication response 4 | 5 | ingest_source -> receiving_entity: POST CMAF header 6 | receiving_entity --> ingest_source: 200 OK 7 | 8 | ingest_source -> receiving_entity: POST CMAF segment 9 | receiving_entity --> ingest_source: 200 OK 10 | 11 | ingest_source -> receiving_entity: POST CMAF segment 12 | receiving_entity --> ingest_source: 200 OK 13 | 14 | ingest_source -> receiving_entity: POST CMAF segment 15 | receiving_entity --> ingest_source: 200 OK 16 | ingest_source -> receiving_entity: POST CMAF segment 17 | receiving_entity --> ingest_source: 200 OK 18 | ingest_source -> receiving_entity: POST CMAF segment 19 | receiving_entity --> ingest_source: 412 Precondition Failed error 20 | ingest_source -> receiving_entity: POST CMAF header 21 | receiving_entity --> ingest_source: 200 OK 22 | ingest_source -> receiving_entity: POST CMAF segment 23 | receiving_entity --> ingest_source: 200 OK 24 | ingest_source -> receiving_entity: POST CMAF segment with the lmsg brand 25 | receiving_entity --> ingest_source: 200 OK 26 | @enduml 27 | -------------------------------------------------------------------------------- /.github/workflows/publish.yml: -------------------------------------------------------------------------------- 1 | name: Publish 2 | 3 | on: 4 | push: 5 | branches: 6 | - master 7 | 8 | # Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages 9 | permissions: 10 | contents: read 11 | packages: read 12 | pages: write 13 | id-token: write 14 | 15 | jobs: 16 | build: 17 | runs-on: ubuntu-latest 18 | container: 19 | image: ghcr.io/dash-industry-forum/dashif-specs:latest 20 | credentials: 21 | username: ${{ github.actor }} 22 | password: ${{ secrets.github_token }} 23 | 24 | steps: 25 | - uses: actions/checkout@v4 26 | - name: Build 27 | env: 28 | # Reset OPTS to empty to make sure we are not using 29 | # interactive mode in CI 30 | OPTS: 31 | run: make -f /tools/Makefile spec SRC=DASH-IF-Ingest.bs.md NAME=Ingest 32 | 33 | - name: Archive 34 | uses: actions/upload-artifact@v4 35 | with: 36 | name: dist 37 | path: dist/ 38 | 39 | package: 40 | runs-on: ubuntu-latest 41 | needs: build 42 | steps: 43 | - uses: actions/download-artifact@v4 44 | with: 45 | name: dist 46 | path: dist 47 | - uses: actions/upload-pages-artifact@v3 48 | with: 49 | path: dist 50 | 51 | publish: 52 | runs-on: ubuntu-latest 53 | needs: package 54 | steps: 55 | - name: Deploy to GitHub Pages 56 | uses: actions/deploy-pages@v4 57 | -------------------------------------------------------------------------------- /Historic/liason: -------------------------------------------------------------------------------- 1 | DASH IF published a live media ingest protocol specification for community review. 2 | 3 | The aim of this new specification is to improve interoperability between server side entities, 4 | 5 | such as live ABR encoders, packagers and content delivery networks. It aims to ease 6 | 7 | this well known interoperability bottleneck in the industry. 8 | 9 | It envisions ingest/egress using Common Media Application Track Format (CMAF) using HTTP POST based transmission, 10 | 11 | and defines two interfaces for two main use cases. The first, CMAF or fMP4 based ingest can be used for 12 | 13 | example to ingest to active streaming origins and packagers, these entities can then do media 14 | processing of the content. The second interface implements a push based DASH 15 | 16 | protocol for ingest to a passive origin server and/or content delivery network that need not repackage the content. 17 | 18 | Particular attention was paid to support of timed metadata, such as SCTE-35 based splice points or program 19 | 20 | information in CMAF based tracks, and redundancy and failover support. The specification will help live 21 | 22 | encoders to implement a consistent output format, and downstream entities a consistent input format. 23 | 24 | The specification is available for public feedback up to the 31st of July, and available under 25 | https://dashif-documents.azurewebsites.net/Ingest/master/DASH-IF-Ingest.html. 26 | 27 | We think this specification is of particular interest to MPEG considering 28 | 29 | - interoperability between Media processing entities defined in MPEG Network based media processing 30 | - Technology under consideration in Common Media Application Track Format (CMAF), in particular the singalling 31 | of broadcast related metadata in fmp4 or cmaf track files, DASH-IF spec would benefit from further standardization 32 | of timed metadata tracks 33 | - DASH, a push based version of dash introduces some constraints to the manifest and push based protocol operation 34 | these modes might be of interest to the DASH group for technologies under consideration. 35 | -------------------------------------------------------------------------------- /Historic/CR_v0.txt: -------------------------------------------------------------------------------- 1 | 98 | -------------------------------------------------------------------------------- /Historic/00-CommunityReview.inc.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 33 | 34 |
4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |

CHANGE REQUEST

DASH-IF IOP

CR

rev

-

Current version:

V0.9

Status:

21 |

Draft

Internal Review

X

Community Review

Agreed

32 |
35 |
36 | 37 | 38 | 131 | 132 |
39 | 40 | 41 | 46 | 51 | 52 | 53 | 54 | 59 | 64 | 65 | 66 | 67 | 72 | 79 | 80 | 81 | 82 | 87 | 92 | 94 | 99 | 104 | 105 | 106 | 108 | 126 | 127 | 128 | 129 |
42 |

43 | Title: 44 |

45 |
47 |

48 | DASH-IF Specification of Live Media Ingest 49 |

50 |
55 |

56 | Source: 57 |

58 |
60 |

61 | DASH-IF IOP Ingest TF 62 |

63 |
68 |

69 | Supporting Companies: 70 |

71 |
73 |

74 | CenturyLink, Qualcomm Inc., MediaExcel, Harmonic, 75 | Bitmovin, Hulu, Microsoft, Unified Streaming, Akamai, Comcast, AWS Elemental, 76 | Sony, Tencent, <others> 77 |

78 |
83 |

84 | Category: 85 |

86 |
88 |

89 | A 90 |

91 |
93 | 95 |

96 | Date: 97 |

98 |
100 |

101 | 2019-06-14 102 |

103 |
107 | 109 |

110 | 111 | Use one of the following categories: 112 | 113 |
114 | C 115 |
116 | (correction) 117 |
118 | A 119 | (addition of feature) 120 |
121 | B 122 | (editorial modification) 123 |
124 |

125 |
130 |
133 |
134 | 135 | 136 | 207 | 208 |
137 | 138 | 139 | 144 | 150 | 151 | 152 | 153 | 158 | 165 | 166 | 167 | 168 | 173 | 178 | 179 | 180 | 181 | 186 | 191 | 192 | 193 | 194 | 199 | 204 | 205 |
140 |

141 | Reason for change: 142 |

143 |
145 |

146 | Improve interoperability between cloud and server side streaming entities. In particular, 147 | between ABR live encoders, origin servers and content delivery networks. 148 |

149 |
154 |

155 | Summary of change: 156 |

157 |
159 |

160 | This document specifies protocol interfaces for live ingest/egress of media content. 161 | It can be used between live ABR encoders, streaming origins, packagers and content delivery networks. 162 | It features support for redundant workflows with failover support and timed metadata. 163 |

164 |
169 |

170 | Consequences if not approved: 171 |

172 |
174 |

175 | Inconsistent implementations, Poor interoperability, less rich metadata and ad insertion support 176 |

177 |
182 |

183 | Sections affected: 184 |

185 |
187 |

188 | New Independent Document 189 |

190 |
195 |

196 | Other comments: 197 |

198 |
200 |

201 | Feedback during community review is welcome. 202 |

203 |
206 |
209 |
210 | 211 | 212 | 279 | 280 |
213 | 214 | 215 | 220 | 250 | 251 | 252 | 253 | 258 | 263 | 264 | 265 | 266 | 271 | 276 | 277 |
216 |

217 | Disclaimer: 218 |

219 |
221 |

222 | This document is not yet final. It is provided for public 223 | review until the deadline mentioned below. If you have 224 | comments on the document, please submit comments by one of 225 | the following means: 226 |

233 |

234 |

Please add a detailed description of the problem and the 235 | comment. 236 |

237 |

238 | Based on the received comments a final document will be 239 | published latest by the expected publication date below, 240 | integrated in a new version of DASH-IF IOP, if the following 241 | additional criteria are fulfilled: 242 |

    243 |
  • All comments from community review are addressed
  • 244 |
  • The relevant aspects for the Conformance Software are 245 | provided
  • 246 |
  • Verified IOP test vectors are provided
  • 247 |
248 |

249 |
254 |

255 | Commenting Deadline: 256 |

257 |
259 |

260 | July 31st, 2019 261 |

262 |
267 |

268 | Expected Publication: 269 |

270 |
272 |

273 | August 31st, 2019 274 |

275 |
278 |
281 | 282 | 283 | -------------------------------------------------------------------------------- /DASH-IF-Ingest.bs.md: -------------------------------------------------------------------------------- 1 | # Specification: Live Media Ingest # {#ingestspec} 2 | 3 | ## Abstract ## {#abstract} 4 | 5 | Two closely related protocol interfaces are defined: CMAF Ingest (Interface-1) 6 | based on fragmented MP4 and DASH/HLS Ingest (Interface-2) based on DASH and HLS. 7 | Both interfaces use the HTTP POST (or PUT) method to transmit media objects from 8 | an ingest source to a receiving entity. Smart implementations can implement 9 | and support both at the same time. These interfaces support carriage of 10 | audiovisual media, timed metadata and timed text. Examples of workflows using 11 | these interfaces are provided. In addition, guidelines for synchronization of 12 | multiple ingest sources, redundancy and failover are presented. 13 | 14 | The current version of the protocol is 1.2. 15 | 16 | ## Copyright Notice and Disclaimer ## {#copyrights} 17 | 18 | Review these documents carefully as they describe your rights and restrictions 19 | with respect to this document. Code Components extracted from this document must 20 | include Simplified BSD License text as described in Section 4.e of the Trust 21 | Legal Provisions and are provided without warranty as described in the 22 | Simplified BSD License. 23 | 24 | This is a document made available by DASH-IF. The technology embodied in this 25 | document may involve the use of intellectual property rights, including patents 26 | and patent applications owned or controlled by any of the authors or developers 27 | of this document. No patent license, either implied or express, is granted to 28 | you by this document. DASH-IF has made no search or investigation for such 29 | rights and DASH-IF disclaims any duty to do so. The rights and obligations which 30 | apply to DASH-IF documents, as such rights and obligations are set forth and 31 | defined in the DASH-IF Bylaws and IPR Policy including, but not limited to, 32 | patent and other intellectual property license rights and obligations. A copy of 33 | the DASH-IF Bylaws and IPR Policy can be obtained at http://dashif.org/. 34 | 35 | The material contained herein is provided on an AS IS basis. The authors and 36 | developers of this material and DASH-IF hereby disclaim all other warranties and 37 | conditions, either express, implied or statutory, including, but not limited to, 38 | any (if any) implied warranties, duties or conditions of merchantability, of 39 | fitness for a particular purpose, of accuracy or completeness of responses, of 40 | workmanlike effort, and of lack of negligence. In addition, this document may 41 | include references to documents and/or technologies controlled by third parties. 42 | Those third party documents and technologies may be subject to third party rules 43 | and licensing terms. No intellectual property license, either implied or 44 | express, to any third party material is granted to you by this document or 45 | DASH-IF. DASH-IF makes no warranty whatsoever for such third party material. 46 | 47 | # Introduction # {#introduction} 48 | 49 | The main goal of this specification is to define the interoperability points 50 | between an [=ingest source=] and a [=receiving entity=] that typically reside in 51 | the cloud or network. This specification does not impose any new constraints or 52 | requirements to clients that consume media streams. 53 | 54 | Live media ingest happens between an [=ingest source=] such as a 55 | [=live encoder=] and a [=receiving entity=]. The [=receiving entity=] could be a 56 | media packager, streaming origin or a content delivery network (CDN) or another 57 | cloud media service. The 58 | combination of ingest sources and receiving entities is common in practical 59 | video streaming deployments, where media processing functionality is distributed 60 | between the ingest sources and receiving entities. Nevertheless, in such 61 | deployments, interoperability can sometimes be challenging. 62 | This challenge comes from the fact that 63 | there are multiple levels of interoperability to be considered and vendors may 64 | have a different view of what is expected/preferred as well as how various 65 | technical specifications apply. First of all, the choice for the data 66 | transmission protocol, and connection establishing and tearing down are 67 | important. Handling premature/unexpected disconnects and recovering from 68 | failovers are also critical. 69 | 70 | A second level of interoperability lies with the media container and coded media 71 | formats. MPEG defined several media container formats such as [[!ISOBMFF]] and 72 | [[!MPEG2TS]], which are widely adopted and well supported. However, these are 73 | general purpose formats, targeting several different application areas. To do 74 | so, they provide many different profiles and options. Interoperability 75 | is often achieved through other application standards such as those for 76 | broadcast, storage or streaming. For interoperable live media ingest, this 77 | document provides guidance on how to use [[!ISOBMFF]] and [[!MPEGCMAF]] for 78 | formatting the media content. 79 | 80 | A third level of interoperability lies in the way metadata is inserted in 81 | streams. Live content often needs such metadata to signal opportunities for ad 82 | insertion, program information or other attributes like timed graphics or 83 | general information relating to the broadcast. Examples of such metadata formats 84 | include [[!SCTE35]] markers, which are often found in broadcast streams and 85 | other metadata such as ID3 tags [[!ID3v2]] containing information relating to 86 | the media presentation. In fact, many more types of metadata relating to the 87 | live event might be ingested and passed on to an over-the-top (OTT) streaming 88 | workflow. 89 | 90 | Fourth, for live media, handling the timeline of the presentation consistently 91 | is important. This includes sampling of the media, avoiding timeline 92 | discontinuities and synchronizing timestamps attached by different ingest 93 | sources such as audio and video. In addition, media timeline discontinuities 94 | must be avoided as much as possible during normal operation. Further, when using 95 | redundant ingest sources, the ingested streams must be synchronized in a sample 96 | accurate manner. 97 | 98 | Fifth, in practice multiple ingest sources and receiving entities are often 99 | used. This requires that multiple ingest sources and receiving entities work 100 | together in a redundant workflow to avoid interruptions when some of the 101 | components fail. Well defined failover behavior is important for 102 | interoperability. 103 | 104 | This document provides a specification for establishing these interoperability 105 | points. The approaches are based on known standardized technologies that have 106 | been tested and deployed in several large-scale streaming deployments. 107 | 108 | To address these interoperability points, two different interfaces and their protocol 109 | specifications have been developed. The first interface (CMAF Ingest) mainly 110 | functions as an ingest format to a packager or active media processor, while the 111 | second interface (DASH/HLS Ingest) works mainly to ingest media presentations to 112 | an origin server, cloud storage or CDN. Smart implementations can implement 113 | both interfaces at once. With CMAF being used increasingly by both DASH and HLS in 114 | practice this would be a preferred implementation option. 115 | 116 | [[#workflows]] provides more background and motivation for the two interfaces. 117 | We further motivate the specification in this document supporting HTTP/1.1 118 | [[!rfc9112]] and [[!ISOBMFF]]. 119 | 120 | The document is structured as follows: Section 3 presents the conventions and 121 | terminology used throughout this document. Section 4 presents the use cases and 122 | workflows related to media ingest and the two interfaces. Section 5 lists the 123 | common requirements for both interfaces. Sections 6 and 7 detail Interface-1 and 124 | Interface-2, respectively. Sections 8 provides example workflows and Section 9 125 | shows example implementations. 126 | 127 | # Conventions and Terminology # {#conventions} 128 | 129 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", 130 | "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be 131 | interpreted as described in BCP 14, RFC 2119 [[RFC2119]]. 132 | 133 | The following terminology is used in the rest of this document: 134 | 135 | **ABR**: Adaptive bitrate. 136 | 137 | **CMAF chunk**: [=CMAF media object=] defined in 138 | [[!MPEGCMAF]] clause 7.3.2.3. 139 | 140 | **CMAF fragment**: [=CMAF media object=] defined in 141 | [[!MPEGCMAF]] clause 7.3.2.4. 142 | 143 | **CMAF header**: Defined in [[!MPEGCMAF]] clause 7.3.2.1. 144 | 145 | **CMAF Ingest**: Ingest interface defined in this 146 | specification for push-based [[!MPEGCMAF]]. 147 | 148 | **CMAF media object**: Defined in [[!MPEGCMAF]]: a CMAF chunk, 149 | segment, fragment or track. 150 | 151 | **CMAF presentation**: Logical grouping of CMAF tracks 152 | corresponding to a media presentation as defined in [[!MPEGCMAF]] clause 6. 153 | 154 | **CMAFstream**: Byte-stream that follows the CMAF track format 155 | structure format defined in [[!MPEGCMAF]] between the ingest source and 156 | receiving entity. Due to error control behavior such as retransmission of 157 | CMAF fragments and headers, a CMAFstream may not fully conform to a CMAF 158 | track file. The receiving entity can filter out retransmitted fragments and 159 | headers and restore a valid CMAF track from the CMAFstream. 160 | 161 | **CMAF track**: [=CMAF media object=] defined in 162 | [[!MPEGCMAF]] clause 7.3.2.2. 163 | 164 | **connection**: A connection setup between two hosts, 165 | typically the media [=ingest source=] and [=receiving entity=]. 166 | 167 | **DASH Ingest**: Ingest interface defined in this 168 | specification for push-based DASH. 169 | 170 | **HLS Ingest**: Ingest interface defined in this specification 171 | for push-based HLS. 172 | 173 | **HTTP POST**: HTTP command for sending data from a source to 174 | a destination. 175 | 176 | **HTTP PUT**: HTTP command for sending data from a source to 177 | a destination. 178 | 179 | **ingest source**: A media source ingesting live media content 180 | to a receiving entity. It is typically a [=live encoder=] but not restricted 181 | to this, e.g., it could be a stored media resource. 182 | 183 | **ingest stream**: The stream of media pushed from the ingest 184 | source to the receiving entity. 185 | 186 | **live stream session**: The entire live stream for the ingest 187 | relating to a broadcast event. 188 | 189 | **live encoder**: Entity performing live encoding of a high 190 | quality ingest stream. This can serve as an [=ingest source=]. 191 | 192 | **manifest objects**: Objects ingested that represent 193 | streaming manifest, e.g., .mpd in DASH and .m3u8 in HLS. 194 | 195 | **media objects**: Objects ingested that represent the media, 196 | timed text or other non-manifest objects. Typically, these are CMAF 197 | addressable media objects such as CMAF chunks, segments or tracks. 198 | 199 | **media fragment**: Media fragment, combination of 200 | MovieFragmentBox ("moof") and MediaDataBox ("mdat") in ISOBMFF structure. 201 | This could be a CMAF fragment or chunk. A media fragment may include 202 | top-level boxes defined in CMAF fragments such as "emsg", "prft" and "styp". 203 | Used for backward compatibility with fragmented MP4. 204 | 205 | **objects**: [=Manifest objects=] or [=media objects=]. 206 | 207 | **OTT**: Over-the-top. 208 | 209 | **POST_URL**: Target URL of a POST command in the HTTP 210 | protocol for posting data from a source to a destination (e.g., /ingest1). 211 | The POST_URL is known by both the ingest source and receiving entity. The 212 | POST_URL is setup by the receiving entity. The ingest source may add extended 213 | paths to signal track names, fragment names or segment names. 214 | 215 | **publishing_point_URL**: Entry point used to receive an 216 | [=ingest stream=] (e.g., https://example.com/ingest1). 217 | 218 | **receiving entity**: Entity used to receive the media 219 | content, receives/consumes an [=ingest stream=]. 220 | 221 | **RTP**: Real-time Transport Protocol as specified in 222 | [[!RFC3550]]. 223 | 224 | **streaming presentation**: Set of [=objects=] composing a 225 | streaming presentation based on a streaming protocol such as DASH. 226 | 227 | **switching set**: Group of tracks corresponding to a 228 | switching set defined in [[!MPEGCMAF]] or an adaptation set defined in 229 | [[!MPEGDASH]]. 230 | 231 | **switching set ID**: Identifier generated by a live ingest 232 | source to group CMAF tracks in a switching set. The switching set ID is 233 | unique for each switching set in a live stream session. 234 | 235 | **TCP**: Transmission Control Protocol (TCP) as specified in 236 | [[!RFC793]]. 237 | 238 | **baseMediaDecodeTime**: Decode time of the first sample in a movie fragment as 239 | signaled in the "[=tfdt=]" box. 240 | 241 | **elng**: The ExtendedLanguageTag box ("elng") as defined in 242 | [[!ISOBMFF]] overrides the language information. 243 | 244 | **ftyp**: The FileTypeBox ("ftyp") as defined in 245 | [[!ISOBMFF]]. 246 | 247 | **mdat**: The MediaDataBox ("mdat") defined in [[!ISOBMFF]]. 248 | 249 | **mdhd**: The MediaHeaderBox ("mdhd") as defined in 250 | [[!ISOBMFF]] contains information about the media such 251 | as timescale, duration, language using ISO 639-2/T [[!iso-639-2]] codes. 252 | 253 | **mfra (deprecated)**: The MovieFragmentRandomAccessBox 254 | ("mfra") defined in [[!ISOBMFF]] signals random access samples 255 | (these are samples that require no prior or other samples for decoding). 256 | 257 | **moof**: The MovieFragmentBox ("moof") as defined in 258 | [[!ISOBMFF]]. 259 | 260 | **nmhd**: The NullMediaHeaderBox ("nmhd") as defined in 261 | [[!ISOBMFF]] signals a track for which no specific media header is defined. 262 | This is used for metadata tracks. 263 | 264 | **prft**: The ProducerReferenceTime ("prft") as defined in 265 | [[!ISOBMFF]] supplies times corresponding to the production of associated 266 | movie fragments. 267 | 268 | **tfdt**: The TrackFragmentBaseMediaDecodeTimeBox ("tfdt") 269 | defined in [[!ISOBMFF]] signals the decode time of the first sample in the 270 | movie fragment. 271 | 272 | # Media Ingest Workflows and Interfaces (Informative) # {#workflows} 273 | 274 | Two workflows have been identified mapping to two protocol interfaces. The first 275 | workflow uses a [=live encoder=] as the [=ingest source=] and a separate 276 | packager as the [=receiving entity=]. In this case, Interface-1 277 | ([=CMAF Ingest=]) is used to ingest a live encoded stream to the packager, which 278 | can perform packaging, encryption or other active media processing. Interface-1 279 | is defined in a way that it will be possible to generate DASH or HLS 280 | presentations based on information in the ingested stream. Figure 1 shows an 281 | example for Interface-1. In many cases a common implementation is possible. 282 | 283 | Figure 1: Example with [=CMAF Ingest=]. 284 |
285 | 286 | The second workflow constitutes ingest to a passive delivery system such as a 287 | cloud storage or a CDN. In this case, Interface-2 ([=DASH Ingest=] or 288 | [=HLS Ingest=]) is used to ingest a stream already formatted to be ready for 289 | delivery to an end client. Figure 2 shows an example for Interface-2. 290 | 291 | Figure 2: Example with [=DASH Ingest=]. 292 |
293 | 294 | A legacy example of a media ingest protocol for the first workflow is the ingest 295 | part of the Microsoft Smooth Streaming protocol [[=MS-SSTR=]]. Interface-1 ([=CMAF Ingest=], 296 | detailed in [[#interface-1]]) improves the Smooth Streaming's ingest protocol 297 | including lessons learned over the last ten years after the initial deployment of 298 | Smooth Streaming in 2009 and several advances on signaling metadata and timed text. 299 | In addition, it includes support for next-generation media codecs such as [[!MPEGHEVC]] 300 | and protocols like DASH [[!MPEGDASH]] by adding explicit support for MPEG-DASH Media presentation description. 301 | 302 | Interface-2 (DASH/HLS Ingest) is included for ingest of media streaming 303 | presentations to a passive receiving entity that provides a pass-through 304 | functionality. In this case, [=manifest objects=] and other client-specific 305 | information also need to be ingested and updated, and segments may be deleted. 306 | 307 | Combining the two interfaces can be considered in many cases. 308 | An example of this is given at the end of the document in [[#examples]]. 309 | 310 | Table 1 highlights some of the key differences and practical considerations of 311 | the interfaces. In Interface-1, the ingest source can be simple since the 312 | [=receiving entity=] can do many of the operations related to the delivery such 313 | as encryption or generating the streaming manifests. In addition, the 314 | distribution of functionalities can make it easier to scale a deployment with 315 | concurrent (redundant) live media sources and receiving entities. Besides these 316 | factors, choosing a workflow for a video streaming platform depends on many 317 | other factors. 318 | 319 | Table 1: Different ingest use cases. 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 |
InterfaceIngest sourceReceiving entity
CMAF IngestLimited overview, simpler encoder, multiple sources Re-encryption, transcoding, stitching, watermarking, packaging
DASH/HLS IngestGlobal overview, targets duplicate presentations, limited flexibility, no redundancyManifest manipulation, transmission, storage
338 | 339 | Figure 3: Workflow with redundant ingest sources and receiving entities. 340 |
341 | 342 | Finally, Figure 3 highlights another aspect that was taken into consideration 343 | for large-scale systems with many users. Often content owners would like to run 344 | multiple ingest sources, multiple receiving entities and make them available to 345 | the clients in a seamless fashion. This approach is already common when serving 346 | web pages, and this architecture also applies to media streaming over HTTP. In 347 | Figure 3, it is highlighted how one or more ingest sources can be sending data 348 | to one or more receiving entities. In such a workflow, it is important to handle 349 | the case when one ingest source or receiving entity fails and synchronization. 350 | Both the system and client behavior are an important consideration in systems 351 | that need to run 24/7. Failovers must be handled robustly and without causing 352 | service interruption. This specification details how this failover and 353 | redundancy support can be achieved and provides recommendations for dual 354 | encoder synchronisation. 355 | 356 | # Common Requirements for Interface-1 and Interface-2 # {#interface-1-2} 357 | 358 | The media ingest follows the following common requirements for both interfaces. 359 | 360 | ## Ingest Source Identification ## {#interface-1-2-user-agent} 361 | 362 | - The [=ingest source=] SHOULD include a User-Agent header (which provides 363 | information about brand name, version number and build number in a readable 364 | format) in all allowed HTTP messages. The receiving entity can log the 365 | received information along with other relevant HTTP header data to 366 | facilitate troubleshooting. The version number of the current version is 367 | DASH-IF-Ingest 1.1, thus header name may be DASH-IF-Ingest and value may be 1.1 368 | 369 | ## General Requirements ## {#interface-1-2-general} 370 | 371 | 1. The [=ingest source=] SHALL communicate using the [=HTTP POST=] or [=HTTP PUT=] as 372 | defined in the HTTP protocol, version 1.1 [[!rfc9112]]. 373 | 374 | NOTE: This specification does not imply any functional differentiation 375 | between a POST and PUT command. Either may be used to transfer content to 376 | the [=receiving entity=]. Unless indicated otherwise, the use of the term 377 | POST can be interpreted as POST or PUT. 378 | 379 | 2. The [=ingest source=] SHOULD use HTTP over TLS, if TLS is used it SHALL 380 | support at least TLS version 1.2, a higher version may also be supported 381 | additionally [[!rfc9110]]. 382 | 3. The [=ingest source=] SHOULD use a domain name system for resolving 383 | hostnames to IP addresses such as DNS [[!RFC1035]] or any other system 384 | that is in place. If this is not the case, the domain name<->IP address 385 | mapping(s) MUST be known and static. 386 | 4. In the case of 3, [=ingest source=] MUST update the IP to hostname 387 | resolution respecting the TTL (time-to-live) from DNS query responses. 388 | This enables better resilience to IP address changes in large-scale 389 | deployments where the IP address of the media processing entities may 390 | change frequently. 391 | 5. In case HTTP over TLS [[!rfc9110]] is used, at least one of the basic 392 | authentication HTTP AUTH [[!RFC7617]], TLS client certificates or HTTP 393 | Digest authentication [[!RFC7616]] MUST be supported. 394 | 6. Mutual authentication SHALL be supported. TLS client certificates SHALL 395 | chain to a trusted CA or be self-signed. Self-signed certificates MAY be 396 | used, for example, when the ingest source and receiving entity fall under 397 | the same administration. 398 | 7. As compatibility profile for the TLS encryption, the [=ingest source=] 399 | SHOULD support the Mozilla's intermediate compatibility profile 400 | [[=Mozilla-TLS=]]. 401 | 8. In case of an authentication error confirmed by an HTTP 403 response, the 402 | ingest source SHALL retry to establish the [=connection=] within a fixed 403 | time period with updated authentication credentials. When that also 404 | results in error, the [=ingest source=] can retry N times, after which the 405 | [=ingest source=] SHOULD stop and log an error. The number of retries N 406 | can be configurable in the [=ingest source=]. 407 | 9. The [=ingest source=] SHOULD terminate the [=HTTP POST=] or [=HTTP PUT=] request if data 408 | is not being sent at a rate commensurate with the MP4 fragment duration. 409 | An [=HTTP POST=] or [=HTTP PUT=] command that does not send data can prevent the 410 | [=receiving entity=] from quickly disconnecting from the 411 | [=ingest source=] in the event of a service update. 412 | 10. The HTTP request for sparse data SHOULD be short-lived, terminating as soon 413 | as the data of a fragment is sent. 414 | 11. The HTTP request uses the [=publishing_point_URL=] at the 415 | [=receiving entity=] and SHOULD use an additional relative path when 416 | posting different streams and fragments, for example, to signal the 417 | stream or fragment name. 418 | 12. Both the [=ingest source=] and [=receiving entity=] MUST support IPv4 and 419 | IPv6 transport. 420 | 13. The [=ingest source=] and [=receiving entity=] SHOULD support gzip based 421 | content encoding. 422 | 14. The response from the [=receiving entity=] may, in addition to response code, 423 | return information in the response body, such as relating to the transfer time, 424 | size etc. of the last HTTP request, especially in case this request was in HTTP chunked 425 | transfer mode. But no specific response format is defined at this time, but this 426 | may be considered in future revisions. 427 | NOTE: More specific response body formatting may be defined in future revisions, 428 | input from implementors is welcome. 429 | 15. The ingest source MUST support the configuration and use of Fully 430 | Qualified Domain Names (per RFC 8499) to identify the receiving entity. 431 | 16. The ingest source MUST support the configuration of the path, which it 432 | will POST all the objects to. 433 | 17. The ingest source SHOULD support the configuration of the delivery path 434 | that the receiving entity will use to retrieve the content. When provided, 435 | the ingest source MUST use this path to build absolute URLs in the 436 | manifest files it generates. When absent, use of relative paths is assumed 437 | and the ingest source MUST build the manifest files accordingly. 438 | 18. The ingest source MUST transfer [=media objects=] and 439 | [=manifest objects=] to the receiving entity via individual HTTP/1.1 POST 440 | commands to the configured path. 441 | 19. To avoid delay associated with the TCP handshake, the ingest source SHOULD 442 | use persistent TCP connections. 443 | 20. To avoid head of line blocking, the ingest source SHOULD use multiple 444 | parallel TCP connections to transfer the streaming presentation that it is 445 | generating. For example, the ingest source SHOULD POST each representation 446 | (e.g., CMAF track) in a media presentation over a different TCP 447 | connection. 448 | 21. The ingest source SHOULD use the chunked transfer encoding option for the 449 | HTTP requests when the content length of the request is unknown at the 450 | start of transmission or to support the low-latency use cases. 451 | 452 | ## Failure Behaviors ## {#interface-1-2-failure} 453 | 454 | 1. The [=ingest source=] SHOULD use a timeout in the order of a segment 455 | duration (e.g., 1-6 seconds) for establishing the TCP connection. If an 456 | attempt to establish the connection takes longer than the timeout, the 457 | ingest source aborts the operation and tries again. 458 | 2. The [=ingest source=] SHOULD resend the [=objects=] for which a connection 459 | was terminated early or when an HTTP 400 or 403 error response was 460 | received if the connection was down for less than three average segments 461 | durations. For connections that were down longer, the [=ingest source=] 462 | can resume sending [=objects=] at the live edge of the media presentation. 463 | 3. After a TCP error, the [=ingest source=] performs the following: 464 | 465 | 3a. The current connection MUST be closed and a new connection MUST be 466 | created for a new [=HTTP POST=] or [=HTTP PUT=] request. 467 | 468 | 3b. The new HTTP [=POST_URL=] MUST be the same as the initial 469 | [=POST_URL=] for the object to be ingested. 470 | 471 | 4. In case the [=receiving entity=] cannot process the HTTP request due to 472 | authentication or permission problems, or incorrect path, it SHALL return 473 | an HTTP 403 Forbidden error. 474 | 5. The following error conditions apply to the receiving entity: 475 | 476 | 5a. If the [=publishing_point_URL=] receiving the HTTP request is not 477 | available, it SHOULD return an HTTP 404 Not Found error to the 478 | [=ingest source=]. 479 | 480 | 5b. If the receiving entity can process a fragment in the HTTP request 481 | body but finds the media type is not supported, it may return an HTTP 415 482 | Unsupported Media Type error. 483 | 484 | 5c. If the receiving entity cannot process a fragment in the POST request 485 | body due to missing or incorrect initialization fragment, it may return an 486 | HTTP 412 Precondition Failed error. 487 | 488 | 5d. If there is an error at the receiving entity not particularly relating 489 | to the request from the [=ingest source=], it may return an 490 | appropriate HTTP 5xx error. 491 | 492 | 5e. In all other scenarios, the receiving entity MUST return an HTTP 400 493 | Bad Request error. 494 | 495 | 6. The [=ingest source=] SHOULD support the handling of HTTP 30x redirect 496 | responses from the receiving entity. 497 | 498 | ## Identifier ## {#interface-1-2-identifier} 499 | 500 | The interfaces described in this document (clauses [[#interface-1]] and [[#interface-2]]) are identified with the following identifier: 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | 511 | 512 | 513 | 514 | 515 |
IdentiferReferenceSectionsComments
http://dashif.org/ingest/v1.2http://dashif.org/ingest/v1.2Clause [[#interface-1]] and [[#interface-2]]Conforming to the requirements of this document
516 | 517 | The above identifier may be used by an entity to signal the support of interfaces defined in clause [[#interface-1]] and [[#interface-2]]. 518 | 519 | 520 | # Interface-1: CMAF Ingest # {#interface-1} 521 | 522 | This section describes the protocol behavior specific to Interface-1. Operation 523 | of this interface MUST also adhere to the common requirements given in [[#interface-1-2]]. 524 | 525 | ## General Considerations (Informative) ## {#interface-1-general} 526 | 527 | The media format is conforming to the track constraints specified 528 | in [[!MPEGCMAF]] clause 7. Note that no CMAF media profile is 529 | needed by this specification unless stated otherwise; only the structural format 530 | based on [[!MPEGCMAF]] clause 7 is used. Supporting CMAF media profiles is optional. 531 | 532 | [=CMAF Ingest=] can also be used for simple transport of 533 | media to an archive, as the combination of CMAF header and CMAF fragments will 534 | result in a valid archived CMAF track file when an ingest is stored on disk by 535 | the receiving entity. 536 | 537 | [=CMAF Ingest=] improves over Smooth Streaming's ingest 538 | protocol [[=MS-SSTR=]] by only using standardized media container formats and 539 | boxes based on [[!ISOBMFF]] and [[!MPEGCMAF]] instead of specific UUID boxes. 540 | 541 | Many new technologies like MPEG HEVC, AV1, HDR have CMAF bindings. Using CMAF 542 | will make it easier to adopt such technologies. 543 | 544 | Some discussions on the early development of the specification have been documented in [[=fmp4git=]]. 545 | 546 | Figure 4: CMAF Ingest with multiple ingest sources. 547 |
548 | 549 | Figures 5-7 detail some of the concepts and structures defined in 550 | [[!MPEGCMAF]]. Figure 5 shows the data format structure of the [=CMAF track=]. 551 | In this format, media samples and media indexes are interleaved. The 552 | MovieFragmentBox "[=moof=]" box as specified in [[!ISOBMFF]] is used to signal 553 | the information to playback and decode properties of the samples stored in the 554 | "[=mdat=]" box. The CMAF header contains the track specific information and is 555 | referred to as a [=CMAF header=] in [[!MPEGCMAF]]. The combination of 556 | "[=moof=]" and "[=mdat=]" can be referred as a [=CMAF fragment=] or 557 | [=CMAF chunk=] depending on the structure content and the number of moof-mdat 558 | pairs in the addressable object. 559 | 560 | Figure 5: CMAF track stream. 561 |
562 | 563 | Figure 6 illustrates the presentation timing model, defined in [[!MPEGCMAF]] 564 | clause 6.6. Different bit-rate tracks and/or media streams are conveyed in 565 | separate CMAF tracks. By having fragment boundaries time aligned for tracks and 566 | applying constraints on tracks, seamless switching can be achieved. By using a 567 | common timeline different streams can be synchronized at the receiver, while 568 | they are in a separate [=CMAF track=], sent over a separate connection, possibly 569 | from a different [=ingest source=]. 570 | 571 | For more information on the synchronization model, we refer the readers to 572 | Section 6 of [[!MPEGCMAF]]. For synchronization of tracks coming from different 573 | encoders, sample-time accuracy is required, i.e., the samples with identical 574 | timestamp contain identical content. 575 | 576 | In Figure 7, another advantage of this synchronization model is illustrated, 577 | which is the concept of late binding. In the case of late binding, streams are 578 | combined on playout/streaming in a presentation (see Section 7.3.6 of 579 | [[!MPEGCMAF]]). 580 | 581 | NOTE: As defined in [[!MPEGCMAF]], different CMAF tracks have the same starting 582 | time sharing an implicit timeline. A stream becoming available from a different 583 | source needs to be synchronized and time-aligned with other streams. 584 | 585 | Figure 6: CMAF track synchronization. 586 |
587 | 588 | Figure 7: CMAF late binding. 589 |
590 | 591 | Figure 8 shows the flow diagram of the protocol. It starts with a DNS resolution 592 | (if needed) and an authentication step (using two-factor authentication, TLS 593 | certificates or HTTP Digest Authentication) to establish a secure [=TCP=] 594 | connection. 595 | 596 | In private datacenter deployments where nodes are not reachable from outside, a 597 | non-authenticated connection may also be used. The ingest source then issues an 598 | [=HTTP POST=] or [=HTTP PUT=] request to test that the [=receiving entity=] is listening. 599 | This request include the [=CMAF header=] or could be empty. In case the test is successful, 600 | it is followed by the CMAF header and fragments composing the [=CMAFstream=]. At 601 | the end of the session, the source may send an empty [=mfra (deprecated)=] box 602 | or a segment with the *lmsg* brand. Then, the 603 | [=ingest source=] can follow up by closing the TCP connection using a TCP FIN 604 | packet. 605 | 606 | NOTE: If the HTTP POST is using the chunked transfer encoding option, the 607 | [=ingest source=] sends a zero-length terminating chunk per [[!rfc9112]] after 608 | sending the *lmsg* brand letting the [=receiving entity=] know that the POST 609 | command has been concluded. 610 | 611 | Figure 8: CMAF Ingest flow. 612 |
613 | 614 | ## General Protocol, Manifest and Track Format Requirements ## {#interface-1-requirements} 615 | 616 | The ingest source transmits media content to the receiving entity using HTTP 617 | POST or PUT. The receiving entity listens for content at the [=publishing_point_URL=] 618 | that is known by both the ingest source and receiving entity. The [=POST_URL=] 619 | may contain an extended path to identify the stream name, switching set or 620 | fragment may be added by the ingest source. It is assumed that the ingest source 621 | can retrieve these paths and use them. 622 | 623 | In Interface-1, the container format is based on CMAF, conforming to the track 624 | constraints specified in [[!MPEGCMAF]] clause 7. Unless stated otherwise, no 625 | conformance to a specific CMAF media profile is REQUIRED. 626 | 627 | 1. The ingest source SHALL start by an [=HTTP POST=] or [=HTTP PUT=] request with the CMAF 628 | header, or an empty request, to the POST_URL. This can help the ingest 629 | source quickly detect whether the [=publishing_point_URL=] is valid, and 630 | if there are any authentication or other conditions required. 631 | 2. The ingest source MUST initiate a media ingest connection by posting at 632 | least one CMAF header after step 1 for each track. Before doing so, 633 | it SHOULD post a DASH manifest with a file name extension .mpd 634 | to the [=publishing_point_URL=] without an additional relative path 635 | but the manifest filename and in addition following clause 16 of this section. 636 | If not the case, the grouping of the CMAF tracks 637 | is trivial and the Streams() keyword is used to identify CMAF tracks. 638 | 3. The ingest source SHALL transmit one or more CMAF segments composing the 639 | track to the receiving entity once they become available. In this case, a 640 | single HTTP POST or PUT request message body MUST contain one CMAF segment. 641 | 4. The ingest source MAY use the chunked transfer encoding option of the HTTP 642 | POST command [[!rfc9112]] when the content length is unknown at the start 643 | of transmission or to support use cases that require low latency. 644 | 5. If the HTTP request terminates or times out with a TCP error, the 645 | ingest source MUST establish a new connection and follow the preceding 646 | requirements. Additionally, the ingest source MAY resend the segment in 647 | which the timeout or TCP error occurred. 648 | 6. The ingest source MUST handle any error responses received from the 649 | receiving entity, as described in general requirements, and by 650 | retransmitting the [=CMAF header=]. 651 | 7. *(deprecated)* In case the [=live stream session=] is over the ingest 652 | source MAY signal the stop by transmitting an empty [=mfra (deprecated)=] 653 | box towards the receiving entity. After that it SHALL send an empty HTTP 654 | chunk and wait for the HTTP response before closing TCP connection. 655 | 8. The ingest source SHOULD use a separate parallel TCP connection for ingest 656 | of each different CMAF track. 657 | 9. The ingest source MAY use a separate relative path in the [=POST_URL=] for 658 | ingesting each different track or track segment by appending it to the 659 | [=POST_URL=]. This makes it easy to detect redundant streams from 660 | different ingest sources. Specific naming convention of the segments and 661 | paths can be derived from the MPEG-DASH manifest, SegmentTemplate@media and 662 | @initialization. If not, the Streams(stream_name) keyword (deprecated) 663 | shall be used to signal the name of the cmaf track representation. 664 | 10. The [=baseMediaDecodeTime=] timestamps in "tfdt" of fragments in the 665 | [=CMAFstream=] SHOULD arrive in increasing order for each of the 666 | fragments in the different tracks/streams that are ingested. 667 | 11. The fragment sequence numbers in the [=CMAFstream=] signaled in the 668 | "mfhd" box SHOULD arrive in increasing order for each of the different 669 | tracks/streams that are ingested. Using both [=baseMediaDecodeTime=] and 670 | sequence number based indexing helps the receiving entities identify 671 | discontinuities. In this case sequence numbers SHOULD increase by one. 672 | 12. The average and maximum bitrate of each track SHOULD be signaled in the 673 | "btrt" box in the sample entry of the CMAF header. These can be used to 674 | signal the bitrate later on, such as in the manifest. 675 | 13. In case a track is part of a [=switching set=], all properties in 676 | Sections 6.4 and 7.3.4 of [[!MPEGCMAF]] MUST be satisfied, enabling the 677 | receiver to group the tracks in the respective switching sets. 678 | 14. Ingested tracks MUST conform to CMAF track structure defined in 679 | [[!MPEGCMAF]]. Additional constraints on the CMAF track structure are 680 | defined in later sections for specific media types. 681 | 15. CMAF tracks MAY use SegmentTypeBox to signal brands like chunk, fragment 682 | or segment. Such signaling may also be inserted in a later stage by the 683 | receiving entity. 684 | 16. The MPEG-DASH manifest shall use SegmentTemplate in each AdaptationSet 685 | (or in each contained Representation). 686 | - a. The SegmentTemplate@initiatization in the MPEG-DASH manifest 687 | shall contain the single substring $RepresentationID$ and the 688 | SegmentTempate@media shall contain the single substring $RepresentationID$ and 689 | the substring $Number$ or $Time$ (not both). For best interoperability, a separator 690 | character should be between representation substrings that is not an integer, 691 | this is especially important in case the $RepresentationID$ substitution 692 | ends with a number character. 693 | - b. SegmentTemplate@media shall be identical for each 694 | SegmentTemplate Element in the MPEG-DASH manifest. 695 | - c. SegmentTemplate@initialization shall be identical for each 696 | SegmentTemplate Element in the MPEG-DASH manifest. 697 | - d. The BaseURL element shall be absent. 698 | - e. The AvailabilityStartTime SHOULD be set to 1970-01-01T00:00:00Z (Unix epoch) 699 | and the period @start to PT0S (if this is not the case it may be more difficult to 700 | synchronize more than one ingest source). 701 | - f. Each Representation in the MPEG-DASH manifest represents a CMAF track, 702 | each AdaptationSet in the MPD represents a CMAF SwitchingSet. 703 | - g. In case an ingest source issues an HTTP Request with an updated MPEG-DASH 704 | manifest, identical naming conventions apply. A receiver may ignore such updated MPD 705 | send by an ingest source. 706 | - h. The MPEG-DASH manifest shall contain a single Period Element. 707 | 17. The Ingest source may send an HTTP Live Streaming manifest, but its structure 708 | and naming shall be derived from or matching the MPEG-DASH manifest 709 | described in clause 16 above. In particular: 710 | - a. In a master playlist, the groupings identified represent CMAF Switching sets 711 | For media playlists named X.m3u8, X shall match the name of the corresponding Representation@id. 712 | - b. The segment URI announced in media playlists shall follow a structure that can be derived using 713 | the SegmentTemplate@media from the MPEG-DASH manifest. 714 | - c. The EXT-X-MAP URI attribute in media playlists shall follow a naming structure 715 | that can be derived using a SegmentTemplate@initialization from the MPEG-DASH manifest. 716 | - d. A receiver may ignore EXT-X-DATE-RANGE tags in the manifest, 717 | timed metadata shall be caried as described in the section on timed metadata 718 | [[#interface-1-timed-metadata]]. 719 | - e. A receiver may ignore updated HTTP Live Streaming manifests. 720 | 721 | 18. In case the ingest source loses its own input or input is absent, it 722 | SHALL insert filler or replacement content, and output these as valid 723 | CMAF segments. Examples may be black frames, silent audio, or empty timed 724 | text segments. Such segments SHOULD be labelled by using a SegmentTypeBox 725 | ("styp") with the *slat* brand. This allows a receiver to still replace 726 | those segments with valid content segments at a later time. 727 | 19. The last segment in a CMAF track, SHOULD be labelled with a 728 | SegmentTypeBox ("styp") with the *lmsg* brand. This way, the receiver 729 | knows that no more media segments are expected for this track. In case 730 | the track is restarted, a request with a [=CMAF header=] with (identical 731 | properties) must be issued to the same [=POST_URL=]. 732 | 20. CMAF segments may include one or more DASHEventMessageBox'es ("emsg") 733 | containing timed metadata. 734 | 735 | NOTE: According to [[!MPEGDASH]], all DASHEventMessageBox'es ("emsg") 736 | must have a presentation_time later as compared to the segment's earliest 737 | presentation time. This can make re-signaling of continuation events 738 | (events that are still active) troublesome (this is fixed in MPEG-DASH 5th edition). 739 | 740 | NOTE: Including DASHEventMessageBox'es ("emsg") boxes in media segments 741 | may result in a loss of performance for just-in-time (re-)packaging. In this 742 | case, timed metadata [[#interface-1-timed-metadata]] should be 743 | considered. 744 | 745 | 20. CMAF media (audio and video) tracks SHALL include the 746 | ProducerReferenceTimeBox'es ("[=prft=]") in the ingest. In these media 747 | tracks, all segments SHALL include a "[=prft=]" box. The "[=prft=]" box 748 | permits the end client to compute the end-to-end latency or the encoding 749 | plus distribution latency. 750 | 751 | 21. In case the input to the ingest source is MPEG-2 TS based, the ingest 752 | source is responsible for converting the presentation timestamps and 753 | program clock reference (PCR) to a timeline suitable for [[!MPEGDASH]] 754 | and [[!ISOBMFF]] with the correct anchor and timescales. The RECOMMENDED 755 | timescales and anchors are provided in next sections for each track type. 756 | For dual-encoder synchronization, it is also RECOMMENDED to use the Unix 757 | epoch or another similar well known time anchor (e.g. 758 | 2:14 a.m., EDT, on August 29, 1997, the time sky-net became self-aware 759 | is sometimes used). 760 | 761 | 22. In case a receiving entity cannot process a request from an ingest source 762 | correctly, it can send an HTTP error code. See [[#interface-1-failover]] or 763 | [[#interface-1-2]] for details. 764 | 765 | ## Requirements for Formatting Media Tracks ## {#interface-1-media-tracks} 766 | 767 | [[!MPEGCMAF]] has the notion of [=CMAF track=], which are composed of 768 | [=CMAF fragment=] and [=CMAF chunk=]s. A fragment can be composed of one or more 769 | chunks. The [=media fragment=] defined in ISOBMFF predates the definition in 770 | CMAF. It is assumed that the ingest source uses [=HTTP POST=] or 771 | [=HTTP PUT=] requests to transmit CMAF 772 | fragment(s) to the receiving entity. The following are additional requirements 773 | imposed to the formatting of CMAF media tracks. 774 | 775 | 1. Media tracks SHALL be formatted using boxes according to Section 7 of 776 | [[!MPEGCMAF]]. Media track SHOULD not use media-level encryption (e.g., 777 | common encryption), as HTTP over TLS (HTTPS) should provide sufficient 778 | transport layer security. However, in case common encryption is used, the 779 | decryption key shall be made available out of band by supported means such 780 | as CPIX defined by DASH-IF. 781 | 2. The [=CMAF fragment=] durations SHOULD be constant; the duration MAY 782 | fluctuate to compensate for non-integer frame rates. By choosing an 783 | appropriate timescale (a multiple of the frame rate is recommended) this 784 | issue should be avoided. A last fragment of a track may have a 785 | different duration. 786 | 3. The [=CMAF fragment=] durations SHOULD be between approximately one and 787 | six seconds. 788 | 4. Media tracks SHOULD use a timescale for video streams based on the 789 | framerate and 44.1 KHz or 48 KHz for audio streams or any another 790 | timescale that enables integer increments of the decode times of fragments 791 | signaled in the "tfdt" box based on this scale. If necessary, integer 792 | multiples of these timescales could be used. 793 | 5. The language of the CMAF track SHOULD be signaled in the "[=mdhd=]" box or 794 | "[=elng=]" boxes in the CMAF header. 795 | 6. Media tracks SHOULD contain the ("btrt") box specifying the target average 796 | and maximum bitrate of the CMAF fragments in the sample entry container in 797 | the CMAF header. 798 | 7. Media tracks MAY be composed of CMAF chunks [[!MPEGCMAF]] clause 7.3.2.3. 799 | In this case, they SHOULD be signaled using SegmentTypeBox ("styp") to 800 | make it easy for the receiving entity to differentiate them from CMAF 801 | fragments. The brand type of a chunk is *cmfl*. CMAF chunks should only be 802 | signaled if they are not the first chunk in a CMAF fragment. 803 | 8. In video tracks, profiles like avc1 and hvc1 MAY be used that signal the 804 | sequence parameter set in the CMAF header. In this case, these codec 805 | parameters do not change dynamically during the live session in the media 806 | track. 807 | 9. However, video tracks SHOULD use profiles like avc3 or hev1 that signal 808 | the parameter sets (PPS, SPS, VPS) in in the media samples. This allows 809 | inband signaling of parameter changes. This is because in live content, 810 | codec configuration may change slightly over time. 811 | 10. In case the language of a track changes, a new CMAF header with updated 812 | "[=mdhd=]" and/or "[=elng=]" SHOULD be present. The CMAF header MUST be 813 | identical, except the "elng" tag. 814 | 11. Track roles SHOULD be signaled in the ingest by using a "kind" box in 815 | UserDataBox ("udta"). The "kind" box MUST contain a schemeURI 816 | urn:mpeg:dash:role:2011 and a value containing a Role as defined in 817 | [[!MPEGDASH]]. In case this signaling does not occur, the processing 818 | entity can define the role for the track independently. 819 | 820 | ## Requirements for Signaling Switching Sets ## {#interface-1-switchingsets} 821 | 822 | In live streaming, a [=CMAF presentation=] of streams corresponding to a channel 823 | is ingested by posting to a [=publishing_point_URL=] at the receiving entity. 824 | CMAF has the notion of switching sets [[!MPEGCMAF]] that map to similar 825 | streaming protocol concepts like Adaptation Set in DASH. To signal a switching 826 | set in a CMAF presentation, CMAF media tracks MUST correspond to the constraints 827 | defined in [[!MPEGCMAF]] clause 7.3.4. 828 | 829 | In addition, optional explicit signaling is defined in this clause. This would 830 | mean the following steps could be implemented by the live ingest source. 831 | 832 | 1. A live ingest source MAY generate a [=switching set ID=] that is unique 833 | for each switching set in a live stream session. Tracks with the same 834 | [=switching set ID=] belong to the same switching set. The switching set 835 | ID can be a string or (small) integer number. Characters in 836 | [=switching set ID=] SHALL be unreserved, i.e., A-Za-z0-9_.-~ in order to 837 | avoid introducing delimiters. 838 | 2. The [=switching set ID=] may be added in a relative path to the 839 | [=POST_URL=] using the Switching() keyword. In this case, a CMAF segment 840 | is sent from the live ingest source as POST chunk.cmfv 841 | POST_URL/Switching([=switching set ID=])/Streams(stream_id) (deprecated not 842 | commonly supported). This option is only recommended when Streams() keyword 843 | is used and the option to signal switchingsets in the MPD is not used. 844 | 845 | 3. The live ingest source MAY add a "kind" box in the "udta" box in each 846 | track to signal the switching set it belongs to. The schemeURI of this 847 | "kind" box SHALL be urn:dashif:ingest:switchingset_id and the value field 848 | of the "kind" box SHALL be the [=switching set ID=]. 849 | 4. The switching sets are grouped as adaptation sets present in the DASH 850 | manifest in a POST request issued earlier, i.e., before the segments of 851 | that switching set are transmitted. In this case, the naming of the 852 | segment URIs follows the naming defined in the DASH manifest based on a 853 | SegmentTemplate elements. In this case the SwitchingSet ID corresponds 854 | to the AdaptationSet @id attribute 855 | 5. SwitchingSet grouping may be derived from the HTTP Live Streaming master playlist. 856 | 857 | Table 2: Switching set signaling options. 858 | 859 | 860 | 861 | 862 | 863 | 864 | 865 | 866 | 867 | 868 | 869 | 870 | 871 | 872 | 873 | 874 | 875 | 876 | 877 | 878 | 879 | 880 | 881 | 882 | 883 |
Signaling optionRequirement
Implicit signaling based on switching set constraints [[!MPEGCMAF]] clause 7.3.4.Mandatory
Signaling using [=switching set ID=] in the [=POST_URL=] using Switching() keyword (only when not MPD and Streams() is used)Optional
Signaling using DASH AdaptationSet and defined naming structure based on SegmentTemplate and SegmentTimelineOptional
Signaling using HTTP Live Streaming master playlist Optional
Signaling using [=switching set ID=] in the track using "kind" box with schemeURI urn:dashif:ingest:switchingset_id and value set to [=switching set ID=]Optional
884 | 885 | ## Requirements for Timed Text, Captions and Subtitle Tracks ## {#interface-1-timed-text-captions} 886 | 887 | The live media ingest specification follows requirements for ingesting a track 888 | with timed text, captions and/or subtitle streams. The recommendations for 889 | formatting subtitle and timed text tracks are defined in [[!MPEGCMAF]] and 890 | [[!MPEG4-30]]. 891 | 892 | We provide additional guidelines and best practices for formatting timed text 893 | and subtitle tracks. 894 | 895 | 1. CMAF tracks carrying WebVTT signaled by the *cwvt* brand or TTML Text 896 | signaled by the *im1t* brand are preferred. [[!MPEG4-30]] defines the 897 | track format selected in [[!MPEGCMAF]]. 898 | 2. Based on this [[!ISOBMFF]], the trackhandler "hdlr" SHALL be set to "text" 899 | for WebVTT and "subt" for TTML. 900 | 3. The "[=ftyp=]" box in the CMAF header for the track containing timed text, 901 | images, captions and subtitles MAY use signaling using CMAF profiles based 902 | on [[!MPEGCMAF]]: 903 | 904 | 4. The BitRateBox ("btrt") SHOULD be used to signal the average and maximum 905 | bitrate in the sample entry box, this is most relevant for bitmap or XML 906 | based timed text subtitles that may consume significant bandwidth (e.g., 907 | im1i or im1t). 908 | 5. In case the language of a track changes, a new CMAF header with updated 909 | "[=mdhd=]" and/or "[=elng=]" SHOULD be sent from the ingest source to the 910 | receiving entity. 911 | 6. Track roles can be signaled in the ingest, by using a "kind" box in the 912 | "udta" box. The "kind" box MUST contain a schemeURI 913 | urn:mpeg:dash:role:2011 and a value containing a role as defined in 914 | [[!MPEGDASH]]. 915 | 916 | NOTE: [[!MPEGCMAF]] allows multiple "kind" boxes, hence, multiple roles can be 917 | signaled. By default, one should signal the DASH role urn:mpeg:dash:role:2011. A 918 | receiver may derive corresponding configuration for other streaming protocols 919 | such as HLS. In case this is not desired, additional "kind" boxes with 920 | corresponding schemeURI and values can be used to explicitly signal this 921 | information for other protocol schemes. 922 | 923 | An informative scheme of defined roles in DASH and respective corresponding 924 | roles in HLS can be found below, additionally the forced subtitle in HLS might 925 | be derived from a DASH forced subtitle role as well by a [=receiving entity=]. 926 | 927 | Table 3: Roles for subtitle and audio tracks and HLS characteristics. 928 | 929 | 930 | 931 | 932 | 933 | 934 | 935 | 936 | 937 | 938 | 939 | 940 | 941 | 942 | 943 | 944 | 945 | 946 | 947 | 948 | 949 |
HLS characteristicurn:mpeg:dash:role:2011
transcribes-spoken-dialogsubtitle
easy-to-readeasyreader
describes-videodescription
describes-music-and-soundcaption
950 | 951 | DASH roles are defined in urn:mpeg:dash:role:2011 [[!MPEGDASH]]. Another example 952 | for explicitly signaling roles could be DVB DASH [[!DVB-DASH]]: 953 | 954 |
955 | kind.schemeURI="urn:tva:metadata:cs:AudioPurposeCS:2007@1" 956 | kind.value="Alternate" 957 |
958 | 959 | ## Requirements for Timed Metadata Tracks ## {#interface-1-timed-metadata} 960 | 961 | This section discusses the specific formatting requirements for [=CMAF Ingest=] 962 | of timed metadata. Examples of timed metadata are opportunities for splice 963 | points and program information signaled by SCTE-35 markers. Such event signaling 964 | is different from regular audio/video information because of its sparse nature. 965 | In this case, the signaling data usually does not happen continuously and the 966 | intervals may be hard to predict. Other examples of timed metadata are ID3 tags 967 | [[!ID3v2]], SCTE-35 markers [[!SCTE35]] and DASHEventMessageBox'es defined in 968 | Section 5.9.8.3 of [[!MPEGDASH]]. 969 | 970 | Table 4 provides some example urn schemes to be signaled. Table 5 illustrates an 971 | example of a SCTE-35 marker stored in a DASHEventMessageBox that is in turn 972 | stored as a metadata sample in a metadata track. The presented approach enables 973 | ingest of timed metadata from different sources, because data is not interleaved 974 | with the media. 975 | 976 | By using CMAF timed metadata tack, the same track and presentation formatting 977 | are applied for metadata as for other tracks ingested, and the metadata is part 978 | of the [=CMAF presentation=]. 979 | 980 | By embedding the DASHEventMessageBox structure in timed metadata samples, some 981 | of the benefits of its usages in DASH and CMAF are kept. In addition, it enables 982 | signaling of gaps, overlapping events and multiple events starting at the same 983 | time in a single timed metadata track for this scheme. In addition, the parsing 984 | and processing of DASHEventMessageBox'es is supported in many players. The 985 | support for this DASHEventMessageBox embedded timed metadata track instantiation 986 | is described. 987 | 988 | An example of adding an ID3 tag in a DASHEventMessageBox can be found in 989 | [[=aomid3=]]. 990 | 991 | Table 4: Example URN schemes for timed metadata tracks. 992 | 993 | 994 | 995 | 996 | 997 | 998 | 999 | 1000 | 1001 | 1002 | 1003 | 1004 | 1005 | 1006 | 1007 | 1008 | 1009 | 1010 | 1011 | 1012 | 1013 |
URIReference
urn:mpeg:dash:event:2012[[!MPEGDASH]]
urn:dvb:iptv:cpm:2014[[!DVB-DASH]]
urn:scte:scte35:2013:bin [[!SCTE214-3]]
www.nielsen.com:id3:v1 Nielsen ID3 in DASH [[!ID3v2]]
1014 | 1015 | Table 5: Example of a SCTE-35 marker embedded in a DASH EventMessageBox. 1016 | 1017 | 1018 | 1019 | 1020 | 1021 | 1022 | 1023 | 1024 | 1025 | 1026 | 1027 | 1028 | 1029 | 1030 | 1031 | 1032 | 1033 | 1034 | 1035 | 1036 | 1037 | 1038 | 1039 | 1040 | 1041 | 1042 | 1043 | 1044 | 1045 | 1046 | 1047 | 1048 | 1049 |
TagValue
scheme_id_uriurn:scte:scte35:2013:bin
value value used to signal subscheme
timescalepositive number, ticks per second, similar to track timescale
presentation_time_deltanon-negative number
event_durationduration of event "0xFFFFFFFF" if unknown
idunique identifier for message
message_data splice info section including CRC
1050 | 1051 | The following are requirements and recommendations that apply to the timed 1052 | metadata ingest of information related to events, tags, ad markers and program 1053 | information and others: 1054 | 1055 | 1. Timed Metadata SHALL be conveyed in a CMAF track, where the media handler (hdlr) 1056 | is "meta", the track handler box is a NullMediaHeaderBox ("[=nmhd=]") as 1057 | defined for timed metadata tracks in [[!ISOBMFF]] clause 12.3. 1058 | 2. The CMAF timed metadata track applies to the [=CMAF presentation=] 1059 | ingested to a [=publishing_point_URL=] at the receiving entity. 1060 | 3. To fulfill CMAF track requirements in [[!MPEGCMAF]] clause 7.3., such as 1061 | not having gaps in the media timeline, filler data may be needed. Such 1062 | filler data SHALL be defined by the metadata scheme signaled in 1063 | URIMetaSampleEntry. For example, WebVTT tracks define a VTTEmptyCueBox in 1064 | [[!MPEG4-30]] clause 6.6. This cue is to be carried in samples in which no 1065 | active cue occurs. Other schemes could define empty fillers amongst 1066 | similar lines, such as the EventMessageEmptyBox (emeb) in ISO/IEC 23001-18. 1067 | 4. CMAF track files do not support overlapping, multiple concurrently active 1068 | or zero duration samples. In case metadata or events are concurrent, 1069 | overlapping or of zero duration, such semantics MUST be defined by the 1070 | scheme signaled in the URIMetaSampleEntry. The timed metadata track MUST 1071 | still conform to [[!MPEGCMAF]] clause 7.3. 1072 | 5. CMAF timed metadata tracks MAY carry DASH Events as defined in 1073 | [[!MPEGDASH]] clause 5.9.8.3 in the metadata samples. The best way to 1074 | create such a track is based on ISO/IEC 23001-18. Some 1075 | older implementations may use DASHEventMessageBox'es as defined in 1076 | ISO/IEC 23009-1. Using DASHEventMessageBox'es directly in samples may be 1077 | implemented as follows: 1078 | 1079 | 5a. Version 1 SHOULD be used. In case version 0 is used, the 1080 | presentation_time_delta refers to presentation time of the sample 1081 | enclosing the DASHEventMessageBox. 1082 | 1083 | 5b. The URIMetaSampleEntry SHOULD contain the URN 1084 | "urn:mpeg:dash:event:2012" or an equivalent URN to signal the presence of 1085 | DASHEventMessageBox'es. 1086 | 1087 | 5c. The timescale of the DASHEventMessageBox SHALL match the value 1088 | specified in the MediaHeaderBox ("mdhd") of the timed metadata track. 1089 | 1090 | 5d. The sample SHOULD contain all DASHEventMessageBox'es that are active 1091 | in during the presentation time of the sample. 1092 | 1093 | 5e. A single metadata sample MAY contain multiple DASHEventMessageBox'es. 1094 | This happens if multiple DASHEventMessageBox'es have the same presentation 1095 | time or if an earlier event is still active in a sample containing a newly 1096 | started and overlapping event. 1097 | 1098 | 5f. The scheme_id_uri in the DASHEventMessageBox can be used to signal the 1099 | scheme of the data carried in the message data field. This enables 1100 | carriage of multiple metadata schemes in a track. 1101 | 1102 | 5g. For SCTE-35 ingest, the scheme_id_uri in the DASHEventMessageBox MUST 1103 | be "urn:scte:scte35:2013:bin" as defined in [[!SCTE214-3]]. A binary 1104 | SCTE-35 payload is carried in the message_data field of a 1105 | DASHEventMessageBox. If a splice point is signaled, media tracks MUST 1106 | insert an IDR sample at the time corresponding to the event presentation 1107 | time. 1108 | 1109 | 5h. It may be necessary to add filler samples to avoid gaps in the CMAF 1110 | track timeline. This may be done using EventMessageEmptyBox (8 bytes) with 1111 | 4cc code of "emeb" defined in ISO/IEC 23001-18. 1112 | 1113 | 5i. If ID3 tags are carried, the DASHEventMessageBox MUST be formatted as 1114 | defined in [[=aomid3=]]. 1115 | 1116 | 5j. The value and id field of the DASHEventMessageBox can be used by the 1117 | receiving entity to detect duplicate events. 1118 | 1119 | 6. The ingest source SHOULD NOT embed inband top-level DASHEventMessageBox'es 1120 | ("emsg") in the timed metadata track. 1121 | 1122 | 7. Timed metadata tracks, similar to other CMAF tracks, SHOULD use a constant 1123 | segment duration. As actual timed metadata durations may vary in practice, 1124 | timed metadata schemes should support schemes for re-signaling all active 1125 | timed metadata in each sample. This way, constant duration segments 1126 | (e.g., two-second segments) can still be used and metadata that is still 1127 | active can be repeated in later segments. ISO/IEC 23001-18 has explicit 1128 | support for this feature by repeating the event message instance boxes 1129 | in subsequent samples. 1130 | 8. A change in the set of active events shall trigger a sample boundary in 1131 | the timed medata track. 1132 | 1133 | 9. In case the timed metadata track is also signaled in the manifest, the 1134 | @codecs string should be set to the 4cc code of the sample entry, e.g., 1135 | "urim" for URIMetaSampleEntry or "evte" for ISO/IEC 23001-18. 1136 | The contentType field should be set to "meta" and mimeType field to "application/mp4". 1137 | Additional supplemental or Essential property descriptors may 1138 | be used to further describe the content of the metadata track in the manifest. 1139 | 1140 | ## Requirements for Signaling and Conditioning Splice Points ## {#interface-1-splicing} 1141 | 1142 | Splicing is important for use cases like ad insertion or clipping of content. 1143 | The requirements for signaling splice points and content conditioning at 1144 | respective splice points are as follows. 1145 | 1146 | 1. The preferred method for signaling splice point uses the timed metadata 1147 | track sample with a presentation time corresponding to the splice point. 1148 | The timed metadata track sample is carrying events carrying binary SCTE-35 1149 | based on the scheme urn:scte:scte35:2013:bin as defined in 1150 | [[!SCTE214-3]]. The command carried in the binary SCTE-35 SHALL carry a 1151 | splice info section with spliceInsert command with out of network 1152 | indicator set to 1 and a break_duration matching the actual break 1153 | duration. 1154 | 1155 | 2. Information related to splicing, whether SCTE-35 based or by other means, 1156 | whether in an EventMessageBox or timed metadata track sample or event MUST 1157 | be available to the receiver at least four seconds before the media 1158 | segment with the intended splice point. 1159 | 1160 | 3. The splice time SHALL equal the presentation time of the metadata sample 1161 | or event message, as the SCTE-35 timing is based on MPEG-2 TS and has no 1162 | meaning in CMAF or DASH. The media ingest source is responsible for the 1163 | frame accurate conversion of this time similar to for the media segments. 1164 | 1165 | 4. In case a separate SCTE-35 command is used with out_of_network_indicator=0, 1166 | the actual duration of the break SHALL match the announced break duration in the 1167 | SCTE-35 command iwth out_of_network_indicator=1 in the earlier SCTE-35 1168 | splice_insert command. 1169 | 1170 | 5. In case segmentation descriptors are used and multiple descriptors are 1171 | present, a separate event message with a duration corresponding to each of 1172 | the descriptors SHOULD be used. 1173 | 1174 | The conditioning follows [[=DASH-IFad=]] shown in Figure 9: 1175 | 1176 | Figure 9: Splice point conditioning 1177 |
1178 | 1179 | 1180 | The splice point conditioning in [[=DASH-IFad=]] are defined as follows: 1181 | 1182 | 1. Option 1 (splice conditioned packaging): Both a fragment boundary and a 1183 | SAP 1 or SAP 2 (stream access point) at the splice point. 1184 | 2. Option 2 (splice conditioned encoding): A SAP 1 or SAP 2 stream access 1185 | point at the frame at the boundary. 1186 | 3. Option 3 (splice point signaling): No specific content conditioning at the 1187 | splice point. 1188 | 1189 | This specification requires option 1 or 2 to be applied. Option 2 is required 1190 | for dual-encoder synchronization to avoid variation of the segment durations. 1191 | 1192 | ## Requirements for Failovers and Connection Error Handling ## {#interface-1-failover} 1193 | 1194 | Given the nature of live streaming, good failover support is critical for 1195 | ensuring the availability of the service. Typically, media services are designed 1196 | to handle various types of failures, including network errors, server errors, 1197 | and storage issues. When used in conjunction with proper failover logic from the 1198 | ingest source side, highly reliable live streaming setups can be built. In this 1199 | section, we discuss requirements for failover scenarios. 1200 | 1201 | When the [=receiving entity=] fails: 1202 | 1203 | - A new instance SHOULD be created listening to the same 1204 | [=publishing_point_URL=] for the ingest stream. 1205 | 1206 | When the [=ingest source=] fails: 1207 | 1208 | 1. A new instance SHOULD be instantiated to continue the ingest for the live 1209 | streaming session. 1210 | 2. The new instance MUST use the same URL's for HTTP requests as the 1211 | failed instance for segments. 1212 | 3. The new instance's POST request MUST include the same [=CMAF header=] or 1213 | CMAF header as the failed instance. 1214 | 4. The new instance MUST be properly synced with all other running ingest 1215 | sources for the same live presentation to generate synced audio/video 1216 | samples with aligned fragment boundaries in the track. This implies that 1217 | timestamps in the "tfdt" [=baseMediaDecodeTime=] box match. 1218 | 5. The new stream MUST be semantically equivalent with the previous stream, 1219 | and interchangeable at the header and media fragment levels. 1220 | 6. The new instance SHOULD try to minimize data loss. The 1221 | [=baseMediaDecodeTime=] of fragments SHOULD increase from the point where 1222 | the encoder last stopped. The [=baseMediaDecodeTime=] in the "tfdt" box 1223 | SHOULD increase in a continuous manner, but it is permissible to introduce 1224 | a discontinuity, if necessary. A receiving entity can ignore fragments 1225 | that it has already received and processed, so it is better to err on the 1226 | side of resending fragments than to introduce discontinuities in the media 1227 | timeline. 1228 | 7. In some cases, an alternative source can be used by the receiving entity 1229 | to request the missing segments through additional signaling, which is out 1230 | of the scope of this specification. 1231 | 1232 | ## Requirements for Ingest Source Synchronization ## {#interface-1-dualsync} 1233 | 1234 | In the case of more than one redundant ingest sources, synchronization between 1235 | them can be achieved as follows. A fixed segment duration is chosen such as 1236 | based on the fixed GoP duration, e.g., two seconds that is used by all ingest 1237 | sources and CMF tracks. 1238 | So the CMAF segment duration is fixed for all CMAF tracks (not only the video 1239 | tracks). The CMAF tracks use a fixed anchor T as a timeline origin, this 1240 | should be 1-1-1970 (Unix epoch) or another well-known defined time anchor. The 1241 | segment boundaries in this case are K * segment duration (since anchor T) for an 1242 | integer K > 0. Any media source joining or starting can compute the fragment 1243 | boundary and produce segments with equivalent segment boundaries corresponding 1244 | to approximately the current time by choosing K sufficiently large. 1245 | 1246 | It is assumed that media sources generate signals from a synchronized input source and 1247 | can use timing information from this source, e.g., MPEG-2 TS presentation time 1248 | stamp or SDI signals to compute such timestamps for each segment. For example, 1249 | in the case of MPEG-2 TS program clock reference (PCR) and presentation 1250 | timestamps can be used. Based on this conversion, different media sources will 1251 | produce segments with identical durations, per frame timestamps and enclosing frames. 1252 | By this conversion to a common timeline based on a common anchor (in this case the 1253 | Unix epoch) and fixed segment durations, ingest sources can join and leave the 1254 | synchronized operation, enabling both synchronization and redundancy. Each 1255 | time a source join it can compute based on the anchor, fixed segment duration 1256 | and current Time a suitable value for K and the CMAF base media decode times. 1257 | 1258 | In this setup, a first ingest source can be seamlessly replaced by a redundant 1259 | second ingest source. In case of splicing, it is important that the ingest 1260 | source inserts an IDR frame but not a segment or fragment boundary. 1261 | 1262 | ## Identifier ## {#interface-1-identifier} 1263 | 1264 | The interface described in this clause is identified with the following identifier: 1265 | 1266 | 1267 | 1268 | 1269 | 1270 | 1271 | 1272 | 1273 | 1274 | 1275 | 1276 | 1277 | 1278 | 1279 |
IdentiferReferenceSectionsComments
http://dashif.org/ingest/v1.2/interface-1http://dashif.org/ingest/v1.2Clause [[#interface-1]]Conforming to the requirements of clause [[#interface-1]]
1280 | 1281 | The above identifier may be used by an entity to signal the support of the interface defined in clause [[#interface-1]]. 1282 | 1283 | # Interface-2: DASH and HLS Ingest # {#interface-2} 1284 | 1285 | Interface-2 defines the protocol specific behavior required to ingest a 1286 | [=streaming presentation=] composed of mandatory [=manifest objects=] and 1287 | [=media objects=] to receiving entities. In this mode, the ingest source 1288 | prepares and delivers to the receiving entity all the [=objects=] intended for 1289 | consumption by a client. These are a complete streaming presentation including 1290 | all manifest and media objects. 1291 | 1292 | This interface is intended to be used by workflows that do not require active 1293 | media processing after encoding. It leverages the fact that many encoders 1294 | provide DASH and HLS packaging capabilities and that the resulting packaged 1295 | content can easily be transferred via HTTP to standard web servers. However, 1296 | neither DASH nor HLS has specified how such a workflow is intended to work 1297 | leaving the industry to self-specify key decisions such as how to secure and 1298 | authenticate ingest sources, who is responsible for managing the content life 1299 | cycle, the order of operations, failover features, robustness methods, etc. In 1300 | most cases, a working solution can be had using a readily available web server 1301 | such as Nginx or Varnish and the standard compliment of HTTP methods. In many 1302 | cases, Interface-2 simply documents what is considered an industry best practice 1303 | while attempting to provide guidance to areas less commonly considered. 1304 | 1305 | The requirements below (in addition to the common requirements listed in 1306 | [[#interface-1-2]]) encapsulate all the needed functionality to support 1307 | Interface-2. In case [[!MPEGCMAF]] media is used, the media track and segment 1308 | formatting will be similar as defined in Interface-1. 1309 | 1310 | ## General Requirements ## {#interface-2-requirements} 1311 | 1. The ingest source MUST be able to create a compliant streaming 1312 | presentation for DASH and/or HLS. The ingest source may create both DASH 1313 | and HLS streaming presentations using common media objects (i.e., CMAF), 1314 | but the ingest source MUST generate format-specific manifest objects. 1315 | 1316 | ### HTTP Sessions ### {#interface-2-http-sessions} 1317 | 1318 | 1319 | 1. The ingest source SHOULD remove media objects from the receiving entity 1320 | that are no longer referenced in the corresponding manifest objects via an 1321 | HTTP DELETE command. How long the ingest source waits to remove 1322 | unreferenced content can be configurable. Upon receiving an HTTP DELETE 1323 | command, the receiving entity SHOULD: 1324 | 1325 | 1a. delete the referenced content and return an HTTP 200 OK status code, 1326 | 1327 | 1b. delete the corresponding folder if the last file in the folder is 1328 | deleted and it is not a root folder and not necessarily recursively 1329 | deleting empty folders. 1330 | 1331 | ### Unique Segment and Manifest Naming ### {#interface-2-naming} 1332 | 1333 | 1. The ingest source MUST ensure all [=media objects=] (video segments, audio 1334 | segments, initialization segments and caption segments) have unique paths. 1335 | This uniqueness applies across all ingested content in previous sessions 1336 | as well as the current session. This requirement ensures previously cached 1337 | content (i.e., by a CDN) is not inadvertently served instead of newer 1338 | content of the same name. 1339 | 2. The ingest source MUST ensure all objects in a [=live stream session=] are 1340 | contained within the configured path. Should the receiving entity receive 1341 | media objects outside of the allowed path, it SHOULD return an HTTP 403 1342 | Forbidden response. 1343 | 3. For each live stream session, the ingest source MUST provide unique paths 1344 | for the [=manifest objects=]. One suggested method of achieving this is to 1345 | introduce a timestamp of the start of the live stream session into the 1346 | manifest path. A session is defined by the explicit start and stop of the 1347 | encoding process. 1348 | 4. When receiving objects with the same path as an existing object, the 1349 | receiving entity MUST overwrite the existing objects with the newer 1350 | objects of the same path. 1351 | 5. To support unique naming and consistency, the ingest source SHOULD include 1352 | a number, which is monotonically increasing with each new media object at 1353 | the end of media object's name, separated by a non-numeric character. This 1354 | way it is possible to retrieve this numeric suffix via a regular 1355 | expression. 1356 | 1357 | NOTE: Using DASH SegmentTemplate with @media and @intitialization and a single period 1358 | can achieve this. 1359 | 1360 | 6. The ingest source MUST identify media objects containing initialization 1361 | fragments by using the .init file extension. 1362 | 7. The ingest source MUST include a file extension and a MIME type for all 1363 | media objects. Table 6 outlines the formats that manifest and media 1364 | objects are expected to follow based on their file extension. Segments may 1365 | be formatted as MPEG4 (.mp4, .m4v, m4a), [[!MPEGCMAF]] (.cmfv, .cmfa, 1366 | .cmfm, .cmft) or [[!MPEG2TS]] .ts (HLS only). Manifests may be formatted 1367 | as DASH (.mpd) or HLS (.m3u8). 1368 | 1369 | NOTE: Using MPEG-2 TS breaks consistency with Interface-1, which uses a CMAF 1370 | container format structure. 1371 | 1372 | Table 6: List of the permissible combinations of file extensions and MIME types. 1373 | 1374 | 1375 | 1376 | 1377 | 1378 | 1379 | 1380 | 1381 | 1382 | 1383 | 1384 | 1385 | 1386 | 1387 | 1388 | 1389 | 1390 | 1391 | 1392 | 1393 | 1394 | 1395 | 1396 | 1397 | 1398 | 1399 | 1400 | 1401 | 1402 | 1403 | 1404 | 1405 | 1406 | 1407 | 1408 | 1409 | 1410 | 1411 | 1412 | 1413 | 1414 | 1415 | 1416 | 1417 | 1418 | 1419 | 1420 | 1421 | 1422 | 1423 | 1424 | 1425 | 1426 | 1427 | 1428 | 1429 | 1430 |
File extensionMIME type
.m3u8 [[!RFC8216]]application/x-mpegURL or vnd.apple.mpegURL
.mpd [[!MPEGDASH]]application/dash+xml
.cmfv [[!MPEGCMAF]]video/mp4
.cmfa [[!MPEGCMAF]]audio/mp4
.cmft [[!MPEGCMAF]]application/mp4
.cmfm [[!MPEGCMAF]]application/mp4
.mp4 [[!ISOBMFF]]video/mp4 or application/mp4
.m4v [[!ISOBMFF]]video/mp4
.m4a [[!ISOBMFF]]audio/mp4
.m4s [[!ISOBMFF]]video/iso.segment
.initvideo/mp4
.header [[!ISOBMFF]]video/mp4
.keyapplication/octet-stream
1431 | 1432 | ### Additional Failure Behaviors ### {#interface-2-failure-behaviors} 1433 | 1434 | The following items defines additional behavior of an ingest source when 1435 | encountering certain error responses from the receiving entity. 1436 | 1437 | 1. When the ingest source receives a TCP connection attempt timeout, abort 1438 | midstream, response timeout, TCP send/receive timeout or an HTTP 5xx error 1439 | code when attempting to POST content to the receiving entity, it MUST: 1440 | 1441 | 1a. For manifest objects: Re-resolve DNS on each retry (per the DNS TTL) 1442 | and retry as defined in [[#interface-1-2]]. 1443 | 1444 | 1b. For media objects: Re-resolve DNS on each retry (per the DNS TTL) and 1445 | continue uploading for n seconds, where n is the segment duration. After 1446 | it reaches the media object duration value, the ingest source MUST 1447 | continue with the next media object and update the manifest object with a 1448 | discontinuity marker appropriate for the protocol format. To maintain 1449 | continuity of the timeline, the ingest source SHOULD continue to upload 1450 | the missing media object with a lower priority. The reason for this is to 1451 | maintain an archive without discontinuity in case the stream is played 1452 | back at a later time. Once a media object is successfully uploaded, the 1453 | ingest source SHOULD update the corresponding manifest object to reflect 1454 | the now available media object. 1455 | 1456 | NOTE: Some clients may not like changes made in the manifest about the 1457 | past media objects (e.g., removing a previously present discontinuity). 1458 | Thus, care should be taken when making such changes. 1459 | 1460 | 2. Upon receipt of an HTTP 403 or 400 error code, the ingest source MAY be 1461 | configured to not retry sending the fragments (N, as described in 1462 | [[#interface-1-2]], will be 0 in this case). 1463 | 1464 | ## DASH-Specific Requirements ## {#dash-ingest-requirements} 1465 | 1466 | ### File Extensions and MIME Types ### {#dash-ingest-extensions-mime} 1467 | 1468 | 1. The ingest source MUST use an .mpd file extension for the manifest. 1469 | 2. The ingest source MUST use one of the allowed file extensions (see Table 1470 | 6) for the media objects. 1471 | 1472 | ### Relative Paths ### {#dash-ingest-relative-paths} 1473 | 1474 | - The ingest source SHOULD use relative URLs to address each segment within 1475 | the manifest. 1476 | 1477 | ## HLS-Specific Requirements ## {#hls-ingest-requirements} 1478 | 1479 | ### File Extensions and MIME Types ### {#hls-ingest-extensions-mime} 1480 | 1481 | 1. The ingest source MUST use an .m3u8 file extension for master and variant 1482 | playlists. 1483 | 2. The ingest source SHOULD use a .key file extension for any keyfile posted 1484 | to the receiving entity for client delivery. 1485 | 3. The ingest source MUST use a .ts file extension for segments encapsulated 1486 | in an MPEG-2 TS file format. 1487 | 4. The ingest source MUST use one of the allowed file extensions (see Table 1488 | 6) appropriate for the MIME type of the content encapsulated using 1489 | [[!MPEGCMAF]]. 1490 | 1491 | ### Relative Paths ### {#hls-ingest-relative-paths} 1492 | 1493 | 1. The ingest source SHOULD use relative URLs to address each segment within 1494 | the variant playlist. 1495 | 2. The ingest source SHOULD use relative URLs to address each variant 1496 | playlist within the master playlist. 1497 | 1498 | ### Encryption ### {#hls-ingest-encryption} 1499 | 1500 | - The ingest source may choose to encrypt the media segments and publish the 1501 | corresponding keyfile to the receiving entity. 1502 | 1503 | ### Upload Order ### {#hls-ingest-upload_order} 1504 | 1505 | In accordance with [[!RFC8216]] recommendation, ingest sources MUST upload all 1506 | required files for a specific bitrate and segment before proceeding to the next 1507 | segment. For example, for a bitrate that has segments and a playlist that 1508 | updates every segment and key files, ingest sources upload the segment file 1509 | followed by a key file (optional) and the playlist file in serial fashion. The 1510 | encoder MUST only move to the next segment after the previous segment has been 1511 | successfully uploaded or after the segment duration time has elapsed. The order 1512 | of operation should be: 1513 | 1514 | 1. Upload the media segment, 1515 | 2. Upload the key file (if required), 1516 | 3. Upload the playlist. 1517 | 1518 | If there is a problem with any of the steps, retry. Do not proceed to step 3 1519 | until step 1 succeeds or times out as described above. Failed uploads MUST 1520 | result in a stream manifest discontinuity per [[!RFC8216]]. 1521 | 1522 | ### Resiliency ### {#hls-ingest-resiliency} 1523 | 1524 | 1. When ingesting media objects to multiple receiving entities, the ingest 1525 | source MUST send identical media objects with identical names. 1526 | 2. When multiple ingest sources are used, they MUST use consistent media 1527 | object names including when reconnecting due to an application or 1528 | transport error. A common approach is to use (epoch time)/(segment 1529 | duration) as the object name. 1530 | 1531 | ## Identifier ## {#interface-2-identifier} 1532 | 1533 | The interface described in this clause is identified with the following identifier: 1534 | 1535 | 1536 | 1537 | 1538 | 1539 | 1540 | 1541 | 1542 | 1543 | 1544 | 1545 | 1546 | 1547 | 1548 |
IdentiferReferenceSectionsComments
http://dashif.org/ingest/v1.2/interface-2http://dashif.org/ingest/v1.2Clause [[#interface-2]]Conforming to the requirements of clause [[#interface-2]]
1549 | 1550 | The above identifier may be used by an entity to signal the support of the interface defined in clause [[#interface-2]]. 1551 | 1552 | # Examples (Informative) # {#examples} 1553 | 1554 | In this section, we provide some example deployments for live streaming. 1555 | 1556 | ## Example 1: CMAF Ingest and a Just-in-Time Packager ## {##example-1} 1557 | 1558 | Figure 10 shows an example where a separate packager and origin server are used. 1559 | 1560 | Figure 10: Example setup with CMAF Ingest and DASH/HLS Ingest. 1561 |
1562 | 1563 | The broadcast source is used as input to the [=live encoder=]. The broadcast 1564 | sources can be the SDI signals from a broadcast facility or MPEG-2 TS streams 1565 | intercepted from a broadcast that need to be re-used in an [=OTT=] distribution 1566 | workflow. The live encoder performs the encoding of the tracks into CMAF tracks 1567 | and functions as the ingest source in the CMAF Ingest interface. Multiple live 1568 | encoders can be used, providing redundant inputs to the packager using 1569 | dual-encoder synchronization. In this case, the segments are of constant 1570 | duration, and audio and video segment boundaries are aligned. Segments should 1571 | use a timing relative to a shared anchor such as the Unix epoch as to support 1572 | synchronization based on epoch locking (see section on ingest source synchronization). 1573 | 1574 | Following the CMAF Ingest specification in this document allows for failover and 1575 | many other features related to the content tracks. The live encoder performs the 1576 | following tasks: 1577 | 1578 | - It demuxes and receives the MPEG-2 TS and/or SDI signal. 1579 | 1580 | - It translates the metadata in these streams such as SCTE-35 or SCTE-104 to 1581 | timed metadata tracks. 1582 | 1583 | - It performs a high quality [=ABR=] encoding in different bitrates with 1584 | aligned switching points. 1585 | 1586 | - It packages all media and timed text tracks as CMAF-compliant tracks and 1587 | signals track roles in "kind" boxes. 1588 | 1589 | - It posts the addressable media objects composing the tracks to the packager 1590 | according to the CMAF Ingest interface defined in [[#interface-1]], and 1591 | optionally a manifest describing the groupings and naming of the inputs. 1592 | 1593 | - The CMAF Ingest allows multiple live encoders and packagers to be deployed 1594 | benefiting from redundant stream creation avoiding timeline discontinuities 1595 | due to failures as much as possible. 1596 | 1597 | - In case the receiving entity fails, it reconnects and resends as defined in 1598 | [[#interface-1-2]] and [[#interface-1-failover]]. 1599 | 1600 | - In case the ingest source itself fails, it restarts and performs the steps 1601 | as in [[#interface-1-failover]]. 1602 | 1603 | The live encoder can be deployed in the cloud or on a bare metal server or even 1604 | as a dedicated hardware. The live encoder may have some tools or configuration 1605 | APIs to author the CMAF tracks and feed instructions/properties from the SDI or 1606 | broadcast feed into the CMAF tracks. The packager receives the ingested streams 1607 | and performs the following tasks. 1608 | 1609 | - It receives the CMAF tracks, grouping switching sets based on switching set 1610 | constraints, based on the "kind" box or information in the URI or MPD. 1611 | 1612 | - When packaging to DASH, an adaptation set is created for each switching set 1613 | ingested. 1614 | 1615 | - The near constant fragment duration is used to generate segment template 1616 | based presentation using either $Number$ or $Time$. 1617 | 1618 | - In case a splice point occurs, an IDR frame is inserted in the segment 1619 | without introducing a segment boundary (this is important if more than one 1620 | synchronized encoders are used). The SCTE-35 signal is included as timed 1621 | metadata. 1622 | 1623 | - In case changes happen, the packager can update the manifest and embed 1624 | inband events to trigger manifest updates in the fragments. 1625 | 1626 | - The DASH packager encrypts media segments according to key information 1627 | available. This key information is typically exchanged by protocols defined 1628 | in CPIX. This allows configuration of the content keys, initialization 1629 | vectors and embedding encryption information in the manifest. 1630 | 1631 | - The DASH packager signals subtitles in the manifest based on received CMAF 1632 | streams and roles signaled in the "kind" box. 1633 | 1634 | - In case a fragment is missing and SegmentTimeline is used, the packager 1635 | signals a discontinuity in the MPD. 1636 | 1637 | - In case the low-latency mode is used, the packager may make output 1638 | available before the entire fragment is received using HTTP chunked 1639 | transfer encoding. 1640 | 1641 | - The packager may have a proprietary API similar to the live encoder for 1642 | configuration of aspects like the timeShiftBuffer, DVR window, encryption 1643 | modes enabled, etc. 1644 | 1645 | - The packager uses DASH/HLS Ingest (as specified in [[#interface-2]]) to 1646 | push content to the origin server of a CDN. Alternatively, it could also 1647 | make content directly available as an origin server. In this case, DASH/HLS 1648 | Ingest is avoided and the packager also serves as the origin server. 1649 | 1650 | - The packager converts the timed metadata track and uses it to convert to 1651 | either MPD events or inband events signaled in the manifest. The packager 1652 | creates a segment boundary in case this was not present in the original 1653 | ingest and in case a SCTE-35 splice event was received. 1654 | 1655 | - The packager may also generate HLS or other streaming media presentations 1656 | based on the input. 1657 | 1658 | - In case the packager crashes or fails, it restarts and waits for the ingest 1659 | source to perform the actions detailed in [[#interface-1-failover]]. 1660 | 1661 | The CDN consumes a DASH/HLS Ingest or serves as a proxy for content delivered to 1662 | a client. The CDN, in case it is consuming the POST-based DASH/HLS Ingest, 1663 | performs the following tasks: 1664 | 1665 | - It stores all posted content and makes them available for HTTP GET requests 1666 | from locations corresponding to the paths signaled in the manifest. 1667 | 1668 | - It occasionally deletes content based on instructions from the ingest 1669 | source, which is the packager in this setup. 1670 | 1671 | - In case the low-latency mode is used, content could be made available 1672 | before the entire pieces of content are available. 1673 | 1674 | - It updates the manifest accordingly when a manifest update is received. 1675 | 1676 | - It serves as a proxy for HTTP GET requests forwarded to the packager. 1677 | 1678 | In case the CDN serves as a proxy, it only forwards requests for content to the 1679 | packager to receive the content and caches the relevant segments for a certain 1680 | duration. 1681 | 1682 | The client receives DASH or HLS streams and is not affected by the specification 1683 | of this work. Nevertheless, it is expected that by using a common streaming 1684 | format, less caching and less overhead in the network will result in a better 1685 | user experience. The client still needs to retrieve license and key information 1686 | by steps defined outside of this specification. Information on how to retrieve 1687 | this information will typically be signaled in the manifest prepared by the 1688 | packager. 1689 | 1690 | ## Example 2: Low-Latency DASH, and Combination of Interface-1 and Interface-2 ## {##example-2} 1691 | 1692 | A second example is given in Figure 11. It constitutes the reference workflow 1693 | for live chunked CMAF developed by DASH-IF and DVB. In this workflow, a 1694 | contribution encoder produces an [=RTP=] mezzanine stream that is transmitted to 1695 | FFmpeg, an example open-source encoder/packager running on a server. 1696 | Alternatively, a file resource may be used. In this workflow, the encoder 1697 | functions as the ingest source. FFmpeg produces the ingest stream with 1698 | different ABR encoded CMAF tracks. In addition, it sends a manifest that 1699 | complies with DASH-IF and DVB low-latency CMAF specification and MPD updates. 1700 | The CMAF tracks also contain respective timing information (i.e., "[=prft=]"). 1701 | In this case, the ingest source implements Interface-1 and Interface-2 based 1702 | ingest at once. By also resending CMAF headers in case of failures both 1703 | interfaces may be satisfied. In some cases, URI rewrite rules are needed to 1704 | achieve the compatibility between Interface-1 and Interface-2. For example, the 1705 | DASH segment naming structure can be used to derive the explicit Streams() 1706 | keywords. 1707 | 1708 | The origin server is used to pass the streams to the client and may in some 1709 | cases also perform a re-encryption or re-packaging of the streaming presentation 1710 | as needed by the clients. The example client is DASH.js and a maximum end-to-end 1711 | latency of 3500 ms is targeted. 1712 | 1713 | The approaches for authentication and DNS resolution are similar for the two 1714 | interfaces, as are the track formatting in case CMAF is used. This example does 1715 | not use timed metadata. The ingest source may resend the CMAF header or 1716 | initialization segment in case of connection failures to conform to the CMAF 1717 | Ingest specification. 1718 | 1719 | Figure 11: DASH-IF/DVB reference live chunked CMAF workflow. 1720 |
1721 | 1722 | 1723 | # Implementations (Informative) # {#implementations} 1724 | 1725 | ## Implementation 1: FFmpeg Support for Interface-1 and Interface-2 ## {##implementation1} 1726 | 1727 | Ingest of a single (or multiple) tracks can be achieved in FFmpeg with the MP4 1728 | and CMAF muxer. This example shows the ingest of a single SMPTE header bar video 1729 | track with FFmpeg. 1730 | 1731 |

1732 | #!/bin/bash
1733 | # Publishing point url is ${PROTO}://${SERVER}:${PORT}/${ID}/ with default ID=live
1734 | SERVER="${1}"
1735 | PORT="${2}"
1736 | FF="${3}"
1737 | ID=live 
1738 | PROTO=http
1739 | 
1740 | ffmpeg -nostats -i smptehdbars=size=1280x720:rate=25 -fflags genpts 
1741 | -write_prft pts -movflags empty_moov+separate_moof+default_base_moof+cmaf 
1742 | -f mp4 {PROTO}://${SERVER}:${PORT}/${ID}//Streams(video-1280x720-700k.cmfv)
1743 | 
1744 | 1745 | A more extensive example with epoch locking (dual-encoder synchronization) is 1746 | available from [=PythonFFmpegIngest=]. In this case, a patch is used to add 1747 | correct audio timescale and epoch time offset to FFmpeg. 1748 | 1749 | An example of CMAF and DASH/HLS ingest can be achieved using the DASH muxer. An 1750 | example script is shown below as provided by FFlabs. 1751 | 1752 |
 
1753 | #!/bin/bash
1754 | ## Example provided by FFlabs of low latency CMAF+DASH+HLS ingest 
1755 | ## Period starts from current time
1756 | # publishing point url is ${PROTO}://${SERVER}:${PORT}/${ID}/ with default ID=live
1757 | SERVER="${1}"
1758 | PORT="${2}"
1759 | FF="${3}"
1760 | 
1761 | # Set your tls files here 
1762 | #TLS_KEY="/home/borgmann/dash/certs/ingest_client_thilo.key"
1763 | #TLS_CRT="/home/borgmann/dash/certs/ingest_client_thilo.crt"
1764 | #TLS_CA="/home/borgmann/dash/certs/ca.crt"
1765 | #TS_OUT="/home/borgmann/dash/ts"
1766 | 
1767 | # Linux camera input may be used as input
1768 | INPUT="/dev/video0"
1769 | INPUT_FPS="10"
1770 | ID=live
1771 | ACODEC=aac
1772 | VCODEC=h264_vaapi
1773 | VCODEC=libx264
1774 | COLOR=bt709
1775 | TARGET_LATENCY="3.5"
1776 | 
1777 | if [ "$SERVER" == "" -o "$PORT" == "" ]
1778 | then
1779 |     echo "Usage: $0   []"
1780 |     exit
1781 | else
1782 |     if [ "$FF" == "" ]
1783 |     then
1784 |         FF=ffmpeg
1785 |     fi
1786 | 
1787 |     if [ "${TLS_KEY}" != "" -a "${TLS_CRT}" != "" -a "${TLS_CA}" != "" ]
1788 |     then
1789 |         PROTO=https
1790 |         HTTP_OPTS="-http_opts key_file=${TLS_KEY},cert_file=${TLS_CRT},ca_file=${TLS_CA},tls_verify=1"
1791 |     else
1792 |         PROTO=http
1793 |         HTTP_OPTS=""
1794 |     fi
1795 | 
1796 |     echo "Ingesting to: ${PROTO}://${SERVER}:${PORT}/${ID}/${ID}.mpd"
1797 | 
1798 | fi
1799 | 
1800 | # DASH HLS CMAF
1801 | ${FF} \
1802 | -framerate ${INPUT_FPS} \
1803 | -i ${INPUT}  \
1804 | -f lavfi -i sine \
1805 | -pix_fmt yuv420p \
1806 | -c:v ${VCODEC} -b:v:0 500K -b:v:1 200K -s:v:0 960x400 -s:v:1 720x300 \
1807 | -map 0:v:0 -map 0:v:0 \
1808 | -c:a ${ACODEC} -b:a 96K -ac 2 \
1809 | -map 1:a:0 \
1810 | -use_timeline 1 \
1811 | -media_seg_name "chunk-stream\$RepresentationID\$-\$Time\$.\$ext\$" \
1812 | -mpd_profile dvb_dash \
1813 | -utc_timing_url "http://time.akamai.com" \
1814 | -format_options "movflags=cmaf" \
1815 | -frag_type duration \
1816 | -adaptation_sets "id=0,seg_duration=7.68,frag_duration=1.92,streams=0,1 id=1,seg_duration=1,frag_type=none,streams=2" \
1817 | -g:v 20 -keyint_min:v 20 -sc_threshold:v 0 -streaming 1 -ldash 1 -tune zerolatency \
1818 | -export_side_data prft \
1819 | -write_prft 1 \
1820 | -target_latency ${TARGET_LATENCY} \
1821 | -color_primaries ${COLOR} -color_trc ${COLOR} -colorspace ${COLOR} \
1822 | -f dash \
1823 | ${HTTP_OPTS} \
1824 | ${PROTO}://${SERVER}:${PORT}/${ID}/${ID}.mpd 
1825 | 
1826 |  
1827 | 1828 | ## Implementation 2: Ingesting CMAF Track Files Based on fmp4 Tools ## {##implementation2} 1829 | 1830 | Another example of ingesting CMAF track files is provided by [=fmp4tools=] as 1831 | described in [=LiveCMAF=]. In this case, stored track files are used. The tool 1832 | can patch the timestamp of the input tracks to a real time and upload the 1833 | segments in real time. The tool can upload timed text and timed metadata tracks. 1834 | Also, the tools support conversion and creation of timed metadata tracks, and 1835 | on-the-fly generation of avail cues based on SCTE-35. 1836 | 1837 | Options available when using fmp4 tools: 1838 |

1839 | Usage: fmp4ingest [options] 
1840 |  [-u url]                       Publishing Point URL
1841 |  [-r, --realtime]               Enable realtime mode
1842 |  [-l, --loop]                   Enable looping arg1 + 1 times
1843 |  [--wc_offset]                  (boolean )Add a wallclock time offset for converting VoD (0) asset to Live
1844 |  [--ism_offset]                 insert a fixed value for hte wallclock time offset instead of using a remote time source uri
1845 |  [--wc_uri]                     uri for fetching wall clock time default time.akamai.com
1846 |  [--initialization]             SegmentTemplate@initialization sets the relative path for init segments, shall include $RepresentationID$
1847 |  [--media]                      SegmentTemplate@media sets the relative path for media segments, shall include $RepresentationID$ and $Time$ or $Number$
1848 |  [--avail]                      signal an advertisment slot every arg1 ms with duration of arg2 ms
1849 |  [--dry_run]                    Do a dry run and write the output files to disk directly for checking file and box integrity
1850 |  [--announce]                   specify the number of seconds in advance to presenation time to send an avail 
1851 |  [--auth]                       Basic Auth Password
1852 |  [--aname]                      Basic Auth User Name
1853 |  [--sslcert]                    TLS 1.2 client certificate
1854 |  [--sslkey]                     TLS private Key
1855 |  [--sslkeypass]                 passphrase
1856 |                    CMAF files to ingest (.cmf[atvm])
1857 | 
1858 | 1859 | Example command line using fmp4 tools: 1860 |

1861 | ## Example with inserting 9600 ms breaks every 57.6 seconds with three track
1862 | files for audio, video and timed text
1863 | ## Also a wallclock time is added
1864 | fmp4ingest -r -u publishing_point_url --wc_offset --avail 57600 9600  tos-096-750k.cmfv tos-096s-128k.cmfa tears-of-steel-nl.cmft
1865 | 
1866 | 1867 | Example creating a timed metadata track from a DASH manifest: 1868 |

1869 | ## Example converting an MPD with DASH events to a timed metadata track 
1870 | dashEventfmp4 scte-35.mpd scte-35.cmfm
1871 | 
1872 | 1873 | # List of Versions and Changes # {#changes} 1874 | 1875 | ## Version 1.0 ## {#version-1-0} 1876 | 1877 | This initial version with Interface-1 and Interface-2 was published in April 2020. 1878 | 1879 | ## Version 1.1 ## {#version-1-1} 1880 | 1881 | Technical updates completed: 1882 | 1883 | 1. Added a section on encoder synchronization (issues #126 and #140) 1884 | 2. Added restriction for single segment per post or PUT (issue #112) 1885 | 3. Added text on encoder input loss (issue #113) 1886 | 4. Added guidance on the manifest formatting (issue #111) 1887 | 5. Added reference to MPEG-B part 18 for timed metadata track (issue #31) 1888 | 6. Clarified emsg time is leading (issue #129) 1889 | 7. Added the brand for the last segment (issue #114) 1890 | 8. Deprecated the usage of mfra to close the ingest (issue #124) 1891 | 9. Allowed common encryption of media tracks (issue #117) 1892 | 10. Added text on requesting segments from an alternative server (issue #119) 1893 | 11. Swapped priority preferred sample entry to hev1/avc3 (issue #115) 1894 | 12. Clarified SCTE-35 carriage (issues #128, #133, #130, #121 and #127) 1895 | 13. Added text for the prft box and made it a requirement (issue #116) 1896 | 14. Added guidelines for constant segment duration for timed metadata (issue #145) 1897 | 15. Added text on conversion of MPEG-2 TS to DASH timeline (issue #131) 1898 | 16. Added an informative section with example implementations (issue #147) 1899 | 17. Added additional requirements on the formatting of DASH MPD for CMAF ingest (issue #125 ) 1900 | 18. Added additional requirements on the formatting of HTTP Live Streaming playlist (issue #148) 1901 | 19. Deprecated streams keyword in favor of manifest + SEgmentTEmplate signals (issue #125) 1902 | 1903 | Editorial updates completed: 1904 | 1905 | 1. Fixed capitalization errors, cross reference errors and some terms 1906 | 2. Updated the references 1907 | 3. Clarified POST_URL vs. publishing_point_URL 1908 | 4. Cleaned up the informative sections 1909 | 5. Updated the diagrams including the fixes 1910 | 6. Updated/simplified the text for the examples 1911 | 7. Fixed several references (including new/updated section numbers) 1912 | 8. Made text referring to CMAF less verbose 1913 | 9. Moved some of the common requirements of Interface 2 to general 1-2 requirements 1914 | 1915 | ## Version 1.2 ## {#version-1-2} 1916 | 1917 | Technical updates completed: 1918 | 1919 | 1. Added an identifier for the protocols 1920 | 2. Added an interface identifier for both interfaces 1921 | 1922 | 1923 | # Acknowledgements # {#contributors} 1924 | 1925 | We thank the contributors from the following companies for their comments and 1926 | support: Huawei, Akamai, BBC, CenturyLink, Microsoft, Unified Streaming, 1927 | Facebook, Hulu, Comcast, ITV, Qualcomm, Tencent, Samsung, MediaExcel, Harmonic, 1928 | Sony, Arris, Bitmovin, ATEME, EZDRM, DSR, Broadpeak and AWS Elemental. 1929 | 1930 | # URL References # {#url-references} 1931 | 1932 | fmp4git: Unified Streaming fmp4-ingest: 1933 | https://github.com/unifiedstreaming/fmp4-ingest 1934 | 1935 | aomid3: Carriage of ID3 Timed Metadata in the Common Media 1936 | Application Format (CMAF): https://aomediacodec.github.io/id3-emsg 1937 | 1938 | Mozilla-TLS: Mozilla Wiki Security/Server Side TLS: 1939 | https://wiki.mozilla.org/Security/Server_Side_TLS#Intermediate_compatibility_.28recommended.29 1940 | 1941 | MS-SSTR: Smooth Streaming Protocol: 1942 | https://msdn.microsoft.com/en-us/library/ff469518.aspx 1943 | 1944 | fmp4tools: fmp4 Ingest Tools: 1945 | https://github.com/unifiedstreaming/fmp4-ingest/tree/master/ingest-tools 1946 | 1947 | LiveCMAF: Tools for Live CMAF Ingest: 1948 | https://dl.acm.org/doi/abs/10.1145/3339825.3394933 1949 | 1950 | DASH-IFad: Advanced Ad Insertion in DASH (under community 1951 | review): https://dashif.org/docs/CR-Ad-Insertion-r4.pdf 1952 | 1953 | PythonFFmpegIngest: Python Script for Generating Interface-1 with 1954 | FFmpeg: 1955 | https://github.com/unifiedstreaming/live-demo-cmaf/blob/master/ffmpeg/entrypoint.py 1956 | 1957 | 1958 |
1959 | Revision: 1.0
1960 | 
1961 | Title: DASH-IF Live Media Ingest Protocol
1962 | Status: iso/TS
1963 | Deadline: 2019-07-31
1964 | Status Text: IOP Approved Technical Specification
1965 | Shortname: live ingest
1966 | URL: https://dashif.org/guidelines/
1967 | Issue Tracking: GitHub https://github.com/Dash-Industry-Forum/Ingest/issues
1968 | Repository: https://github.com/Dash-Industry-Forum/Ingest GitHub
1969 | Editor: DASH-IF Ingest TF
1970 | 
1971 | Default Highlight: text
1972 | Line Numbers: off
1973 | Markup Shorthands: markdown yes
1974 | Boilerplate: copyright off, abstract off
1975 | Abstract: None
1976 | Image Auto Size: false
1977 | Date: 2024-02-28
1978 | 
1979 | 1980 | 1981 |
{}
1982 | 1983 | 1984 |

1985 | 
1986 | 
1987 |  
1988 | 
1989 | --------------------------------------------------------------------------------