├── .DS_Store ├── .github └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── requirements.txt ├── src ├── DataCollection.srt ├── License.txt ├── Notice.txt ├── THIRD_PARTY_LICENSES.txt ├── __pycache__ │ ├── audioUtils.cpython-37.pyc │ └── srtUtils.cpython-37.pyc ├── audioUtils.py ├── concatenateVideos.py ├── makevideo.bat ├── srt.py ├── srtUtils.py ├── transcribeUtils.py ├── translatevideo.py └── videoUtils.py └── tools ├── srtUtils.py ├── testWebVTT.py └── webvttUtils.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-transcribe-captioning-tools/14d8b58186fd71c2145e6cc76719ffc0be3a7087/.DS_Store -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | **/__pycache__/ 2 | *.pyc 3 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check [existing open](https://github.com/aws-samples/aws-transcribe-captioning-tools/issues), or [recently closed](https://github.com/aws-samples/aws-transcribe-captioning-tools/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/aws-samples/aws-transcribe-captioning-tools/labels/help%20wanted) issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](https://github.com/aws-samples/aws-transcribe-captioning-tools/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this 4 | software and associated documentation files (the "Software"), to deal in the Software 5 | without restriction, including without limitation the rights to use, copy, modify, 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 7 | permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AWS VOD Captioning using AWS Transcribe 2 | 3 | > Add subtitles to video with AWS machine learning services. Inlcuding AWS Polly, AWS Transcribe, and AWS Translate. 4 | 5 | ## Overview 6 | This repository contains code for VOD subtitle creation, described in the AWS blog post [“Create video subtitles with translation using machine learning”](https://aws.amazon.com/blogs/machine-learning/create-video-subtitles-with-translation-using-machine-learning/). 7 | 8 | ## Prerequisites 9 | 10 | - Set up an AWS account. ([instructions](https://AWS.amazon.com/free/?sc_channel=PS&sc_campaign=acquisition_US&sc_publisher=google&sc_medium=cloud_computing_b&sc_content=AWS_account_bmm_control_q32016&sc_detail=%2BAWS%20%2Baccount&sc_category=cloud_computing&sc_segment=102882724242&sc_matchtype=b&sc_country=US&s_kwcid=AL!4422!3!102882724242!b!!g!!%2BAWS%20%2Baccount&ef_id=WS3s1AAAAJur-Oj2:20170825145941:s)) 11 | - Clone this repo. 12 | - The other requirements are listed in this ([blog post](https://aws.amazon.com/blogs/machine-learning/create-video-subtitles-with-translation-using-machine-learning/)) 13 | - Configure AWS CLI and a local credentials file. ([instructions](http://docs.AWS.amazon.com/cli/latest/userguide/cli-chap-welcome.html)) 14 | 15 | 16 | ## Getting Started 17 | 18 | Head on over to this blog post to see the instructions to create captions with AWS Transcribe in the SRT format, create alternate language SRT files with AWS Translate, and use AWS Polly to create alternate language video files: 19 | https://aws.amazon.com/blogs/machine-learning/create-video-subtitles-with-translation-using-machine-learning/ 20 | 21 | 22 | 23 | 24 | ## More AWS Transcribe Tools for Video 25 | 26 | If you just want to create an SRT or a VTT file, the tools directory contains Python code to convert AWS Transcribe JSON to an SRT or a VTT file. These files can be imported and used on web or desktop video players. 27 | 28 | ```shell 29 | python srt.py output_file_from_transcribe.json output.srt 30 | ``` 31 | 32 | 33 | | name | description | 34 | |-------|-------------| 35 | |srt.py | Takes the JSON response from AWS Transcribe and converts to a captions.srt file | 36 | |vtt.py | Takes the JSON response from AWS Transcribe and converts to a captions.vtt file | 37 | 38 | 39 | ## License Summary 40 | 41 | This sample code is made available under a modified MIT license.See the LICENSE file. 42 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | boto3==1.9.136 2 | moviepy==1.0.3 -------------------------------------------------------------------------------- /src/DataCollection.srt: -------------------------------------------------------------------------------- 1 | 1 2 | 00:00:10,240 --> 00:00:14,529 3 | Hello, Statistic body. I'm yoga. Welcome to 4 | 5 | 2 6 | 00:00:14,539 --> 00:00:21,510 7 | competition. All statistic subject. Okay, in this 8 | 9 | 3 10 | 00:00:21,510 --> 00:00:32,600 11 | section I will tell you about data collection, the 12 | 13 | 4 14 | 00:00:32,600 --> 00:00:39,299 15 | objectives off the session, our definition off data types 16 | 17 | 5 18 | 00:00:39,310 --> 00:00:45,579 19 | , off data primary and secondary data and the last 20 | 21 | 6 22 | 00:00:45,590 --> 00:00:53,719 23 | IHS data collection techniques. Okay, Now I start 24 | 25 | 7 26 | 00:00:53,729 --> 00:01:00,490 27 | from definition off data, as we know from the 28 | 29 | 8 30 | 00:01:00,490 --> 00:01:06,930 31 | previous chapter that statistics is important and closely related to 32 | 33 | 9 34 | 00:01:06,930 --> 00:01:11,269 35 | data. Now, I will explain about the definition 36 | 37 | 10 38 | 00:01:11,280 --> 00:01:17,200 39 | off data itself. In definition one, data are 40 | 41 | 11 42 | 00:01:17,200 --> 00:01:23,200 43 | plain facts usually roll numbers and in the definition to 44 | 45 | 12 46 | 00:01:23,939 --> 00:01:30,140 47 | data are individual pieces off, factual information recorded and 48 | 49 | 13 50 | 00:01:30,140 --> 00:01:37,000 51 | used for the purpose off analysis. So from the 52 | 53 | 14 54 | 00:01:37,010 --> 00:01:40,969 55 | two definitions, we know that data is the part 56 | 57 | 15 58 | 00:01:40,969 --> 00:01:49,319 59 | off information. Now, I will explain about types 60 | 61 | 16 62 | 00:01:49,329 --> 00:01:56,659 63 | off data data is divided into two kinds, namely 64 | 65 | 17 66 | 00:01:57,040 --> 00:02:08,120 67 | qualitative and quantitative. Qualitative data itself is divided into 68 | 69 | 18 70 | 00:02:08,120 --> 00:02:15,650 71 | nominal and or denial for quantitative is defected into inter 72 | 73 | 19 74 | 00:02:15,650 --> 00:02:21,680 75 | file and rescue each off. The inter fall and 76 | 77 | 20 78 | 00:02:21,680 --> 00:02:32,370 79 | rescue have discrete data and continuous data. Okay, 80 | 81 | 21 82 | 00:02:34,439 --> 00:02:37,960 83 | Now I will explain more detail about the types off 84 | 85 | 22 86 | 00:02:37,969 --> 00:02:43,939 87 | data. Qualitative. Okay. Qualitative is a data 88 | 89 | 23 90 | 00:02:43,939 --> 00:02:49,469 91 | concerned with descriptions which can be observed but cannot be 92 | 93 | 24 94 | 00:02:49,469 --> 00:02:55,969 95 | computed. And as we know, that qualitative data 96 | 97 | 25 98 | 00:02:57,939 --> 00:03:02,050 99 | is divided into nominal an orginal scale. No, 100 | 101 | 26 102 | 00:03:02,939 --> 00:03:07,840 103 | I will explain about the nominal scale nominal scale called 104 | 105 | 27 106 | 00:03:07,840 --> 00:03:14,449 107 | simply because levels you can check the examples below to 108 | 109 | 28 110 | 00:03:14,449 --> 00:03:27,259 111 | understand what the nominal is now for the orginal scale 112 | 113 | 29 114 | 00:03:28,539 --> 00:03:30,990 115 | , the orginal scale have order off the values. 116 | 117 | 30 118 | 00:03:31,610 --> 00:03:37,979 119 | The order is important and significant, but the differences 120 | 121 | 31 122 | 00:03:37,979 --> 00:03:44,020 123 | between each one is not really known. And you 124 | 125 | 32 126 | 00:03:44,020 --> 00:03:53,840 127 | can see the example below. Okay. Now its 128 | 129 | 33 130 | 00:03:53,840 --> 00:04:01,569 131 | quantitative quantitative is the one that focus on numbers and 132 | 133 | 34 134 | 00:04:01,770 --> 00:04:10,159 135 | mathematical calculations and can be calculated and computed and quantitative 136 | 137 | 35 138 | 00:04:11,340 --> 00:04:15,529 139 | . Uh huh. Toe rescue. The first is 140 | 141 | 36 142 | 00:04:15,540 --> 00:04:23,839 143 | interval scale and interval skills are numbering skills in which 144 | 145 | 37 146 | 00:04:23,850 --> 00:04:28,209 147 | we know both the order and the exact differences between 148 | 149 | 38 150 | 00:04:28,230 --> 00:04:33,129 151 | the values and second is ratio skills. Raise your 152 | 153 | 39 154 | 00:04:33,129 --> 00:04:39,709 155 | skills are data measurement skills because they tell us about 156 | 157 | 40 158 | 00:04:39,709 --> 00:04:43,589 159 | the order. They tell us the exact value between 160 | 161 | 41 162 | 00:04:43,589 --> 00:04:47,949 163 | units, and they also have a new absolute zero 164 | 165 | 42 166 | 00:04:48,189 --> 00:04:54,930 167 | , which allows for a wide range off both descriptive 168 | 169 | 43 170 | 00:04:54,939 --> 00:05:00,829 171 | and inferential statistics to be applied now I will explain 172 | 173 | 44 174 | 00:05:00,829 --> 00:05:11,980 175 | about discrete and continuous data for discrete data can only 176 | 177 | 45 178 | 00:05:11,980 --> 00:05:19,629 179 | take certain values. And that's the example, the 180 | 181 | 46 182 | 00:05:19,629 --> 00:05:26,850 183 | number off student and the number that appear after you 184 | 185 | 47 186 | 00:05:26,850 --> 00:05:40,860 187 | rolling dies and continuous data for continuous data can take 188 | 189 | 48 190 | 00:05:41,439 --> 00:05:46,699 191 | any value within a range. And the examples are 192 | 193 | 49 194 | 00:05:46,709 --> 00:05:51,930 195 | the first is a person's hey and then the time 196 | 197 | 50 198 | 00:05:51,939 --> 00:05:58,050 199 | in a race. And then a talks waked and 200 | 201 | 51 202 | 00:05:58,240 --> 00:06:05,930 203 | the length off a leaf. No, If there's 204 | 205 | 52 206 | 00:06:06,230 --> 00:06:15,230 207 | a question how we to get the data, the 208 | 209 | 53 210 | 00:06:15,230 --> 00:06:20,750 211 | answer is they're too option to get data. First 212 | 213 | 54 214 | 00:06:21,639 --> 00:06:27,089 215 | is get the data by ourselves. For example, 216 | 217 | 55 218 | 00:06:27,180 --> 00:06:31,160 219 | a researcher conduct some research, and he gathered the 220 | 221 | 56 222 | 00:06:31,160 --> 00:06:35,730 223 | data by himself. We called the data as primary 224 | 225 | 57 226 | 00:06:35,740 --> 00:06:42,790 227 | data. Second is Katie data from another source. 228 | 229 | 58 230 | 00:06:43,540 --> 00:06:46,259 231 | For example, I collect the data from Internet or 232 | 233 | 59 234 | 00:06:46,259 --> 00:06:49,769 235 | I ask my fellow researcher to give his data. 236 | 237 | 60 238 | 00:06:50,230 --> 00:06:54,769 239 | The data that I get is called secondary data, 240 | 241 | 61 242 | 00:06:59,139 --> 00:07:01,680 243 | and there are many techniques to get the data. 244 | 245 | 62 246 | 00:07:01,769 --> 00:07:05,089 247 | But in this session I only mentioned five techniques, 248 | 249 | 63 250 | 00:07:05,439 --> 00:07:14,649 251 | namely, record station senses, survey experiment and observation 252 | 253 | 64 254 | 00:07:17,439 --> 00:07:24,560 255 | . Registration is a method which in places more on 256 | 257 | 65 258 | 00:07:24,560 --> 00:07:35,610 259 | structured recording through various institutions, and census is a 260 | 261 | 66 262 | 00:07:35,610 --> 00:07:42,060 263 | complete way off collecting data where all elements in the 264 | 265 | 67 266 | 00:07:42,060 --> 00:07:46,800 267 | population that are object off the research are investigated or 268 | 269 | 68 270 | 00:07:46,810 --> 00:07:56,889 271 | enumerated one by one. And then the next ISS 272 | 273 | 69 274 | 00:07:56,889 --> 00:08:01,300 275 | survey survey is collecting information from a sample group to 276 | 277 | 70 278 | 00:08:01,300 --> 00:08:09,649 279 | learn about the entire population and next ISS experiment. 280 | 281 | 71 282 | 00:08:11,110 --> 00:08:16,930 283 | An experimental study has the researcher purposely attempting to influence 284 | 285 | 72 286 | 00:08:16,939 --> 00:08:20,939 287 | the result. The goal is to do their mind 288 | 289 | 73 290 | 00:08:20,949 --> 00:08:26,550 291 | what effect a particular treatment has on the outcome. 292 | 293 | 74 294 | 00:08:28,439 --> 00:08:33,570 295 | Researchers take measurements or surface off the sample population, 296 | 297 | 75 298 | 00:08:35,240 --> 00:08:46,169 299 | and you can read the example below. No, 300 | 301 | 76 302 | 00:08:46,309 --> 00:08:52,710 303 | the observational in the observational study, the simple population 304 | 305 | 77 306 | 00:08:52,720 --> 00:08:58,529 307 | being studied ISS miserable or surveilled as it ISS. 308 | 309 | 78 310 | 00:08:58,340 --> 00:09:05,879 311 | The researcher observes the subjects and missiles variables but doesn't 312 | 313 | 79 314 | 00:09:05,879 --> 00:09:09,779 315 | influence the population in any way or attempt to intervene 316 | 317 | 80 318 | 00:09:09,970 --> 00:09:13,990 319 | in the study. There is no manipulation by the 320 | 321 | 81 322 | 00:09:13,990 --> 00:09:20,659 323 | researcher, and the last is that those that we 324 | 325 | 82 326 | 00:09:20,659 --> 00:09:26,480 327 | can use to collect the data. You can use 328 | 329 | 83 330 | 00:09:26,480 --> 00:09:33,049 331 | questionnaire in their view checklist or any digital tools. 332 | 333 | 84 334 | 00:09:35,240 --> 00:09:41,110 335 | Okay, I think enough for the session. I 336 | 337 | 85 338 | 00:09:41,110 --> 00:09:45,769 339 | hope you enjoy that. And CIA 340 | 341 | -------------------------------------------------------------------------------- /src/License.txt: -------------------------------------------------------------------------------- 1 | MIT No Attribution 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this 4 | software and associated documentation files (the "Software"), to deal in the Software 5 | without restriction, including without limitation the rights to use, copy, modify, 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 7 | permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /src/Notice.txt: -------------------------------------------------------------------------------- 1 | Transcribing and Subtitling Videos Using Amazon Services 2 | Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | -------------------------------------------------------------------------------- /src/THIRD_PARTY_LICENSES.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-transcribe-captioning-tools/14d8b58186fd71c2145e6cc76719ffc0be3a7087/src/THIRD_PARTY_LICENSES.txt -------------------------------------------------------------------------------- /src/__pycache__/audioUtils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-transcribe-captioning-tools/14d8b58186fd71c2145e6cc76719ffc0be3a7087/src/__pycache__/audioUtils.cpython-37.pyc -------------------------------------------------------------------------------- /src/__pycache__/srtUtils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-transcribe-captioning-tools/14d8b58186fd71c2145e6cc76719ffc0be3a7087/src/__pycache__/srtUtils.cpython-37.pyc -------------------------------------------------------------------------------- /src/audioUtils.py: -------------------------------------------------------------------------------- 1 | # ================================================================================== 2 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # ================================================================================== 17 | # 18 | # audioUtils.py 19 | # by: Rob Dachowski 20 | # For questions or feedback, please contact robdac@amazon.com 21 | # 22 | # Purpose: The program provides a number of utility audio functions used to create 23 | # transcribed, translated, and subtitled videos using Amazon Transcribe, 24 | # Amazon Translate, Amazon Polly, and MoviePy 25 | # 26 | # Change Log: 27 | # 6/29/2018: Initial version 28 | # 29 | # ================================================================================== 30 | 31 | 32 | import boto3 33 | import os 34 | import json 35 | import contextlib 36 | from moviepy.editor import * 37 | from moviepy import editor 38 | from contextlib import closing 39 | 40 | # ================================================================================== 41 | # Function: writeAudio 42 | # Purpose: writes the bytes associates with the stream to a binary file 43 | # Parameters: 44 | # output_file - the name + extension of the ouptut file (e.g. "abc.mp3") 45 | # stream - the stream of bytes to write to the output_file 46 | # ================================================================================== 47 | def writeAudio( output_file, stream ): 48 | 49 | bytes = stream.read() 50 | 51 | print("\t==> Writing {:d} bytes to audio file: {:s}".format(len(output_file), output_file)) 52 | try: 53 | # Open a file for writing the output as a binary stream 54 | with open(output_file, "wb") as file: 55 | file.write(bytes) 56 | 57 | if file.closed: 58 | print("\t==> {:s} is closed".format(output_file)) 59 | else: 60 | print("\t==> {:s} is NOT closed".format(output_file)) 61 | except IOError as error: 62 | # Could not write to file, exit gracefully 63 | print(error) 64 | sys.exit(-1) 65 | 66 | # ================================================================================== 67 | # Function: createAudioTrackFromTranslation 68 | # Purpose: Using the provided transcript, get a translation from Amazon Translate, then use Amazon Polly to synthesize speech 69 | # Prrameters: 70 | # region - the aws region in which to run the service 71 | # transcript - the Amazon Transcribe JSON structure to translate 72 | # sourceLangCode - the language code for the original content (e.g. English = "EN") 73 | # targetLangCode - the language code for the translated content (e.g. Spanich = "ES") 74 | # audioFileName - the name (including extension) of the target audio file (e.g. "abc.mp3") 75 | # ================================================================================== 76 | def createAudioTrackFromTranslation( region, transcript, sourceLangCode, targetLangCode, audioFileName ): 77 | print( "\n==> createAudioTrackFromTranslation " ) 78 | 79 | # Set up the polly and translate services 80 | client = boto3.client('polly') 81 | translate = boto3.client(service_name='translate', region_name=region, use_ssl=True) 82 | 83 | #get the transcript text 84 | temp = json.loads( transcript) 85 | transcript_txt = temp["results"]["transcripts"][0]["transcript"] 86 | 87 | voiceId = getVoiceId( targetLangCode ) 88 | 89 | # Now translate it. 90 | translated_txt = unicode((translate.translate_text(Text=transcript_txt, SourceLanguageCode=sourceLangCode, TargetLanguageCode=targetLangCode))["TranslatedText"])[:2999] 91 | 92 | # Use the translated text to create the synthesized speech 93 | response = client.synthesize_speech( OutputFormat="mp3", SampleRate="22050", Text=translated_txt, VoiceId=voiceId) 94 | 95 | if response["ResponseMetadata"]["HTTPStatusCode"] == 200: 96 | print( "\t==> Successfully called Polly for speech synthesis") 97 | writeAudioStream( response, audioFileName ) 98 | else: 99 | print( "\t==> Error calling Polly for speech synthesis") 100 | 101 | 102 | # ================================================================================== 103 | # Function: writeAudioStream 104 | # Purpose: Utility to write an audio file from the response from the Amazon Polly API 105 | # Prrameters: 106 | # response - the Amazaon Polly JSON response 107 | # audioFileName - the name (including extension) of the target audio file (e.g. "abc.mp3") 108 | # ================================================================================== 109 | def writeAudioStream( response, audioFileName ): 110 | # Take the resulting stream and write it to an mp3 file 111 | if "AudioStream" in response: 112 | with closing(response["AudioStream"]) as stream: 113 | output = audioFileName 114 | writeAudio( output, stream ) 115 | 116 | 117 | 118 | # ================================================================================== 119 | # Function: getVoiceId 120 | # Purpose: Utility to return the name of the voice to use given a language code. Note: this is only populated with the 121 | # VoiceIds used for this example. Refer to the Amazon Polly API documentation for other voiceId names 122 | # Prrameters: 123 | # targetLangCode - the language code used for the target Amazon Polly output 124 | # ================================================================================== 125 | def getVoiceId( targetLangCode ): 126 | 127 | # Feel free to add others as desired 128 | if targetLangCode == "es": 129 | voiceId = "Penelope" 130 | elif targetLangCode == "de": 131 | voiceId = "Marlene" 132 | 133 | return voiceId 134 | 135 | 136 | # ================================================================================== 137 | # Function: getSecondsFromTranslation 138 | # Purpose: Utility to determine how long in seconds it will take for a particular phrase of translated text to be spoken 139 | # Prrameters: 140 | # textToSynthesize - the raw text to be synthesized 141 | # targetLangCode - the language code used for the target Amazon Polly output 142 | # audioFileName - the name (including extension) of the target audio file (e.g. "abc.mp3") 143 | # ================================================================================== 144 | def getSecondsFromTranslation( textToSynthesize, targetLangCode, audioFileName ): 145 | 146 | # Set up the polly and translate services 147 | client = boto3.client('polly') 148 | translate = boto3.client(service_name='translate', region_name="us-east-1", use_ssl=True) 149 | 150 | # Use the translated text to create the synthesized speech 151 | response = client.synthesize_speech( OutputFormat="mp3", SampleRate="22050", Text=textToSynthesize, VoiceId=getVoiceId( targetLangCode ) ) 152 | 153 | # write the stream out to disk so that we can load it into an AudioClip 154 | writeAudioStream( response, audioFileName ) 155 | 156 | # Load the temporary audio clip into an AudioFileClip 157 | audio = AudioFileClip( audioFileName) 158 | 159 | # return the duration 160 | return audio.duration 161 | 162 | 163 | -------------------------------------------------------------------------------- /src/concatenateVideos.py: -------------------------------------------------------------------------------- 1 | # ================================================================================== 2 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # ================================================================================== 17 | # 18 | # concatenateVideos.py 19 | # by: Rob Dachowski 20 | # For questions or feedback, please contact robdac@amazon.com 21 | # 22 | # Purpose: This code uses the output of makevideo.bat to combine the clips into a short demo consisting of 23 | # short subclips and some title frames 24 | 25 | # Change Log: 26 | # 6/29/2018: Initial version 27 | # 28 | # ================================================================================== 29 | 30 | # Import everything needed to edit video clips 31 | from moviepy.editor import * 32 | from moviepy import editor 33 | from moviepy.video.tools.subtitles import SubtitlesClip 34 | #import moviepy.video.fx.all as vfx 35 | from time import gmtime, strftime 36 | 37 | 38 | # Load the clips outputed from makevideo.bat 39 | print strftime("%H:%M:%S", gmtime()), "Reading video English clip..." 40 | english = VideoFileClip("subtitledVideo-en.mp4") 41 | english = english.subclip( 0, 15).set_duration(15) 42 | 43 | print strftime("%H:%M:%S", gmtime()), "Reading video Spanish clip..." 44 | spanish = VideoFileClip("subtitledVideo-es.mp4") 45 | spanish = spanish.subclip( 15, 30).set_duration(15) 46 | 47 | print strftime("%H:%M:%S", gmtime()), "Reading video German clip..." 48 | german = VideoFileClip("subtitledVideo-de.mp4") 49 | german = german.subclip( 30, 45).set_duration(15) 50 | 51 | 52 | 53 | print strftime("%H:%M:%S", gmtime()), "Creating title..." 54 | # Generate a text clip. You can customize the font, color, etc. 55 | toptitle = TextClip("Creating Subtitles and Translations Using Amazon Services:\n\nAmazon Transcribe\nAmazon Translate\nAmazon Polly",fontsize=36,color='white', bg_color='black', method="caption", align="center", size=english.size) 56 | toptitle.set_duration(5) 57 | 58 | 59 | subtitle1 = TextClip("re:Invent 2017 Keynote Address",fontsize=36,color='white', bg_color='black', method="caption", align="center", size=english.size) 60 | subtitle1.set_duration(5) 61 | 62 | subtitle2 = TextClip( "\nAndy Jassy, President and CEO of Amazon Web Services", fontsize=28, color='white', bg_color='black', method="caption", align="center ", size=english.size) 63 | subtitle2.set_duration(5) 64 | 65 | # Composite the video clips into a title page 66 | title = CompositeVideoClip( [ toptitle, subtitle1.set_start(5), subtitle2.set_start(9)] ).set_duration(15) 67 | 68 | 69 | #Create text clips for the various different translations 70 | est = TextClip("English Subtitles\nUsing Amazon Transcribe",fontsize=24,color='white', bg_color='black', method="caption", align="center", size=english.size) 71 | est = est.set_pos('center').set_duration(2.5) 72 | 73 | sst = TextClip("Spanish Subtitles\nUsing Amazon Transcribe, Amazon Translate, and Amazon Polly",fontsize=24,color='white', bg_color='black', method="caption", align="center", size=english.size) 74 | sst = sst.set_pos('center').set_duration(2.5) 75 | 76 | dst = TextClip("German Subtitles\nUsing Amazon Transcribe, Amazon Translate, and Amazon Polly",fontsize=24,color='white', bg_color='black', method="caption", align="center", size=english.size) 77 | dst = dst.set_pos('center').set_duration(2.5) 78 | 79 | print strftime("%H:%M:%S", gmtime()), "Concatenating videos" 80 | 81 | # concatenate the various titles, subtitles, and clips together 82 | combined = concatenate_videoclips( [title.crossfadeout(2), est, english, sst, spanish, dst, german] ) 83 | 84 | # Write the result to a file (many options available !) 85 | print strftime("%H:%M:%S", gmtime()), "Writing concatnated video" 86 | combined.write_videofile("combined.mp4", codec="libx264", audio_codec="aac", fps=24) -------------------------------------------------------------------------------- /src/makevideo.bat: -------------------------------------------------------------------------------- 1 | REM ================================================================================== 2 | REM Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | REM Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | REM software and associated documentation files (the "Software"), to deal in the Software 6 | REM without restriction, including without limitation the rights to use, copy, modify, 7 | REM merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | REM permit persons to whom the Software is furnished to do so. 9 | 10 | REM THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | REM INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | REM PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | REM HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | REM OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | REM SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | REM ================================================================================== 17 | REM 18 | REM makevideo.bat 19 | REM by: Rob Dachowski 20 | REM For questions or feedback, please contact robdac@amazon.com 21 | REM 22 | REM Purpose: This batchfile invokes the translatevideo.py file with parameters 23 | REM 24 | REM Change Log: 25 | REM 6/29/2018: Initial version 26 | REM 27 | REM ================================================================================== 28 | 29 | cls 30 | python translatevideo.py -region us-east-1 -inbucket robdac-aiml-test/ -infile AWS_reInvent_2017.mp4 -outbucket robdac-aiml-test/ -outfilename subtitledVideo -outfiletype mp4 -outlang es de 31 | -------------------------------------------------------------------------------- /src/srt.py: -------------------------------------------------------------------------------- 1 | import sys 2 | 3 | from srtUtils import * 4 | 5 | input_file = sys.argv[1] 6 | output_file = sys.argv[2] 7 | 8 | with open(input_file, "r") as f: 9 | data = writeTranscriptToSRT(f.read(), 'en', output_file ) -------------------------------------------------------------------------------- /src/srtUtils.py: -------------------------------------------------------------------------------- 1 | # ================================================================================== 2 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # ================================================================================== 17 | # 18 | # srtUtils.py 19 | # by: Rob Dachowski 20 | # For questions or feedback, please contact robdac@amazon.com 21 | # 22 | # Purpose: The program provides a number of utility functions for creating SubRip Subtitle files (.SRT) 23 | # 24 | # Change Log: 25 | # 6/29/2018: Initial version 26 | # 27 | # ================================================================================== 28 | 29 | import json 30 | import boto3 31 | import re 32 | import codecs 33 | import time 34 | import math 35 | from audioUtils import * 36 | 37 | 38 | 39 | # ================================================================================== 40 | # Function: newPhrase 41 | # Purpose: simply create a phrase tuple 42 | # Parameters: 43 | # None 44 | # ================================================================================== 45 | def newPhrase(): 46 | return { 'start_time': '', 'end_time': '', 'words' : [] } 47 | 48 | 49 | 50 | # ================================================================================== 51 | # Function: getTimeCode 52 | # Purpose: Format and return a string that contains the converted number of seconds into SRT format 53 | # Parameters: 54 | # seconds - the duration in seconds to convert to HH:MM:SS,mmm 55 | # ================================================================================== 56 | # Format and return a string that contains the converted number of seconds into SRT format 57 | def getTimeCode(seconds): 58 | # ....t_hund = int(seconds % 1 * 1000) 59 | # ....t_seconds = int( seconds ) 60 | # ....t_secs = ((float( t_seconds) / 60) % 1) * 60 61 | # ....t_mins = int( t_seconds / 60 ) 62 | # ....return str( "%02d:%02d:%02d,%03d" % (00, t_mins, int(t_secs), t_hund )) 63 | (frac, whole) = math.modf(seconds) 64 | frac = frac * 1000 65 | return str('%s,%03d' % (time.strftime('%H:%M:%S',time.gmtime(whole)), frac)) 66 | 67 | 68 | # ================================================================================== 69 | # Function: writeTranscriptToSRT 70 | # Purpose: Function to get the phrases from the transcript and write it out to an SRT file 71 | # Parameters: 72 | # transcript - the JSON output from Amazon Transcribe 73 | # sourceLangCode - the language code for the original content (e.g. English = "EN") 74 | # srtFileName - the name of the SRT file (e.g. "mySRT.SRT") 75 | # ================================================================================== 76 | def writeTranscriptToSRT( transcript, sourceLangCode, srtFileName ): 77 | # Write the SRT file for the original language 78 | print( "==> Creating SRT from transcript") 79 | phrases = getPhrasesFromTranscript( transcript ) 80 | writeSRT( phrases, srtFileName ) 81 | 82 | 83 | 84 | 85 | # ================================================================================== 86 | # Function: writeTranscriptToSRT 87 | # Purpose: Based on the JSON transcript provided by Amazon Transcribe, get the phrases from the translation 88 | # and write it out to an SRT file 89 | # Parameters: 90 | # transcript - the JSON output from Amazon Transcribe 91 | # sourceLangCode - the language code for the original content (e.g. English = "EN") 92 | # targetLangCode - the language code for the translated content (e.g. Spanich = "ES") 93 | # srtFileName - the name of the SRT file (e.g. "mySRT.SRT") 94 | # ================================================================================== 95 | def writeTranslationToSRT( transcript, sourceLangCode, targetLangCode, srtFileName, region ): 96 | # First get the translation 97 | print( "\n\n==> Translating from " + sourceLangCode + " to " + targetLangCode ) 98 | translation = translateTranscript( transcript, sourceLangCode, targetLangCode, region ) 99 | #print( "\n\n==> Translation: " + str(translation)) 100 | 101 | # Now create phrases from the translation 102 | textToTranslate = unicode(translation["TranslatedText"]) 103 | phrases = getPhrasesFromTranslation( textToTranslate, targetLangCode ) 104 | writeSRT( phrases, srtFileName ) 105 | 106 | 107 | # ================================================================================== 108 | # Function: getPhrasesFromTranslation 109 | # Purpose: Based on the JSON translation provided by Amazon Translate, get the phrases from the translation 110 | # and write it out to an SRT file. Note that since we are using a block of translated text rather than 111 | # a JSON structure with the timing for the start and end of each word as in the output of Transcribe, 112 | # we will need to calculate the start and end-time for each phrase 113 | # Parameters: 114 | # translation - the JSON output from Amazon Translate 115 | # targetLangCode - the language code for the translated content (e.g. Spanich = "ES") 116 | # ================================================================================== 117 | def getPhrasesFromTranslation( translation, targetLangCode ): 118 | 119 | # Now create phrases from the translation 120 | words = translation.split() 121 | 122 | #print( words ) #debug statement 123 | 124 | #set up some variables for the first pass 125 | phrase = newPhrase() 126 | phrases = [] 127 | nPhrase = True 128 | x = 0 129 | c = 0 130 | seconds = 0 131 | 132 | print("==> Creating phrases from translation...") 133 | 134 | for word in words: 135 | 136 | # if it is a new phrase, then get the start_time of the first item 137 | if nPhrase == True: 138 | phrase["start_time"] = getTimeCode( seconds ) 139 | nPhrase = False 140 | c += 1 141 | 142 | # Append the word to the phrase... 143 | phrase["words"].append(word) 144 | x += 1 145 | 146 | 147 | # now add the phrase to the phrases, generate a new phrase, etc. 148 | if x == 10: 149 | 150 | # For Translations, we now need to calculate the end time for the phrase 151 | psecs = getSecondsFromTranslation( getPhraseText( phrase), targetLangCode, "phraseAudio" + str(c) + ".mp3" ) 152 | seconds += psecs 153 | phrase["end_time"] = getTimeCode( seconds ) 154 | 155 | #print c, phrase 156 | phrases.append(phrase) 157 | phrase = newPhrase() 158 | nPhrase = True 159 | #seconds += .001 160 | x = 0 161 | 162 | # This if statement is to address a defect in the SubtitleClip. If the Subtitles end up being 163 | # a different duration than the content, MoviePy will sometimes fail with unexpected errors while 164 | # processing the subclip. This is limiting it to something less than the total duration for our example 165 | # however, you may need to modify or eliminate this line depending on your content. 166 | if c == 30: 167 | break 168 | 169 | return phrases 170 | 171 | 172 | # ================================================================================== 173 | # Function: getPhrasesFromTranscript 174 | # Purpose: Based on the JSON transcript provided by Amazon Transcribe, get the phrases from the translation 175 | # and write it out to an SRT file 176 | # Parameters: 177 | # transcript - the JSON output from Amazon Transcribe 178 | # ================================================================================== 179 | def getPhrasesFromTranscript( transcript ): 180 | 181 | # This function is intended to be called with the JSON structure output from the Transcribe service. However, 182 | # if you only have the translation of the transcript, then you should call getPhrasesFromTranslation instead 183 | 184 | # Now create phrases from the translation 185 | ts = json.loads( transcript ) 186 | items = ts['results']['items'] 187 | #print( items ) 188 | 189 | #set up some variables for the first pass 190 | phrase = newPhrase() 191 | phrases = [] 192 | nPhrase = True 193 | x = 0 194 | c = 0 195 | lastEndTime = "" 196 | 197 | print("==> Creating phrases from transcript...") 198 | 199 | for item in items: 200 | 201 | # if it is a new phrase, then get the start_time of the first item 202 | if nPhrase == True: 203 | if item["type"] == "pronunciation": 204 | phrase["start_time"] = getTimeCode( float(item["start_time"]) ) 205 | nPhrase = False 206 | lastEndTime = getTimeCode( float(item["end_time"]) ) 207 | c+= 1 208 | else: 209 | # get the end_time if the item is a pronuciation and store it 210 | # We need to determine if this pronunciation or puncuation here 211 | # Punctuation doesn't contain timing information, so we'll want 212 | # to set the end_time to whatever the last word in the phrase is. 213 | if item["type"] == "pronunciation": 214 | phrase["end_time"] = getTimeCode( float(item["end_time"]) ) 215 | 216 | # in either case, append the word to the phrase... 217 | phrase["words"].append(item['alternatives'][0]["content"]) 218 | x += 1 219 | 220 | # now add the phrase to the phrases, generate a new phrase, etc. 221 | if x == 10: 222 | #print c, phrase 223 | phrases.append(phrase) 224 | phrase = newPhrase() 225 | nPhrase = True 226 | x = 0 227 | 228 | # if there are any words in the final phrase add to phrases 229 | if(len(phrase["words"]) > 0): 230 | if phrase['end_time'] == '': 231 | phrase['end_time'] = lastEndTime 232 | phrases.append(phrase) 233 | 234 | return phrases 235 | 236 | 237 | 238 | 239 | # ================================================================================== 240 | # Function: translateTranscript 241 | # Purpose: Based on the JSON transcript provided by Amazon Transcribe, get the JSON response of translated text 242 | # Parameters: 243 | # transcript - the JSON output from Amazon Transcribe 244 | # sourceLangCode - the language code for the original content (e.g. English = "EN") 245 | # targetLangCode - the language code for the translated content (e.g. Spanich = "ES") 246 | # region - the AWS region in which to run the Translation (e.g. "us-east-1") 247 | # ================================================================================== 248 | def translateTranscript( transcript, sourceLangCode, targetLangCode, region ): 249 | # Get the translation in the target language. We want to do this first so that the translation is in the full context 250 | # of what is said vs. 1 phrase at a time. This really matters in some lanaguages 251 | 252 | # stringify the transcript 253 | ts = json.loads( transcript ) 254 | 255 | # pull out the transcript text and put it in the txt variable 256 | txt = ts["results"]["transcripts"][0]["transcript"] 257 | 258 | #set up the Amazon Translate client 259 | translate = boto3.client(service_name='translate', region_name=region, use_ssl=True) 260 | 261 | # call Translate with the text, source language code, and target language code. The result is a JSON structure containing the 262 | # translated text 263 | translation = translate.translate_text(Text=txt,SourceLanguageCode=sourceLangCode, TargetLanguageCode=targetLangCode) 264 | 265 | return translation 266 | 267 | 268 | 269 | # ================================================================================== 270 | # Function: writeSRT 271 | # Purpose: Iterate through the phrases and write them to the SRT file 272 | # Parameters: 273 | # phrases - the array of JSON tuples containing the phrases to show up as subtitles 274 | # filename - the name of the SRT output file (e.g. "mySRT.srt") 275 | # ================================================================================== 276 | def writeSRT( phrases, filename ): 277 | print ("==> Writing phrases to disk...") 278 | 279 | # open the files 280 | e = codecs.open(filename,"w+", "utf-8") 281 | x = 1 282 | 283 | for phrase in phrases: 284 | 285 | # determine how many words are in the phrase 286 | length = len(phrase["words"]) 287 | 288 | # write out the phrase number 289 | e.write( str(x) + "\n" ) 290 | x += 1 291 | 292 | # write out the start and end time 293 | e.write( phrase["start_time"] + " --> " + phrase["end_time"] + "\n" ) 294 | 295 | # write out the full phase. Use spacing if it is a word, or punctuation without spacing 296 | out = getPhraseText( phrase ) 297 | 298 | # write out the srt file 299 | e.write(out + "\n\n" ) 300 | 301 | 302 | #print out 303 | 304 | e.close() 305 | 306 | 307 | # ================================================================================== 308 | # Function: getPhraseText 309 | # Purpose: For a given phrase, return the string of words including punctuation 310 | # Parameters: 311 | # phrase - the array of JSON tuples containing the words to show up as subtitles 312 | # ================================================================================== 313 | 314 | def getPhraseText( phrase ): 315 | 316 | length = len(phrase["words"]) 317 | 318 | out = "" 319 | for i in range( 0, length ): 320 | if re.match( '[a-zA-Z0-9]', phrase["words"][i]): 321 | if i > 0: 322 | out += " " + phrase["words"][i] 323 | else: 324 | out += phrase["words"][i] 325 | else: 326 | out += phrase["words"][i] 327 | 328 | return out 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | -------------------------------------------------------------------------------- /src/transcribeUtils.py: -------------------------------------------------------------------------------- 1 | # ================================================================================== 2 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # ================================================================================== 17 | # 18 | # transcribeUtils.py 19 | # by: Rob Dachowski 20 | # For questions or feedback, please contact robdac@amazon.com 21 | # 22 | # Purpose: The program provides a number of utility functions for leveraging the Amazon Transcribe API 23 | # 24 | # Change Log: 25 | # 6/29/2018: Initial version 26 | # 27 | # ================================================================================== 28 | 29 | import boto3 30 | import uuid 31 | import requests 32 | 33 | 34 | 35 | # ================================================================================== 36 | # Function: createTranscribeJob 37 | # Purpose: Function to format the input parameters and invoke the Amazon Transcribe service 38 | # Parameters: 39 | # region - the AWS region in which to run AWS services (e.g. "us-east-1") 40 | # bucket - the Amazon S3 bucket name (e.g. "mybucket/") found in region that contains the media file for processing. 41 | # mediaFile - the content to process (e.g. "myvideo.mp4") 42 | # 43 | # ================================================================================== 44 | def createTranscribeJob( region, bucket, mediaFile ): 45 | 46 | # Set up the Transcribe client 47 | transcribe = boto3.client('transcribe') 48 | 49 | # Set up the full uri for the bucket and media file 50 | mediaUri = "https://" + "s3-" + region + ".amazonaws.com/" + bucket + mediaFile 51 | 52 | print( "Creating Job: " + "transcribe" + mediaFile + " for " + mediaUri ) 53 | 54 | # Use the uuid functionality to generate a unique job name. Otherwise, the Transcribe service will return an error 55 | response = transcribe.start_transcription_job( TranscriptionJobName="transcribe_" + uuid.uuid4().hex + "_" + mediaFile , \ 56 | LanguageCode = "en-US", \ 57 | MediaFormat = "mp4", \ 58 | Media = { "MediaFileUri" : mediaUri }, \ 59 | Settings = { "VocabularyName" : "MyVocabulary" } \ 60 | ) 61 | 62 | # return the response structure found in the Transcribe Documentation 63 | return response 64 | 65 | 66 | # ================================================================================== 67 | # Function: getTranscriptionJobStatus 68 | # Purpose: Helper function to return the status of a job running the Amazon Transcribe service 69 | # Parameters: 70 | # jobName - the unique jobName used to start the Amazon Transcribe job 71 | # ================================================================================== 72 | def getTranscriptionJobStatus( jobName ): 73 | transcribe = boto3.client('transcribe') 74 | 75 | response = transcribe.get_transcription_job( TranscriptionJobName=jobName ) 76 | return response 77 | 78 | 79 | # ================================================================================== 80 | # Function: getTranscript 81 | # Purpose: Helper function to return the transcript based on the signed URI in S3 as produced by the Transcript job 82 | # Parameters: 83 | # transcriptURI - the signed S3 URI for the Transcribe output 84 | # ================================================================================== 85 | def getTranscript( transcriptURI ): 86 | # Get the resulting Transcription Job and store the JSON response in transcript 87 | result = requests.get( transcriptURI ) 88 | 89 | return result.text 90 | 91 | -------------------------------------------------------------------------------- /src/translatevideo.py: -------------------------------------------------------------------------------- 1 | # ================================================================================== 2 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # ================================================================================== 17 | # 18 | # translatevideo.py 19 | # by: Rob Dachowski 20 | # For questions or feedback, please contact robdac@amazon.com 21 | # 22 | # Purpose: This code drives the process to create a transription job, translate it into another language, 23 | # create subtitles, use Amazon Polly to synthesize an alternate audio track, and finally put it all together 24 | # into a new video. 25 | # 26 | # Change Log: 27 | # 6/29/2018: Initial version 28 | # 29 | # ================================================================================== 30 | 31 | 32 | import argparse 33 | from transcribeUtils import * 34 | from srtUtils import * 35 | import time 36 | from videoUtils import * 37 | from audioUtils import * 38 | 39 | 40 | 41 | # Get the command line arguments and parse them 42 | parser = argparse.ArgumentParser( prog='translatevideo.py', description='Process a video found in the input file, process it, and write tit out to the output file') 43 | parser.add_argument('-region', required=True, help="The AWS region containing the S3 buckets" ) 44 | parser.add_argument('-inbucket', required=True, help='The S3 bucket containing the input file') 45 | parser.add_argument('-infile', required=True, help='The input file to process') 46 | parser.add_argument('-outbucket', required=True, help='The S3 bucket containing the input file') 47 | parser.add_argument('-outfilename', required=True, help='The file name without the extension') 48 | parser.add_argument('-outfiletype', required=True, help='The output file type. E.g. mp4, mov') 49 | parser.add_argument('-outlang', required=True, nargs='+', help='The language codes for the desired output. E.g. en = English, de = German') 50 | args = parser.parse_args() 51 | 52 | # print out parameters and key header information for the user 53 | print( "==> translatevideo.py:\n") 54 | print( "==> Parameters: ") 55 | print("\tInput bucket/object: " + args.inbucket + args.infile ) 56 | print( "\tOutput bucket/object: " + args.outbucket + args.outfilename + "." + args.outfiletype ) 57 | 58 | print( "\n==> Target Language Translation Output: " ) 59 | 60 | for lang in args.outlang: 61 | print( "\t" + args.outbucket + args.outfilename + "-" + lang + "." + args.outfiletype) 62 | 63 | 64 | # Create Transcription Job 65 | response = createTranscribeJob( args.region, args.inbucket, args.infile ) 66 | 67 | # loop until the job successfully completes 68 | print( "\n==> Transcription Job: " + response["TranscriptionJob"]["TranscriptionJobName"] + "\n\tIn Progress"), 69 | 70 | while( response["TranscriptionJob"]["TranscriptionJobStatus"] == "IN_PROGRESS"): 71 | print( "."), 72 | time.sleep( 30 ) 73 | response = getTranscriptionJobStatus( response["TranscriptionJob"]["TranscriptionJobName"] ) 74 | 75 | print( "\nJob Complete") 76 | print( "\tStart Time: " + str(response["TranscriptionJob"]["CreationTime"]) ) 77 | print( "\tEnd Time: " + str(response["TranscriptionJob"]["CompletionTime"]) ) 78 | print( "\tTranscript URI: " + str(response["TranscriptionJob"]["Transcript"]["TranscriptFileUri"]) ) 79 | 80 | # Now get the transcript JSON from AWS Transcribe 81 | transcript = getTranscript( str(response["TranscriptionJob"]["Transcript"]["TranscriptFileUri"]) ) 82 | # print( "\n==> Transcript: \n" + transcript) 83 | 84 | # Create the SRT File for the original transcript and write it out. 85 | writeTranscriptToSRT( transcript, 'en', "subtitles-en.srt" ) 86 | createVideo( args.infile, "subtitles-en.srt", args.outfilename + "-en." + args.outfiletype, "audio-en.mp3", True) 87 | 88 | 89 | # Now write out the translation to the transcript for each of the target languages 90 | for lang in args.outlang: 91 | writeTranslationToSRT(transcript, 'en', lang, "subtitles-" + lang + ".srt", args.region ) 92 | 93 | #Now that we have the subtitle files, let's create the audio track 94 | createAudioTrackFromTranslation( args.region, transcript, 'en', lang, "audio-" + lang + ".mp3" ) 95 | 96 | # Finally, create the composited video 97 | createVideo( args.infile, "subtitles-" + lang + ".srt", args.outfilename + "-" + lang + "." + args.outfiletype, "audio-" + lang + ".mp3", False) 98 | 99 | 100 | 101 | 102 | 103 | -------------------------------------------------------------------------------- /src/videoUtils.py: -------------------------------------------------------------------------------- 1 | # ================================================================================== 2 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # ================================================================================== 17 | # 18 | # videoUtils.py 19 | # by: Rob Dachowski 20 | # For questions or feedback, please contact robdac@amazon.com 21 | # 22 | # Purpose: This code drives the MoviePy functions needed to create the subtitled video 23 | # 24 | # Change Log: 25 | # 6/29/2018: Initial version 26 | # 27 | # ================================================================================== 28 | 29 | from moviepy.editor import * 30 | from moviepy import editor 31 | from moviepy.video.tools.subtitles import SubtitlesClip 32 | from time import gmtime, strftime 33 | from audioUtils import * 34 | 35 | 36 | # ================================================================================== 37 | # Function: annotate 38 | # Purpose: This function creates a TextClip based on the provided text and composites the subtitle onto the provided clip. 39 | # Defaults are used for txt_color, fontsize, and font. You can override them as desired 40 | # Parameters: 41 | # clip - the clip to composite the text on 42 | # txt - the block of text to composite on the clip 43 | # txt_color - the color of the text on the screen 44 | # font_size - the size of the font to display 45 | # font - the font to use for the text 46 | # 47 | # ================================================================================== 48 | def annotate(clip, txt, txt_color='white', fontsize=24, font='Arial-Bold'): 49 | # Writes a text at the bottom of the clip 'Xolonium-Bold' 50 | txtclip = editor.TextClip(txt, fontsize=fontsize, font=font, color=txt_color).on_color(color=[0,0,0]) 51 | cvc = editor.CompositeVideoClip([clip, txtclip.set_pos(('center', 50))]) 52 | return cvc.set_duration(clip.duration) 53 | 54 | # ================================================================================== 55 | # Function: createVideo 56 | # Purpose: This function drives the MoviePy code needed to put all of the pieces together and create a new subtitled video 57 | # Parameters: 58 | # originalClipName - the flename of the orignal conent (e.g. "originalVideo.mp4") 59 | # subtitlesFileName - the filename of the SRT file (e.g. "mySRT.srt") 60 | # outputFileName - the filename of the output video file (e.g. "outputFileName.mp4") 61 | # alternateAudioFileName - the filename of an MP3 file that should be used to replace the audio track 62 | # useOriginalAudio - boolean value as to whether or not we should leave the orignal audio in place or overlay it 63 | # 64 | # ================================================================================== 65 | def createVideo( originalClipName, subtitlesFileName, outputFileName, alternateAudioFileName, useOriginalAudio=True ): 66 | # This function is used to put all of the pieces together. 67 | # Note that if we need to use an alternate audio track, the last parm should = False 68 | 69 | print( "\n==> createVideo " ) 70 | 71 | # Load the original clip 72 | print "\t" + strftime("%H:%M:%S", gmtime()), "Reading video clip: " + originalClipName 73 | clip = VideoFileClip(originalClipName) 74 | print "\t\t==> Original clip duration: " + str(clip.duration) 75 | 76 | if useOriginalAudio == False: 77 | print strftime( "\t" + "%H:%M:%S", gmtime()), "Reading alternate audio track: " + alternateAudioFileName 78 | audio = AudioFileClip(alternateAudioFileName) 79 | audio = audio.subclip( 0, clip.duration ) 80 | audio.set_duration(clip.duration) 81 | print "\t\t==> Audio duration: " + str(audio.duration) 82 | clip = clip.set_audio( audio ) 83 | else: 84 | print strftime( "\t" + "%H:%M:%S", gmtime()), "Using original audio track..." 85 | 86 | # Create a lambda function that will be used to generate the subtitles for each sequence in the SRT 87 | generator = lambda txt: TextClip(txt, font='Arial-Bold', fontsize=24, color='white') 88 | 89 | # read in the subtitles files 90 | print "\t" + strftime("%H:%M:%S", gmtime()), "Reading subtitle file: " + subtitlesFileName 91 | subs = SubtitlesClip(subtitlesFileName, generator) 92 | print "\t\t==> Subtitles duration before: " + str(subs.duration) 93 | subs = subs.subclip( 0, clip.duration - .001) 94 | subs.set_duration( clip.duration - .001 ) 95 | print "\t\t==> Subtitles duration after: " + str(subs.duration) 96 | print "\t" + strftime("%H:%M:%S", gmtime()), "Reading subtitle file complete: " + subtitlesFileName 97 | 98 | 99 | print "\t" + strftime( "%H:%M:%S", gmtime()), "Creating Subtitles Track..." 100 | annotated_clips = [annotate(clip.subclip(from_t, to_t), txt) for (from_t, to_t), txt in subs] 101 | 102 | 103 | 104 | print "\t" + strftime( "%H:%M:%S", gmtime()), "Creating composited video: " + outputFileName 105 | # Overlay the text clip on the first video clip 106 | final = concatenate_videoclips( annotated_clips ) 107 | 108 | print "\t" + strftime( "%H:%M:%S", gmtime()), "Writing video file: " + outputFileName 109 | final.write_videofile(outputFileName) -------------------------------------------------------------------------------- /tools/srtUtils.py: -------------------------------------------------------------------------------- 1 | # ================================================================================== 2 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | # ================================================================================== 17 | # 18 | # srtUtils.py 19 | # by: Rob Dachowski 20 | # For questions or feedback, please contact robdac@amazon.com 21 | # 22 | # Purpose: The program provides a number of utility functions for creating SubRip Subtitle files (.SRT) 23 | # 24 | # Change Log: 25 | # 6/29/2018: Initial version 26 | # 27 | # ================================================================================== 28 | 29 | import json 30 | import boto3 31 | import re 32 | import codecs 33 | from audioUtils import * 34 | 35 | 36 | 37 | # ================================================================================== 38 | # Function: newPhrase 39 | # Purpose: simply create a phrase tuple 40 | # Parameters: 41 | # None 42 | # ================================================================================== 43 | def newPhrase(): 44 | return { 'start_time': '', 'end_time': '', 'words' : [] } 45 | 46 | 47 | 48 | # ================================================================================== 49 | # Function: getTimeCode 50 | # Purpose: Format and return a string that contains the converted number of seconds into SRT format 51 | # Parameters: 52 | # seconds - the duration in seconds to convert to HH:MM:SS,mmm 53 | # ================================================================================== 54 | # Format and return a string that contains the converted number of seconds into SRT format 55 | def getTimeCode( seconds ): 56 | t_hund = int(seconds % 1 * 1000) 57 | t_seconds = int( seconds ) 58 | t_secs = ((float( t_seconds) / 60) % 1) * 60 59 | t_mins = int( t_seconds / 60 ) 60 | return str( "%02d:%02d:%02d,%03d" % (00, t_mins, int(t_secs), t_hund )) 61 | 62 | 63 | # ================================================================================== 64 | # Function: writeTranscriptToSRT 65 | # Purpose: Function to get the phrases from the transcript and write it out to an SRT file 66 | # Parameters: 67 | # transcript - the JSON output from Amazon Transcribe 68 | # sourceLangCode - the language code for the original content (e.g. English = "EN") 69 | # srtFileName - the name of the SRT file (e.g. "mySRT.SRT") 70 | # ================================================================================== 71 | def writeTranscriptToSRT( transcript, sourceLangCode, srtFileName ): 72 | # Write the SRT file for the original language 73 | print( "==> Creating SRT from transcript") 74 | phrases = getPhrasesFromTranscript( transcript ) 75 | writeSRT( phrases, srtFileName ) 76 | 77 | 78 | # ================================================================================== 79 | # Function: writeTranscriptToSRT 80 | # Purpose: Based on the JSON transcript provided by Amazon Transcribe, get the phrases from the translation 81 | # and write it out to an SRT file 82 | # Parameters: 83 | # transcript - the JSON output from Amazon Transcribe 84 | # sourceLangCode - the language code for the original content (e.g. English = "EN") 85 | # targetLangCode - the language code for the translated content (e.g. Spanich = "ES") 86 | # srtFileName - the name of the SRT file (e.g. "mySRT.SRT") 87 | # ================================================================================== 88 | def writeTranslationToSRT( transcript, sourceLangCode, targetLangCode, srtFileName, region ): 89 | # First get the translation 90 | print( "\n\n==> Translating from " + sourceLangCode + " to " + targetLangCode ) 91 | translation = translateTranscript( transcript, sourceLangCode, targetLangCode, region ) 92 | #print( "\n\n==> Translation: " + str(translation)) 93 | 94 | # Now create phrases from the translation 95 | textToTranslate = unicode(translation["TranslatedText"]) 96 | phrases = getPhrasesFromTranslation( textToTranslate, targetLangCode ) 97 | writeSRT( phrases, srtFileName ) 98 | 99 | 100 | # ================================================================================== 101 | # Function: getPhrasesFromTranslation 102 | # Purpose: Based on the JSON translation provided by Amazon Translate, get the phrases from the translation 103 | # and write it out to an SRT file. Note that since we are using a block of translated text rather than 104 | # a JSON structure with the timing for the start and end of each word as in the output of Transcribe, 105 | # we will need to calculate the start and end-time for each phrase 106 | # Parameters: 107 | # translation - the JSON output from Amazon Translate 108 | # targetLangCode - the language code for the translated content (e.g. Spanich = "ES") 109 | # ================================================================================== 110 | def getPhrasesFromTranslation( translation, targetLangCode ): 111 | 112 | # Now create phrases from the translation 113 | words = translation.split() 114 | 115 | #print( words ) #debug statement 116 | 117 | #set up some variables for the first pass 118 | phrase = newPhrase() 119 | phrases = [] 120 | nPhrase = True 121 | x = 0 122 | c = 0 123 | seconds = 0 124 | 125 | print "==> Creating phrases from translation..." 126 | 127 | for word in words: 128 | 129 | # if it is a new phrase, then get the start_time of the first item 130 | if nPhrase == True: 131 | phrase["start_time"] = getTimeCode( seconds ) 132 | nPhrase = False 133 | c += 1 134 | 135 | # Append the word to the phrase... 136 | phrase["words"].append(word) 137 | x += 1 138 | 139 | 140 | # now add the phrase to the phrases, generate a new phrase, etc. 141 | if x == 10: 142 | 143 | # For Translations, we now need to calculate the end time for the phrase 144 | psecs = getSecondsFromTranslation( getPhraseText( phrase), targetLangCode, "phraseAudio" + str(c) + ".mp3" ) 145 | seconds += psecs 146 | phrase["end_time"] = getTimeCode( seconds ) 147 | 148 | #print c, phrase 149 | phrases.append(phrase) 150 | phrase = newPhrase() 151 | nPhrase = True 152 | #seconds += .001 153 | x = 0 154 | 155 | # This if statement is to address a defect in the SubtitleClip. If the Subtitles end up being 156 | # a different duration than the content, MoviePy will sometimes fail with unexpected errors while 157 | # processing the subclip. This is limiting it to something less than the total duration for our example 158 | # however, you may need to modify or eliminate this line depending on your content. 159 | if c == 30: 160 | break 161 | 162 | return phrases 163 | 164 | 165 | # ================================================================================== 166 | # Function: getPhrasesFromTranscript 167 | # Purpose: Based on the JSON transcript provided by Amazon Transcribe, get the phrases from the translation 168 | # and write it out to an SRT file 169 | # Parameters: 170 | # transcript - the JSON output from Amazon Transcribe 171 | # ================================================================================== 172 | def getPhrasesFromTranscript( transcript ): 173 | 174 | # This function is intended to be called with the JSON structure output from the Transcribe service. However, 175 | # if you only have the translation of the transcript, then you should call getPhrasesFromTranslation instead 176 | 177 | # Now create phrases from the translation 178 | ts = json.loads( transcript ) 179 | items = ts['results']['items'] 180 | #print( items ) 181 | 182 | #set up some variables for the first pass 183 | phrase = newPhrase() 184 | phrases = [] 185 | nPhrase = True 186 | x = 0 187 | c = 0 188 | 189 | print "==> Creating phrases from transcript..." 190 | 191 | for item in items: 192 | 193 | # if it is a new phrase, then get the start_time of the first item 194 | if nPhrase == True: 195 | if item["type"] == "pronunciation": 196 | phrase["start_time"] = getTimeCode( float(item["start_time"]) ) 197 | nPhrase = False 198 | c+= 1 199 | else: 200 | # get the end_time if the item is a pronuciation and store it 201 | # We need to determine if this pronunciation or puncuation here 202 | # Punctuation doesn't contain timing information, so we'll want 203 | # to set the end_time to whatever the last word in the phrase is. 204 | if item["type"] == "pronunciation": 205 | phrase["end_time"] = getTimeCode( float(item["end_time"]) ) 206 | 207 | # in either case, append the word to the phrase... 208 | phrase["words"].append(item['alternatives'][0]["content"]) 209 | x += 1 210 | 211 | # now add the phrase to the phrases, generate a new phrase, etc. 212 | if x == 10: 213 | #print c, phrase 214 | phrases.append(phrase) 215 | phrase = newPhrase() 216 | nPhrase = True 217 | x = 0 218 | 219 | # if there are any words in the final phrase add to phrases 220 | if(len(phrase["words"]) > 0): 221 | phrases.append(phrase) 222 | 223 | return phrases 224 | 225 | 226 | 227 | 228 | # ================================================================================== 229 | # Function: translateTranscript 230 | # Purpose: Based on the JSON transcript provided by Amazon Transcribe, get the JSON response of translated text 231 | # Parameters: 232 | # transcript - the JSON output from Amazon Transcribe 233 | # sourceLangCode - the language code for the original content (e.g. English = "EN") 234 | # targetLangCode - the language code for the translated content (e.g. Spanich = "ES") 235 | # region - the AWS region in which to run the Translation (e.g. "us-east-1") 236 | # ================================================================================== 237 | def translateTranscript( transcript, sourceLangCode, targetLangCode, region ): 238 | # Get the translation in the target language. We want to do this first so that the translation is in the full context 239 | # of what is said vs. 1 phrase at a time. This really matters in some lanaguages 240 | 241 | # stringify the transcript 242 | ts = json.loads( transcript ) 243 | 244 | # pull out the transcript text and put it in the txt variable 245 | txt = ts["results"]["transcripts"][0]["transcript"] 246 | 247 | #set up the Amazon Translate client 248 | translate = boto3.client(service_name='translate', region_name=region, use_ssl=True) 249 | 250 | # call Translate with the text, source language code, and target language code. The result is a JSON structure containing the 251 | # translated text 252 | translation = translate.translate_text(Text=txt,SourceLanguageCode=sourceLangCode, TargetLanguageCode=targetLangCode) 253 | 254 | return translation 255 | 256 | 257 | 258 | # ================================================================================== 259 | # Function: writeSRT 260 | # Purpose: Iterate through the phrases and write them to the SRT file 261 | # Parameters: 262 | # phrases - the array of JSON tuples containing the phrases to show up as subtitles 263 | # filename - the name of the SRT output file (e.g. "mySRT.srt") 264 | # ================================================================================== 265 | def writeSRT( phrases, filename ): 266 | print "==> Writing phrases to disk..." 267 | 268 | # open the files 269 | e = codecs.open(filename,"w+", "utf-8") 270 | x = 1 271 | 272 | for phrase in phrases: 273 | 274 | # determine how many words are in the phrase 275 | length = len(phrase["words"]) 276 | 277 | # write out the phrase number 278 | e.write( str(x) + "\n" ) 279 | x += 1 280 | 281 | # write out the start and end time 282 | e.write( phrase["start_time"] + " --> " + phrase["end_time"] + "\n" ) 283 | 284 | # write out the full phase. Use spacing if it is a word, or punctuation without spacing 285 | out = getPhraseText( phrase ) 286 | 287 | # write out the srt file 288 | e.write(out + "\n\n" ) 289 | 290 | 291 | #print out 292 | 293 | e.close() 294 | 295 | 296 | # ================================================================================== 297 | # Function: getPhraseText 298 | # Purpose: For a given phrase, return the string of words including punctuation 299 | # Parameters: 300 | # phrase - the array of JSON tuples containing the words to show up as subtitles 301 | # ================================================================================== 302 | 303 | def getPhraseText( phrase ): 304 | 305 | length = len(phrase["words"]) 306 | 307 | out = "" 308 | for i in range( 0, length ): 309 | if re.match( '[a-zA-Z0-9]', phrase["words"][i]): 310 | if i > 0: 311 | out += " " + phrase["words"][i] 312 | else: 313 | out += phrase["words"][i] 314 | else: 315 | out += phrase["words"][i] 316 | 317 | return out 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | -------------------------------------------------------------------------------- /tools/testWebVTT.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from transcribeUtils import * 3 | from webvttUtils import * 4 | import requests 5 | from videoUtils import * 6 | from audioUtils import * 7 | 8 | # Get the command line arguments and parse them 9 | parser = argparse.ArgumentParser( prog='testWebVTT.py', description='Process a video found in the input file, process it, and write tit out to the output file') 10 | parser.add_argument('-region', required=True, help="The AWS region containing the S3 buckets" ) 11 | parser.add_argument('-inbucket', required=True, help='The S3 bucket containing the input file') 12 | parser.add_argument('-infile', required=True, help='The input file to process') 13 | parser.add_argument('-outbucket', required=True, help='The S3 bucket containing the input file') 14 | parser.add_argument('-outfilename', required=True, help='The file name without the extension') 15 | parser.add_argument('-outfiletype', required=True, help='The output file type. E.g. mp4, mov') 16 | parser.add_argument('-outlang', required=True, nargs='+', help='The language codes for the desired output. E.g. en = English, de = German') 17 | parser.add_argument('-TranscriptJob', required=True, help='The URI resulting from the transcript job') 18 | args = parser.parse_args() 19 | 20 | 21 | job = getTranscriptionJobStatus( args.TranscriptJob ) 22 | #print( job ) 23 | 24 | 25 | # Now get the transcript JSON from AWS Transcribe 26 | transcript = getTranscript( str(job["TranscriptionJob"]["Transcript"]["TranscriptFileUri"]) ) 27 | #print( "\n==> Transcript: \n" + transcript) 28 | 29 | # Create the WebVTT File for the original transcript and write it out. 30 | writeTranscriptToWebVTT( transcript, 'en', "subtitles-en.vtt") 31 | #createVideo( args.infile, "subtitles-en.vtt", args.outfilename + "-en." + args.outfiletype, "audio-en.mp3", True) 32 | 33 | 34 | # Now write out the translation to the transcript for each of the target languages 35 | for lang in args.outlang: 36 | writeTranslationToWebVTT(transcript, 'en', lang, "subtitles-" + lang + ".vtt" ) 37 | 38 | #Now that we have the subtitle files, let's create the audio track 39 | #createAudioTrackFromTranslation( args.region, transcript, 'en', lang, "audio-" + lang + ".mp3" ) 40 | 41 | # Finally, create the composited video 42 | #createVideo( args.infile, "subtitles-" + lang + ".WebVTT", args.outfilename + "-" + lang + "." + args.outfiletype, "audio-" + lang + ".mp3", False) 43 | 44 | 45 | -------------------------------------------------------------------------------- /tools/webvttUtils.py: -------------------------------------------------------------------------------- 1 | import json 2 | import boto3 3 | import re 4 | import codecs 5 | from audioUtils import * 6 | 7 | translate = boto3.client(service_name='translate', region_name='us-east-1', use_ssl=True) 8 | 9 | 10 | 11 | # Create a new phrase structure 12 | def newPhrase(): 13 | return { 'start_time': '', 'end_time': '', 'words' : [] } 14 | 15 | # Format and return a string that contains the converted number of seconds into WebVTT format 16 | def getTimeCode( seconds ): 17 | t_hund = int(seconds % 1 * 1000) 18 | t_seconds = int( seconds ) 19 | t_secs = ((float( t_seconds) / 60) % 1) * 60 20 | t_mins = int( t_seconds / 60 ) 21 | return str( "%02d:%02d:%02d.%03d" % (00, t_mins, int(t_secs), t_hund )) 22 | 23 | 24 | 25 | def writeTranscriptToWebVTT( transcript, sourceLangCode, WebVTTFileName ): 26 | # Write the WebVTT file for the original language 27 | print( "==> Creating WebVTT from transcript") 28 | phrases = getPhrasesFromTranscript( transcript ) 29 | writeWebVTT( phrases, WebVTTFileName, "A:middle L:90%" ) 30 | 31 | 32 | def writeTranslationToWebVTT( transcript, sourceLangCode, targetLangCode, WebVTTFileName ): 33 | # First get the translation 34 | print( "\n\n==> Translating from " + sourceLangCode + " to " + targetLangCode ) 35 | translation = translateTranscript( transcript, sourceLangCode, targetLangCode ) 36 | #print( "\n\n==> Translation: " + str(translation)) 37 | 38 | # Now create phrases from the translation 39 | textToTranslate = unicode(translation["TranslatedText"]) 40 | phrases = getPhrasesFromTranslation( textToTranslate, targetLangCode ) 41 | writeWebVTT( phrases, WebVTTFileName, "A:middle L:90%" ) 42 | 43 | 44 | def getPhrasesFromTranslation( translation, targetLangCode ): 45 | 46 | # Now create phrases from the translation 47 | words = translation.split() 48 | 49 | #print( words ) #debug statement 50 | 51 | #set up some variables for the first pass 52 | phrase = newPhrase() 53 | phrases = [] 54 | nPhrase = True 55 | x = 0 56 | c = 0 57 | seconds = 0 58 | 59 | print "==> Creating phrases from translation..." 60 | 61 | for word in words: 62 | 63 | # if it is a new phrase, then get the start_time of the first item 64 | if nPhrase == True: 65 | phrase["start_time"] = getTimeCode( seconds ) 66 | nPhrase = False 67 | c += 1 68 | 69 | # Append the word to the phrase... 70 | phrase["words"].append(word) 71 | x += 1 72 | 73 | 74 | # now add the phrase to the phrases, generate a new phrase, etc. 75 | if x == 10: 76 | 77 | # For Translations, we now need to calculate the end time for the phrase 78 | psecs = getSecondsFromTranslation( getPhraseText( phrase), targetLangCode, "phraseAudio" + str(c) + ".mp3" ) 79 | seconds += psecs 80 | phrase["end_time"] = getTimeCode( seconds ) 81 | 82 | #print c, phrase 83 | phrases.append(phrase) 84 | phrase = newPhrase() 85 | nPhrase = True 86 | #seconds += .001 87 | x = 0 88 | 89 | #if c == 30: 90 | # break 91 | 92 | return phrases 93 | 94 | 95 | def getPhrasesFromTranscript( transcript ): 96 | 97 | # This function is intended to be called with the JSON structure output from the Transcribe service. However, 98 | # if you only have the translation of the transcript, then you should call getPhrasesFromTranslation instead 99 | 100 | # Now create phrases from the translation 101 | ts = json.loads( transcript ) 102 | items = ts['results']['items'] 103 | #print( items ) 104 | 105 | #set up some variables for the first pass 106 | phrase = newPhrase() 107 | phrases = [] 108 | nPhrase = True 109 | x = 0 110 | c = 0 111 | 112 | print "==> Creating phrases from transcript..." 113 | 114 | for item in items: 115 | 116 | # if it is a new phrase, then get the start_time of the first item 117 | if nPhrase == True: 118 | if item["type"] == "pronunciation": 119 | phrase["start_time"] = getTimeCode( float(item["start_time"]) ) 120 | nPhrase = False 121 | c+= 1 122 | else: 123 | # get the end_time if the item is a pronuciation and store it 124 | # We need to determine if this pronunciation or puncuation here 125 | # Punctuation doesn't contain timing information, so we'll want 126 | # to set the end_time to whatever the last word in the phrase is. 127 | if item["type"] == "pronunciation": 128 | phrase["end_time"] = getTimeCode( float(item["end_time"]) ) 129 | 130 | # in either case, append the word to the phrase... 131 | phrase["words"].append(item['alternatives'][0]["content"]) 132 | x += 1 133 | 134 | # now add the phrase to the phrases, generate a new phrase, etc. 135 | if x == 10: 136 | #print c, phrase 137 | phrases.append(phrase) 138 | phrase = newPhrase() 139 | nPhrase = True 140 | x = 0 141 | 142 | # if there are any words in the final phrase add to phrases 143 | if(len(phrase["words"]) > 0): 144 | phrases.append(phrase) 145 | 146 | return phrases 147 | 148 | 149 | 150 | 151 | 152 | def translateTranscript( transcript, sourceLangCode, targetLangCode ): 153 | # Get the translation in the target language. We want to do this first so that the translation is in the full context 154 | # of what is said vs. 1 phrase at a time. This really matters in some lanaguages 155 | 156 | # stringify the transcript 157 | ts = json.loads( transcript ) 158 | 159 | # pull out the transcript text and put it in the txt variable 160 | txt = ts["results"]["transcripts"][0]["transcript"] 161 | 162 | # call Translate with the text, source language code, and target language code. The result is a JSON structure containing the 163 | # translated text 164 | translation = translate.translate_text(Text=txt,SourceLanguageCode=sourceLangCode, TargetLanguageCode=targetLangCode) 165 | 166 | return translation 167 | 168 | 169 | 170 | def writeWebVTT( phrases, filename, style ): 171 | print "==> Writing phrases to disk..." 172 | 173 | # open the files 174 | e = codecs.open(filename,"w+", "utf-8") 175 | x = 1 176 | 177 | # write the header of the webVTT file 178 | e.write( "WEBVTT\n\n") 179 | 180 | for phrase in phrases: 181 | 182 | # determine how many words are in the phrase 183 | length = len(phrase["words"]) 184 | 185 | # write out the phrase number 186 | e.write( str(x) + "\n" ) 187 | x += 1 188 | 189 | # write out the start and end time 190 | e.write( phrase["start_time"] + " --> " + phrase["end_time"] + " " + style + "\n" ) 191 | 192 | # write out the full phase. Use spacing if it is a word, or punctuation without spacing 193 | out = getPhraseText( phrase ) 194 | 195 | # write out the WebVTT file 196 | e.write(out + "\n\n" ) 197 | 198 | 199 | #print out 200 | 201 | e.close() 202 | 203 | 204 | 205 | def getPhraseText( phrase ): 206 | 207 | length = len(phrase["words"]) 208 | 209 | out = "" 210 | for i in range( 0, length ): 211 | if re.match( '[a-zA-Z0-9]', phrase["words"][i]): 212 | if i > 0: 213 | out += " " + phrase["words"][i] 214 | else: 215 | out += phrase["words"][i] 216 | else: 217 | out += phrase["words"][i] 218 | 219 | return out 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | --------------------------------------------------------------------------------