├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── THIRD-PARTY ├── pptx-translator.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .venv 2 | tmp/ 3 | .DS_Store -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pptx-translator 2 | 3 | Python script that translates pptx files using Amazon Translate service. 4 | 5 | ## Installation 6 | 7 | ```bash 8 | $ virtualenv venv 9 | $ source venv/bin/activate 10 | $ pip install -r requirements.txt 11 | ``` 12 | 13 | ## Usage 14 | 15 | Basic translation: 16 | ```bash 17 | python pptx-translator.py source_language_code target_language_code input_file_path 18 | ``` 19 | 20 | Example execution: 21 | ```bash 22 | python pptx-translator.py ja en input-file.pptx 23 | ``` 24 | 25 | For more information on available options: 26 | ```bash 27 | python pptx-translator.py --help 28 | ``` 29 | 30 | ## Command-line Arguments 31 | 32 | ``` 33 | usage: Translates pptx files from source language to target language using Amazon Translate service 34 | [-h] [--terminology TERMINOLOGY] 35 | source_language_code target_language_code input_file_path 36 | 37 | positional arguments: 38 | source_language_code The language code for the language of the source text. 39 | Example: en 40 | target_language_code The language code requested for the language of the 41 | target text. Example: pt 42 | input_file_path The path of the pptx file that should be translated 43 | 44 | optional arguments: 45 | -h, --help show this help message and exit 46 | --terminology TERMINOLOGY 47 | The path of the terminology CSV file 48 | ``` 49 | 50 | ## Features 51 | 52 | - Translates PowerPoint (.pptx) files from one language to another using Amazon Translate 53 | - Supports custom terminology for translation 54 | 55 | ## Security 56 | 57 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 58 | 59 | ## License 60 | 61 | This library is licensed under the MIT-0 License. See the LICENSE file. -------------------------------------------------------------------------------- /THIRD-PARTY: -------------------------------------------------------------------------------- 1 | ** python-pptx; version 0.6.18 -- https://github.com/scanny/python-pptx 2 | Copyright (c) 2013 Steve Canny, https://github.com/scanny 3 | 4 | The MIT License (MIT) 5 | 6 | Permission is hereby granted, free of charge, to any person obtaining a copy 7 | of this software and associated documentation files (the "Software"), to deal 8 | in the Software without restriction, including without limitation the rights 9 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 10 | copies of the Software, and to permit persons to whom the Software is 11 | furnished to do so, subject to the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be included in 14 | all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 17 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 18 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 19 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 20 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 21 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 22 | THE SOFTWARE. 23 | -------------------------------------------------------------------------------- /pptx-translator.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import boto3 3 | from botocore.exceptions import ClientError 4 | from pptx import Presentation 5 | from pptx.enum.lang import MSO_LANGUAGE_ID 6 | 7 | LANGUAGE_CODE_TO_LANGUAGE_ID = { 8 | """ 9 | Dict that maps Amazon Translate language code to MSO_LANGUAGE_ID enum value. 10 | 11 | - Amazon Translate language codes: https://docs.aws.amazon.com/translate/latest/dg/what-is.html#what-is-languages 12 | - python-pptx MSO_LANGUAGE_ID enum: https://python-pptx.readthedocs.io/en/latest/api/enum/MsoLanguageId.html 13 | 14 | python-pptx doesn't support: 15 | - Azerbaijani (az) 16 | - Persian (fa) 17 | - Dari (fa-AF) 18 | - Tagalog (tl) 19 | """ 20 | 'af': MSO_LANGUAGE_ID.AFRIKAANS, 21 | 'am': MSO_LANGUAGE_ID.AMHARIC, 22 | 'ar': MSO_LANGUAGE_ID.ARABIC, 23 | 'bg': MSO_LANGUAGE_ID.BULGARIAN, 24 | 'bn': MSO_LANGUAGE_ID.BENGALI, 25 | 'bs': MSO_LANGUAGE_ID.BOSNIAN, 26 | 'cs': MSO_LANGUAGE_ID.CZECH, 27 | 'da': MSO_LANGUAGE_ID.DANISH, 28 | 'de': MSO_LANGUAGE_ID.GERMAN, 29 | 'el': MSO_LANGUAGE_ID.GREEK, 30 | 'en': MSO_LANGUAGE_ID.ENGLISH_US, 31 | 'es': MSO_LANGUAGE_ID.SPANISH, 32 | 'et': MSO_LANGUAGE_ID.ESTONIAN, 33 | 'fi': MSO_LANGUAGE_ID.FINNISH, 34 | 'fr': MSO_LANGUAGE_ID.FRENCH, 35 | 'fr-CA': MSO_LANGUAGE_ID.FRENCH_CANADIAN, 36 | 'ha': MSO_LANGUAGE_ID.HAUSA, 37 | 'he': MSO_LANGUAGE_ID.HEBREW, 38 | 'hi': MSO_LANGUAGE_ID.HINDI, 39 | 'hr': MSO_LANGUAGE_ID.CROATIAN, 40 | 'hu': MSO_LANGUAGE_ID.HUNGARIAN, 41 | 'id': MSO_LANGUAGE_ID.INDONESIAN, 42 | 'it': MSO_LANGUAGE_ID.ITALIAN, 43 | 'ja': MSO_LANGUAGE_ID.JAPANESE, 44 | 'ka': MSO_LANGUAGE_ID.GEORGIAN, 45 | 'ko': MSO_LANGUAGE_ID.KOREAN, 46 | 'lv': MSO_LANGUAGE_ID.LATVIAN, 47 | 'ms': MSO_LANGUAGE_ID.MALAYSIAN, 48 | 'nl': MSO_LANGUAGE_ID.DUTCH, 49 | 'no': MSO_LANGUAGE_ID.NORWEGIAN_BOKMOL, 50 | 'pl': MSO_LANGUAGE_ID.POLISH, 51 | 'ps': MSO_LANGUAGE_ID.PASHTO, 52 | 'pt': MSO_LANGUAGE_ID.BRAZILIAN_PORTUGUESE, 53 | 'ro': MSO_LANGUAGE_ID.ROMANIAN, 54 | 'ru': MSO_LANGUAGE_ID.RUSSIAN, 55 | 'sk': MSO_LANGUAGE_ID.SLOVAK, 56 | 'sl': MSO_LANGUAGE_ID.SLOVENIAN, 57 | 'so': MSO_LANGUAGE_ID.SOMALI, 58 | 'sq': MSO_LANGUAGE_ID.ALBANIAN, 59 | 'sr': MSO_LANGUAGE_ID.SERBIAN_LATIN, 60 | 'sv': MSO_LANGUAGE_ID.SWEDISH, 61 | 'sw': MSO_LANGUAGE_ID.SWAHILI, 62 | 'ta': MSO_LANGUAGE_ID.TAMIL, 63 | 'th': MSO_LANGUAGE_ID.THAI, 64 | 'tr': MSO_LANGUAGE_ID.TURKISH, 65 | 'uk': MSO_LANGUAGE_ID.UKRAINIAN, 66 | 'ur': MSO_LANGUAGE_ID.URDU, 67 | 'vi': MSO_LANGUAGE_ID.VIETNAMESE, 68 | 'zh': MSO_LANGUAGE_ID.CHINESE_SINGAPORE, 69 | 'zh-TW': MSO_LANGUAGE_ID.CHINESE_HONG_KONG_SAR, 70 | } 71 | 72 | TERMINOLOGY_NAME = 'pptx-translator-terminology' 73 | 74 | translate = boto3.client(service_name='translate') 75 | 76 | def translate_presentation(presentation, source_language_code, target_language_code, terminology_names): 77 | for slide_index, slide in enumerate(presentation.slides, start=1): 78 | print(f'Slide {slide_index} of {len(presentation.slides)}') 79 | 80 | for shape in slide.shapes: 81 | if shape.has_table: 82 | for row in shape.table.rows: 83 | for cell in row.cells: 84 | translate_text_frame(cell.text_frame, source_language_code, target_language_code, terminology_names) 85 | elif shape.has_text_frame: 86 | translate_text_frame(shape.text_frame, source_language_code, target_language_code, terminology_names) 87 | 88 | if slide.has_notes_slide: 89 | translate_text_frame(slide.notes_slide.notes_text_frame, source_language_code, target_language_code, terminology_names) 90 | 91 | def translate_text_frame(text_frame, source_language_code, target_language_code, terminology_names): 92 | for paragraph in text_frame.paragraphs: 93 | for run in paragraph.runs: 94 | if run.text.strip(): 95 | try: 96 | response = translate.translate_text( 97 | Text=run.text, 98 | SourceLanguageCode=source_language_code, 99 | TargetLanguageCode=target_language_code, 100 | TerminologyNames=terminology_names 101 | ) 102 | # original text if translation fails 103 | run.text = response.get('TranslatedText', run.text) 104 | except ClientError as client_error: 105 | if client_error.response['Error']['Code'] == 'ValidationException': 106 | print('Invalid text. Ignoring...') 107 | 108 | def import_terminology(terminology_file_path): 109 | print(f'Importing terminology data from {terminology_file_path}...') 110 | with open(terminology_file_path, 'rb') as f: 111 | translate.import_terminology(Name=TERMINOLOGY_NAME, 112 | MergeStrategy='OVERWRITE', 113 | TerminologyData={'File': bytearray(f.read()), 'Format': 'CSV'}) 114 | 115 | def main(): 116 | argument_parser = argparse.ArgumentParser( 117 | 'Translates pptx files from source language to target language using Amazon Translate service') 118 | argument_parser.add_argument( 119 | 'source_language_code', type=str, 120 | help='The language code for the language of the source text. Example: en') 121 | argument_parser.add_argument( 122 | 'target_language_code', type=str, 123 | help='The language code requested for the language of the target text. Example: pt') 124 | argument_parser.add_argument( 125 | 'input_file_path', type=str, 126 | help='The path of the pptx file that should be translated') 127 | argument_parser.add_argument( 128 | '--terminology', type=str, 129 | help='The path of the terminology CSV file') 130 | args = argument_parser.parse_args() 131 | 132 | terminology_names = [] 133 | if args.terminology: 134 | import_terminology(args.terminology) 135 | terminology_names = [TERMINOLOGY_NAME] 136 | 137 | print(f'Translating {args.input_file_path} from {args.source_language_code} to {args.target_language_code}...') 138 | presentation = Presentation(args.input_file_path) 139 | translate_presentation(presentation, 140 | args.source_language_code, 141 | args.target_language_code, 142 | terminology_names) 143 | 144 | output_file_path = args.input_file_path.replace( 145 | '.pptx', f'-{args.target_language_code}.pptx') 146 | print(f'Saving {output_file_path}...') 147 | presentation.save(output_file_path) 148 | 149 | if __name__ == '__main__': 150 | main() -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | boto3==1.35.42 2 | python-pptx==1.0.2 --------------------------------------------------------------------------------