├── .deepsource.toml ├── .dockerignore ├── .gitattributes ├── .github ├── copilot-instructions.md └── workflows │ ├── .prettierrc │ ├── 01-build-base-images.yml │ ├── 02-build-model-cache.yml │ ├── 03-build-distil-en.yml │ ├── 04-build-matrix-images.yml │ ├── README.md │ ├── auto_merge.yml │ ├── docker-reused-steps │ └── action.yml │ ├── docker_publish.yml │ ├── scan.yml │ ├── scan │ └── html.tpl │ ├── scan_ubi.yml │ ├── submodule_update.yml │ └── test │ ├── en.webm │ └── zh.webm ├── .gitignore ├── .gitmodules ├── .hadolint.yml ├── .vscode └── settings.json ├── AGENTS.md ├── Dockerfile ├── LICENSE ├── README.md ├── docker-bake.hcl ├── load_align_model.py └── ubi.Dockerfile /.deepsource.toml: -------------------------------------------------------------------------------- 1 | version = 1 2 | 3 | [[analyzers]] 4 | name = "docker" -------------------------------------------------------------------------------- /.dockerignore: -------------------------------------------------------------------------------- 1 | **/.hadolint.yml 2 | **/node_modules 3 | **/*.log 4 | **/.git 5 | **/.gitignore 6 | **/.env 7 | **/.github 8 | **/.vscode 9 | **/bin 10 | **/obj 11 | **/dist 12 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | *.sh eol=lf 2 | -------------------------------------------------------------------------------- /.github/copilot-instructions.md: -------------------------------------------------------------------------------- 1 | # GitHub Copilot Instructions for docker-whisperX 2 | 3 | * **Response Language:** `zh-TW 正體中文` 4 | 5 | # Key Directives: 6 | 7 | * Maintain the highest standard of quality in all deliverables by following best practices. 8 | * All code comments and documentation must be written in **English** as per project conventions. 9 | * Proactively consult both core documentation and conversation history to ensure accurate comprehension of all requirements. 10 | * You are neither able to execute `docker`, use `podman` instead. 11 | * When doing Git commit, use the conventional commit format for the title and a brief description in the body. Always commit with `--signoff` and explicitly specify the author on the command: `GitHub Copilot `. Write the commit in English. 12 | 13 | --- 14 | 15 | # Project DevOps 16 | 17 | This project uses GitHub for DevOps management. 18 | 19 | Please use the #github-sudo tool to perform DevOps tasks. 20 | 21 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!*** 22 | 23 | * **GitHub repo**: https://github.com/jim60105/docker-whisperX 24 | 25 | * **Backlog & Bugs**: All backlogs and bugs must be managed on GitHub Issues. 26 | 27 | * Each issue represents a specific backlog plan / bug reports / enhancement requests. 28 | * Contains implementation or bug-fix guides from project foundation to deployment 29 | * Each issue(backlogs) includes complete technical design and implementation details 30 | * Each issue(bugs) includes problem description, reproduction steps, and proposed solutions 31 | * Serves as task queue for ongoing maintenance and improvements 32 | 33 | ## DevOps Flow 34 | 35 | ### Planning Stage 36 | 37 | **If we are at planning stage you shouldn't start to implement anything!** 38 | **Planning Stage is to create a detailed development plan and #create_issue on GitHub** 39 | 40 | 1. **Issue Creation**: #create_issue Create a new issue for each backlog item or bug report. Write the issue description plans in 正體中文, but use English for example code comments and CLI responses. The plan should be very detailed (try your best!). Please write that enables anyone to complete the work successfully. 41 | 2. **Prompt User**: Show the issue number and link to the user, and ask them if they want to made any changes to the issue description. If they do, you can edit the issue description using #update_issue . 42 | 43 | ### Implementation Stage 44 | 45 | **Only start to implement stage when user prompt you to do so!** 46 | **Implementation Stage is to implement the plan step by step, following the instructions provided in the issue and submit a work report PR at last** 47 | 48 | 1. **Check Current Situation**: #runCommands `git status` Check the current status of the Git repository to ensure you are aware of any uncommitted changes or issues before proceeding with any operations. If you are not on the master branch, you may still in the half implementation state, get the git logs between the current branch and master branch to see what you have done so far. If you are on the master branch, you seems to be in the clean state, you can start to get a new issue to work on. 49 | 2. **Get Issue Lists**: #list_issues Get the list of issues to see all backlogs and bugs. Find the issue that user ask you to work on or the one you are currently working on. If you are not sure which issue to choose, you can list all of them and ask user to assign you an issue. 50 | 3. **Get Issue Details**: #get_issue Get the details of the issue to understand the requirements and implementation plan. Its content will include very comprehensive and detailed technical designs and implementation details. Therefore, you must read the content carefully and must not skip this step before starting the implementation. 51 | 4. **Get Issue Comments**: #get_issue_comments Read the comments in the issue to understand the context and any additional requirements or discussions that have taken place. Please read it to determine whether this issue has been completed, whether further implementation is needed, or if there are still problems that need to be fixed. This step must not be skipped before starting implementation. 52 | 5. **Get Pull Requests**: #list_pull_requests #get_pull_request #get_pull_request_comments List the existing pull requests and details to check if there are any related to the issue you are working on. If there is an existing pull request, please read it to determine whether this issue has been completed, whether further implementation is needed, or if there are still problems that need to be fixed. This step must not be skipped before starting implementation. 53 | 6. **Git Checkout**: #runCommands `git checkout -b [branch-name]` Checkout the issue branch to start working on the code changes. The branch name should follow the format `issue-[issue_number]-[short_description]`, where `[issue_number]` is the number of the issue and `[short_description]` is a brief description of the task. Skip this step if you are already on the correct branch. 54 | 7. **Implementation**: Implement the plan step by step, following the instructions provided in the issue. Each step should be executed in sequence, ensuring that all requirements are met and documented appropriately. 55 | 8. **Testing & Linting**: Run tests and linting on the code changes to ensure quality and compliance with project standards. 56 | 9. **Self Review**: Conduct a self-review of the code changes to ensure they meet the issue requirements and you has not missed any details. 57 | 10. **Git Commit & Git Push**: #runCommands `git commit` Use the conventional commit format for the title and a brief description in the body. Always commit with `--signoff` and explicitly specify the author on the command: `GitHub Copilot `. Write the commit in English. Link the issue number in the commit message body. #runCommands `git push` Push the changes to the remote repository. 58 | 11. **Create Pull Request**: #list_pull_requests #create_pull_request ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. Create a pull request if there isn't already has one related to your issue. Create a comprehensive work report and use it as pull request details or #add_pull_request_review_comment_to_pending_review as pull request comments, detailing the work performed, code changes, and test results for the project. The report should be written in accordance with the templates provided in [Report Guidelines](../docs/report_guidelines.md) and [REPORT_TEMPLATE](../docs/REPORT_TEMPLATE.md). Follow the template exactly. Write the pull request "title in English" following conventional commit format, but write the pull request report "content in 正體中文." Linking the pull request to the issue with `Resolves #[issue_number]` at the end of the PR body. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR to `upstream`. 59 | 60 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!*** 61 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!*** 62 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!*** 63 | 64 | --- 65 | 66 | ## Project Overview 67 | 68 | This project provides a **Docker containerization** for [WhisperX](https://github.com/m-bain/whisperX). 69 | 70 | The project focuses on **continuous integration optimization** for building 175+ Docker images (10GB each) weekly on GitHub Free runners, emphasizing efficient docker layer caching, parallel builds, and minimal image sizes. 71 | 72 | The focus of this project is on the Dockerfile and CI workflow, not on the WhisperX project itself. 73 | 74 | ## Project Structure 75 | 76 | ``` 77 | docker-whisperX/ 78 | ├── Dockerfile # Main Docker build configuration (For docker compatibility) 79 | ├── ubi.Dockerfile # Red Hat UBI-based alternative (For podman compatibility) 80 | ├── docker-bake.hcl # Docker Buildx bake configuration for matrix builds 81 | ├── load_align_model.py # Preloads alignment models for supported languages 82 | ├── whisperX/ # Git submodule containing WhisperX source code 83 | │ ├── pyproject.toml # Python package configuration 84 | │ └── whisperx/ # Main WhisperX Python package 85 | └── .github/ 86 | └── workflows/ # CI/CD pipeline configurations 87 | ``` 88 | 89 | ## Coding Standards and Conventions 90 | 91 | ### Docker Best Practices 92 | - Use **multi-stage builds** to minimize final image size 93 | - Leverage **BuildKit features** like `--mount=type=cache` for dependency caching 94 | - Apply **layer caching strategies** to optimize CI build times 95 | - Use **ARG** variables for build-time configuration (WHISPER_MODEL, LANG, etc.) 96 | - Follow **security best practices**: run as non-root user, minimize installed packages 97 | - Do not use `--link` in ubi.Dockerfile, as it is not supported by Podman. 98 | - Do not use `,z` or `,Z` in Dockerfile, as it is not supported by Docker buildx. 99 | 100 | ### Documentation Standards 101 | - Write documentation in English for user-facing content 102 | - Use **English** for technical comments in code and commit messages 103 | - Include **clear examples** in README files showing actual usage commands 104 | - Document **build arguments** and their acceptable values 105 | - Provide **troubleshooting guidance** for common issues 106 | 107 | ## Key Technologies and Dependencies 108 | 109 | ### Build Tools 110 | - **uv**: Modern Python package manager for dependency resolution (Used in Dockerfile) 111 | - **Docker Buildx**: Extended build capabilities with bake support 112 | - **GitHub Actions**: CI/CD automation for multi-architecture builds 113 | 114 | ## Development Guidelines 115 | 116 | ### When Working with Docker Configuration 117 | - **Dockerfile modifications**: Always test both `amd64` and `arm64` architectures 118 | - **Build arguments**: Validate that ARG values match supported languages in `load_align_model.py` 119 | - **Cache optimization**: Consider layer ordering impact on CI build performance 120 | - **Multi-stage builds**: Ensure each stage serves a clear purpose (build → no_model → load_whisper) 121 | 122 | ### When Working with CI/CD 123 | - **Parallel builds**: Consider the large amount of build matrix impact on GitHub runner resources 124 | - **Caching strategy**: Optimize for both build time and cache storage efficiency 125 | - **Multi-architecture**: Ensure changes work correctly on both x86_64 and arm64 126 | 127 | ## Project-Specific Conventions 128 | 129 | ## Additional Notes for Contributors 130 | 131 | When suggesting changes, always consider the impact on: 132 | 1. **Build time efficiency** for the CI pipeline 133 | 2. **Multi-architecture compatibility** (amd64/arm64) 134 | 135 | --- 136 | 137 | When contributing to this codebase, adhere strictly to these directives to ensure consistency with the existing architectural conventions and stylistic norms. 138 | -------------------------------------------------------------------------------- /.github/workflows/.prettierrc: -------------------------------------------------------------------------------- 1 | { 2 | "tabWidth": 2, 3 | "useTabs": false 4 | } 5 | -------------------------------------------------------------------------------- /.github/workflows/01-build-base-images.yml: -------------------------------------------------------------------------------- 1 | # Build base images workflow 2 | # This workflow builds the ubi-no_model and no_model base images 3 | name: "01-build-base-images" 4 | 5 | on: 6 | workflow_run: 7 | workflows: ["docker_publish"] 8 | types: [completed] 9 | 10 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job. 11 | permissions: 12 | contents: read 13 | packages: write 14 | id-token: write 15 | attestations: write 16 | 17 | env: 18 | REGISTRY_IMAGE: ${{ github.repository_owner }}/whisperx 19 | 20 | jobs: 21 | # Build the ubi-no_model in parallel across multiple platforms 22 | build_ubi: 23 | runs-on: ${{ matrix.runner }} 24 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 25 | strategy: 26 | fail-fast: false 27 | matrix: 28 | include: 29 | - platform: linux/amd64 30 | runner: ubuntu-latest 31 | - platform: linux/arm64 32 | runner: ubuntu-24.04-arm 33 | outputs: 34 | digest: ${{ steps.build.outputs.digest }} 35 | steps: 36 | - name: Checkout 37 | uses: actions/checkout@v4 38 | with: 39 | submodules: true 40 | 41 | - name: Prepare platform variables 42 | run: | 43 | platform=${{ matrix.platform }} 44 | echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV 45 | 46 | - name: Setup docker 47 | id: setup 48 | uses: ./.github/workflows/docker-reused-steps 49 | with: 50 | tag: ubi-no_model 51 | 52 | - name: Build and push by digest (ubi-no_model) 53 | uses: docker/build-push-action@v6 54 | id: build 55 | with: 56 | context: . 57 | file: ./ubi.Dockerfile 58 | target: no_model 59 | platforms: ${{ matrix.platform }} 60 | labels: ${{ steps.setup.outputs.labels }} 61 | build-args: | 62 | VERSION=${{ github.event.workflow_run.head_sha }} 63 | RELEASE=${{ github.event.workflow_run.run_number }} 64 | cache-from: | 65 | type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-${{ env.PLATFORM_PAIR }} 66 | cache-to: | 67 | type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-${{ env.PLATFORM_PAIR }},mode=max 68 | outputs: | 69 | type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true 70 | sbom: true 71 | provenance: true 72 | 73 | - name: Export digest 74 | run: | 75 | mkdir -p /tmp/digests 76 | digest="${{ steps.build.outputs.digest }}" 77 | echo "${digest#sha256:}" > /tmp/digests/${{ env.PLATFORM_PAIR }} 78 | 79 | - name: Upload digest 80 | uses: actions/upload-artifact@v4 81 | with: 82 | name: digests-ubi-${{ env.PLATFORM_PAIR }} 83 | path: /tmp/digests/* 84 | if-no-files-found: error 85 | retention-days: 1 86 | 87 | # Test ubi-no_model on amd64 platform only (for performance) 88 | test_ubi: 89 | runs-on: ubuntu-latest 90 | needs: build_ubi 91 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 92 | steps: 93 | - name: Checkout 94 | uses: actions/checkout@v4 95 | with: 96 | submodules: true 97 | 98 | - name: Setup docker 99 | id: setup 100 | uses: ./.github/workflows/docker-reused-steps 101 | with: 102 | tag: ubi-no_model 103 | 104 | - name: Download build digests 105 | uses: actions/download-artifact@v4 106 | with: 107 | name: digests-ubi-linux-amd64 108 | path: /tmp/test-digests 109 | 110 | - name: Pull test image from registry 111 | id: pull 112 | run: | 113 | # Get the digest for amd64 platform 114 | DIGEST=$(cat /tmp/test-digests/linux-amd64) 115 | IMAGE_REF="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:${DIGEST}" 116 | echo "Pulling image: $IMAGE_REF" 117 | docker pull "$IMAGE_REF" 118 | # Tag the image for easier reference in tests 119 | docker tag "$IMAGE_REF" "test-image:ubi-no_model" 120 | echo "imageid=test-image:ubi-no_model" >> $GITHUB_OUTPUT 121 | 122 | - name: Test ubi-no_model docker image 123 | run: | 124 | docker run --group-add 0 -v ".:/app" ${{ steps.pull.outputs.imageid }} -- --model base --language en --device cpu --compute_type int8 --output_format srt .github/workflows/test/en.webm; 125 | if [ ! -f en.srt ]; then 126 | echo "The en.srt file does not exist" 127 | exit 1 128 | fi 129 | echo "cat en.srt:"; 130 | cat en.srt; 131 | if ! grep -qi 'no' en.srt; then 132 | echo "The en.srt file does not contain the word 'no'" 133 | exit 1 134 | fi 135 | echo "Test passed." 136 | 137 | # Merge all platform builds into manifest list for ubi-no_model 138 | merge_ubi: 139 | runs-on: ubuntu-latest 140 | needs: [build_ubi, test_ubi] 141 | steps: 142 | - name: Checkout 143 | uses: actions/checkout@v4 144 | 145 | - name: Download digests 146 | uses: actions/download-artifact@v4 147 | with: 148 | path: /tmp/digests 149 | pattern: digests-ubi-* 150 | merge-multiple: true 151 | 152 | - name: Setup docker 153 | id: setup 154 | uses: ./.github/workflows/docker-reused-steps 155 | with: 156 | tag: ubi-no_model 157 | 158 | - name: Create GHCR manifest list 159 | run: | 160 | echo "Creating manifest list for GHCR..." 161 | cd /tmp/digests 162 | echo "Files in /tmp/digests:" 163 | ls -la 164 | echo "Building digest references..." 165 | digest_refs="" 166 | for file in linux-*; do 167 | if [[ -f "$file" ]]; then 168 | digest=$(cat "$file") 169 | echo "Processing $file with digest: $digest" 170 | digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest " 171 | fi 172 | done 173 | echo "Final digest references: $digest_refs" 174 | docker buildx imagetools create \ 175 | $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \ 176 | $digest_refs 177 | 178 | - name: Get final manifest digest 179 | id: get_digest 180 | run: | 181 | # Get the digest of the manifest list we just created 182 | echo "Available GHCR tags:" 183 | jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON 184 | IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1) 185 | echo "Using tag for digest lookup: $IMAGE_TAG" 186 | # Get the raw digest output and extract only the sha256 part 187 | DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}") 188 | echo "Raw digest output: $DIGEST_RAW" 189 | # Extract only the digest hash, removing any MediaType or other formatting 190 | DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1) 191 | echo "Extracted digest: $DIGEST" 192 | echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT 193 | 194 | - name: Attest GHCR image (ubi-no_model) 195 | uses: actions/attest-build-provenance@v2 196 | with: 197 | subject-name: ghcr.io/${{ github.repository_owner }}/whisperx 198 | subject-digest: ${{ steps.get_digest.outputs.manifest_digest }} 199 | 200 | # Build the no_model in parallel across multiple platforms 201 | build_no_model: 202 | runs-on: ${{ matrix.runner }} 203 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 204 | strategy: 205 | fail-fast: false 206 | matrix: 207 | include: 208 | - platform: linux/amd64 209 | runner: ubuntu-latest 210 | - platform: linux/arm64 211 | runner: ubuntu-24.04-arm 212 | outputs: 213 | digest: ${{ steps.build.outputs.digest }} 214 | steps: 215 | - name: Checkout 216 | uses: actions/checkout@v4 217 | with: 218 | submodules: true 219 | 220 | - name: Prepare platform variables 221 | run: | 222 | platform=${{ matrix.platform }} 223 | echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV 224 | 225 | - name: Setup docker 226 | id: setup 227 | uses: ./.github/workflows/docker-reused-steps 228 | 229 | - name: Build and push by digest (no_model) 230 | uses: docker/build-push-action@v6 231 | id: build 232 | with: 233 | context: . 234 | file: ./Dockerfile 235 | target: no_model 236 | platforms: ${{ matrix.platform }} 237 | labels: ${{ steps.setup.outputs.labels }} 238 | build-args: | 239 | VERSION=${{ github.event.workflow_run.head_sha }} 240 | RELEASE=${{ github.event.workflow_run.run_number }} 241 | cache-from: | 242 | type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-${{ env.PLATFORM_PAIR }} 243 | cache-to: | 244 | type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-${{ env.PLATFORM_PAIR }},mode=max 245 | outputs: | 246 | type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true 247 | sbom: true 248 | provenance: true 249 | 250 | - name: Export digest 251 | run: | 252 | mkdir -p /tmp/digests 253 | digest="${{ steps.build.outputs.digest }}" 254 | echo "${digest#sha256:}" > /tmp/digests/${{ env.PLATFORM_PAIR }} 255 | 256 | - name: Upload digest 257 | uses: actions/upload-artifact@v4 258 | with: 259 | name: digests-no_model-${{ env.PLATFORM_PAIR }} 260 | path: /tmp/digests/* 261 | if-no-files-found: error 262 | retention-days: 1 263 | 264 | # Test no_model on amd64 platform only (for performance) 265 | test_no_model: 266 | runs-on: ubuntu-latest 267 | needs: build_no_model 268 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 269 | steps: 270 | - name: Checkout 271 | uses: actions/checkout@v4 272 | with: 273 | submodules: true 274 | 275 | - name: Setup docker 276 | id: setup 277 | uses: ./.github/workflows/docker-reused-steps 278 | 279 | - name: Download build digests 280 | uses: actions/download-artifact@v4 281 | with: 282 | name: digests-no_model-linux-amd64 283 | path: /tmp/test-digests 284 | 285 | - name: Pull test image from registry 286 | id: pull 287 | run: | 288 | # Get the digest for amd64 platform 289 | DIGEST=$(cat /tmp/test-digests/linux-amd64) 290 | IMAGE_REF="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:${DIGEST}" 291 | echo "Pulling image: $IMAGE_REF" 292 | docker pull "$IMAGE_REF" 293 | # Tag the image for easier reference in tests 294 | docker tag "$IMAGE_REF" "test-image:no_model" 295 | echo "imageid=test-image:no_model" >> $GITHUB_OUTPUT 296 | 297 | - name: Test no_model docker image 298 | run: | 299 | docker run --group-add 0 -v ".:/app" ${{ steps.pull.outputs.imageid }} -- --model base --language en --device cpu --compute_type int8 --output_format srt .github/workflows/test/en.webm; 300 | if [ ! -f en.srt ]; then 301 | echo "The en.srt file does not exist" 302 | exit 1 303 | fi 304 | echo "cat en.srt:"; 305 | cat en.srt; 306 | if ! grep -qi 'no' en.srt; then 307 | echo "The en.srt file does not contain the word 'no'" 308 | exit 1 309 | fi 310 | echo "Test passed." 311 | 312 | # Merge all platform builds into manifest list for no_model 313 | merge_no_model: 314 | runs-on: ubuntu-latest 315 | needs: [build_no_model, test_no_model] 316 | outputs: 317 | digest: ${{ steps.get_digest.outputs.manifest_digest }} 318 | steps: 319 | - name: Checkout 320 | uses: actions/checkout@v4 321 | 322 | - name: Download digests 323 | uses: actions/download-artifact@v4 324 | with: 325 | path: /tmp/digests 326 | pattern: digests-no_model-* 327 | merge-multiple: true 328 | 329 | - name: Setup docker 330 | id: setup 331 | uses: ./.github/workflows/docker-reused-steps 332 | 333 | - name: Create GHCR manifest list 334 | run: | 335 | echo "Creating manifest list for GHCR..." 336 | cd /tmp/digests 337 | echo "Files in /tmp/digests:" 338 | ls -la 339 | echo "Building digest references..." 340 | digest_refs="" 341 | for file in linux-*; do 342 | if [[ -f "$file" ]]; then 343 | digest=$(cat "$file") 344 | echo "Processing $file with digest: $digest" 345 | digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest " 346 | fi 347 | done 348 | echo "Final digest references: $digest_refs" 349 | docker buildx imagetools create \ 350 | $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \ 351 | $digest_refs 352 | 353 | - name: Get final manifest digest 354 | id: get_digest 355 | run: | 356 | # Get the digest of the manifest list we just created 357 | echo "Available GHCR tags:" 358 | jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON 359 | IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1) 360 | echo "Using tag for digest lookup: $IMAGE_TAG" 361 | # Get the raw digest output and extract only the sha256 part 362 | DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}") 363 | echo "Raw digest output: $DIGEST_RAW" 364 | # Extract only the digest hash, removing any MediaType or other formatting 365 | DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1) 366 | echo "Extracted digest: $DIGEST" 367 | echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT 368 | 369 | - name: Attest GHCR image (no_model) 370 | uses: actions/attest-build-provenance@v2 371 | with: 372 | subject-name: ghcr.io/${{ github.repository_owner }}/whisperx 373 | subject-digest: ${{ steps.get_digest.outputs.manifest_digest }} 374 | -------------------------------------------------------------------------------- /.github/workflows/02-build-model-cache.yml: -------------------------------------------------------------------------------- 1 | # Build model cache workflow 2 | # This workflow builds the Whisper model cache images for all supported models 3 | name: "02-build-model-cache" 4 | 5 | on: 6 | workflow_run: 7 | workflows: ["01-build-base-images"] 8 | types: [completed] 9 | 10 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job. 11 | permissions: 12 | contents: read 13 | packages: write 14 | id-token: write 15 | attestations: write 16 | 17 | env: 18 | REGISTRY_IMAGE: ${{ github.repository_owner }}/whisperx 19 | 20 | jobs: 21 | # Build model cache in parallel across multiple platforms 22 | build_cache: 23 | runs-on: ${{ matrix.runner }} 24 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 25 | strategy: 26 | fail-fast: false 27 | matrix: 28 | model: 29 | - tiny 30 | - base 31 | - small 32 | - medium 33 | - large-v3 34 | - distil-large-v3 35 | platform: 36 | - linux/amd64 37 | - linux/arm64 38 | include: 39 | - platform: linux/amd64 40 | runner: ubuntu-latest 41 | - platform: linux/arm64 42 | runner: ubuntu-24.04-arm 43 | outputs: 44 | digest: ${{ steps.build.outputs.digest }} 45 | steps: 46 | - name: Checkout 47 | uses: actions/checkout@v4 48 | with: 49 | submodules: true 50 | 51 | - name: Setup docker 52 | id: setup 53 | uses: ./.github/workflows/docker-reused-steps 54 | with: 55 | tag: cache-${{ matrix.model }} 56 | 57 | - name: Get no_model digest from triggering workflow 58 | id: get-base-digest 59 | run: | 60 | # Get the triggering workflow run information 61 | WORKFLOW_RUN_ID="${{ github.event.workflow_run.id }}" 62 | 63 | # Get the no_model digest from the artifacts or outputs 64 | # For now, we'll use the latest no_model image with the same commit SHA 65 | COMMIT_SHA="${{ github.event.workflow_run.head_sha }}" 66 | SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7) 67 | 68 | # Use the no_model image with short SHA tag 69 | NO_MODEL_IMAGE="ghcr.io/jim60105/whisperx:no_model-$SHORT_SHA" 70 | echo "no_model_image=$NO_MODEL_IMAGE" >> $GITHUB_OUTPUT 71 | 72 | - name: Build and push by digest (cache-${{ matrix.model }}) 73 | uses: docker/build-push-action@v6 74 | id: build 75 | with: 76 | context: . 77 | file: ./Dockerfile 78 | target: load_whisper 79 | platforms: ${{ matrix.platform }} 80 | labels: ${{ steps.setup.outputs.labels }} 81 | build-args: | 82 | WHISPER_MODEL=${{ matrix.model }} 83 | NO_MODEL_STAGE=${{ steps.get-base-digest.outputs.no_model_image }} 84 | VERSION=${{ github.event.workflow_run.head_sha }} 85 | RELEASE=${{ github.event.workflow_run.run_number }} 86 | cache-from: | 87 | type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-model-${{ matrix.model }} 88 | cache-to: | 89 | type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-model-${{ matrix.model }},mode=max 90 | outputs: | 91 | type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true 92 | sbom: true 93 | provenance: true 94 | 95 | - name: Export digest 96 | run: | 97 | mkdir -p /tmp/digests 98 | digest="${{ steps.build.outputs.digest }}" 99 | platform="${{ matrix.platform }}" 100 | platform_safe="${platform//\//-}" 101 | echo "${digest#sha256:}" > "/tmp/digests/${platform_safe}" 102 | 103 | - name: Upload digest 104 | uses: actions/upload-artifact@v4 105 | with: 106 | name: digests-cache-${{ matrix.model }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }} 107 | path: /tmp/digests/* 108 | if-no-files-found: error 109 | retention-days: 1 110 | 111 | # Test model cache builds by pulling from registry 112 | test_cache: 113 | runs-on: ${{ matrix.runner }} 114 | needs: build_cache 115 | strategy: 116 | fail-fast: false 117 | matrix: 118 | model: 119 | - tiny 120 | - base 121 | - small 122 | - medium 123 | - large-v3 124 | - distil-large-v3 125 | platform: 126 | - linux/amd64 127 | - linux/arm64 128 | include: 129 | - platform: linux/amd64 130 | runner: ubuntu-latest 131 | - platform: linux/arm64 132 | runner: ubuntu-24.04-arm 133 | steps: 134 | - name: Checkout 135 | uses: actions/checkout@v4 136 | with: 137 | submodules: true 138 | 139 | - name: Download digests 140 | uses: actions/download-artifact@v4 141 | with: 142 | name: digests-cache-${{ matrix.model }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }} 143 | path: /tmp/digests 144 | 145 | - name: Setup Docker Buildx 146 | uses: docker/setup-buildx-action@v3 147 | 148 | - name: Log in to Container Registry 149 | uses: docker/login-action@v3 150 | with: 151 | registry: ghcr.io 152 | username: ${{ github.actor }} 153 | password: ${{ secrets.GITHUB_TOKEN }} 154 | 155 | - name: Test pull cache-${{ matrix.model }} image 156 | run: | 157 | cd /tmp/digests 158 | digest=$(cat linux-*) 159 | echo "Testing pull of cache-${{ matrix.model }} for platform ${{ matrix.platform }}" 160 | docker pull --platform ${{ matrix.platform }} ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest 161 | echo "Successfully pulled cache-${{ matrix.model }} image for ${{ matrix.platform }}" 162 | 163 | # Merge all platform builds into manifest list for each model cache 164 | merge_cache: 165 | runs-on: ubuntu-latest 166 | needs: test_cache 167 | strategy: 168 | matrix: 169 | model: 170 | - tiny 171 | - base 172 | - small 173 | - medium 174 | - large-v3 175 | - distil-large-v3 176 | steps: 177 | - name: Checkout 178 | uses: actions/checkout@v4 179 | with: 180 | submodules: true 181 | 182 | - name: Download digests 183 | uses: actions/download-artifact@v4 184 | with: 185 | path: /tmp/digests 186 | pattern: digests-cache-${{ matrix.model }}-* 187 | merge-multiple: true 188 | 189 | - name: Setup docker 190 | id: setup 191 | uses: ./.github/workflows/docker-reused-steps 192 | with: 193 | tag: cache-${{ matrix.model }} 194 | 195 | - name: Create GHCR manifest list 196 | run: | 197 | echo "Creating manifest list for cache-${{ matrix.model }}..." 198 | cd /tmp/digests 199 | echo "Files in /tmp/digests:" 200 | ls -la 201 | echo "Building digest references..." 202 | digest_refs="" 203 | for file in linux-*; do 204 | if [[ -f "$file" ]]; then 205 | digest=$(cat "$file") 206 | echo "Processing $file with digest: $digest" 207 | digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest " 208 | fi 209 | done 210 | echo "Final digest references: $digest_refs" 211 | docker buildx imagetools create \ 212 | $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \ 213 | $digest_refs 214 | 215 | - name: Get final manifest digest 216 | id: get_digest 217 | run: | 218 | # Get the digest of the manifest list we just created 219 | echo "Available GHCR tags:" 220 | jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON 221 | IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1) 222 | echo "Using tag for digest lookup: $IMAGE_TAG" 223 | # Get the raw digest output and extract only the sha256 part 224 | DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}") 225 | echo "Raw digest output: $DIGEST_RAW" 226 | # Extract only the digest hash, removing any MediaType or other formatting 227 | DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1) 228 | echo "Extracted digest: $DIGEST" 229 | echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT 230 | 231 | - name: Attest GHCR image (cache-${{ matrix.model }}) 232 | uses: actions/attest-build-provenance@v2 233 | with: 234 | subject-name: ghcr.io/${{ github.repository_owner }}/whisperx 235 | subject-digest: ${{ steps.get_digest.outputs.manifest_digest }} 236 | -------------------------------------------------------------------------------- /.github/workflows/03-build-distil-en.yml: -------------------------------------------------------------------------------- 1 | # Build distil-large-v3-en workflow 2 | # This workflow builds the distil-large-v3-en specialized image 3 | name: "03-build-distil-en" 4 | 5 | on: 6 | workflow_run: 7 | workflows: ["02-build-model-cache"] 8 | types: [completed] 9 | 10 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job. 11 | permissions: 12 | contents: read 13 | packages: write 14 | id-token: write 15 | attestations: write 16 | 17 | env: 18 | REGISTRY_IMAGE: ${{ github.repository_owner }}/whisperx 19 | 20 | jobs: 21 | # Build distil-large-v3-en on multiple platforms in parallel 22 | build_distil: 23 | runs-on: ${{ matrix.runner }} 24 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 25 | strategy: 26 | fail-fast: false 27 | matrix: 28 | include: 29 | - platform: linux/amd64 30 | runner: ubuntu-latest 31 | - platform: linux/arm64 32 | runner: ubuntu-24.04-arm 33 | outputs: 34 | digest: ${{ steps.build.outputs.digest }} 35 | steps: 36 | - name: Checkout 37 | uses: actions/checkout@v4 38 | with: 39 | submodules: true 40 | 41 | - name: Setup docker 42 | id: setup 43 | uses: ./.github/workflows/docker-reused-steps 44 | with: 45 | tag: distil-large-v3-en 46 | 47 | - name: Get base image references 48 | id: get-refs 49 | run: | 50 | # Get the commit SHA from the triggering workflow 51 | COMMIT_SHA="${{ github.event.workflow_run.head_sha }}" 52 | SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7) 53 | 54 | # Set image references 55 | NO_MODEL_IMAGE="ghcr.io/jim60105/whisperx:no_model-$SHORT_SHA" 56 | CACHE_IMAGE="ghcr.io/jim60105/whisperx:cache-distil-large-v3-$SHORT_SHA" 57 | 58 | echo "no_model_image=$NO_MODEL_IMAGE" >> $GITHUB_OUTPUT 59 | echo "cache_image=$CACHE_IMAGE" >> $GITHUB_OUTPUT 60 | 61 | - name: Build and push by digest (distil-large-v3-en) 62 | uses: docker/build-push-action@v6 63 | id: build 64 | with: 65 | context: . 66 | file: ./Dockerfile 67 | target: final 68 | platforms: ${{ matrix.platform }} 69 | labels: ${{ steps.setup.outputs.labels }} 70 | build-args: | 71 | WHISPER_MODEL=distil-large-v3 72 | LANG=en 73 | LOAD_WHISPER_STAGE=${{ steps.get-refs.outputs.cache_image }} 74 | NO_MODEL_STAGE=${{ steps.get-refs.outputs.no_model_image }} 75 | VERSION=${{ github.event.workflow_run.head_sha }} 76 | RELEASE=${{ github.event.workflow_run.run_number }} 77 | outputs: | 78 | type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true 79 | sbom: true 80 | provenance: true 81 | 82 | - name: Export digest 83 | run: | 84 | mkdir -p /tmp/digests 85 | digest="${{ steps.build.outputs.digest }}" 86 | platform="${{ matrix.platform }}" 87 | platform_safe="${platform//\//-}" 88 | echo "${digest#sha256:}" > "/tmp/digests/${platform_safe}" 89 | 90 | - name: Upload digest 91 | uses: actions/upload-artifact@v4 92 | with: 93 | name: digests-distil-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }} 94 | path: /tmp/digests/* 95 | if-no-files-found: error 96 | retention-days: 1 97 | 98 | # Test distil-large-v3-en builds by pulling from registry 99 | test_distil: 100 | runs-on: ${{ matrix.runner }} 101 | needs: build_distil 102 | strategy: 103 | fail-fast: false 104 | matrix: 105 | include: 106 | - platform: linux/amd64 107 | runner: ubuntu-latest 108 | - platform: linux/arm64 109 | runner: ubuntu-24.04-arm 110 | steps: 111 | - name: Checkout 112 | uses: actions/checkout@v4 113 | with: 114 | submodules: true 115 | 116 | - name: Download digests 117 | uses: actions/download-artifact@v4 118 | with: 119 | name: digests-distil-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }} 120 | path: /tmp/digests 121 | 122 | - name: Setup Docker Buildx 123 | uses: docker/setup-buildx-action@v3 124 | 125 | - name: Log in to Container Registry 126 | uses: docker/login-action@v3 127 | with: 128 | registry: ghcr.io 129 | username: ${{ github.actor }} 130 | password: ${{ secrets.GITHUB_TOKEN }} 131 | 132 | - name: Test pull distil-large-v3-en image 133 | run: | 134 | cd /tmp/digests 135 | digest=$(cat linux-*) 136 | echo "Testing pull of distil-large-v3-en for platform ${{ matrix.platform }}" 137 | docker pull --platform ${{ matrix.platform }} ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest 138 | echo "Successfully pulled distil-large-v3-en image for ${{ matrix.platform }}" 139 | 140 | # Merge all platform builds into manifest list 141 | merge_distil: 142 | runs-on: ubuntu-latest 143 | needs: test_distil 144 | steps: 145 | - name: Checkout 146 | uses: actions/checkout@v4 147 | 148 | - name: Download digests 149 | uses: actions/download-artifact@v4 150 | with: 151 | path: /tmp/digests 152 | pattern: digests-distil-* 153 | merge-multiple: true 154 | 155 | - name: Setup docker 156 | id: setup 157 | uses: ./.github/workflows/docker-reused-steps 158 | with: 159 | tag: distil-large-v3-en 160 | 161 | - name: Create GHCR manifest list 162 | run: | 163 | echo "Creating manifest list for distil-large-v3-en..." 164 | cd /tmp/digests 165 | echo "Files in /tmp/digests:" 166 | ls -la 167 | echo "Building digest references..." 168 | digest_refs="" 169 | for file in linux-*; do 170 | if [[ -f "$file" ]]; then 171 | digest=$(cat "$file") 172 | echo "Processing $file with digest: $digest" 173 | digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest " 174 | fi 175 | done 176 | echo "Final digest references: $digest_refs" 177 | docker buildx imagetools create \ 178 | $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \ 179 | $digest_refs 180 | 181 | - name: Get final manifest digest 182 | id: get_digest 183 | run: | 184 | # Get the digest of the manifest list we just created 185 | echo "Available GHCR tags:" 186 | jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON 187 | IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1) 188 | echo "Using tag for digest lookup: $IMAGE_TAG" 189 | # Get the raw digest output and extract only the sha256 part 190 | DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}") 191 | echo "Raw digest output: $DIGEST_RAW" 192 | # Extract only the digest hash, removing any MediaType or other formatting 193 | DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1) 194 | echo "Extracted digest: $DIGEST" 195 | echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT 196 | 197 | - name: Attest GHCR image (distil-large-v3-en) 198 | uses: actions/attest-build-provenance@v2 199 | with: 200 | subject-name: ghcr.io/${{ github.repository_owner }}/whisperx 201 | subject-digest: ${{ steps.get_digest.outputs.manifest_digest }} 202 | -------------------------------------------------------------------------------- /.github/workflows/04-build-matrix-images.yml: -------------------------------------------------------------------------------- 1 | # Build matrix images workflow 2 | # This workflow builds the full matrix of Docker images and runs tests 3 | name: "04-build-matrix-images" 4 | 5 | on: 6 | workflow_run: 7 | workflows: ["02-build-model-cache"] 8 | types: [completed] 9 | 10 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job. 11 | permissions: 12 | contents: read 13 | packages: write 14 | id-token: write 15 | attestations: write 16 | 17 | env: 18 | REGISTRY_IMAGE: ${{ github.repository_owner }}/whisperx 19 | 20 | # The following languages are excluded because these transcribe model are too large to build on the GitHub Actions 21 | # https://github.com/jim60105/docker-whisperX/actions/runs/8405597972 22 | # - no 23 | # - nn 24 | 25 | jobs: 26 | # Build matrix images for tiny and base models 27 | build_matrix_1: 28 | runs-on: ${{ matrix.runner }} 29 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 30 | strategy: 31 | fail-fast: false 32 | matrix: 33 | lang: 34 | - en 35 | - fr 36 | - de 37 | - es 38 | - it 39 | - ja 40 | - zh 41 | - nl 42 | - uk 43 | - pt 44 | - ar 45 | - cs 46 | - ru 47 | - pl 48 | - hu 49 | - fi 50 | - fa 51 | - el 52 | - tr 53 | - da 54 | - he 55 | - vi 56 | - ko 57 | - ur 58 | - te 59 | - hi 60 | - ca 61 | - ml 62 | - sk 63 | - sl 64 | - hr 65 | - ro 66 | - eu 67 | - gl 68 | - ka 69 | - lv 70 | - tl 71 | model: 72 | - tiny 73 | - base 74 | platform: 75 | - linux/amd64 76 | - linux/arm64 77 | include: 78 | - platform: linux/amd64 79 | runner: ubuntu-latest 80 | - platform: linux/arm64 81 | runner: ubuntu-24.04-arm 82 | outputs: 83 | digest: ${{ steps.build.outputs.digest }} 84 | steps: 85 | - name: Checkout 86 | uses: actions/checkout@v4 87 | with: 88 | submodules: true 89 | 90 | - name: Setup docker 91 | id: setup 92 | uses: ./.github/workflows/docker-reused-steps 93 | with: 94 | tag: ${{ matrix.model }}-${{ matrix.lang }} 95 | 96 | - name: Get base image references 97 | id: get-refs 98 | run: | 99 | # Get the commit SHA from the triggering workflow 100 | COMMIT_SHA="${{ github.event.workflow_run.head_sha }}" 101 | SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7) 102 | 103 | # Set image references 104 | NO_MODEL_IMAGE="ghcr.io/jim60105/whisperx:no_model-$SHORT_SHA" 105 | CACHE_IMAGE="ghcr.io/jim60105/whisperx:cache-${{ matrix.model }}-$SHORT_SHA" 106 | 107 | echo "no_model_image=$NO_MODEL_IMAGE" >> $GITHUB_OUTPUT 108 | echo "cache_image=$CACHE_IMAGE" >> $GITHUB_OUTPUT 109 | 110 | - name: Build and push by digest (${{ matrix.model }}-${{ matrix.lang }}) 111 | uses: docker/build-push-action@v6 112 | id: build 113 | with: 114 | context: . 115 | file: ./Dockerfile 116 | target: final 117 | platforms: ${{ matrix.platform }} 118 | labels: ${{ steps.setup.outputs.labels }} 119 | build-args: | 120 | WHISPER_MODEL=${{ matrix.model }} 121 | LANG=${{ matrix.lang }} 122 | LOAD_WHISPER_STAGE=${{ steps.get-refs.outputs.cache_image }} 123 | NO_MODEL_STAGE=${{ steps.get-refs.outputs.no_model_image }} 124 | VERSION=${{ github.event.workflow_run.head_sha }} 125 | RELEASE=${{ github.event.workflow_run.run_number }} 126 | outputs: | 127 | type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true 128 | sbom: true 129 | provenance: true 130 | 131 | - name: Export digest 132 | run: | 133 | mkdir -p /tmp/digests 134 | digest="${{ steps.build.outputs.digest }}" 135 | platform="${{ matrix.platform }}" 136 | platform_safe="${platform//\//-}" 137 | echo "${digest#sha256:}" > "/tmp/digests/${platform_safe}" 138 | 139 | - name: Upload digest 140 | uses: actions/upload-artifact@v4 141 | with: 142 | name: digests-${{ matrix.model }}-${{ matrix.lang }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }} 143 | path: /tmp/digests/* 144 | if-no-files-found: error 145 | retention-days: 1 146 | 147 | # Build matrix images for small, medium and large-v3 models 148 | build_matrix_2: 149 | runs-on: ${{ matrix.runner }} 150 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 151 | strategy: 152 | fail-fast: false 153 | matrix: 154 | lang: 155 | - en 156 | - fr 157 | - de 158 | - es 159 | - it 160 | - ja 161 | - zh 162 | - nl 163 | - uk 164 | - pt 165 | - ar 166 | - cs 167 | - ru 168 | - pl 169 | - hu 170 | - fi 171 | - fa 172 | - el 173 | - tr 174 | - da 175 | - he 176 | - vi 177 | - ko 178 | - ur 179 | - te 180 | - hi 181 | - ca 182 | - ml 183 | - sk 184 | - sl 185 | - hr 186 | - ro 187 | - eu 188 | - gl 189 | - ka 190 | - lv 191 | - tl 192 | model: 193 | - small 194 | - medium 195 | - large-v3 196 | platform: 197 | - linux/amd64 198 | - linux/arm64 199 | include: 200 | - platform: linux/amd64 201 | runner: ubuntu-latest 202 | - platform: linux/arm64 203 | runner: ubuntu-24.04-arm 204 | outputs: 205 | digest: ${{ steps.build.outputs.digest }} 206 | steps: 207 | - name: Checkout 208 | uses: actions/checkout@v4 209 | with: 210 | submodules: true 211 | 212 | - name: Setup docker 213 | id: setup 214 | uses: ./.github/workflows/docker-reused-steps 215 | with: 216 | tag: ${{ matrix.model }}-${{ matrix.lang }} 217 | 218 | - name: Get base image references 219 | id: get-refs 220 | run: | 221 | # Get the commit SHA from the triggering workflow 222 | COMMIT_SHA="${{ github.event.workflow_run.head_sha }}" 223 | SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7) 224 | 225 | # Set image references 226 | NO_MODEL_IMAGE="ghcr.io/jim60105/whisperx:no_model-$SHORT_SHA" 227 | CACHE_IMAGE="ghcr.io/jim60105/whisperx:cache-${{ matrix.model }}-$SHORT_SHA" 228 | 229 | echo "no_model_image=$NO_MODEL_IMAGE" >> $GITHUB_OUTPUT 230 | echo "cache_image=$CACHE_IMAGE" >> $GITHUB_OUTPUT 231 | 232 | - name: Build and push by digest (${{ matrix.model }}-${{ matrix.lang }}) 233 | uses: docker/build-push-action@v6 234 | id: build 235 | with: 236 | context: . 237 | file: ./Dockerfile 238 | target: final 239 | platforms: ${{ matrix.platform }} 240 | labels: ${{ steps.setup.outputs.labels }} 241 | build-args: | 242 | WHISPER_MODEL=${{ matrix.model }} 243 | LANG=${{ matrix.lang }} 244 | LOAD_WHISPER_STAGE=${{ steps.get-refs.outputs.cache_image }} 245 | NO_MODEL_STAGE=${{ steps.get-refs.outputs.no_model_image }} 246 | VERSION=${{ github.event.workflow_run.head_sha }} 247 | RELEASE=${{ github.event.workflow_run.run_number }} 248 | outputs: | 249 | type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true 250 | sbom: true 251 | provenance: true 252 | 253 | - name: Export digest 254 | run: | 255 | mkdir -p /tmp/digests 256 | digest="${{ steps.build.outputs.digest }}" 257 | platform="${{ matrix.platform }}" 258 | platform_safe="${platform//\//-}" 259 | echo "${digest#sha256:}" > "/tmp/digests/${platform_safe}" 260 | 261 | - name: Upload digest 262 | uses: actions/upload-artifact@v4 263 | with: 264 | name: digests-${{ matrix.model }}-${{ matrix.lang }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }} 265 | path: /tmp/digests/* 266 | if-no-files-found: error 267 | retention-days: 1 268 | 269 | # Test matrix images builds by pulling from registry (selective testing) 270 | test_matrix: 271 | runs-on: ${{ matrix.runner }} 272 | needs: [build_matrix_1, build_matrix_2] 273 | strategy: 274 | fail-fast: false 275 | matrix: 276 | # Only test a subset to save resources - focus on large-v3-zh for compatibility 277 | include: 278 | - lang: zh 279 | model: large-v3 280 | platform: linux/amd64 281 | runner: ubuntu-latest 282 | - lang: en 283 | model: tiny 284 | platform: linux/amd64 285 | runner: ubuntu-latest 286 | steps: 287 | - name: Checkout 288 | uses: actions/checkout@v4 289 | 290 | - name: Download digests 291 | uses: actions/download-artifact@v4 292 | with: 293 | name: digests-${{ matrix.model }}-${{ matrix.lang }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }} 294 | path: /tmp/digests 295 | 296 | - name: Setup Docker Buildx 297 | uses: docker/setup-buildx-action@v3 298 | 299 | - name: Log in to Container Registry 300 | uses: docker/login-action@v3 301 | with: 302 | registry: ghcr.io 303 | username: ${{ github.actor }} 304 | password: ${{ secrets.GITHUB_TOKEN }} 305 | 306 | - name: Test pull ${{ matrix.model }}-${{ matrix.lang }} image 307 | run: | 308 | cd /tmp/digests 309 | digest=$(cat linux-*) 310 | echo "Testing pull of ${{ matrix.model }}-${{ matrix.lang }} for platform ${{ matrix.platform }}" 311 | docker pull --platform ${{ matrix.platform }} ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest 312 | echo "Successfully pulled ${{ matrix.model }}-${{ matrix.lang }} image for ${{ matrix.platform }}" 313 | 314 | # Merge all platform builds into manifest lists for each model-lang combination 315 | merge_matrix: 316 | runs-on: ubuntu-latest 317 | needs: test_matrix 318 | strategy: 319 | fail-fast: false 320 | matrix: 321 | lang: 322 | - en 323 | - fr 324 | - de 325 | - es 326 | - it 327 | - ja 328 | - zh 329 | - nl 330 | - uk 331 | - pt 332 | - ar 333 | - cs 334 | - ru 335 | - pl 336 | - hu 337 | - fi 338 | - fa 339 | - el 340 | - tr 341 | - da 342 | - he 343 | - vi 344 | - ko 345 | - ur 346 | - te 347 | - hi 348 | - ca 349 | - ml 350 | - sk 351 | - sl 352 | - hr 353 | - ro 354 | - eu 355 | - gl 356 | - ka 357 | - lv 358 | - tl 359 | model: 360 | - tiny 361 | - base 362 | - small 363 | - medium 364 | - large-v3 365 | steps: 366 | - name: Checkout 367 | uses: actions/checkout@v4 368 | with: 369 | submodules: true 370 | 371 | - name: Download digests 372 | uses: actions/download-artifact@v4 373 | with: 374 | path: /tmp/digests 375 | pattern: digests-${{ matrix.model }}-${{ matrix.lang }}-* 376 | merge-multiple: true 377 | 378 | - name: Setup docker 379 | id: setup 380 | uses: ./.github/workflows/docker-reused-steps 381 | with: 382 | tag: ${{ matrix.model }}-${{ matrix.lang }} 383 | 384 | - name: Create GHCR manifest list 385 | run: | 386 | echo "Creating manifest list for ${{ matrix.model }}-${{ matrix.lang }}..." 387 | cd /tmp/digests 388 | echo "Files in /tmp/digests:" 389 | ls -la 390 | echo "Building digest references..." 391 | digest_refs="" 392 | for file in linux-*; do 393 | if [[ -f "$file" ]]; then 394 | digest=$(cat "$file") 395 | echo "Processing $file with digest: $digest" 396 | digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest " 397 | fi 398 | done 399 | echo "Final digest references: $digest_refs" 400 | docker buildx imagetools create \ 401 | $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \ 402 | $digest_refs 403 | 404 | - name: Get final manifest digest 405 | id: get_digest 406 | run: | 407 | # Get the digest of the manifest list we just created 408 | echo "Available GHCR tags:" 409 | jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON 410 | IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1) 411 | echo "Using tag for digest lookup: $IMAGE_TAG" 412 | # Get the raw digest output and extract only the sha256 part 413 | DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}") 414 | echo "Raw digest output: $DIGEST_RAW" 415 | # Extract only the digest hash, removing any MediaType or other formatting 416 | DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1) 417 | echo "Extracted digest: $DIGEST" 418 | echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT 419 | 420 | - name: Attest GHCR image (${{ matrix.model }}-${{ matrix.lang }}) 421 | uses: actions/attest-build-provenance@v2 422 | with: 423 | subject-name: ghcr.io/${{ github.repository_owner }}/whisperx 424 | subject-digest: ${{ steps.get_digest.outputs.manifest_digest }} 425 | 426 | # Comprehensive test for medium-zh after merge 427 | test-medium-zh: 428 | name: Test medium-zh docker image 429 | runs-on: ubuntu-latest 430 | if: ${{ github.event.workflow_run.conclusion == 'success' }} 431 | needs: merge_matrix 432 | steps: 433 | # We require additional space due to the large size of our image. (~10GB) 434 | - name: Free Disk Space (Ubuntu) 435 | uses: jlumbroso/free-disk-space@main 436 | with: 437 | tool-cache: true 438 | android: true 439 | dotnet: true 440 | haskell: true 441 | large-packages: true 442 | docker-images: true 443 | swap-storage: false 444 | 445 | - name: Checkout 446 | uses: actions/checkout@v4 447 | with: 448 | sparse-checkout: | 449 | .github/workflows/test/** 450 | sparse-checkout-cone-mode: false 451 | 452 | - name: Get image reference 453 | id: get-ref 454 | run: | 455 | # Get the commit SHA from the triggering workflow 456 | COMMIT_SHA="${{ github.event.workflow_run.head_sha }}" 457 | SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7) 458 | 459 | # Set image reference 460 | IMAGE="ghcr.io/jim60105/whisperx:medium-zh-$SHORT_SHA" 461 | echo "image=$IMAGE" >> $GITHUB_OUTPUT 462 | 463 | - name: Test medium-zh docker image 464 | run: | 465 | docker run --group-add 0 -v ".:/app" ${{ steps.get-ref.outputs.image }} -- --device cpu --compute_type int8 --output_format srt .github/workflows/test/zh.webm; 466 | if [ ! -f zh.srt ]; then 467 | echo "The zh.srt file does not exist" 468 | exit 1 469 | fi 470 | echo "cat zh.srt:"; 471 | cat zh.srt; 472 | if ! grep -qi -e '充满' -e '充滿' zh.srt; then 473 | echo "The zh.srt file does not contain the word '充满' or '充滿'" 474 | exit 1 475 | fi 476 | echo "Test passed." 477 | -------------------------------------------------------------------------------- /.github/workflows/README.md: -------------------------------------------------------------------------------- 1 | # Docker Workflow Architecture 2 | 3 | This document describes the distributed multi-platform CI/CD workflow architecture for building Docker images in the docker-whisperX project. 4 | 5 | ## Overview 6 | 7 | The workflow architecture has been completely refactored to implement **distributed multi-platform builds** with massive parallel processing capabilities. The original single `docker_publish.yml` workflow has been split into 5 specialized workflow files, each supporting native multi-platform builds (linux/amd64 + linux/arm64) to improve maintainability, parallel processing efficiency, and fault isolation for building 370+ platform-specific Docker images (10GB each). 8 | 9 | ## Workflow Chain 10 | 11 | ```mermaid 12 | graph TD 13 | A[docker_publish.yml] -->|triggers| B[01-build-base-images.yml] 14 | B -->|build_base| C[02-build-model-cache.yml] 15 | B -->|merge_base| C 16 | C -->|build_model| D[03-build-distil-en.yml] 17 | C -->|merge_model| D 18 | C -->|build_model| E[04-build-matrix-images.yml] 19 | C -->|merge_model| E 20 | E -->|build_matrix| F[test-matrix] 21 | E -->|merge_matrix| G[test-large-v3-zh] 22 | 23 | subgraph "Multi-Platform Jobs" 24 | B1[build_base_amd64] 25 | B2[build_base_arm64] 26 | C1[build_model_amd64] 27 | C2[build_model_arm64] 28 | E1[build_matrix_amd64_370jobs] 29 | E2[build_matrix_arm64_370jobs] 30 | end 31 | 32 | B --> B1 33 | B --> B2 34 | C --> C1 35 | C --> C2 36 | E --> E1 37 | E --> E2 38 | ``` 39 | 40 | ## Workflow Files 41 | 42 | ### 1. `docker_publish.yml` (Entry Point) 43 | - **Purpose**: Main trigger and coordination 44 | - **Triggers**: Push to master, tags, manual dispatch 45 | - **Actions**: Logs build chain initiation info 46 | - **Next**: Triggers `01-build-base-images.yml` 47 | 48 | ### 2. `01-build-base-images.yml` 49 | - **Purpose**: Build base images with distributed multi-platform architecture 50 | - **Triggered by**: `docker_publish.yml` 51 | - **Architecture**: Three-stage distributed build process 52 | - **Jobs**: 53 | - `build_base`: 6 parallel platform-specific builds (3 images × 2 platforms) 54 | - `test_base`: Selective testing for key images 55 | - `merge_base`: Create manifest lists for multi-platform images 56 | - **Platforms**: linux/amd64 (ubuntu-latest) + linux/arm64 (ubuntu-24.04-arm) 57 | - **Outputs**: Base image digests and manifest lists for downstream workflows 58 | - **Next**: Triggers `02-build-model-cache.yml` 59 | 60 | ### 3. `02-build-model-cache.yml` 61 | - **Purpose**: Build Whisper model cache images with distributed multi-platform architecture 62 | - **Triggered by**: `01-build-base-images.yml` 63 | - **Architecture**: Three-stage distributed build process 64 | - **Jobs**: 65 | - `build_model`: 10 parallel platform-specific builds (5 models × 2 platforms) 66 | - `test_model`: Selective testing for key model combinations 67 | - `merge_model`: Create manifest lists for multi-platform model images 68 | - **Models**: tiny, base, small, medium, large-v3 69 | - **Platforms**: Native builds on linux/amd64 + linux/arm64 70 | - **Dependencies**: Uses base images from previous workflow 71 | - **Next**: Triggers both `03-build-distil-en.yml` and `04-build-matrix-images.yml` in parallel 72 | 73 | ### 4. `03-build-distil-en.yml` 74 | - **Purpose**: Build distil-large-v3-en specialized image with distributed multi-platform architecture 75 | - **Triggered by**: `02-build-model-cache.yml` 76 | - **Architecture**: Three-stage distributed build process 77 | - **Jobs**: 78 | - `build_distil`: 2 parallel platform-specific builds (1 model × 2 platforms) 79 | - `test_distil`: Comprehensive testing for English optimization 80 | - `merge_distil`: Create manifest list for multi-platform distil image 81 | - **Platforms**: Native builds on linux/amd64 + linux/arm64 82 | - **Specialization**: English-optimized distil model with enhanced performance 83 | - **Parallel with**: `04-build-matrix-images.yml` 84 | 85 | ### 5. `04-build-matrix-images.yml` 86 | - **Purpose**: Build full image matrix with massive-scale distributed multi-platform architecture 87 | - **Triggered by**: `02-build-model-cache.yml` 88 | - **Architecture**: Three-stage distributed build process at unprecedented scale 89 | - **Jobs**: 90 | - `build_matrix`: **370 parallel platform-specific builds** (37 languages × 5 models × 2 platforms) 91 | - `test_matrix`: Selective testing strategy for key language combinations 92 | - `merge_matrix`: 185 manifest list creations for multi-platform matrix images 93 | - `test-large-v3-zh`: Comprehensive Chinese image functionality testing 94 | - **Scale**: Industry-leading 370 parallel Docker builds 95 | - **Platforms**: Native builds on linux/amd64 + linux/arm64 96 | - **Languages**: 37 supported languages with alignment models 97 | - **Models**: tiny, base, small, medium, large-v3 98 | - **Innovation**: Selective testing strategy to manage massive resource usage 99 | - **Parallel with**: `03-build-distil-en.yml` 100 | 101 | ## Key Features 102 | 103 | ### Distributed Multi-Platform Architecture 104 | - **Native Platform Builds**: linux/amd64 (ubuntu-latest) + linux/arm64 (ubuntu-24.04-arm) 105 | - **QEMU Elimination**: Complete removal of emulation overhead for 50-70% build time reduction 106 | - **Digest-based Builds**: Push-by-digest strategy with artifact-based manifest list creation 107 | - **Three-stage Process**: build → test → merge pattern for optimal resource utilization 108 | 109 | ### Massive Scale Parallel Processing 110 | - **370 Parallel Jobs**: Industry-leading scale for Docker matrix builds 111 | - **Intelligent Scheduling**: Strategic max-parallel settings to prevent GitHub Actions overload 112 | - **Selective Testing**: Resource-optimized testing strategy for large-scale matrices 113 | - **Error Isolation**: fail-fast: false allows partial failures without impacting other builds 114 | 115 | ### Advanced Caching Strategy 116 | - **Layered Cache Naming**: cache-base, cache-model-{model}, cache-matrix-{model}-{lang} 117 | - **Platform-specific Caching**: Separate cache spaces for amd64 and arm64 118 | - **Conflict Avoidance**: Sophisticated naming prevents cache collisions 119 | - **Build Time Optimization**: Maximized cache reuse across workflow stages 120 | 121 | ### Workflow Chaining 122 | - Uses `workflow_run` events for dependency management 123 | - Conditional execution based on previous workflow success 124 | - Proper error handling and failure isolation 125 | 126 | ### Image Reference Management 127 | - Base images are referenced by commit SHA tags 128 | - Consistent naming convention across workflows 129 | - Proper digest passing between dependent workflows 130 | 131 | ## Benefits 132 | 133 | ### Performance Revolution 134 | - **50-70% Build Time Reduction**: Native multi-platform builds eliminate QEMU emulation overhead 135 | - **Massive Parallelization**: 370 simultaneous builds vs. previous sequential execution 136 | - **Resource Maximization**: Full utilization of GitHub Actions parallel job capacity 137 | - **Network Optimization**: Layered caching reduces redundant downloads 138 | 139 | ### Maintainability 140 | - **Modular Design**: Each workflow focuses on specific build stages 141 | - **Independent Debugging**: Can test and fix specific stages in isolation 142 | - **Code Reuse**: Shared logic through reusable actions 143 | - **Platform Isolation**: Separate troubleshooting for amd64 vs arm64 issues 144 | 145 | ### Scalability & Resource Management 146 | - **Distributed Architecture**: Horizontal scaling across multiple workflow files 147 | - **Intelligent Resource Usage**: Selective testing prevents resource exhaustion 148 | - **Future-proof Design**: Easy addition of new languages, models, or platforms 149 | - **Cost Optimization**: Significant reduction in GitHub Actions minutes usage 150 | 151 | ### Fault Tolerance 152 | - **Platform Independence**: Single platform failure doesn't affect other platform 153 | - **Stage Isolation**: Build failures don't immediately impact other independent stages 154 | - **Graceful Degradation**: Partial matrix success allows useful outputs 155 | - **Retry Mechanisms**: Built-in GitHub Actions retry for transient failures 156 | 157 | ## Architecture Details 158 | 159 | ### Three-Stage Build Process 160 | 161 | Each workflow implements a consistent three-stage pattern: 162 | 163 | #### Stage 1: Distributed Build 164 | - **Platform Matrix**: Each image built natively on both linux/amd64 and linux/arm64 165 | - **Parallel Execution**: Maximum utilization of GitHub Actions runner capacity 166 | - **Digest Artifacts**: Each build produces platform-specific image digests 167 | - **Cache Optimization**: Platform-specific caching for optimal performance 168 | 169 | #### Stage 2: Selective Testing 170 | - **Strategic Selection**: Test key combinations to validate functionality without resource exhaustion 171 | - **Platform Coverage**: Ensure both architectures work correctly 172 | - **Quality Gates**: Functional testing before manifest creation 173 | - **Resource Management**: Balanced testing approach for large matrices 174 | 175 | #### Stage 3: Manifest Merge 176 | - **Multi-platform Images**: Combine platform-specific digests into manifest lists 177 | - **Registry Optimization**: Single image tag supports both architectures 178 | - **Backward Compatibility**: Maintains existing image naming conventions 179 | - **Deployment Ready**: Images ready for multi-architecture deployment 180 | 181 | ### Scale Breakdown 182 | 183 | | Workflow | Images | Platforms | Platform Builds | Test Jobs | Merge Jobs | Other | **Total Jobs** | 184 | |------------|--------|-----------|----------------|-----------|------------|-------|---------------| 185 | | 01-base | 2 | 2 | 4 | 2 | 2 | 0 | 8 | 186 | | 02-model | 6 | 2 | 12 | 12 | 6 | 0 | 30 | 187 | | 03-distil | 1 | 2 | 2 | 2 | 1 | 0 | 5 | 188 | | 04-matrix | 185 | 2 | 370 | 2 | 185 | 1 | 558 | 189 | | **Total** | **194**| **2** | **388** | **18** | **194** | **1** | **601** | 190 | 191 | ### Resource Optimization Strategies 192 | 193 | #### Caching Hierarchy 194 | ``` 195 | cache-base-{image} # Base image layer cache 196 | cache-model-{model} # Model-specific dependency cache 197 | cache-matrix-{model}-{lang} # Language-specific alignment cache 198 | ``` 199 | 200 | #### Testing Strategy 201 | - **Base Images**: Test critical base functionality 202 | - **Model Cache**: Test model loading and initialization 203 | - **Distil English**: Comprehensive English language testing 204 | - **Matrix Images**: Selective testing of key language combinations (large-v3-zh, tiny-en) 205 | 206 | #### Parallel Job Management 207 | - **Max-parallel Settings**: Prevent GitHub Actions infrastructure overload 208 | - **Job Dependencies**: Proper sequencing without blocking parallel execution 209 | - **Artifact Lifecycle**: Short retention periods for intermediate build artifacts 210 | - **Error Handling**: Graceful handling of partial failures in large matrices 211 | 212 | ## Migration Notes 213 | 214 | ### From Original Workflow 215 | - All job functionality preserved and enhanced with multi-platform support 216 | - Same Docker build arguments and caching strategies, now optimized for distributed builds 217 | - Enhanced test procedures with platform-specific validation 218 | - Same output artifacts and attestations, now with manifest list support 219 | - **Major Enhancement**: Native ARM64 builds replace QEMU emulation 220 | 221 | ### Architecture Changes 222 | - **Build Strategy**: From single-job sequential to distributed parallel builds 223 | - **Platform Support**: From QEMU emulation to native multi-platform builds 224 | - **Scale**: From 175 images to 370 platform-specific builds + 185 manifest lists 225 | - **Testing**: From sequential testing to selective parallel testing strategy 226 | - **Caching**: From simple caching to sophisticated layered cache hierarchy 227 | 228 | ### Breaking Changes 229 | - Workflow names changed (affects status badges and external references) 230 | - Build timing significantly improved due to parallel execution 231 | - Different workflow run IDs for different stages and platforms 232 | - **Performance Impact**: 50-70% faster build times expected 233 | 234 | ### Rollback Strategy 235 | - Original workflow backed up as `docker_publish.yml.backup` 236 | - Can be restored by renaming backup file 237 | - All new workflow files can be safely deleted for rollback 238 | - **Note**: Rollback loses multi-platform build benefits 239 | 240 | ## Monitoring and Troubleshooting 241 | 242 | ### Status Monitoring 243 | - **Multi-level Tracking**: Monitor workflow → job → platform level status 244 | - **Platform-specific Visibility**: Separate status for amd64 and arm64 builds 245 | - **Resource Usage Tracking**: Monitor parallel job utilization and build durations 246 | - **Cache Performance**: Track cache hit rates and build acceleration 247 | 248 | ### Performance Metrics 249 | - **Build Time Comparison**: Monitor time reduction vs. previous architecture 250 | - **Parallel Efficiency**: Track simultaneous job execution and bottlenecks 251 | - **Resource Utilization**: GitHub Actions minute usage and cost optimization 252 | - **Error Rates**: Platform-specific failure analysis and trends 253 | 254 | ### Common Issues 255 | 256 | #### Multi-Platform Specific 257 | - **Platform Build Failures**: Check platform-specific runner availability (ubuntu-24.04-arm) 258 | - **Digest Artifact Issues**: Verify artifact upload/download between build stages 259 | - **Manifest Creation Failures**: Check digest availability and format compatibility 260 | - **Cache Conflicts**: Verify cache key uniqueness across platforms and workflows 261 | 262 | #### Scale-Related Issues 263 | - **GitHub Actions Limits**: Monitor parallel job usage against account quotas 264 | - **Resource Exhaustion**: Watch for runner capacity issues during peak usage 265 | - **Network Bottlenecks**: Monitor artifact transfer times for large matrices 266 | - **Storage Limitations**: Manage artifact retention and storage usage 267 | 268 | #### Legacy Issues 269 | - **Image Reference Failures**: Check SHA tag generation and image availability 270 | - **Workflow Chaining**: Verify `workflow_run` triggers are correctly configured 271 | - **Permission Issues**: Ensure all workflows have proper GITHUB_TOKEN permissions 272 | 273 | ### Debugging Strategies 274 | 275 | #### Platform-specific Debugging 276 | - **Single Platform Testing**: Temporarily disable one platform to isolate issues 277 | - **Selective Matrix Testing**: Use workflow_dispatch with reduced matrix for debugging 278 | - **Cache Isolation**: Clear platform-specific caches to eliminate cache-related issues 279 | 280 | #### Large-scale Debugging 281 | - **Staged Rollouts**: Test changes on smaller matrices before full deployment 282 | - **Parallel Limit Adjustment**: Reduce max-parallel settings during debugging 283 | - **Selective Re-runs**: Re-execute only failed combinations rather than entire workflows 284 | 285 | #### Tools and Techniques 286 | - **Workflow Dispatch**: Manual triggering with custom parameters for testing 287 | - **Debug Logging**: Enhanced logging for multi-platform build processes 288 | - **Artifact Inspection**: Download and analyze build artifacts for troubleshooting 289 | - **Performance Profiling**: Use GitHub Actions built-in timing and resource metrics 290 | -------------------------------------------------------------------------------- /.github/workflows/auto_merge.yml: -------------------------------------------------------------------------------- 1 | name: Automatically Approve / Merge PR 2 | on: 3 | pull_request: 4 | 5 | jobs: 6 | Auto_Approve: 7 | name: Auto Approve 8 | runs-on: ubuntu-latest 9 | if: github.actor == 'jim60105' && github.repository == 'jim60105/docker-whisperX' 10 | steps: 11 | - name: Auto approve 12 | uses: hmarr/auto-approve-action@v3 13 | with: 14 | github-token: "${{ secrets.GITHUB_TOKEN }}" 15 | 16 | Auto_Merge_PR: 17 | name: Auto Merge PR 18 | runs-on: ubuntu-latest 19 | needs: "Auto_Approve" 20 | steps: 21 | - name: Git Auto Merge 22 | uses: plm9606/automerge_actions@1.2.3 23 | with: 24 | # Use PAT to trigger another workflow 25 | github-token: ${{ secrets.CR_PAT }} 26 | merge-method: squash 27 | reviewers-number: 0 28 | label-name: "automerge" 29 | 30 | - name: Remove label 31 | if: ${{ success() }} 32 | uses: buildsville/add-remove-label@v1 33 | with: 34 | token: ${{secrets.GITHUB_TOKEN}} 35 | label: "automerge" 36 | type: remove 37 | -------------------------------------------------------------------------------- /.github/workflows/docker-reused-steps/action.yml: -------------------------------------------------------------------------------- 1 | name: Reusable docker workflow 2 | 3 | description: Reusable docker workflow. 4 | 5 | inputs: 6 | tag: 7 | description: "A tag to use for the image" 8 | default: "no_model" 9 | 10 | outputs: 11 | tags: 12 | description: "tags" 13 | value: ${{ steps.meta.outputs.tags }} 14 | labels: 15 | description: "labels" 16 | value: ${{ steps.meta.outputs.labels }} 17 | 18 | runs: 19 | using: composite 20 | steps: 21 | # We require additional space due to the large size of our image. (~10GB) 22 | - name: Free Disk Space (Ubuntu) 23 | uses: jlumbroso/free-disk-space@main 24 | with: 25 | tool-cache: true 26 | android: true 27 | dotnet: true 28 | haskell: true 29 | large-packages: true 30 | docker-images: true 31 | swap-storage: true 32 | 33 | - name: Docker meta:${{ inputs.tag }} 34 | id: meta 35 | uses: docker/metadata-action@v5 36 | with: 37 | images: ghcr.io/${{ github.repository_owner }}/whisperx 38 | tags: | 39 | ${{ inputs.tag }} 40 | type=sha,prefix=${{ inputs.tag }}- 41 | type=raw,value=latest,enable=${{ inputs.tag == 'no_model' }} 42 | 43 | - name: Set up QEMU 44 | uses: docker/setup-qemu-action@v3 45 | 46 | - name: Set up Docker Buildx 47 | uses: docker/setup-buildx-action@v3 48 | 49 | # You may need to manage write and read access of GitHub Actions for repositories in the container settings. 50 | - name: Login to GitHub Container Registry 51 | uses: docker/login-action@v3 52 | with: 53 | registry: ghcr.io 54 | username: ${{ github.repository_owner }} 55 | password: ${{ github.token }} 56 | -------------------------------------------------------------------------------- /.github/workflows/docker_publish.yml: -------------------------------------------------------------------------------- 1 | # Check this guide for more information about publishing to ghcr.io with GitHub Actions: 2 | # https://docs.github.com/en/packages/managing-github-packages-using-github-actions-workflows/publishing-and-installing-a-package-with-github-actions#upgrading-a-workflow-that-accesses-ghcrio 3 | 4 | # Main workflow trigger that initiates the Docker image build chain 5 | # This workflow has been refactored to trigger a sequence of specialized workflows: 6 | # 1. 01-build-base-images.yml - Builds ubi-no_model and no_model base images 7 | # 2. 02-build-model-cache.yml - Builds model cache images (6 models) 8 | # 3. 03-build-distil-en.yml + 04-build-matrix-images.yml - Parallel builds of final images 9 | name: docker_publish 10 | 11 | on: 12 | push: 13 | branches: 14 | - "master" 15 | tags: 16 | - "*" 17 | paths-ignore: 18 | - "*.md" 19 | 20 | # Allows you to run this workflow manually from the Actions tab 21 | workflow_dispatch: 22 | 23 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job. 24 | permissions: 25 | contents: read 26 | packages: write 27 | id-token: write 28 | attestations: write 29 | 30 | jobs: 31 | # Trigger workflow to initiate the build chain 32 | trigger-build-chain: 33 | runs-on: ubuntu-latest 34 | steps: 35 | - name: Checkout 36 | uses: actions/checkout@v4 37 | with: 38 | submodules: true 39 | 40 | - name: Workflow chain initialization 41 | run: | 42 | echo "=== Docker Build Chain Initiated ===" 43 | echo "Commit: ${{ github.sha }}" 44 | echo "Ref: ${{ github.ref }}" 45 | echo "Run number: ${{ github.run_number }}" 46 | echo "" 47 | echo "This workflow will trigger the following sequence:" 48 | echo "1. 01-build-base-images.yml - Base images (ubi-no_model, no_model)" 49 | echo "2. 02-build-model-cache.yml - Model cache images (6 models)" 50 | echo "3. 03-build-distil-en.yml - Distil English model (parallel)" 51 | echo "4. 04-build-matrix-images.yml - Full matrix build + tests (parallel)" 52 | echo "" 53 | echo "Total expected images: ~175+" 54 | echo "=== Build Chain Ready ===" 55 | -------------------------------------------------------------------------------- /.github/workflows/scan.yml: -------------------------------------------------------------------------------- 1 | name: scan 2 | 3 | on: 4 | workflow_run: 5 | workflows: ["01-build-base-images"] 6 | types: [completed] 7 | 8 | # Allows you to run this workflow manually from the Actions tab 9 | workflow_dispatch: 10 | 11 | jobs: 12 | scan: 13 | name: Scan Python official base image 14 | runs-on: ubuntu-latest 15 | if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }} 16 | steps: 17 | - name: Checkout 18 | uses: actions/checkout@v4 19 | with: 20 | sparse-checkout: | 21 | .github/workflows/scan/html.tpl 22 | sparse-checkout-cone-mode: false 23 | 24 | - name: Run Trivy vulnerability scanner for Python official image 25 | uses: aquasecurity/trivy-action@0.16.1 26 | with: 27 | image-ref: "ghcr.io/jim60105/whisperx:no_model" 28 | vuln-type: "os,library" 29 | scanners: vuln 30 | severity: "CRITICAL,HIGH" 31 | format: "template" 32 | template: "@.github/workflows/scan/html.tpl" 33 | exit-code: '1' 34 | ignore-unfixed: true 35 | output: "trivy-results.html" 36 | 37 | - name: Upload Artifact 38 | uses: actions/upload-artifact@v4 39 | if: always() 40 | with: 41 | name: trivy-results 42 | path: trivy-results.html 43 | retention-days: 90 44 | -------------------------------------------------------------------------------- /.github/workflows/scan/html.tpl: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | {{- if . }} 7 | 56 | {{- escapeXML ( index . 0 ).Target }} - Trivy Report - {{ now }} 57 | 84 | 85 | 86 |

{{- escapeXML ( index . 0 ).Target }} - Trivy Report - {{ now }}

87 | 88 | {{- range . }} 89 | 90 | {{- if (eq (len .Vulnerabilities) 0) }} 91 | 92 | {{- else }} 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | {{- range .Vulnerabilities }} 102 | 103 | 104 | 105 | 106 | 107 | 108 | 113 | 114 | {{- end }} 115 | {{- end }} 116 | {{- if (eq (len .Misconfigurations ) 0) }} 117 | 118 | {{- else }} 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | {{- range .Misconfigurations }} 127 | 128 | 129 | 130 | 131 | 132 | 138 | 139 | {{- end }} 140 | {{- end }} 141 | {{- end }} 142 |
{{ .Type | toString | escapeXML }}
No Vulnerabilities found
PackageVulnerability IDSeverityInstalled VersionFixed VersionLinks
{{ escapeXML .PkgName }}{{ escapeXML .VulnerabilityID }}{{ escapeXML .Vulnerability.Severity }}{{ escapeXML .InstalledVersion }}{{ escapeXML .FixedVersion }}
No Misconfigurations found
TypeMisconf IDCheckSeverityMessage
{{ escapeXML .Type }}{{ escapeXML .ID }}{{ escapeXML .Title }}{{ escapeXML .Severity }}
143 | {{- else }} 144 | 145 | 146 |

Trivy Returned Empty Report

147 | {{- end }} 148 | 149 | 150 | -------------------------------------------------------------------------------- /.github/workflows/scan_ubi.yml: -------------------------------------------------------------------------------- 1 | name: scan 2 | 3 | on: 4 | workflow_run: 5 | workflows: ["01-build-base-images"] 6 | types: [completed] 7 | 8 | # Allows you to run this workflow manually from the Actions tab 9 | workflow_dispatch: 10 | 11 | jobs: 12 | scan-ubi: 13 | name: Scan Red Hat UBI base image 14 | runs-on: ubuntu-latest 15 | if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }} 16 | steps: 17 | - name: Checkout 18 | uses: actions/checkout@v4 19 | with: 20 | sparse-checkout: | 21 | .github/workflows/scan/html.tpl 22 | sparse-checkout-cone-mode: false 23 | 24 | - name: Run Trivy vulnerability scanner for UBI image 25 | uses: aquasecurity/trivy-action@0.16.1 26 | with: 27 | image-ref: "ghcr.io/jim60105/whisperx:ubi-no_model" 28 | vuln-type: "os,library" 29 | scanners: vuln 30 | severity: "CRITICAL,HIGH" 31 | format: "template" 32 | template: "@.github/workflows/scan/html.tpl" 33 | ignore-unfixed: true 34 | output: "trivy-results-ubi.html" 35 | 36 | - name: Upload Artifact 37 | uses: actions/upload-artifact@v4 38 | with: 39 | name: trivy-results-ubi 40 | path: trivy-results-ubi.html 41 | retention-days: 90 42 | 43 | - name: Run Trivy vulnerability scanner for UBI image (SARIF) 44 | uses: aquasecurity/trivy-action@master 45 | if: always() 46 | with: 47 | image-ref: "ghcr.io/jim60105/whisperx:ubi-no_model" 48 | vuln-type: "os,library" 49 | scanners: vuln 50 | severity: "CRITICAL,HIGH" 51 | format: 'sarif' 52 | exit-code: '1' 53 | ignore-unfixed: true 54 | output: 'trivy-results.sarif' 55 | 56 | - name: Upload Trivy scan results to GitHub Security tab 57 | uses: github/codeql-action/upload-sarif@v3 58 | if: always() 59 | with: 60 | sarif_file: 'trivy-results.sarif' 61 | -------------------------------------------------------------------------------- /.github/workflows/submodule_update.yml: -------------------------------------------------------------------------------- 1 | name: Submodule Updates 2 | 3 | on: 4 | schedule: 5 | - cron: "0 0 * * 0" 6 | workflow_dispatch: 7 | 8 | jobs: 9 | update_submodules: 10 | name: Submodule update 11 | runs-on: ubuntu-latest 12 | env: 13 | PARENT_REPOSITORY: ${{ github.repository_owner }}/docker-whisperX 14 | CHECKOUT_BRANCH: master 15 | PR_AGAINST_BRANCH: master 16 | OWNER: ${{ github.repository_owner }} 17 | 18 | steps: 19 | - name: Checkout Code 20 | uses: actions/checkout@v3 21 | 22 | - name: Update Submodules 23 | uses: releasehub-com/github-action-create-pr-parent-submodule@v1 24 | continue-on-error: true 25 | with: 26 | github_token: ${{ secrets.CR_PAT }} 27 | parent_repository: ${{ env.PARENT_REPOSITORY }} 28 | checkout_branch: ${{ env.CHECKOUT_BRANCH}} 29 | pr_against_branch: ${{ env.PR_AGAINST_BRANCH }} 30 | owner: ${{ env.OWNER }} 31 | label: "automerge" 32 | -------------------------------------------------------------------------------- /.github/workflows/test/en.webm: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jim60105/docker-whisperX/9245177c906bb94f40f1015629d62b6496dce317/.github/workflows/test/en.webm -------------------------------------------------------------------------------- /.github/workflows/test/zh.webm: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jim60105/docker-whisperX/9245177c906bb94f40f1015629d62b6496dce317/.github/workflows/test/zh.webm -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.env 2 | cache 3 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "whisperX"] 2 | path = whisperX 3 | url = https://github.com/jim60105/whisperX 4 | -------------------------------------------------------------------------------- /.hadolint.yml: -------------------------------------------------------------------------------- 1 | ignored: 2 | - DL3041 # Specify version with `dnf install -y -`. 3 | - DL3042 # Avoid use of cache directory with pip. Use `pip install --no-cache-dir ` 4 | - DL4006 # Set the SHELL option -o pipefail before RUN with a pipe in it 5 | - DL3013 # Pin versions in pip. Instead of `pip install ` use `pip install ==` 6 | - SC2015 # Note that A && B || C is not if-then-else. C may run when A is true. 7 | - DL3006 # Always tag the version of an image explicitly 8 | - DL3008 # Pin versions in apt get install. Instead of `apt-get install ` use `apt-get install =` 9 | - DL3040 # `dnf clean all` missing after dnf command. 10 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "cSpell.words": [ 3 | "anuragshas", 4 | "aptlists", 5 | "bokmaal", 6 | "buildx", 7 | "catala", 8 | "comodoro", 9 | "CUDA", 10 | "Diarization", 11 | "distil", 12 | "ffprobe", 13 | "findutils", 14 | "ftspeech", 15 | "huggingface", 16 | "imvladikon", 17 | "jonatasgrosman", 18 | "kingabzpro", 19 | "kresnik", 20 | "libgomp", 21 | "libsndfile", 22 | "microdnf", 23 | "mpoyraz", 24 | "nguyenvulebinh", 25 | "nodocs", 26 | "noplugins", 27 | "numpy", 28 | "pipefail", 29 | "pyannote", 30 | "pycache", 31 | "pyproject", 32 | "PYTHONIOENCODING", 33 | "PYTHONPATH", 34 | "PYTHONUNBUFFERED", 35 | "saattrupdan", 36 | "setopt", 37 | "setuptools", 38 | "Silero", 39 | "softcatala", 40 | "submuodules", 41 | "theainerd", 42 | "tmpfs", 43 | "torchaudio", 44 | "tsflags", 45 | "venv", 46 | "VOXPOPULI", 47 | "whisperx", 48 | "xlsr", 49 | "Yehor" 50 | ], 51 | "[python]": { 52 | "editor.defaultFormatter": "ms-python.black-formatter" 53 | }, 54 | "python.formatting.provider": "none" 55 | } 56 | -------------------------------------------------------------------------------- /AGENTS.md: -------------------------------------------------------------------------------- 1 | # GitHub Copilot Instructions for docker-whisperX 2 | 3 | * **Response Language:** `zh-TW 正體中文` 4 | 5 | # Key Directives: 6 | 7 | * Maintain the highest standard of quality in all deliverables by following best practices. 8 | * All code comments and documentation must be written in **English** as per project conventions. 9 | * Proactively consult both core documentation and conversation history to ensure accurate comprehension of all requirements. 10 | * You are neither able to execute `docker`, use `podman` instead. 11 | * When doing Git commit, use the conventional commit format for the title and a brief description in the body. Always commit with `--signoff` and explicitly specify the author on the command: `GitHub Copilot `. Write the commit in English. 12 | 13 | --- 14 | 15 | # Project DevOps 16 | 17 | This project uses GitHub for DevOps management. 18 | 19 | Use `gh` CLI commands to perform DevOps tasks. 20 | 21 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!*** 22 | 23 | * **GitHub repo**: https://github.com/jim60105/standalone-civitai-shortcut 24 | 25 | * **Backlog & Bugs**: All backlogs and bugs must be managed on GitHub Issues. 26 | 27 | * Each issue represents a specific backlog plan / bug reports / enhancement requests. 28 | * Contains implementation or bug-fix guides from project foundation to deployment 29 | * Each issue(backlogs) includes complete technical design and implementation details 30 | * Each issue(bugs) includes problem description, reproduction steps, and proposed solutions 31 | * Serves as task queue for ongoing maintenance and improvements 32 | 33 | ## DevOps Flow 34 | 35 | ### Planning Stage 36 | 37 | **If we are at planning stage you shouldn't start to implement anything!** 38 | **Planning Stage is to create a detailed development plan and create issue on GitHub using `gh issue create`** 39 | 40 | 1. **Issue Creation**: Use `gh issue create --title "Issue Title" --body "Issue Description"` to create a new issue for each backlog item or bug report. Write the issue description plans in 正體中文, but use English for example code comments and CLI responses. The plan should be very detailed (try your best!). Please write that enables anyone to complete the work successfully. 41 | 2. **Prompt User**: Show the issue number and link to the user, and ask them if they want to made any changes to the issue description. If they do, you can edit the issue description using `gh issue edit [number] --body "New Description"`. 42 | 43 | ### Implementation Stage 44 | 45 | **Only start to implement stage when user prompt you to do so!** 46 | **Implementation Stage is to implement the plan step by step, following the instructions provided in the issue and submit a work report PR at last** 47 | 48 | 1. **Check Current Situation**: Run `git status` to check the current status of the Git repository to ensure you are aware of any uncommitted changes or issues before proceeding with any operations. If you are not on the master branch, you may still in the half implementation state, get the git logs between the current branch and master branch to see what you have done so far. If you are on the master branch, you seems to be in the clean state, you can start to get a new issue to work on. 49 | 2. **Get Issue Lists**: Use `gh issue list` to get the list of issues to see all backlogs and bugs. Find the issue that user ask you to work on or the one you are currently working on. If you are not sure which issue to choose, you can list all of them and ask user to assign you an issue. 50 | 3. **Get Issue Details**: Use `gh issue view [number]` to get the details of the issue to understand the requirements and implementation plan. Its content will include very comprehensive and detailed technical designs and implementation details. Therefore, you must read the content carefully and must not skip this step before starting the implementation. 51 | 4. **Get Issue Comments**: Use `gh issue view [number] --comments` to read the comments in the issue to understand the context and any additional requirements or discussions that have taken place. Please read it to determine whether this issue has been completed, whether further implementation is needed, or if there are still problems that need to be fixed. This step must not be skipped before starting implementation. 52 | 5. **Get Pull Requests**: Use `gh pr list`, `gh pr view [number]`, and `gh pr view [number] --comments` to list the existing pull requests and details to check if there are any related to the issue you are working on. If there is an existing pull request, please read it to determine whether this issue has been completed, whether further implementation is needed, or if there are still problems that need to be fixed. This step must not be skipped before starting implementation. 53 | 6. **Git Checkout**: Run `git checkout -b [branch-name]` to checkout the issue branch to start working on the code changes. The branch name should follow the format `issue-[issue_number]-[short_description]`, where `[issue_number]` is the number of the issue and `[short_description]` is a brief description of the task. Skip this step if you are already on the correct branch. 54 | 7. **Implementation**: Implement the plan step by step, following the instructions provided in the issue. Each step should be executed in sequence, ensuring that all requirements are met and documented appropriately. 55 | 8. **Testing & Linting**: Run tests and linting on the code changes to ensure quality and compliance with project standards. 56 | 9. **Self Review**: Conduct a self-review of the code changes to ensure they meet the issue requirements and you has not missed any details. 57 | 10. **Git Commit & Git Push**: Run `git commit` using the conventional commit format for the title and a brief description in the body. Always commit with `--signoff` and explicitly specify the author on the command: `Codex-CLI `. Write the commit in English. Link the issue number in the commit message body. Run `git push` to push the changes to the remote repository. 58 | 11. **Create Pull Request**: Use `gh pr list` and `gh pr create` commands. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. Create a pull request if there isn't already has one related to your issue using `gh pr create --title "PR Title" --body "PR Description"`. Create a comprehensive work report and use it as pull request details, detailing the work performed, code changes, and test results for the project. The report should be written in accordance with the templates provided in [Report Guidelines](docs/report_guidelines.md) and [REPORT_TEMPLATE](docs/REPORT_TEMPLATE.md). Follow the template exactly. Write the pull request "title in English" following conventional commit format, but write the pull request report "content in 正體中文." Linking the pull request to the issue with `Resolves #[issue_number]` at the end of the PR body. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR to `upstream`. 59 | 60 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!*** 61 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!*** 62 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!*** 63 | 64 | --- 65 | 66 | ## Project Overview 67 | 68 | This project provides a **Docker containerization** for [WhisperX](https://github.com/m-bain/whisperX). 69 | 70 | The project focuses on **continuous integration optimization** for building 175+ Docker images (10GB each) weekly on GitHub Free runners, emphasizing efficient docker layer caching, parallel builds, and minimal image sizes. 71 | 72 | The focus of this project is on the Dockerfile and CI workflow, not on the WhisperX project itself. 73 | 74 | ## Project Structure 75 | 76 | ``` 77 | docker-whisperX/ 78 | ├── Dockerfile # Main Docker build configuration (For docker compatibility) 79 | ├── ubi.Dockerfile # Red Hat UBI-based alternative (For podman compatibility) 80 | ├── docker-bake.hcl # Docker Buildx bake configuration for matrix builds 81 | ├── load_align_model.py # Preloads alignment models for supported languages 82 | ├── whisperX/ # Git submodule containing WhisperX source code 83 | │ ├── pyproject.toml # Python package configuration 84 | │ └── whisperx/ # Main WhisperX Python package 85 | └── .github/ 86 | └── workflows/ # CI/CD pipeline configurations 87 | ``` 88 | 89 | ## Coding Standards and Conventions 90 | 91 | ### Docker Best Practices 92 | - Use **multi-stage builds** to minimize final image size 93 | - Leverage **BuildKit features** like `--mount=type=cache` for dependency caching 94 | - Apply **layer caching strategies** to optimize CI build times 95 | - Use **ARG** variables for build-time configuration (WHISPER_MODEL, LANG, etc.) 96 | - Follow **security best practices**: run as non-root user, minimize installed packages 97 | - Do not use `--link` in ubi.Dockerfile, as it is not supported by Podman. 98 | - Do not use `,z` or `,Z` in Dockerfile, as it is not supported by Docker buildx. 99 | 100 | ### Documentation Standards 101 | - Write documentation in English for user-facing content 102 | - Use **English** for technical comments in code and commit messages 103 | - Include **clear examples** in README files showing actual usage commands 104 | - Document **build arguments** and their acceptable values 105 | - Provide **troubleshooting guidance** for common issues 106 | 107 | ## Key Technologies and Dependencies 108 | 109 | ### Build Tools 110 | - **uv**: Modern Python package manager for dependency resolution (Used in Dockerfile) 111 | - **Docker Buildx**: Extended build capabilities with bake support 112 | - **GitHub Actions**: CI/CD automation for multi-architecture builds 113 | 114 | ## Development Guidelines 115 | 116 | ### When Working with Docker Configuration 117 | - **Dockerfile modifications**: Always test both `amd64` and `arm64` architectures 118 | - **Build arguments**: Validate that ARG values match supported languages in `load_align_model.py` 119 | - **Cache optimization**: Consider layer ordering impact on CI build performance 120 | - **Multi-stage builds**: Ensure each stage serves a clear purpose (build → no_model → load_whisper) 121 | 122 | ### When Working with CI/CD 123 | - **Parallel builds**: Consider the large amount of build matrix impact on GitHub runner resources 124 | - **Caching strategy**: Optimize for both build time and cache storage efficiency 125 | - **Multi-architecture**: Ensure changes work correctly on both x86_64 and arm64 126 | 127 | ## Project-Specific Conventions 128 | 129 | ## Additional Notes for Contributors 130 | 131 | When suggesting changes, always consider the impact on: 132 | 1. **Build time efficiency** for the CI pipeline 133 | 2. **Multi-architecture compatibility** (amd64/arm64) 134 | 135 | --- 136 | 137 | When contributing to this codebase, adhere strictly to these directives to ensure consistency with the existing architectural conventions and stylistic norms. 138 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | # syntax=docker/dockerfile:1 2 | ARG WHISPER_MODEL=base 3 | ARG LANG=en 4 | ARG UID=1001 5 | ARG VERSION=EDGE 6 | ARG RELEASE=0 7 | 8 | # These ARGs are for caching stage builds in CI 9 | # Leave them as is when building locally 10 | ARG LOAD_WHISPER_STAGE=load_whisper 11 | ARG NO_MODEL_STAGE=no_model 12 | 13 | # When downloading diarization model with auth token, it seems that it is not respecting the TORCH_HOME env variable. 14 | # So it is necessary to ensure that the CACHE_HOME is set to the exact same path as the default path. 15 | # https://github.com/jim60105/docker-whisperX/issues/27 16 | ARG CACHE_HOME=/.cache 17 | ARG CONFIG_HOME=/.config 18 | ARG TORCH_HOME=${CACHE_HOME}/torch 19 | ARG HF_HOME=${CACHE_HOME}/huggingface 20 | 21 | ######################################## 22 | # Base stage for amd64 23 | ######################################## 24 | FROM docker.io/library/python:3.11-slim-bullseye AS prepare_base_amd64 25 | 26 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892 27 | ARG TARGETARCH 28 | ARG TARGETVARIANT 29 | 30 | WORKDIR /tmp 31 | 32 | ENV NVIDIA_VISIBLE_DEVICES=all 33 | ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility 34 | 35 | ######################################## 36 | # Base stage for arm64 37 | ######################################## 38 | FROM docker.io/library/python:3.11-slim-bullseye AS prepare_base_arm64 39 | 40 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892 41 | ARG TARGETARCH 42 | ARG TARGETVARIANT 43 | 44 | WORKDIR /tmp 45 | 46 | # Missing dependencies for arm64 (needed for build-time and run-time) 47 | # https://github.com/jim60105/docker-whisperX/issues/14 48 | RUN --mount=type=cache,id=apt-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/cache/apt \ 49 | --mount=type=cache,id=aptlists-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/lib/apt/lists \ 50 | apt-get update && apt-get install -y --no-install-recommends \ 51 | libgomp1 libsndfile1 52 | 53 | # Select the base stage by target architecture 54 | FROM prepare_base_$TARGETARCH$TARGETVARIANT AS base 55 | 56 | ######################################## 57 | # Build stage 58 | ######################################## 59 | FROM base AS build 60 | 61 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892 62 | ARG TARGETARCH 63 | ARG TARGETVARIANT 64 | 65 | WORKDIR /app 66 | 67 | # Install uv 68 | COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ 69 | 70 | ENV UV_PROJECT_ENVIRONMENT=/venv 71 | ENV VIRTUAL_ENV=/venv 72 | ENV UV_LINK_MODE=copy 73 | ENV UV_PYTHON_DOWNLOADS=0 74 | 75 | # Install big dependencies separately for layer caching 76 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \ 77 | uv venv --system-site-packages /venv && \ 78 | uv pip install --no-deps --index "https://download.pytorch.org/whl/cu128" \ 79 | "torch==2.7.1+cu128" \ 80 | "torchaudio" \ 81 | "triton" \ 82 | "pyannote.audio==3.3.2" 83 | 84 | # Install whisperX dependencies 85 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \ 86 | --mount=type=bind,source=whisperX/pyproject.toml,target=pyproject.toml \ 87 | --mount=type=bind,source=whisperX/uv.lock,target=uv.lock \ 88 | uv sync --frozen --no-dev --no-install-project --no-editable 89 | 90 | # Install whisperX project 91 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \ 92 | --mount=source=whisperX,target=.,rw \ 93 | uv sync --frozen --no-dev --no-editable 94 | 95 | ######################################## 96 | # Final stage for no_model 97 | ######################################## 98 | FROM base AS no_model 99 | 100 | # We don't need them anymore 101 | RUN pip3.11 uninstall -y pip wheel && \ 102 | rm -rf /root/.cache/pip 103 | 104 | # Create user 105 | ARG UID 106 | RUN groupadd -g $UID $UID && \ 107 | useradd -l -u $UID -g $UID -m -s /bin/sh -N $UID 108 | 109 | ARG CACHE_HOME 110 | ARG CONFIG_HOME 111 | ARG TORCH_HOME 112 | ARG HF_HOME 113 | ENV XDG_CACHE_HOME=${CACHE_HOME} 114 | ENV TORCH_HOME=${TORCH_HOME} 115 | ENV HF_HOME=${HF_HOME} 116 | 117 | RUN install -d -m 775 -o $UID -g 0 /licenses && \ 118 | install -d -m 775 -o $UID -g 0 /root && \ 119 | install -d -m 775 -o $UID -g 0 ${CACHE_HOME} && \ 120 | install -d -m 775 -o $UID -g 0 ${CONFIG_HOME} && \ 121 | install -d -m 775 -o $UID -g 0 /nltk_data 122 | 123 | # ffmpeg 124 | COPY --link --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /ffmpeg /usr/local/bin/ 125 | # COPY --link --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /ffprobe /usr/local/bin/ 126 | 127 | # dumb-init 128 | COPY --link --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /dumb-init /usr/local/bin/ 129 | 130 | # Copy licenses (OpenShift Policy) 131 | COPY --link --chown=$UID:0 --chmod=775 LICENSE /licenses/LICENSE 132 | COPY --link --chown=$UID:0 --chmod=775 whisperX/LICENSE /licenses/whisperX.LICENSE 133 | 134 | # Copy dependencies and code (and support arbitrary uid for OpenShift best practice) 135 | # https://docs.openshift.com/container-platform/4.14/openshift_images/create-images.html#use-uid_create-images 136 | COPY --link --chown=$UID:0 --chmod=775 --from=build /venv /venv 137 | 138 | ENV PATH="/venv/bin${PATH:+:${PATH}}" 139 | ENV PYTHONPATH="/venv/lib/python3.11/site-packages" 140 | ENV LD_LIBRARY_PATH="/venv/lib/python3.11/site-packages/nvidia/cudnn/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" 141 | 142 | # Test whisperX 143 | RUN python3 -c 'import whisperx;' && \ 144 | whisperx -h 145 | 146 | WORKDIR /app 147 | 148 | VOLUME [ "/app" ] 149 | 150 | USER $UID 151 | 152 | STOPSIGNAL SIGINT 153 | 154 | ENTRYPOINT [ "dumb-init", "--", "/bin/sh", "-c", "whisperx \"$@\"" ] 155 | 156 | ARG VERSION 157 | ARG RELEASE 158 | LABEL name="jim60105/docker-whisperX" \ 159 | # Authors for WhisperX 160 | vendor="Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew" \ 161 | # Maintainer for this docker image 162 | maintainer="jim60105" \ 163 | # Dockerfile source repository 164 | url="https://github.com/jim60105/docker-whisperX" \ 165 | version=${VERSION} \ 166 | # This should be a number, incremented with each change 167 | release=${RELEASE} \ 168 | io.k8s.display-name="WhisperX" \ 169 | summary="WhisperX: Time-Accurate Speech Transcription of Long-Form Audio" \ 170 | description="This is the docker image for WhisperX: Automatic Speech Recognition with Word-Level Timestamps (and Speaker Diarization) from the community. For more information about this tool, please visit the following website: https://github.com/m-bain/whisperX." 171 | 172 | ######################################## 173 | # load_whisper stage 174 | # This stage will be tagged for caching in CI. 175 | ######################################## 176 | FROM ${NO_MODEL_STAGE} AS load_whisper 177 | 178 | ARG CONFIG_HOME 179 | ARG XDG_CONFIG_HOME=${CONFIG_HOME} 180 | ARG HOME="/root" 181 | 182 | # Preload Silero vad model 183 | RUN python3 < 21 | 22 | 23 | ### Linux, OSX 24 | 25 | Install an NVIDIA GPU Driver if you do not already have one installed. 26 | 27 | 28 | Install the NVIDIA Container Toolkit with this guide. 29 | 30 | 31 | > [!TIP] 32 | > I have a Chinese blog about this topic: 33 | > [Podman GPU Configuration Notes for Fedora/RHEL](https://xn--jgy.tw/Container/configuring-gpu-in-linux-podman/) 34 | 35 | ## 📦 Available Pre-built Image 36 | 37 | ![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/jim60105/docker-whisperX/04-build-matrix-images.yml?label=Docker%20Build) ![GitHub last commit (branch)](https://img.shields.io/github/last-commit/jim60105/docker-whisperX/master?label=Date) 38 | 39 | > [!NOTE] 40 | > The WhisperX code base in these images aligns with the git submodule commit hash. 41 | > I have [a scheduled CI workflow](https://github.com/jim60105/docker-whisperX/actions/workflows/submodule_update.yml) runs weekly to target on [the main branch](https://github.com/m-bain/whisperX/tree/main) and rebuild all docker images. 42 | 43 | ```bash 44 | docker run --gpus all -it -v ".:/app" ghcr.io/jim60105/whisperx:base-en -- --output_format srt audio.mp3 45 | docker run --gpus all -it -v ".:/app" ghcr.io/jim60105/whisperx:large-v3-ja -- --output_format srt audio.mp3 46 | docker run --gpus all -it -v ".:/app" ghcr.io/jim60105/whisperx:no_model -- --model tiny --language en --output_format srt audio.mp3 47 | ``` 48 | 49 | The image tags are formatted as `WHISPER_MODEL`-`LANG`, for example, `tiny-en`, `base-de` or `large-v3-zh`. 50 | Please be aware that the whisper models `*.en`, `large-v1`, `large-v2` have been excluded as I believe they are not frequently used. If you require these models, please refer to the following section to build them on your own. 51 | 52 | You can find the actual build matrix in [04-build-matrix-images.yml](.github/workflows/04-build-matrix-images.yml) and all available tags at [ghcr.io](https://github.com/jim60105/docker-whisperX/pkgs/container/whisperx/versions?filters%5Bversion_type%5D=tagged). 53 | 54 | In addition, there is also a `no_model` tag that does not include any pre-downloaded models, also referred to as `latest`. 55 | 56 | > Added a `distil-large-v3-en` model. 57 | > Only en, distil model seems to only support English. 58 | 59 | ## ⚡️ Preserve the download cache for the align models when working with various languages 60 | 61 | You can mount the `/.cache` to share align models between containers. 62 | Please use tag `no_model` (`latest`) for this scenario. 63 | 64 | ```bash 65 | docker run --gpus all -it -v ".:/app" -v whisper_cache:/.cache ghcr.io/jim60105/whisperx:latest -- --model large-v3 --language en --output_format srt audio.mp3 66 | ``` 67 | 68 | ## 🛠️ Building the Docker Image 69 | 70 | > [!IMPORTANT] 71 | > Clone the Git repository recursively to include submodules: 72 | > `git clone --recursive https://github.com/jim60105/docker-whisperX.git` 73 | 74 | ### Build Arguments 75 | 76 | The [Dockerfile](Dockerfile) builds the image contained models. It accepts two build arguments: `LANG` and `WHISPER_MODEL`. 77 | 78 | - `LANG`: The language to transcribe. The default is `en`. See [supported languages in load_align_model.py](https://github.com/jim60105/docker-whisperX/blob/master/load_align_model.py). 79 | - `WHISPER_MODEL`: The model name. The default is `base`. See [fast-whisper](https://huggingface.co/Systran) for supported models. 80 | 81 | In case of multiple language alignments needed, use space separated list of languages `"LANG=pl fr en"` when building the image. Also note that WhisperX is not doing well to handle multiple languages within the same audio file. Even if you do not provide the language parameter, it will still recognize the language (or fallback to en) and use it for choosing the alignment model. Alignment models are language specific. **This instruction is simply for embedding multiple alignment models into a docker image.** 82 | 83 | ### Build Command 84 | 85 | > [!NOTE] 86 | > If you are using an earlier version of the docker client, it is necessary to [enable the BuildKit mode](https://docs.docker.com/build/buildkit/#getting-started) when building the image. This is because I used the `COPY --link` feature which enhances the build performance and was introduced in Buildx v0.8. 87 | > With the Docker Engine 23.0 and Docker Desktop 4.19, Buildx has become the default build client. So you won't have to worry about this when using the latest version. 88 | 89 | For example, if you want to build the image with `en` language and `large-v3` model: 90 | 91 | ```bash 92 | docker build --build-arg LANG=en --build-arg WHISPER_MODEL=large-v3 -t whisperx:large-v3-en . 93 | ``` 94 | 95 | If you want to build the image without any pre-downloaded models: 96 | 97 | ```bash 98 | docker build --target no_model -t whisperx:no_model . 99 | ``` 100 | 101 | If you want to build all images at once, we have [a Docker bake file](docker-bake.hcl) available: 102 | 103 | > [!WARNING] 104 | > [Bake](https://docs.docker.com/build/bake/) is currently an experimental feature, and it may require additional configuration in order to function correctly. 105 | 106 | ```bash 107 | docker buildx bake build no_model ubi-no_model 108 | ``` 109 | 110 | ### Usage Command 111 | 112 | Mount the current directory as `/app` and run WhisperX with additional input arguments: 113 | 114 | ```bash 115 | docker run --gpus all -it -v ".:/app" whisperx:large-v3-ja -- --output_format srt audio.mp3 116 | ``` 117 | 118 | > [!NOTE] 119 | > Remember to prepend `--` before the arguments. 120 | > `--model` and `--language` args are defined in Dockerfile, no need to specify. 121 | 122 | ## ⛑️ Red Hat UBI based Image 123 | 124 | ![Docker Build](https://img.shields.io/github/actions/workflow/status/jim60105/docker-whisperX/01-build-base-images.yml?label=Docker%20Build) 125 | 126 | I have created an alternative [ubi.Dockerfile](ubi.Dockerfile) that is based on the **Red Hat Universal Base Image (UBI)** image, unlike the default one which used the **Python official image** as the base image. If you are a Red Hat subscriber, I believe you will find its benefits. 127 | 128 | > [!TIP] 129 | > With the release of the Red Hat Universal Base Image (UBI), you can now take advantage of the greater reliability, security, and performance of official Red Hat container images where OCI-compliant Linux containers run - whether you're a customer or not. -- [Red Hat blog](https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image) 130 | 131 | It is important to mention that it is *NOT* necessary obtaining a license from Red Hat to use UBI, however, if you are the subscriber and runs it on RHEL/OpenShift, you may get supports from Red Hat. 132 | 133 | Despite my initial hesitation, I made the decision not to utilize the *UBI* version as the default image. The *Python official image* has a significantly larger user base compared to *UBI*, and I believe that opting for it aligns better with public expectations. Nevertheless, I would still suggest giving the *UBI* version a try. 134 | 135 | Please refer to [the latest vulnerability scan report](https://github.com/jim60105/docker-whisperX/actions/workflows/scan.yml?query=is%3Asuccess) from our scanning workflow artifact. You can see that the *UBI* version has fewer vulnerabilities compared to the *Python official image* version. 136 | 137 | You can get the pre-built image at tag `ubi-no_model`. Notice that only `no_model` is available. Feel free to build your own image with the [ubi.Dockerfile](ubi.Dockerfile) for your needs. This Dockerfile supports the same build arguments as the default one. 138 | 139 | ```bash 140 | docker run --gpus all -it -v ".:/app" ghcr.io/jim60105/whisperx:ubi-no_model -- --model tiny --language en --output_format srt audio.mp3 141 | ``` 142 | 143 | > [!WARNING] 144 | > ***DISCLAIMER***: 145 | > I have created the image in accordance with the specifications outlined in the [Red Hat Container Certification Requirement](https://access.redhat.com/documentation/en-us/red_hat_software_certification/8.72/html/red_hat_openshift_software_certification_policy_guide/assembly-requirements-for-container-images_openshift-sw-cert-policy-introduction) but I am not going to pursue the actual [certification](https://connect.redhat.com/en/partner-with-us/red-hat-container-certification). 146 | 147 | ## 📝 LICENSE 148 | 149 | > The main program, WhisperX, is distributed under [the BSD-4 license](https://github.com/m-bain/whisperX/blob/main/LICENSE). 150 | > Please consult their repository for access to the source code and license. 151 | 152 | The Dockerfile and CI workflow files in this repository are licensed under [the MIT license](LICENSE). 153 | 154 | ## 🌟 Star History 155 | 156 | 157 | 158 | 159 | 160 | Star History Chart 161 | 162 | 163 | -------------------------------------------------------------------------------- /docker-bake.hcl: -------------------------------------------------------------------------------- 1 | group "default" { 2 | targets = ["no_model", "ubi-no_model", "build"] 3 | } 4 | 5 | variable "WHISPER_MODEL" { 6 | default = "base" 7 | } 8 | 9 | variable "LANG" { 10 | default = "en" 11 | } 12 | 13 | target "build" { 14 | matrix = { 15 | "WHISPER_MODEL" = [ 16 | "tiny", 17 | "base", 18 | "small", 19 | "medium", 20 | "large-v3", 21 | "distil-large-v3" 22 | ] 23 | "LANG" = [ 24 | "en", 25 | "fr", 26 | "de", 27 | "es", 28 | "it", 29 | "ja", 30 | "zh", 31 | "nl", 32 | "uk", 33 | "pt", 34 | "ar", 35 | "cs", 36 | "ru", 37 | "pl", 38 | "hu", 39 | "fi", 40 | "fa", 41 | "el", 42 | "tr", 43 | "da", 44 | "he", 45 | "vi", 46 | "ko", 47 | "ur", 48 | "te", 49 | "hi", 50 | "ca", 51 | "ml", 52 | "no", 53 | "nn", 54 | "sk", 55 | "sl", 56 | "hr", 57 | "ro", 58 | "eu", 59 | "gl", 60 | "ka", 61 | "lv", 62 | "tl", 63 | ] 64 | } 65 | 66 | args = { 67 | WHISPER_MODEL = "${WHISPER_MODEL}" 68 | LANG = "${LANG}" 69 | } 70 | 71 | name = "whisperx-${WHISPER_MODEL}-${LANG}" 72 | dockerfile = "Dockerfile" 73 | tags = [ 74 | "ghcr.io/jim60105/whisperx:${WHISPER_MODEL}-${LANG}" 75 | ] 76 | platforms = ["linux/amd64", "linux/arm64"] 77 | cache-from = ["type=local,mode=max,src=cache"] 78 | cache-to = ["type=local,mode=max,dest=cache"] 79 | } 80 | 81 | target "no_model" { 82 | dockerfile = "Dockerfile" 83 | target = "no_model" 84 | tags = [ 85 | "ghcr.io/jim60105/whisperx:latest", 86 | "ghcr.io/jim60105/whisperx:no_model" 87 | ] 88 | platforms = ["linux/amd64", "linux/arm64"] 89 | cache-from = ["type=local,mode=max,src=cache"] 90 | cache-to = ["type=local,mode=max,dest=cache"] 91 | } 92 | 93 | target "ubi-no_model" { 94 | dockerfile = "ubi.Dockerfile" 95 | target = "no_model" 96 | tags = [ 97 | "ghcr.io/jim60105/whisperx:ubi-no_model" 98 | ] 99 | platforms = ["linux/amd64", "linux/arm64"] 100 | cache-from = ["type=local,mode=max,src=cache"] 101 | cache-to = ["type=local,mode=max,dest=cache"] 102 | } 103 | -------------------------------------------------------------------------------- /load_align_model.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import torchaudio 3 | from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor 4 | 5 | lang = sys.argv[1] 6 | 7 | # https://github.com/m-bain/whisperX/blob/v3.1.1/whisperx/alignment.py#L21 8 | DEFAULT_ALIGN_MODELS_TORCH = { 9 | "en": "WAV2VEC2_ASR_BASE_960H", 10 | "fr": "VOXPOPULI_ASR_BASE_10K_FR", 11 | "de": "VOXPOPULI_ASR_BASE_10K_DE", 12 | "es": "VOXPOPULI_ASR_BASE_10K_ES", 13 | "it": "VOXPOPULI_ASR_BASE_10K_IT", 14 | } 15 | 16 | DEFAULT_ALIGN_MODELS_HF = { 17 | "ja": "jonatasgrosman/wav2vec2-large-xlsr-53-japanese", 18 | "zh": "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn", 19 | "nl": "jonatasgrosman/wav2vec2-large-xlsr-53-dutch", 20 | "uk": "Yehor/wav2vec2-xls-r-300m-uk-with-small-lm", 21 | "pt": "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese", 22 | "ar": "jonatasgrosman/wav2vec2-large-xlsr-53-arabic", 23 | "cs": "comodoro/wav2vec2-xls-r-300m-cs-250", 24 | "ru": "jonatasgrosman/wav2vec2-large-xlsr-53-russian", 25 | "pl": "jonatasgrosman/wav2vec2-large-xlsr-53-polish", 26 | "hu": "jonatasgrosman/wav2vec2-large-xlsr-53-hungarian", 27 | "fi": "jonatasgrosman/wav2vec2-large-xlsr-53-finnish", 28 | "fa": "jonatasgrosman/wav2vec2-large-xlsr-53-persian", 29 | "el": "jonatasgrosman/wav2vec2-large-xlsr-53-greek", 30 | "tr": "mpoyraz/wav2vec2-xls-r-300m-cv7-turkish", 31 | "da": "saattrupdan/wav2vec2-xls-r-300m-ftspeech", 32 | "he": "imvladikon/wav2vec2-xls-r-300m-hebrew", 33 | "vi": "nguyenvulebinh/wav2vec2-base-vi-vlsp2020", 34 | "ko": "kresnik/wav2vec2-large-xlsr-korean", 35 | "ur": "kingabzpro/wav2vec2-large-xls-r-300m-Urdu", 36 | "te": "anuragshas/wav2vec2-large-xlsr-53-telugu", 37 | "hi": "theainerd/Wav2Vec2-large-xlsr-hindi", 38 | "ca": "softcatala/wav2vec2-large-xlsr-catala", 39 | "ml": "gvs/wav2vec2-large-xlsr-malayalam", 40 | "no": "NbAiLab/nb-wav2vec2-1b-bokmaal-v2", 41 | "nn": "NbAiLab/nb-wav2vec2-1b-nynorsk", 42 | "sk": "comodoro/wav2vec2-xls-r-300m-sk-cv8", 43 | "sl": "anton-l/wav2vec2-large-xlsr-53-slovenian", 44 | "hr": "classla/wav2vec2-xls-r-parlaspeech-hr", 45 | "ro": "gigant/romanian-wav2vec2", 46 | "eu": "stefan-it/wav2vec2-large-xlsr-53-basque", 47 | "gl": "ifrz/wav2vec2-large-xlsr-galician", 48 | "ka": "xsway/wav2vec2-large-xlsr-georgian", 49 | "lv": "jimregan/wav2vec2-large-xlsr-latvian-cv", 50 | "tl": "Khalsuu/filipino-wav2vec2-l-xls-r-300m-official", 51 | } 52 | 53 | # From https://github.com/m-bain/whisperX/issues/189#issuecomment-1523392800 54 | if lang in DEFAULT_ALIGN_MODELS_TORCH: 55 | model_name = DEFAULT_ALIGN_MODELS_TORCH[lang] 56 | bundle = torchaudio.pipelines.__dict__[model_name] 57 | align_model = bundle.get_model() 58 | labels = bundle.get_labels() 59 | 60 | elif lang in DEFAULT_ALIGN_MODELS_HF: 61 | model_name = DEFAULT_ALIGN_MODELS_HF[lang] 62 | processor = Wav2Vec2Processor.from_pretrained(model_name) 63 | align_model = Wav2Vec2ForCTC.from_pretrained(model_name) 64 | else: 65 | raise ValueError(f"Unsupported language: {lang}") 66 | -------------------------------------------------------------------------------- /ubi.Dockerfile: -------------------------------------------------------------------------------- 1 | # syntax=docker/dockerfile:1 2 | ARG WHISPER_MODEL=base 3 | ARG LANG=en 4 | ARG UID=1001 5 | ARG VERSION=EDGE 6 | ARG RELEASE=0 7 | 8 | # These ARGs are for caching stage builds in CI 9 | # Leave them as is when building locally 10 | ARG LOAD_WHISPER_STAGE=load_whisper 11 | ARG NO_MODEL_STAGE=no_model 12 | 13 | # When downloading diarization model with auth token, it seems that it is not respecting the TORCH_HOME env variable. 14 | # So it is necessary to ensure that the CACHE_HOME is set to the exact same path as the default path. 15 | # https://github.com/jim60105/docker-whisperX/issues/27 16 | ARG CACHE_HOME=/.cache 17 | ARG CONFIG_HOME=/.config 18 | ARG TORCH_HOME=${CACHE_HOME}/torch 19 | ARG HF_HOME=${CACHE_HOME}/huggingface 20 | 21 | ######################################## 22 | # Python stage for all bases 23 | ######################################## 24 | FROM registry.access.redhat.com/ubi9/ubi-minimal AS ubi-python 25 | 26 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892 27 | ARG TARGETARCH 28 | ARG TARGETVARIANT 29 | 30 | ENV PYTHON_VERSION=3.11 31 | ENV PYTHONUNBUFFERED=1 32 | ENV PYTHONIOENCODING=UTF-8 33 | 34 | RUN --mount=type=cache,id=dnf-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/cache/dnf \ 35 | microdnf -y upgrade --refresh --best --nodocs --noplugins --setopt=install_weak_deps=0 && \ 36 | microdnf -y install --setopt=install_weak_deps=0 --setopt=tsflags=nodocs \ 37 | python3.11 38 | RUN ln -s /usr/bin/python3.11 /usr/bin/python3 && \ 39 | ln -s /usr/bin/python3.11 /usr/bin/python 40 | 41 | ######################################## 42 | # Base stage for amd64 43 | ######################################## 44 | FROM ubi-python AS prepare_base_amd64 45 | 46 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892 47 | ARG TARGETARCH 48 | ARG TARGETVARIANT 49 | 50 | WORKDIR /tmp 51 | 52 | ENV NVIDIA_VISIBLE_DEVICES=all 53 | ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility 54 | 55 | ######################################## 56 | # Base stage for arm64 57 | ######################################## 58 | FROM ubi-python AS prepare_base_arm64 59 | 60 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892 61 | ARG TARGETARCH 62 | ARG TARGETVARIANT 63 | 64 | WORKDIR /tmp 65 | 66 | # Missing dependencies for arm64 67 | # https://github.com/jim60105/docker-whisperX/issues/14 68 | RUN --mount=type=cache,id=dnf-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/cache/dnf \ 69 | microdnf -y install --setopt=install_weak_deps=0 --setopt=tsflags=nodocs \ 70 | libgomp libsndfile 71 | 72 | # Select the base stage by target architecture 73 | FROM prepare_base_$TARGETARCH$TARGETVARIANT AS base 74 | 75 | ######################################## 76 | # Build stage 77 | ######################################## 78 | FROM base AS build 79 | 80 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892 81 | ARG TARGETARCH 82 | ARG TARGETVARIANT 83 | 84 | WORKDIR /app 85 | 86 | # Install uv 87 | COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ 88 | 89 | ENV UV_PROJECT_ENVIRONMENT=/venv 90 | ENV VIRTUAL_ENV=/venv 91 | ENV UV_LINK_MODE=copy 92 | ENV UV_PYTHON_DOWNLOADS=0 93 | 94 | # Install torch separately as required 95 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \ 96 | uv venv --system-site-packages /venv && \ 97 | uv pip install --no-deps --index "https://download.pytorch.org/whl/cu128" \ 98 | "torch==2.7.1+cu128" \ 99 | "torchaudio" \ 100 | "triton" \ 101 | "pyannote.audio==3.3.2" 102 | 103 | # Install whisperX dependencies 104 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \ 105 | --mount=type=bind,source=whisperX/pyproject.toml,target=pyproject.toml \ 106 | --mount=type=bind,source=whisperX/uv.lock,target=uv.lock \ 107 | uv sync --frozen --no-dev --no-install-project --no-editable 108 | 109 | # Install whisperX project 110 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \ 111 | --mount=source=whisperX,target=.,rw \ 112 | uv sync --frozen --no-dev --no-editable 113 | 114 | ######################################## 115 | # Final stage for no_model 116 | ######################################## 117 | FROM base AS no_model 118 | 119 | ARG CACHE_HOME 120 | ARG CONFIG_HOME 121 | ARG TORCH_HOME 122 | ARG HF_HOME 123 | ENV XDG_CACHE_HOME=${CACHE_HOME} 124 | ENV TORCH_HOME=${TORCH_HOME} 125 | ENV HF_HOME=${HF_HOME} 126 | 127 | ARG UID 128 | RUN install -d -m 775 -o $UID -g 0 /licenses && \ 129 | install -d -m 775 -o $UID -g 0 /root && \ 130 | install -d -m 775 -o $UID -g 0 ${CACHE_HOME} && \ 131 | install -d -m 775 -o $UID -g 0 ${CONFIG_HOME} && \ 132 | install -d -m 775 -o $UID -g 0 /nltk_data 133 | 134 | # ffmpeg 135 | COPY --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /ffmpeg /usr/local/bin/ 136 | # COPY --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /ffprobe /usr/local/bin/ 137 | 138 | # dumb-init 139 | COPY --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /dumb-init /usr/local/bin/ 140 | 141 | # Copy licenses (OpenShift Policy) 142 | COPY --chown=$UID:0 --chmod=775 LICENSE /licenses/LICENSE 143 | COPY --chown=$UID:0 --chmod=775 whisperX/LICENSE /licenses/whisperX.LICENSE 144 | 145 | # Copy dependencies and code (and support arbitrary uid for OpenShift best practice) 146 | # https://docs.openshift.com/container-platform/4.14/openshift_images/create-images.html#use-uid_create-images 147 | COPY --chown=$UID:0 --chmod=775 --from=build /venv /venv 148 | 149 | ENV PATH="/venv/bin${PATH:+:${PATH}}" 150 | ENV PYTHONPATH="/venv/lib/python3.11/site-packages" 151 | ENV LD_LIBRARY_PATH="/venv/lib/python3.11/site-packages/nvidia/cudnn/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" 152 | 153 | # Test whisperX 154 | RUN python3 -c 'import whisperx;' && \ 155 | whisperx -h 156 | 157 | WORKDIR /app 158 | 159 | VOLUME [ "/app" ] 160 | 161 | USER $UID 162 | 163 | STOPSIGNAL SIGINT 164 | 165 | ENTRYPOINT [ "dumb-init", "--", "/bin/sh", "-c", "whisperx \"$@\"" ] 166 | 167 | ARG VERSION 168 | ARG RELEASE 169 | LABEL name="jim60105/docker-whisperX" \ 170 | # Authors for WhisperX 171 | vendor="Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew" \ 172 | # Maintainer for this docker image 173 | maintainer="jim60105" \ 174 | # Dockerfile source repository 175 | url="https://github.com/jim60105/docker-whisperX" \ 176 | version=${VERSION} \ 177 | # This should be a number, incremented with each change 178 | release=${RELEASE} \ 179 | io.k8s.display-name="WhisperX" \ 180 | summary="WhisperX: Time-Accurate Speech Transcription of Long-Form Audio" \ 181 | description="This is the docker image for WhisperX: Automatic Speech Recognition with Word-Level Timestamps (and Speaker Diarization) from the community. For more information about this tool, please visit the following website: https://github.com/m-bain/whisperX." 182 | 183 | ######################################## 184 | # load_whisper stage 185 | # This stage will be tagged for caching in CI. 186 | ######################################## 187 | FROM ${NO_MODEL_STAGE} AS load_whisper 188 | 189 | ARG CONFIG_HOME 190 | ARG XDG_CONFIG_HOME=${CONFIG_HOME} 191 | ARG HOME="/root" 192 | 193 | # Preload Silero vad model 194 | RUN python3 <