├── .deepsource.toml
├── .dockerignore
├── .gitattributes
├── .github
    ├── copilot-instructions.md
    └── workflows
    │   ├── .prettierrc
    │   ├── 01-build-base-images.yml
    │   ├── 02-build-model-cache.yml
    │   ├── 03-build-distil-en.yml
    │   ├── 04-build-matrix-images.yml
    │   ├── README.md
    │   ├── auto_merge.yml
    │   ├── docker-reused-steps
    │       └── action.yml
    │   ├── docker_publish.yml
    │   ├── scan.yml
    │   ├── scan
    │       └── html.tpl
    │   ├── scan_ubi.yml
    │   ├── submodule_update.yml
    │   └── test
    │       ├── en.webm
    │       └── zh.webm
├── .gitignore
├── .gitmodules
├── .hadolint.yml
├── .vscode
    └── settings.json
├── AGENTS.md
├── Dockerfile
├── LICENSE
├── README.md
├── docker-bake.hcl
├── load_align_model.py
└── ubi.Dockerfile


/.deepsource.toml:
--------------------------------------------------------------------------------
1 | version = 1
2 | 
3 | [[analyzers]]
4 | name = "docker"


--------------------------------------------------------------------------------
/.dockerignore:
--------------------------------------------------------------------------------
 1 | **/.hadolint.yml
 2 | **/node_modules
 3 | **/*.log
 4 | **/.git
 5 | **/.gitignore
 6 | **/.env
 7 | **/.github
 8 | **/.vscode
 9 | **/bin
10 | **/obj
11 | **/dist
12 | 


--------------------------------------------------------------------------------
/.gitattributes:
--------------------------------------------------------------------------------
1 | *.sh eol=lf
2 | 


--------------------------------------------------------------------------------
/.github/copilot-instructions.md:
--------------------------------------------------------------------------------
  1 | # GitHub Copilot Instructions for docker-whisperX
  2 | 
  3 | * **Response Language:** `zh-TW 正體中文`
  4 | 
  5 | # Key Directives:
  6 | 
  7 | * Maintain the highest standard of quality in all deliverables by following best practices.
  8 | * All code comments and documentation must be written in **English** as per project conventions.
  9 | * Proactively consult both core documentation and conversation history to ensure accurate comprehension of all requirements.
 10 | * You are neither able to execute `docker`, use `podman` instead.
 11 | * When doing Git commit, use the conventional commit format for the title and a brief description in the body. Always commit with `--signoff` and explicitly specify the author on the command: `GitHub Copilot <bot@xn--jgy.tw>`. Write the commit in English.
 12 | 
 13 | ---
 14 | 
 15 | # Project DevOps
 16 | 
 17 | This project uses GitHub for DevOps management.
 18 | 
 19 | Please use the #github-sudo tool to perform DevOps tasks.
 20 | 
 21 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!***
 22 | 
 23 | * **GitHub repo**: https://github.com/jim60105/docker-whisperX
 24 | 
 25 | * **Backlog & Bugs**: All backlogs and bugs must be managed on GitHub Issues.
 26 | 
 27 |   * Each issue represents a specific backlog plan / bug reports / enhancement requests.
 28 |   * Contains implementation or bug-fix guides from project foundation to deployment
 29 |   * Each issue(backlogs) includes complete technical design and implementation details
 30 |   * Each issue(bugs) includes problem description, reproduction steps, and proposed solutions
 31 |   * Serves as task queue for ongoing maintenance and improvements
 32 | 
 33 | ## DevOps Flow
 34 | 
 35 | ### Planning Stage
 36 | 
 37 | **If we are at planning stage you shouldn't start to implement anything!**
 38 | **Planning Stage is to create a detailed development plan and #create_issue on GitHub**
 39 | 
 40 | 1. **Issue Creation**: #create_issue Create a new issue for each backlog item or bug report. Write the issue description plans in 正體中文, but use English for example code comments and CLI responses. The plan should be very detailed (try your best!). Please write that enables anyone to complete the work successfully.
 41 | 2. **Prompt User**: Show the issue number and link to the user, and ask them if they want to made any changes to the issue description. If they do, you can edit the issue description using #update_issue .
 42 | 
 43 | ### Implementation Stage
 44 | 
 45 | **Only start to implement stage when user prompt you to do so!**
 46 | **Implementation Stage is to implement the plan step by step, following the instructions provided in the issue and submit a work report PR at last**
 47 | 
 48 | 1. **Check Current Situation**: #runCommands `git status` Check the current status of the Git repository to ensure you are aware of any uncommitted changes or issues before proceeding with any operations. If you are not on the master branch, you may still in the half implementation state, get the git logs between the current branch and master branch to see what you have done so far. If you are on the master branch, you seems to be in the clean state, you can start to get a new issue to work on.
 49 | 2. **Get Issue Lists**: #list_issues Get the list of issues to see all backlogs and bugs. Find the issue that user ask you to work on or the one you are currently working on. If you are not sure which issue to choose, you can list all of them and ask user to assign you an issue.
 50 | 3. **Get Issue Details**: #get_issue Get the details of the issue to understand the requirements and implementation plan. Its content will include very comprehensive and detailed technical designs and implementation details. Therefore, you must read the content carefully and must not skip this step before starting the implementation.
 51 | 4. **Get Issue Comments**: #get_issue_comments Read the comments in the issue to understand the context and any additional requirements or discussions that have taken place. Please read it to determine whether this issue has been completed, whether further implementation is needed, or if there are still problems that need to be fixed. This step must not be skipped before starting implementation.
 52 | 5. **Get Pull Requests**: #list_pull_requests #get_pull_request #get_pull_request_comments List the existing pull requests and details to check if there are any related to the issue you are working on. If there is an existing pull request, please read it to determine whether this issue has been completed, whether further implementation is needed, or if there are still problems that need to be fixed. This step must not be skipped before starting implementation.
 53 | 6. **Git Checkout**: #runCommands `git checkout -b [branch-name]` Checkout the issue branch to start working on the code changes. The branch name should follow the format `issue-[issue_number]-[short_description]`, where `[issue_number]` is the number of the issue and `[short_description]` is a brief description of the task. Skip this step if you are already on the correct branch.
 54 | 7. **Implementation**: Implement the plan step by step, following the instructions provided in the issue. Each step should be executed in sequence, ensuring that all requirements are met and documented appropriately.
 55 | 8. **Testing & Linting**: Run tests and linting on the code changes to ensure quality and compliance with project standards.
 56 | 9. **Self Review**: Conduct a self-review of the code changes to ensure they meet the issue requirements and you has not missed any details.
 57 | 10. **Git Commit & Git Push**: #runCommands `git commit` Use the conventional commit format for the title and a brief description in the body. Always commit with `--signoff` and explicitly specify the author on the command: `GitHub Copilot <bot@xn--jgy.tw>`. Write the commit in English. Link the issue number in the commit message body. #runCommands `git push` Push the changes to the remote repository.
 58 | 11. **Create Pull Request**: #list_pull_requests #create_pull_request ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. Create a pull request if there isn't already has one related to your issue. Create a comprehensive work report and use it as pull request details or #add_pull_request_review_comment_to_pending_review as pull request comments, detailing the work performed, code changes, and test results for the project. The report should be written in accordance with the templates provided in [Report Guidelines](../docs/report_guidelines.md) and [REPORT_TEMPLATE](../docs/REPORT_TEMPLATE.md). Follow the template exactly. Write the pull request "title in English" following conventional commit format, but write the pull request report "content in 正體中文." Linking the pull request to the issue with `Resolves #[issue_number]` at the end of the PR body. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR to `upstream`.
 59 | 
 60 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!***
 61 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!***
 62 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!***
 63 | 
 64 | ---
 65 | 
 66 | ## Project Overview
 67 | 
 68 | This project provides a **Docker containerization** for [WhisperX](https://github.com/m-bain/whisperX).
 69 | 
 70 | The project focuses on **continuous integration optimization** for building 175+ Docker images (10GB each) weekly on GitHub Free runners, emphasizing efficient docker layer caching, parallel builds, and minimal image sizes.
 71 | 
 72 | The focus of this project is on the Dockerfile and CI workflow, not on the WhisperX project itself.
 73 | 
 74 | ## Project Structure
 75 | 
 76 | ```
 77 | docker-whisperX/
 78 | ├── Dockerfile              # Main Docker build configuration (For docker compatibility)
 79 | ├── ubi.Dockerfile          # Red Hat UBI-based alternative (For podman compatibility)
 80 | ├── docker-bake.hcl         # Docker Buildx bake configuration for matrix builds
 81 | ├── load_align_model.py     # Preloads alignment models for supported languages
 82 | ├── whisperX/               # Git submodule containing WhisperX source code
 83 | │   ├── pyproject.toml      # Python package configuration
 84 | │   └── whisperx/           # Main WhisperX Python package
 85 | └── .github/
 86 |     └── workflows/          # CI/CD pipeline configurations
 87 | ```
 88 | 
 89 | ## Coding Standards and Conventions
 90 | 
 91 | ### Docker Best Practices
 92 | - Use **multi-stage builds** to minimize final image size
 93 | - Leverage **BuildKit features** like `--mount=type=cache` for dependency caching
 94 | - Apply **layer caching strategies** to optimize CI build times
 95 | - Use **ARG** variables for build-time configuration (WHISPER_MODEL, LANG, etc.)
 96 | - Follow **security best practices**: run as non-root user, minimize installed packages
 97 | - Do not use `--link` in ubi.Dockerfile, as it is not supported by Podman.
 98 | - Do not use `,z` or `,Z` in Dockerfile, as it is not supported by Docker buildx.
 99 | 
100 | ### Documentation Standards
101 | - Write documentation in English for user-facing content
102 | - Use **English** for technical comments in code and commit messages
103 | - Include **clear examples** in README files showing actual usage commands
104 | - Document **build arguments** and their acceptable values
105 | - Provide **troubleshooting guidance** for common issues
106 | 
107 | ## Key Technologies and Dependencies
108 | 
109 | ### Build Tools
110 | - **uv**: Modern Python package manager for dependency resolution (Used in Dockerfile)
111 | - **Docker Buildx**: Extended build capabilities with bake support
112 | - **GitHub Actions**: CI/CD automation for multi-architecture builds
113 | 
114 | ## Development Guidelines
115 | 
116 | ### When Working with Docker Configuration
117 | - **Dockerfile modifications**: Always test both `amd64` and `arm64` architectures
118 | - **Build arguments**: Validate that ARG values match supported languages in `load_align_model.py`
119 | - **Cache optimization**: Consider layer ordering impact on CI build performance
120 | - **Multi-stage builds**: Ensure each stage serves a clear purpose (build → no_model → load_whisper)
121 | 
122 | ### When Working with CI/CD
123 | - **Parallel builds**: Consider the large amount of build matrix impact on GitHub runner resources
124 | - **Caching strategy**: Optimize for both build time and cache storage efficiency
125 | - **Multi-architecture**: Ensure changes work correctly on both x86_64 and arm64
126 | 
127 | ## Project-Specific Conventions
128 | 
129 | ## Additional Notes for Contributors
130 | 
131 | When suggesting changes, always consider the impact on:
132 | 1. **Build time efficiency** for the CI pipeline
133 | 2. **Multi-architecture compatibility** (amd64/arm64)  
134 | 
135 | ---
136 | 
137 | When contributing to this codebase, adhere strictly to these directives to ensure consistency with the existing architectural conventions and stylistic norms.
138 | 


--------------------------------------------------------------------------------
/.github/workflows/.prettierrc:
--------------------------------------------------------------------------------
1 | {
2 |   "tabWidth": 2,
3 |   "useTabs": false
4 | }
5 | 


--------------------------------------------------------------------------------
/.github/workflows/01-build-base-images.yml:
--------------------------------------------------------------------------------
  1 | # Build base images workflow
  2 | # This workflow builds the ubi-no_model and no_model base images
  3 | name: "01-build-base-images"
  4 | 
  5 | on:
  6 |   workflow_run:
  7 |     workflows: ["docker_publish"]
  8 |     types: [completed]
  9 | 
 10 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job.
 11 | permissions:
 12 |   contents: read
 13 |   packages: write
 14 |   id-token: write
 15 |   attestations: write
 16 | 
 17 | env:
 18 |   REGISTRY_IMAGE: ${{ github.repository_owner }}/whisperx
 19 | 
 20 | jobs:
 21 |   # Build the ubi-no_model in parallel across multiple platforms
 22 |   build_ubi:
 23 |     runs-on: ${{ matrix.runner }}
 24 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
 25 |     strategy:
 26 |       fail-fast: false
 27 |       matrix:
 28 |         include:
 29 |           - platform: linux/amd64
 30 |             runner: ubuntu-latest
 31 |           - platform: linux/arm64
 32 |             runner: ubuntu-24.04-arm
 33 |     outputs:
 34 |       digest: ${{ steps.build.outputs.digest }}
 35 |     steps:
 36 |       - name: Checkout
 37 |         uses: actions/checkout@v4
 38 |         with:
 39 |           submodules: true
 40 | 
 41 |       - name: Prepare platform variables
 42 |         run: |
 43 |           platform=${{ matrix.platform }}
 44 |           echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV
 45 | 
 46 |       - name: Setup docker
 47 |         id: setup
 48 |         uses: ./.github/workflows/docker-reused-steps
 49 |         with:
 50 |           tag: ubi-no_model
 51 | 
 52 |       - name: Build and push by digest (ubi-no_model)
 53 |         uses: docker/build-push-action@v6
 54 |         id: build
 55 |         with:
 56 |           context: .
 57 |           file: ./ubi.Dockerfile
 58 |           target: no_model
 59 |           platforms: ${{ matrix.platform }}
 60 |           labels: ${{ steps.setup.outputs.labels }}
 61 |           build-args: |
 62 |             VERSION=${{ github.event.workflow_run.head_sha }}
 63 |             RELEASE=${{ github.event.workflow_run.run_number }}
 64 |           cache-from: |
 65 |             type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-${{ env.PLATFORM_PAIR }}
 66 |           cache-to: |
 67 |             type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-${{ env.PLATFORM_PAIR }},mode=max
 68 |           outputs: |
 69 |             type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true
 70 |           sbom: true
 71 |           provenance: true
 72 | 
 73 |       - name: Export digest
 74 |         run: |
 75 |           mkdir -p /tmp/digests
 76 |           digest="${{ steps.build.outputs.digest }}"
 77 |           echo "${digest#sha256:}" > /tmp/digests/${{ env.PLATFORM_PAIR }}
 78 | 
 79 |       - name: Upload digest
 80 |         uses: actions/upload-artifact@v4
 81 |         with:
 82 |           name: digests-ubi-${{ env.PLATFORM_PAIR }}
 83 |           path: /tmp/digests/*
 84 |           if-no-files-found: error
 85 |           retention-days: 1
 86 | 
 87 |   # Test ubi-no_model on amd64 platform only (for performance)
 88 |   test_ubi:
 89 |     runs-on: ubuntu-latest
 90 |     needs: build_ubi
 91 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
 92 |     steps:
 93 |       - name: Checkout
 94 |         uses: actions/checkout@v4
 95 |         with:
 96 |           submodules: true
 97 | 
 98 |       - name: Setup docker
 99 |         id: setup
100 |         uses: ./.github/workflows/docker-reused-steps
101 |         with:
102 |           tag: ubi-no_model
103 | 
104 |       - name: Download build digests
105 |         uses: actions/download-artifact@v4
106 |         with:
107 |           name: digests-ubi-linux-amd64
108 |           path: /tmp/test-digests
109 | 
110 |       - name: Pull test image from registry
111 |         id: pull
112 |         run: |
113 |           # Get the digest for amd64 platform
114 |           DIGEST=$(cat /tmp/test-digests/linux-amd64)
115 |           IMAGE_REF="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:${DIGEST}"
116 |           echo "Pulling image: $IMAGE_REF"
117 |           docker pull "$IMAGE_REF"
118 |           # Tag the image for easier reference in tests
119 |           docker tag "$IMAGE_REF" "test-image:ubi-no_model"
120 |           echo "imageid=test-image:ubi-no_model" >> $GITHUB_OUTPUT
121 | 
122 |       - name: Test ubi-no_model docker image
123 |         run: |
124 |           docker run --group-add 0 -v ".:/app" ${{ steps.pull.outputs.imageid }} -- --model base --language en --device cpu --compute_type int8 --output_format srt .github/workflows/test/en.webm;
125 |           if [ ! -f en.srt ]; then
126 |             echo "The en.srt file does not exist"
127 |             exit 1
128 |           fi
129 |           echo "cat en.srt:";
130 |           cat en.srt;
131 |           if ! grep -qi 'no' en.srt; then
132 |             echo "The en.srt file does not contain the word 'no'"
133 |             exit 1
134 |           fi
135 |           echo "Test passed."
136 | 
137 |   # Merge all platform builds into manifest list for ubi-no_model
138 |   merge_ubi:
139 |     runs-on: ubuntu-latest
140 |     needs: [build_ubi, test_ubi]
141 |     steps:
142 |       - name: Checkout
143 |         uses: actions/checkout@v4
144 | 
145 |       - name: Download digests
146 |         uses: actions/download-artifact@v4
147 |         with:
148 |           path: /tmp/digests
149 |           pattern: digests-ubi-*
150 |           merge-multiple: true
151 | 
152 |       - name: Setup docker
153 |         id: setup
154 |         uses: ./.github/workflows/docker-reused-steps
155 |         with:
156 |           tag: ubi-no_model
157 | 
158 |       - name: Create GHCR manifest list
159 |         run: |
160 |           echo "Creating manifest list for GHCR..."
161 |           cd /tmp/digests
162 |           echo "Files in /tmp/digests:"
163 |           ls -la
164 |           echo "Building digest references..."
165 |           digest_refs=""
166 |           for file in linux-*; do
167 |             if [[ -f "$file" ]]; then
168 |               digest=$(cat "$file")
169 |               echo "Processing $file with digest: $digest"
170 |               digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest "
171 |             fi
172 |           done
173 |           echo "Final digest references: $digest_refs"
174 |           docker buildx imagetools create \
175 |             $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \
176 |             $digest_refs
177 | 
178 |       - name: Get final manifest digest
179 |         id: get_digest
180 |         run: |
181 |           # Get the digest of the manifest list we just created
182 |           echo "Available GHCR tags:"
183 |           jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON
184 |           IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1)
185 |           echo "Using tag for digest lookup: $IMAGE_TAG"
186 |           # Get the raw digest output and extract only the sha256 part
187 |           DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}")
188 |           echo "Raw digest output: $DIGEST_RAW"
189 |           # Extract only the digest hash, removing any MediaType or other formatting
190 |           DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1)
191 |           echo "Extracted digest: $DIGEST"
192 |           echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT
193 | 
194 |       - name: Attest GHCR image (ubi-no_model)
195 |         uses: actions/attest-build-provenance@v2
196 |         with:
197 |           subject-name: ghcr.io/${{ github.repository_owner }}/whisperx
198 |           subject-digest: ${{ steps.get_digest.outputs.manifest_digest }}
199 | 
200 |   # Build the no_model in parallel across multiple platforms
201 |   build_no_model:
202 |     runs-on: ${{ matrix.runner }}
203 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
204 |     strategy:
205 |       fail-fast: false
206 |       matrix:
207 |         include:
208 |           - platform: linux/amd64
209 |             runner: ubuntu-latest
210 |           - platform: linux/arm64
211 |             runner: ubuntu-24.04-arm
212 |     outputs:
213 |       digest: ${{ steps.build.outputs.digest }}
214 |     steps:
215 |       - name: Checkout
216 |         uses: actions/checkout@v4
217 |         with:
218 |           submodules: true
219 | 
220 |       - name: Prepare platform variables
221 |         run: |
222 |           platform=${{ matrix.platform }}
223 |           echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV
224 | 
225 |       - name: Setup docker
226 |         id: setup
227 |         uses: ./.github/workflows/docker-reused-steps
228 | 
229 |       - name: Build and push by digest (no_model)
230 |         uses: docker/build-push-action@v6
231 |         id: build
232 |         with:
233 |           context: .
234 |           file: ./Dockerfile
235 |           target: no_model
236 |           platforms: ${{ matrix.platform }}
237 |           labels: ${{ steps.setup.outputs.labels }}
238 |           build-args: |
239 |             VERSION=${{ github.event.workflow_run.head_sha }}
240 |             RELEASE=${{ github.event.workflow_run.run_number }}
241 |           cache-from: |
242 |             type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-${{ env.PLATFORM_PAIR }}
243 |           cache-to: |
244 |             type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-${{ env.PLATFORM_PAIR }},mode=max
245 |           outputs: |
246 |             type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true
247 |           sbom: true
248 |           provenance: true
249 | 
250 |       - name: Export digest
251 |         run: |
252 |           mkdir -p /tmp/digests
253 |           digest="${{ steps.build.outputs.digest }}"
254 |           echo "${digest#sha256:}" > /tmp/digests/${{ env.PLATFORM_PAIR }}
255 | 
256 |       - name: Upload digest
257 |         uses: actions/upload-artifact@v4
258 |         with:
259 |           name: digests-no_model-${{ env.PLATFORM_PAIR }}
260 |           path: /tmp/digests/*
261 |           if-no-files-found: error
262 |           retention-days: 1
263 | 
264 |   # Test no_model on amd64 platform only (for performance)
265 |   test_no_model:
266 |     runs-on: ubuntu-latest
267 |     needs: build_no_model
268 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
269 |     steps:
270 |       - name: Checkout
271 |         uses: actions/checkout@v4
272 |         with:
273 |           submodules: true
274 | 
275 |       - name: Setup docker
276 |         id: setup
277 |         uses: ./.github/workflows/docker-reused-steps
278 | 
279 |       - name: Download build digests
280 |         uses: actions/download-artifact@v4
281 |         with:
282 |           name: digests-no_model-linux-amd64
283 |           path: /tmp/test-digests
284 | 
285 |       - name: Pull test image from registry
286 |         id: pull
287 |         run: |
288 |           # Get the digest for amd64 platform
289 |           DIGEST=$(cat /tmp/test-digests/linux-amd64)
290 |           IMAGE_REF="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:${DIGEST}"
291 |           echo "Pulling image: $IMAGE_REF"
292 |           docker pull "$IMAGE_REF"
293 |           # Tag the image for easier reference in tests
294 |           docker tag "$IMAGE_REF" "test-image:no_model"
295 |           echo "imageid=test-image:no_model" >> $GITHUB_OUTPUT
296 | 
297 |       - name: Test no_model docker image
298 |         run: |
299 |           docker run --group-add 0 -v ".:/app" ${{ steps.pull.outputs.imageid }} -- --model base --language en --device cpu --compute_type int8 --output_format srt .github/workflows/test/en.webm;
300 |           if [ ! -f en.srt ]; then
301 |             echo "The en.srt file does not exist"
302 |             exit 1
303 |           fi
304 |           echo "cat en.srt:";
305 |           cat en.srt;
306 |           if ! grep -qi 'no' en.srt; then
307 |             echo "The en.srt file does not contain the word 'no'"
308 |             exit 1
309 |           fi
310 |           echo "Test passed."
311 | 
312 |   # Merge all platform builds into manifest list for no_model
313 |   merge_no_model:
314 |     runs-on: ubuntu-latest
315 |     needs: [build_no_model, test_no_model]
316 |     outputs:
317 |       digest: ${{ steps.get_digest.outputs.manifest_digest }}
318 |     steps:
319 |       - name: Checkout
320 |         uses: actions/checkout@v4
321 | 
322 |       - name: Download digests
323 |         uses: actions/download-artifact@v4
324 |         with:
325 |           path: /tmp/digests
326 |           pattern: digests-no_model-*
327 |           merge-multiple: true
328 | 
329 |       - name: Setup docker
330 |         id: setup
331 |         uses: ./.github/workflows/docker-reused-steps
332 | 
333 |       - name: Create GHCR manifest list
334 |         run: |
335 |           echo "Creating manifest list for GHCR..."
336 |           cd /tmp/digests
337 |           echo "Files in /tmp/digests:"
338 |           ls -la
339 |           echo "Building digest references..."
340 |           digest_refs=""
341 |           for file in linux-*; do
342 |             if [[ -f "$file" ]]; then
343 |               digest=$(cat "$file")
344 |               echo "Processing $file with digest: $digest"
345 |               digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest "
346 |             fi
347 |           done
348 |           echo "Final digest references: $digest_refs"
349 |           docker buildx imagetools create \
350 |             $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \
351 |             $digest_refs
352 | 
353 |       - name: Get final manifest digest
354 |         id: get_digest
355 |         run: |
356 |           # Get the digest of the manifest list we just created
357 |           echo "Available GHCR tags:"
358 |           jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON
359 |           IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1)
360 |           echo "Using tag for digest lookup: $IMAGE_TAG"
361 |           # Get the raw digest output and extract only the sha256 part
362 |           DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}")
363 |           echo "Raw digest output: $DIGEST_RAW"
364 |           # Extract only the digest hash, removing any MediaType or other formatting
365 |           DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1)
366 |           echo "Extracted digest: $DIGEST"
367 |           echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT
368 | 
369 |       - name: Attest GHCR image (no_model)
370 |         uses: actions/attest-build-provenance@v2
371 |         with:
372 |           subject-name: ghcr.io/${{ github.repository_owner }}/whisperx
373 |           subject-digest: ${{ steps.get_digest.outputs.manifest_digest }}
374 | 


--------------------------------------------------------------------------------
/.github/workflows/02-build-model-cache.yml:
--------------------------------------------------------------------------------
  1 | # Build model cache workflow
  2 | # This workflow builds the Whisper model cache images for all supported models
  3 | name: "02-build-model-cache"
  4 | 
  5 | on:
  6 |   workflow_run:
  7 |     workflows: ["01-build-base-images"]
  8 |     types: [completed]
  9 | 
 10 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job.
 11 | permissions:
 12 |   contents: read
 13 |   packages: write
 14 |   id-token: write
 15 |   attestations: write
 16 | 
 17 | env:
 18 |   REGISTRY_IMAGE: ${{ github.repository_owner }}/whisperx
 19 | 
 20 | jobs:
 21 |   # Build model cache in parallel across multiple platforms
 22 |   build_cache:
 23 |     runs-on: ${{ matrix.runner }}
 24 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
 25 |     strategy:
 26 |       fail-fast: false
 27 |       matrix:
 28 |         model:
 29 |           - tiny
 30 |           - base
 31 |           - small
 32 |           - medium
 33 |           - large-v3
 34 |           - distil-large-v3
 35 |         platform:
 36 |           - linux/amd64
 37 |           - linux/arm64
 38 |         include:
 39 |           - platform: linux/amd64
 40 |             runner: ubuntu-latest
 41 |           - platform: linux/arm64
 42 |             runner: ubuntu-24.04-arm
 43 |     outputs:
 44 |       digest: ${{ steps.build.outputs.digest }}
 45 |     steps:
 46 |       - name: Checkout
 47 |         uses: actions/checkout@v4
 48 |         with:
 49 |           submodules: true
 50 | 
 51 |       - name: Setup docker
 52 |         id: setup
 53 |         uses: ./.github/workflows/docker-reused-steps
 54 |         with:
 55 |           tag: cache-${{ matrix.model }}
 56 | 
 57 |       - name: Get no_model digest from triggering workflow
 58 |         id: get-base-digest
 59 |         run: |
 60 |           # Get the triggering workflow run information
 61 |           WORKFLOW_RUN_ID="${{ github.event.workflow_run.id }}"
 62 |           
 63 |           # Get the no_model digest from the artifacts or outputs
 64 |           # For now, we'll use the latest no_model image with the same commit SHA
 65 |           COMMIT_SHA="${{ github.event.workflow_run.head_sha }}"
 66 |           SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7)
 67 |           
 68 |           # Use the no_model image with short SHA tag
 69 |           NO_MODEL_IMAGE="ghcr.io/jim60105/whisperx:no_model-$SHORT_SHA"
 70 |           echo "no_model_image=$NO_MODEL_IMAGE" >> $GITHUB_OUTPUT
 71 | 
 72 |       - name: Build and push by digest (cache-${{ matrix.model }})
 73 |         uses: docker/build-push-action@v6
 74 |         id: build
 75 |         with:
 76 |           context: .
 77 |           file: ./Dockerfile
 78 |           target: load_whisper
 79 |           platforms: ${{ matrix.platform }}
 80 |           labels: ${{ steps.setup.outputs.labels }}
 81 |           build-args: |
 82 |             WHISPER_MODEL=${{ matrix.model }}
 83 |             NO_MODEL_STAGE=${{ steps.get-base-digest.outputs.no_model_image }}
 84 |             VERSION=${{ github.event.workflow_run.head_sha }}
 85 |             RELEASE=${{ github.event.workflow_run.run_number }}
 86 |           cache-from: |
 87 |             type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-model-${{ matrix.model }}
 88 |           cache-to: |
 89 |             type=registry,ref=ghcr.io/${{ env.REGISTRY_IMAGE }}:cache-model-${{ matrix.model }},mode=max
 90 |           outputs: |
 91 |             type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true
 92 |           sbom: true
 93 |           provenance: true
 94 | 
 95 |       - name: Export digest
 96 |         run: |
 97 |           mkdir -p /tmp/digests
 98 |           digest="${{ steps.build.outputs.digest }}"
 99 |           platform="${{ matrix.platform }}"
100 |           platform_safe="${platform//\//-}"
101 |           echo "${digest#sha256:}" > "/tmp/digests/${platform_safe}"
102 | 
103 |       - name: Upload digest
104 |         uses: actions/upload-artifact@v4
105 |         with:
106 |           name: digests-cache-${{ matrix.model }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }}
107 |           path: /tmp/digests/*
108 |           if-no-files-found: error
109 |           retention-days: 1
110 | 
111 |   # Test model cache builds by pulling from registry
112 |   test_cache:
113 |     runs-on: ${{ matrix.runner }}
114 |     needs: build_cache
115 |     strategy:
116 |       fail-fast: false
117 |       matrix:
118 |         model:
119 |           - tiny
120 |           - base
121 |           - small
122 |           - medium
123 |           - large-v3
124 |           - distil-large-v3
125 |         platform:
126 |           - linux/amd64
127 |           - linux/arm64
128 |         include:
129 |           - platform: linux/amd64
130 |             runner: ubuntu-latest
131 |           - platform: linux/arm64
132 |             runner: ubuntu-24.04-arm
133 |     steps:
134 |       - name: Checkout
135 |         uses: actions/checkout@v4
136 |         with:
137 |           submodules: true
138 | 
139 |       - name: Download digests
140 |         uses: actions/download-artifact@v4
141 |         with:
142 |           name: digests-cache-${{ matrix.model }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }}
143 |           path: /tmp/digests
144 | 
145 |       - name: Setup Docker Buildx
146 |         uses: docker/setup-buildx-action@v3
147 | 
148 |       - name: Log in to Container Registry
149 |         uses: docker/login-action@v3
150 |         with:
151 |           registry: ghcr.io
152 |           username: ${{ github.actor }}
153 |           password: ${{ secrets.GITHUB_TOKEN }}
154 | 
155 |       - name: Test pull cache-${{ matrix.model }} image
156 |         run: |
157 |           cd /tmp/digests
158 |           digest=$(cat linux-*)
159 |           echo "Testing pull of cache-${{ matrix.model }} for platform ${{ matrix.platform }}"
160 |           docker pull --platform ${{ matrix.platform }} ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest
161 |           echo "Successfully pulled cache-${{ matrix.model }} image for ${{ matrix.platform }}"
162 | 
163 |   # Merge all platform builds into manifest list for each model cache
164 |   merge_cache:
165 |     runs-on: ubuntu-latest
166 |     needs: test_cache
167 |     strategy:
168 |       matrix:
169 |         model:
170 |           - tiny
171 |           - base
172 |           - small
173 |           - medium
174 |           - large-v3
175 |           - distil-large-v3
176 |     steps:
177 |       - name: Checkout
178 |         uses: actions/checkout@v4
179 |         with:
180 |           submodules: true
181 | 
182 |       - name: Download digests
183 |         uses: actions/download-artifact@v4
184 |         with:
185 |           path: /tmp/digests
186 |           pattern: digests-cache-${{ matrix.model }}-*
187 |           merge-multiple: true
188 | 
189 |       - name: Setup docker
190 |         id: setup
191 |         uses: ./.github/workflows/docker-reused-steps
192 |         with:
193 |           tag: cache-${{ matrix.model }}
194 | 
195 |       - name: Create GHCR manifest list
196 |         run: |
197 |           echo "Creating manifest list for cache-${{ matrix.model }}..."
198 |           cd /tmp/digests
199 |           echo "Files in /tmp/digests:"
200 |           ls -la
201 |           echo "Building digest references..."
202 |           digest_refs=""
203 |           for file in linux-*; do
204 |             if [[ -f "$file" ]]; then
205 |               digest=$(cat "$file")
206 |               echo "Processing $file with digest: $digest"
207 |               digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest "
208 |             fi
209 |           done
210 |           echo "Final digest references: $digest_refs"
211 |           docker buildx imagetools create \
212 |             $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \
213 |             $digest_refs
214 | 
215 |       - name: Get final manifest digest
216 |         id: get_digest
217 |         run: |
218 |           # Get the digest of the manifest list we just created
219 |           echo "Available GHCR tags:"
220 |           jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON
221 |           IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1)
222 |           echo "Using tag for digest lookup: $IMAGE_TAG"
223 |           # Get the raw digest output and extract only the sha256 part
224 |           DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}")
225 |           echo "Raw digest output: $DIGEST_RAW"
226 |           # Extract only the digest hash, removing any MediaType or other formatting
227 |           DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1)
228 |           echo "Extracted digest: $DIGEST"
229 |           echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT
230 | 
231 |       - name: Attest GHCR image (cache-${{ matrix.model }})
232 |         uses: actions/attest-build-provenance@v2
233 |         with:
234 |           subject-name: ghcr.io/${{ github.repository_owner }}/whisperx
235 |           subject-digest: ${{ steps.get_digest.outputs.manifest_digest }}
236 | 


--------------------------------------------------------------------------------
/.github/workflows/03-build-distil-en.yml:
--------------------------------------------------------------------------------
  1 | # Build distil-large-v3-en workflow
  2 | # This workflow builds the distil-large-v3-en specialized image
  3 | name: "03-build-distil-en"
  4 | 
  5 | on:
  6 |   workflow_run:
  7 |     workflows: ["02-build-model-cache"]
  8 |     types: [completed]
  9 | 
 10 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job.
 11 | permissions:
 12 |   contents: read
 13 |   packages: write
 14 |   id-token: write
 15 |   attestations: write
 16 | 
 17 | env:
 18 |   REGISTRY_IMAGE: ${{ github.repository_owner }}/whisperx
 19 | 
 20 | jobs:
 21 |   # Build distil-large-v3-en on multiple platforms in parallel
 22 |   build_distil:
 23 |     runs-on: ${{ matrix.runner }}
 24 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
 25 |     strategy:
 26 |       fail-fast: false
 27 |       matrix:
 28 |         include:
 29 |           - platform: linux/amd64
 30 |             runner: ubuntu-latest
 31 |           - platform: linux/arm64
 32 |             runner: ubuntu-24.04-arm
 33 |     outputs:
 34 |       digest: ${{ steps.build.outputs.digest }}
 35 |     steps:
 36 |       - name: Checkout
 37 |         uses: actions/checkout@v4
 38 |         with:
 39 |           submodules: true
 40 | 
 41 |       - name: Setup docker
 42 |         id: setup
 43 |         uses: ./.github/workflows/docker-reused-steps
 44 |         with:
 45 |           tag: distil-large-v3-en
 46 | 
 47 |       - name: Get base image references
 48 |         id: get-refs
 49 |         run: |
 50 |           # Get the commit SHA from the triggering workflow
 51 |           COMMIT_SHA="${{ github.event.workflow_run.head_sha }}"
 52 |           SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7)
 53 |           
 54 |           # Set image references
 55 |           NO_MODEL_IMAGE="ghcr.io/jim60105/whisperx:no_model-$SHORT_SHA"
 56 |           CACHE_IMAGE="ghcr.io/jim60105/whisperx:cache-distil-large-v3-$SHORT_SHA"
 57 |           
 58 |           echo "no_model_image=$NO_MODEL_IMAGE" >> $GITHUB_OUTPUT
 59 |           echo "cache_image=$CACHE_IMAGE" >> $GITHUB_OUTPUT
 60 | 
 61 |       - name: Build and push by digest (distil-large-v3-en)
 62 |         uses: docker/build-push-action@v6
 63 |         id: build
 64 |         with:
 65 |           context: .
 66 |           file: ./Dockerfile
 67 |           target: final
 68 |           platforms: ${{ matrix.platform }}
 69 |           labels: ${{ steps.setup.outputs.labels }}
 70 |           build-args: |
 71 |             WHISPER_MODEL=distil-large-v3
 72 |             LANG=en
 73 |             LOAD_WHISPER_STAGE=${{ steps.get-refs.outputs.cache_image }}
 74 |             NO_MODEL_STAGE=${{ steps.get-refs.outputs.no_model_image }}
 75 |             VERSION=${{ github.event.workflow_run.head_sha }}
 76 |             RELEASE=${{ github.event.workflow_run.run_number }}
 77 |           outputs: |
 78 |             type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true
 79 |           sbom: true
 80 |           provenance: true
 81 | 
 82 |       - name: Export digest
 83 |         run: |
 84 |           mkdir -p /tmp/digests
 85 |           digest="${{ steps.build.outputs.digest }}"
 86 |           platform="${{ matrix.platform }}"
 87 |           platform_safe="${platform//\//-}"
 88 |           echo "${digest#sha256:}" > "/tmp/digests/${platform_safe}"
 89 | 
 90 |       - name: Upload digest
 91 |         uses: actions/upload-artifact@v4
 92 |         with:
 93 |           name: digests-distil-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }}
 94 |           path: /tmp/digests/*
 95 |           if-no-files-found: error
 96 |           retention-days: 1
 97 | 
 98 |   # Test distil-large-v3-en builds by pulling from registry
 99 |   test_distil:
100 |     runs-on: ${{ matrix.runner }}
101 |     needs: build_distil
102 |     strategy:
103 |       fail-fast: false
104 |       matrix:
105 |         include:
106 |           - platform: linux/amd64
107 |             runner: ubuntu-latest
108 |           - platform: linux/arm64
109 |             runner: ubuntu-24.04-arm
110 |     steps:
111 |       - name: Checkout
112 |         uses: actions/checkout@v4
113 |         with:
114 |           submodules: true
115 | 
116 |       - name: Download digests
117 |         uses: actions/download-artifact@v4
118 |         with:
119 |           name: digests-distil-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }}
120 |           path: /tmp/digests
121 | 
122 |       - name: Setup Docker Buildx
123 |         uses: docker/setup-buildx-action@v3
124 | 
125 |       - name: Log in to Container Registry
126 |         uses: docker/login-action@v3
127 |         with:
128 |           registry: ghcr.io
129 |           username: ${{ github.actor }}
130 |           password: ${{ secrets.GITHUB_TOKEN }}
131 | 
132 |       - name: Test pull distil-large-v3-en image
133 |         run: |
134 |           cd /tmp/digests
135 |           digest=$(cat linux-*)
136 |           echo "Testing pull of distil-large-v3-en for platform ${{ matrix.platform }}"
137 |           docker pull --platform ${{ matrix.platform }} ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest
138 |           echo "Successfully pulled distil-large-v3-en image for ${{ matrix.platform }}"
139 | 
140 |   # Merge all platform builds into manifest list
141 |   merge_distil:
142 |     runs-on: ubuntu-latest
143 |     needs: test_distil
144 |     steps:
145 |       - name: Checkout
146 |         uses: actions/checkout@v4
147 | 
148 |       - name: Download digests
149 |         uses: actions/download-artifact@v4
150 |         with:
151 |           path: /tmp/digests
152 |           pattern: digests-distil-*
153 |           merge-multiple: true
154 | 
155 |       - name: Setup docker
156 |         id: setup
157 |         uses: ./.github/workflows/docker-reused-steps
158 |         with:
159 |           tag: distil-large-v3-en
160 | 
161 |       - name: Create GHCR manifest list
162 |         run: |
163 |           echo "Creating manifest list for distil-large-v3-en..."
164 |           cd /tmp/digests
165 |           echo "Files in /tmp/digests:"
166 |           ls -la
167 |           echo "Building digest references..."
168 |           digest_refs=""
169 |           for file in linux-*; do
170 |             if [[ -f "$file" ]]; then
171 |               digest=$(cat "$file")
172 |               echo "Processing $file with digest: $digest"
173 |               digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest "
174 |             fi
175 |           done
176 |           echo "Final digest references: $digest_refs"
177 |           docker buildx imagetools create \
178 |             $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \
179 |             $digest_refs
180 | 
181 |       - name: Get final manifest digest
182 |         id: get_digest
183 |         run: |
184 |           # Get the digest of the manifest list we just created
185 |           echo "Available GHCR tags:"
186 |           jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON
187 |           IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1)
188 |           echo "Using tag for digest lookup: $IMAGE_TAG"
189 |           # Get the raw digest output and extract only the sha256 part
190 |           DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}")
191 |           echo "Raw digest output: $DIGEST_RAW"
192 |           # Extract only the digest hash, removing any MediaType or other formatting
193 |           DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1)
194 |           echo "Extracted digest: $DIGEST"
195 |           echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT
196 | 
197 |       - name: Attest GHCR image (distil-large-v3-en)
198 |         uses: actions/attest-build-provenance@v2
199 |         with:
200 |           subject-name: ghcr.io/${{ github.repository_owner }}/whisperx
201 |           subject-digest: ${{ steps.get_digest.outputs.manifest_digest }}
202 | 


--------------------------------------------------------------------------------
/.github/workflows/04-build-matrix-images.yml:
--------------------------------------------------------------------------------
  1 | # Build matrix images workflow
  2 | # This workflow builds the full matrix of Docker images and runs tests
  3 | name: "04-build-matrix-images"
  4 | 
  5 | on:
  6 |   workflow_run:
  7 |     workflows: ["02-build-model-cache"]
  8 |     types: [completed]
  9 | 
 10 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job.
 11 | permissions:
 12 |   contents: read
 13 |   packages: write
 14 |   id-token: write
 15 |   attestations: write
 16 | 
 17 | env:
 18 |   REGISTRY_IMAGE: ${{ github.repository_owner }}/whisperx
 19 | 
 20 | # The following languages are excluded because these transcribe model are too large to build on the GitHub Actions
 21 | # https://github.com/jim60105/docker-whisperX/actions/runs/8405597972
 22 | # - no
 23 | # - nn
 24 | 
 25 | jobs:
 26 |   # Build matrix images for tiny and base models
 27 |   build_matrix_1:
 28 |     runs-on: ${{ matrix.runner }}
 29 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
 30 |     strategy:
 31 |       fail-fast: false
 32 |       matrix:
 33 |         lang:
 34 |           - en
 35 |           - fr
 36 |           - de
 37 |           - es
 38 |           - it
 39 |           - ja
 40 |           - zh
 41 |           - nl
 42 |           - uk
 43 |           - pt
 44 |           - ar
 45 |           - cs
 46 |           - ru
 47 |           - pl
 48 |           - hu
 49 |           - fi
 50 |           - fa
 51 |           - el
 52 |           - tr
 53 |           - da
 54 |           - he
 55 |           - vi
 56 |           - ko
 57 |           - ur
 58 |           - te
 59 |           - hi
 60 |           - ca
 61 |           - ml
 62 |           - sk
 63 |           - sl
 64 |           - hr
 65 |           - ro
 66 |           - eu
 67 |           - gl
 68 |           - ka
 69 |           - lv
 70 |           - tl
 71 |         model:
 72 |           - tiny
 73 |           - base
 74 |         platform:
 75 |           - linux/amd64
 76 |           - linux/arm64
 77 |         include:
 78 |           - platform: linux/amd64
 79 |             runner: ubuntu-latest
 80 |           - platform: linux/arm64
 81 |             runner: ubuntu-24.04-arm
 82 |     outputs:
 83 |       digest: ${{ steps.build.outputs.digest }}
 84 |     steps:
 85 |       - name: Checkout
 86 |         uses: actions/checkout@v4
 87 |         with:
 88 |           submodules: true
 89 | 
 90 |       - name: Setup docker
 91 |         id: setup
 92 |         uses: ./.github/workflows/docker-reused-steps
 93 |         with:
 94 |           tag: ${{ matrix.model }}-${{ matrix.lang }}
 95 | 
 96 |       - name: Get base image references
 97 |         id: get-refs
 98 |         run: |
 99 |           # Get the commit SHA from the triggering workflow
100 |           COMMIT_SHA="${{ github.event.workflow_run.head_sha }}"
101 |           SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7)
102 |           
103 |           # Set image references
104 |           NO_MODEL_IMAGE="ghcr.io/jim60105/whisperx:no_model-$SHORT_SHA"
105 |           CACHE_IMAGE="ghcr.io/jim60105/whisperx:cache-${{ matrix.model }}-$SHORT_SHA"
106 |           
107 |           echo "no_model_image=$NO_MODEL_IMAGE" >> $GITHUB_OUTPUT
108 |           echo "cache_image=$CACHE_IMAGE" >> $GITHUB_OUTPUT
109 | 
110 |       - name: Build and push by digest (${{ matrix.model }}-${{ matrix.lang }})
111 |         uses: docker/build-push-action@v6
112 |         id: build
113 |         with:
114 |           context: .
115 |           file: ./Dockerfile
116 |           target: final
117 |           platforms: ${{ matrix.platform }}
118 |           labels: ${{ steps.setup.outputs.labels }}
119 |           build-args: |
120 |             WHISPER_MODEL=${{ matrix.model }}
121 |             LANG=${{ matrix.lang }}
122 |             LOAD_WHISPER_STAGE=${{ steps.get-refs.outputs.cache_image }}
123 |             NO_MODEL_STAGE=${{ steps.get-refs.outputs.no_model_image }}
124 |             VERSION=${{ github.event.workflow_run.head_sha }}
125 |             RELEASE=${{ github.event.workflow_run.run_number }}
126 |           outputs: |
127 |             type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true
128 |           sbom: true
129 |           provenance: true
130 | 
131 |       - name: Export digest
132 |         run: |
133 |           mkdir -p /tmp/digests
134 |           digest="${{ steps.build.outputs.digest }}"
135 |           platform="${{ matrix.platform }}"
136 |           platform_safe="${platform//\//-}"
137 |           echo "${digest#sha256:}" > "/tmp/digests/${platform_safe}"
138 | 
139 |       - name: Upload digest
140 |         uses: actions/upload-artifact@v4
141 |         with:
142 |           name: digests-${{ matrix.model }}-${{ matrix.lang }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }}
143 |           path: /tmp/digests/*
144 |           if-no-files-found: error
145 |           retention-days: 1
146 | 
147 |   # Build matrix images for small, medium and large-v3 models
148 |   build_matrix_2:
149 |     runs-on: ${{ matrix.runner }}
150 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
151 |     strategy:
152 |       fail-fast: false
153 |       matrix:
154 |         lang:
155 |           - en
156 |           - fr
157 |           - de
158 |           - es
159 |           - it
160 |           - ja
161 |           - zh
162 |           - nl
163 |           - uk
164 |           - pt
165 |           - ar
166 |           - cs
167 |           - ru
168 |           - pl
169 |           - hu
170 |           - fi
171 |           - fa
172 |           - el
173 |           - tr
174 |           - da
175 |           - he
176 |           - vi
177 |           - ko
178 |           - ur
179 |           - te
180 |           - hi
181 |           - ca
182 |           - ml
183 |           - sk
184 |           - sl
185 |           - hr
186 |           - ro
187 |           - eu
188 |           - gl
189 |           - ka
190 |           - lv
191 |           - tl
192 |         model:
193 |           - small
194 |           - medium
195 |           - large-v3
196 |         platform:
197 |           - linux/amd64
198 |           - linux/arm64
199 |         include:
200 |           - platform: linux/amd64
201 |             runner: ubuntu-latest
202 |           - platform: linux/arm64
203 |             runner: ubuntu-24.04-arm
204 |     outputs:
205 |       digest: ${{ steps.build.outputs.digest }}
206 |     steps:
207 |       - name: Checkout
208 |         uses: actions/checkout@v4
209 |         with:
210 |           submodules: true
211 | 
212 |       - name: Setup docker
213 |         id: setup
214 |         uses: ./.github/workflows/docker-reused-steps
215 |         with:
216 |           tag: ${{ matrix.model }}-${{ matrix.lang }}
217 | 
218 |       - name: Get base image references
219 |         id: get-refs
220 |         run: |
221 |           # Get the commit SHA from the triggering workflow
222 |           COMMIT_SHA="${{ github.event.workflow_run.head_sha }}"
223 |           SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7)
224 |           
225 |           # Set image references
226 |           NO_MODEL_IMAGE="ghcr.io/jim60105/whisperx:no_model-$SHORT_SHA"
227 |           CACHE_IMAGE="ghcr.io/jim60105/whisperx:cache-${{ matrix.model }}-$SHORT_SHA"
228 |           
229 |           echo "no_model_image=$NO_MODEL_IMAGE" >> $GITHUB_OUTPUT
230 |           echo "cache_image=$CACHE_IMAGE" >> $GITHUB_OUTPUT
231 | 
232 |       - name: Build and push by digest (${{ matrix.model }}-${{ matrix.lang }})
233 |         uses: docker/build-push-action@v6
234 |         id: build
235 |         with:
236 |           context: .
237 |           file: ./Dockerfile
238 |           target: final
239 |           platforms: ${{ matrix.platform }}
240 |           labels: ${{ steps.setup.outputs.labels }}
241 |           build-args: |
242 |             WHISPER_MODEL=${{ matrix.model }}
243 |             LANG=${{ matrix.lang }}
244 |             LOAD_WHISPER_STAGE=${{ steps.get-refs.outputs.cache_image }}
245 |             NO_MODEL_STAGE=${{ steps.get-refs.outputs.no_model_image }}
246 |             VERSION=${{ github.event.workflow_run.head_sha }}
247 |             RELEASE=${{ github.event.workflow_run.run_number }}
248 |           outputs: |
249 |             type=image,name=ghcr.io/${{ env.REGISTRY_IMAGE }},push-by-digest=true,name-canonical=true,push=true
250 |           sbom: true
251 |           provenance: true
252 | 
253 |       - name: Export digest
254 |         run: |
255 |           mkdir -p /tmp/digests
256 |           digest="${{ steps.build.outputs.digest }}"
257 |           platform="${{ matrix.platform }}"
258 |           platform_safe="${platform//\//-}"
259 |           echo "${digest#sha256:}" > "/tmp/digests/${platform_safe}"
260 | 
261 |       - name: Upload digest
262 |         uses: actions/upload-artifact@v4
263 |         with:
264 |           name: digests-${{ matrix.model }}-${{ matrix.lang }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }}
265 |           path: /tmp/digests/*
266 |           if-no-files-found: error
267 |           retention-days: 1
268 | 
269 |   # Test matrix images builds by pulling from registry (selective testing)
270 |   test_matrix:
271 |     runs-on: ${{ matrix.runner }}
272 |     needs: [build_matrix_1, build_matrix_2]
273 |     strategy:
274 |       fail-fast: false
275 |       matrix:
276 |         # Only test a subset to save resources - focus on large-v3-zh for compatibility
277 |         include:
278 |           - lang: zh
279 |             model: large-v3
280 |             platform: linux/amd64
281 |             runner: ubuntu-latest
282 |           - lang: en
283 |             model: tiny
284 |             platform: linux/amd64
285 |             runner: ubuntu-latest
286 |     steps:
287 |       - name: Checkout
288 |         uses: actions/checkout@v4
289 | 
290 |       - name: Download digests
291 |         uses: actions/download-artifact@v4
292 |         with:
293 |           name: digests-${{ matrix.model }}-${{ matrix.lang }}-${{ matrix.platform == 'linux/amd64' && 'linux-amd64' || 'linux-arm64' }}
294 |           path: /tmp/digests
295 | 
296 |       - name: Setup Docker Buildx
297 |         uses: docker/setup-buildx-action@v3
298 | 
299 |       - name: Log in to Container Registry
300 |         uses: docker/login-action@v3
301 |         with:
302 |           registry: ghcr.io
303 |           username: ${{ github.actor }}
304 |           password: ${{ secrets.GITHUB_TOKEN }}
305 | 
306 |       - name: Test pull ${{ matrix.model }}-${{ matrix.lang }} image
307 |         run: |
308 |           cd /tmp/digests
309 |           digest=$(cat linux-*)
310 |           echo "Testing pull of ${{ matrix.model }}-${{ matrix.lang }} for platform ${{ matrix.platform }}"
311 |           docker pull --platform ${{ matrix.platform }} ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest
312 |           echo "Successfully pulled ${{ matrix.model }}-${{ matrix.lang }} image for ${{ matrix.platform }}"
313 | 
314 |   # Merge all platform builds into manifest lists for each model-lang combination
315 |   merge_matrix:
316 |     runs-on: ubuntu-latest
317 |     needs: test_matrix
318 |     strategy:
319 |       fail-fast: false
320 |       matrix:
321 |         lang:
322 |           - en
323 |           - fr
324 |           - de
325 |           - es
326 |           - it
327 |           - ja
328 |           - zh
329 |           - nl
330 |           - uk
331 |           - pt
332 |           - ar
333 |           - cs
334 |           - ru
335 |           - pl
336 |           - hu
337 |           - fi
338 |           - fa
339 |           - el
340 |           - tr
341 |           - da
342 |           - he
343 |           - vi
344 |           - ko
345 |           - ur
346 |           - te
347 |           - hi
348 |           - ca
349 |           - ml
350 |           - sk
351 |           - sl
352 |           - hr
353 |           - ro
354 |           - eu
355 |           - gl
356 |           - ka
357 |           - lv
358 |           - tl
359 |         model:
360 |           - tiny
361 |           - base
362 |           - small
363 |           - medium
364 |           - large-v3
365 |     steps:
366 |       - name: Checkout
367 |         uses: actions/checkout@v4
368 |         with:
369 |           submodules: true
370 | 
371 |       - name: Download digests
372 |         uses: actions/download-artifact@v4
373 |         with:
374 |           path: /tmp/digests
375 |           pattern: digests-${{ matrix.model }}-${{ matrix.lang }}-*
376 |           merge-multiple: true
377 | 
378 |       - name: Setup docker
379 |         id: setup
380 |         uses: ./.github/workflows/docker-reused-steps
381 |         with:
382 |           tag: ${{ matrix.model }}-${{ matrix.lang }}
383 | 
384 |       - name: Create GHCR manifest list
385 |         run: |
386 |           echo "Creating manifest list for ${{ matrix.model }}-${{ matrix.lang }}..."
387 |           cd /tmp/digests
388 |           echo "Files in /tmp/digests:"
389 |           ls -la
390 |           echo "Building digest references..."
391 |           digest_refs=""
392 |           for file in linux-*; do
393 |             if [[ -f "$file" ]]; then
394 |               digest=$(cat "$file")
395 |               echo "Processing $file with digest: $digest"
396 |               digest_refs+="ghcr.io/${{ env.REGISTRY_IMAGE }}@sha256:$digest "
397 |             fi
398 |           done
399 |           echo "Final digest references: $digest_refs"
400 |           docker buildx imagetools create \
401 |             $(jq -cr '.tags | map("-t " + .) | join(" ")' <<<$DOCKER_METADATA_OUTPUT_JSON) \
402 |             $digest_refs
403 | 
404 |       - name: Get final manifest digest
405 |         id: get_digest
406 |         run: |
407 |           # Get the digest of the manifest list we just created
408 |           echo "Available GHCR tags:"
409 |           jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON
410 |           IMAGE_TAG=$(jq -cr '.tags[] | select(startswith("ghcr.io/"))' <<<$DOCKER_METADATA_OUTPUT_JSON | head -n1)
411 |           echo "Using tag for digest lookup: $IMAGE_TAG"
412 |           # Get the raw digest output and extract only the sha256 part
413 |           DIGEST_RAW=$(docker buildx imagetools inspect "$IMAGE_TAG" --format "{{.Manifest.Digest}}")
414 |           echo "Raw digest output: $DIGEST_RAW"
415 |           # Extract only the digest hash, removing any MediaType or other formatting
416 |           DIGEST=$(echo "$DIGEST_RAW" | grep -oE 'sha256:[a-f0-9]{64}' | head -n1)
417 |           echo "Extracted digest: $DIGEST"
418 |           echo "manifest_digest=$DIGEST" >> $GITHUB_OUTPUT
419 | 
420 |       - name: Attest GHCR image (${{ matrix.model }}-${{ matrix.lang }})
421 |         uses: actions/attest-build-provenance@v2
422 |         with:
423 |           subject-name: ghcr.io/${{ github.repository_owner }}/whisperx
424 |           subject-digest: ${{ steps.get_digest.outputs.manifest_digest }}
425 | 
426 |   # Comprehensive test for medium-zh after merge
427 |   test-medium-zh:
428 |     name: Test medium-zh docker image
429 |     runs-on: ubuntu-latest
430 |     if: ${{ github.event.workflow_run.conclusion == 'success' }}
431 |     needs: merge_matrix
432 |     steps:
433 |       # We require additional space due to the large size of our image. (~10GB)
434 |       - name: Free Disk Space (Ubuntu)
435 |         uses: jlumbroso/free-disk-space@main
436 |         with:
437 |           tool-cache: true
438 |           android: true
439 |           dotnet: true
440 |           haskell: true
441 |           large-packages: true
442 |           docker-images: true
443 |           swap-storage: false
444 | 
445 |       - name: Checkout
446 |         uses: actions/checkout@v4
447 |         with:
448 |           sparse-checkout: |
449 |             .github/workflows/test/**
450 |           sparse-checkout-cone-mode: false
451 | 
452 |       - name: Get image reference
453 |         id: get-ref
454 |         run: |
455 |           # Get the commit SHA from the triggering workflow
456 |           COMMIT_SHA="${{ github.event.workflow_run.head_sha }}"
457 |           SHORT_SHA=$(echo $COMMIT_SHA | cut -c 1-7)
458 |           
459 |           # Set image reference
460 |           IMAGE="ghcr.io/jim60105/whisperx:medium-zh-$SHORT_SHA"
461 |           echo "image=$IMAGE" >> $GITHUB_OUTPUT
462 | 
463 |       - name: Test medium-zh docker image
464 |         run: |
465 |           docker run --group-add 0 -v ".:/app" ${{ steps.get-ref.outputs.image }} -- --device cpu --compute_type int8 --output_format srt .github/workflows/test/zh.webm;
466 |           if [ ! -f zh.srt ]; then
467 |             echo "The zh.srt file does not exist"
468 |             exit 1
469 |           fi
470 |           echo "cat zh.srt:";
471 |           cat zh.srt;
472 |           if ! grep -qi -e '充满' -e '充滿' zh.srt; then
473 |             echo "The zh.srt file does not contain the word '充满' or '充滿'"
474 |             exit 1
475 |           fi
476 |           echo "Test passed."
477 | 


--------------------------------------------------------------------------------
/.github/workflows/README.md:
--------------------------------------------------------------------------------
  1 | # Docker Workflow Architecture
  2 | 
  3 | This document describes the distributed multi-platform CI/CD workflow architecture for building Docker images in the docker-whisperX project.
  4 | 
  5 | ## Overview
  6 | 
  7 | The workflow architecture has been completely refactored to implement **distributed multi-platform builds** with massive parallel processing capabilities. The original single `docker_publish.yml` workflow has been split into 5 specialized workflow files, each supporting native multi-platform builds (linux/amd64 + linux/arm64) to improve maintainability, parallel processing efficiency, and fault isolation for building 370+ platform-specific Docker images (10GB each).
  8 | 
  9 | ## Workflow Chain
 10 | 
 11 | ```mermaid
 12 | graph TD
 13 |     A[docker_publish.yml] -->|triggers| B[01-build-base-images.yml]
 14 |     B -->|build_base| C[02-build-model-cache.yml]
 15 |     B -->|merge_base| C
 16 |     C -->|build_model| D[03-build-distil-en.yml]
 17 |     C -->|merge_model| D
 18 |     C -->|build_model| E[04-build-matrix-images.yml]
 19 |     C -->|merge_model| E
 20 |     E -->|build_matrix| F[test-matrix]
 21 |     E -->|merge_matrix| G[test-large-v3-zh]
 22 |     
 23 |     subgraph "Multi-Platform Jobs"
 24 |         B1[build_base_amd64]
 25 |         B2[build_base_arm64]
 26 |         C1[build_model_amd64]
 27 |         C2[build_model_arm64]
 28 |         E1[build_matrix_amd64_370jobs]
 29 |         E2[build_matrix_arm64_370jobs]
 30 |     end
 31 |     
 32 |     B --> B1
 33 |     B --> B2
 34 |     C --> C1
 35 |     C --> C2
 36 |     E --> E1
 37 |     E --> E2
 38 | ```
 39 | 
 40 | ## Workflow Files
 41 | 
 42 | ### 1. `docker_publish.yml` (Entry Point)
 43 | - **Purpose**: Main trigger and coordination
 44 | - **Triggers**: Push to master, tags, manual dispatch
 45 | - **Actions**: Logs build chain initiation info
 46 | - **Next**: Triggers `01-build-base-images.yml`
 47 | 
 48 | ### 2. `01-build-base-images.yml`
 49 | - **Purpose**: Build base images with distributed multi-platform architecture
 50 | - **Triggered by**: `docker_publish.yml`
 51 | - **Architecture**: Three-stage distributed build process
 52 | - **Jobs**:
 53 |   - `build_base`: 6 parallel platform-specific builds (3 images × 2 platforms)
 54 |   - `test_base`: Selective testing for key images
 55 |   - `merge_base`: Create manifest lists for multi-platform images
 56 | - **Platforms**: linux/amd64 (ubuntu-latest) + linux/arm64 (ubuntu-24.04-arm)
 57 | - **Outputs**: Base image digests and manifest lists for downstream workflows
 58 | - **Next**: Triggers `02-build-model-cache.yml`
 59 | 
 60 | ### 3. `02-build-model-cache.yml`
 61 | - **Purpose**: Build Whisper model cache images with distributed multi-platform architecture
 62 | - **Triggered by**: `01-build-base-images.yml`
 63 | - **Architecture**: Three-stage distributed build process
 64 | - **Jobs**:
 65 |   - `build_model`: 10 parallel platform-specific builds (5 models × 2 platforms)
 66 |   - `test_model`: Selective testing for key model combinations
 67 |   - `merge_model`: Create manifest lists for multi-platform model images
 68 | - **Models**: tiny, base, small, medium, large-v3
 69 | - **Platforms**: Native builds on linux/amd64 + linux/arm64
 70 | - **Dependencies**: Uses base images from previous workflow
 71 | - **Next**: Triggers both `03-build-distil-en.yml` and `04-build-matrix-images.yml` in parallel
 72 | 
 73 | ### 4. `03-build-distil-en.yml`
 74 | - **Purpose**: Build distil-large-v3-en specialized image with distributed multi-platform architecture
 75 | - **Triggered by**: `02-build-model-cache.yml`
 76 | - **Architecture**: Three-stage distributed build process
 77 | - **Jobs**:
 78 |   - `build_distil`: 2 parallel platform-specific builds (1 model × 2 platforms)
 79 |   - `test_distil`: Comprehensive testing for English optimization
 80 |   - `merge_distil`: Create manifest list for multi-platform distil image
 81 | - **Platforms**: Native builds on linux/amd64 + linux/arm64
 82 | - **Specialization**: English-optimized distil model with enhanced performance
 83 | - **Parallel with**: `04-build-matrix-images.yml`
 84 | 
 85 | ### 5. `04-build-matrix-images.yml`
 86 | - **Purpose**: Build full image matrix with massive-scale distributed multi-platform architecture
 87 | - **Triggered by**: `02-build-model-cache.yml`
 88 | - **Architecture**: Three-stage distributed build process at unprecedented scale
 89 | - **Jobs**:
 90 |   - `build_matrix`: **370 parallel platform-specific builds** (37 languages × 5 models × 2 platforms)
 91 |   - `test_matrix`: Selective testing strategy for key language combinations
 92 |   - `merge_matrix`: 185 manifest list creations for multi-platform matrix images
 93 |   - `test-large-v3-zh`: Comprehensive Chinese image functionality testing
 94 | - **Scale**: Industry-leading 370 parallel Docker builds
 95 | - **Platforms**: Native builds on linux/amd64 + linux/arm64
 96 | - **Languages**: 37 supported languages with alignment models
 97 | - **Models**: tiny, base, small, medium, large-v3
 98 | - **Innovation**: Selective testing strategy to manage massive resource usage
 99 | - **Parallel with**: `03-build-distil-en.yml`
100 | 
101 | ## Key Features
102 | 
103 | ### Distributed Multi-Platform Architecture
104 | - **Native Platform Builds**: linux/amd64 (ubuntu-latest) + linux/arm64 (ubuntu-24.04-arm)
105 | - **QEMU Elimination**: Complete removal of emulation overhead for 50-70% build time reduction
106 | - **Digest-based Builds**: Push-by-digest strategy with artifact-based manifest list creation
107 | - **Three-stage Process**: build → test → merge pattern for optimal resource utilization
108 | 
109 | ### Massive Scale Parallel Processing
110 | - **370 Parallel Jobs**: Industry-leading scale for Docker matrix builds
111 | - **Intelligent Scheduling**: Strategic max-parallel settings to prevent GitHub Actions overload
112 | - **Selective Testing**: Resource-optimized testing strategy for large-scale matrices
113 | - **Error Isolation**: fail-fast: false allows partial failures without impacting other builds
114 | 
115 | ### Advanced Caching Strategy
116 | - **Layered Cache Naming**: cache-base, cache-model-{model}, cache-matrix-{model}-{lang}
117 | - **Platform-specific Caching**: Separate cache spaces for amd64 and arm64
118 | - **Conflict Avoidance**: Sophisticated naming prevents cache collisions
119 | - **Build Time Optimization**: Maximized cache reuse across workflow stages
120 | 
121 | ### Workflow Chaining
122 | - Uses `workflow_run` events for dependency management
123 | - Conditional execution based on previous workflow success
124 | - Proper error handling and failure isolation
125 | 
126 | ### Image Reference Management
127 | - Base images are referenced by commit SHA tags
128 | - Consistent naming convention across workflows
129 | - Proper digest passing between dependent workflows
130 | 
131 | ## Benefits
132 | 
133 | ### Performance Revolution
134 | - **50-70% Build Time Reduction**: Native multi-platform builds eliminate QEMU emulation overhead
135 | - **Massive Parallelization**: 370 simultaneous builds vs. previous sequential execution
136 | - **Resource Maximization**: Full utilization of GitHub Actions parallel job capacity
137 | - **Network Optimization**: Layered caching reduces redundant downloads
138 | 
139 | ### Maintainability
140 | - **Modular Design**: Each workflow focuses on specific build stages
141 | - **Independent Debugging**: Can test and fix specific stages in isolation
142 | - **Code Reuse**: Shared logic through reusable actions
143 | - **Platform Isolation**: Separate troubleshooting for amd64 vs arm64 issues
144 | 
145 | ### Scalability & Resource Management
146 | - **Distributed Architecture**: Horizontal scaling across multiple workflow files
147 | - **Intelligent Resource Usage**: Selective testing prevents resource exhaustion
148 | - **Future-proof Design**: Easy addition of new languages, models, or platforms
149 | - **Cost Optimization**: Significant reduction in GitHub Actions minutes usage
150 | 
151 | ### Fault Tolerance
152 | - **Platform Independence**: Single platform failure doesn't affect other platform
153 | - **Stage Isolation**: Build failures don't immediately impact other independent stages
154 | - **Graceful Degradation**: Partial matrix success allows useful outputs
155 | - **Retry Mechanisms**: Built-in GitHub Actions retry for transient failures
156 | 
157 | ## Architecture Details
158 | 
159 | ### Three-Stage Build Process
160 | 
161 | Each workflow implements a consistent three-stage pattern:
162 | 
163 | #### Stage 1: Distributed Build
164 | - **Platform Matrix**: Each image built natively on both linux/amd64 and linux/arm64
165 | - **Parallel Execution**: Maximum utilization of GitHub Actions runner capacity
166 | - **Digest Artifacts**: Each build produces platform-specific image digests
167 | - **Cache Optimization**: Platform-specific caching for optimal performance
168 | 
169 | #### Stage 2: Selective Testing
170 | - **Strategic Selection**: Test key combinations to validate functionality without resource exhaustion
171 | - **Platform Coverage**: Ensure both architectures work correctly
172 | - **Quality Gates**: Functional testing before manifest creation
173 | - **Resource Management**: Balanced testing approach for large matrices
174 | 
175 | #### Stage 3: Manifest Merge
176 | - **Multi-platform Images**: Combine platform-specific digests into manifest lists
177 | - **Registry Optimization**: Single image tag supports both architectures
178 | - **Backward Compatibility**: Maintains existing image naming conventions
179 | - **Deployment Ready**: Images ready for multi-architecture deployment
180 | 
181 | ### Scale Breakdown
182 | 
183 | | Workflow   | Images | Platforms | Platform Builds | Test Jobs | Merge Jobs | Other | **Total Jobs** |
184 | |------------|--------|-----------|----------------|-----------|------------|-------|---------------|
185 | | 01-base    | 2      | 2         | 4              | 2         | 2          | 0     | 8             |
186 | | 02-model   | 6      | 2         | 12             | 12        | 6          | 0     | 30            |
187 | | 03-distil  | 1      | 2         | 2              | 2         | 1          | 0     | 5             |
188 | | 04-matrix  | 185    | 2         | 370            | 2         | 185        | 1     | 558           |
189 | | **Total**  | **194**| **2**     | **388**        | **18**    | **194**    | **1** | **601**       |
190 | 
191 | ### Resource Optimization Strategies
192 | 
193 | #### Caching Hierarchy
194 | ```
195 | cache-base-{image}          # Base image layer cache
196 | cache-model-{model}         # Model-specific dependency cache  
197 | cache-matrix-{model}-{lang} # Language-specific alignment cache
198 | ```
199 | 
200 | #### Testing Strategy
201 | - **Base Images**: Test critical base functionality
202 | - **Model Cache**: Test model loading and initialization
203 | - **Distil English**: Comprehensive English language testing
204 | - **Matrix Images**: Selective testing of key language combinations (large-v3-zh, tiny-en)
205 | 
206 | #### Parallel Job Management
207 | - **Max-parallel Settings**: Prevent GitHub Actions infrastructure overload
208 | - **Job Dependencies**: Proper sequencing without blocking parallel execution
209 | - **Artifact Lifecycle**: Short retention periods for intermediate build artifacts
210 | - **Error Handling**: Graceful handling of partial failures in large matrices
211 | 
212 | ## Migration Notes
213 | 
214 | ### From Original Workflow
215 | - All job functionality preserved and enhanced with multi-platform support
216 | - Same Docker build arguments and caching strategies, now optimized for distributed builds
217 | - Enhanced test procedures with platform-specific validation
218 | - Same output artifacts and attestations, now with manifest list support
219 | - **Major Enhancement**: Native ARM64 builds replace QEMU emulation
220 | 
221 | ### Architecture Changes
222 | - **Build Strategy**: From single-job sequential to distributed parallel builds
223 | - **Platform Support**: From QEMU emulation to native multi-platform builds  
224 | - **Scale**: From 175 images to 370 platform-specific builds + 185 manifest lists
225 | - **Testing**: From sequential testing to selective parallel testing strategy
226 | - **Caching**: From simple caching to sophisticated layered cache hierarchy
227 | 
228 | ### Breaking Changes
229 | - Workflow names changed (affects status badges and external references)
230 | - Build timing significantly improved due to parallel execution
231 | - Different workflow run IDs for different stages and platforms
232 | - **Performance Impact**: 50-70% faster build times expected
233 | 
234 | ### Rollback Strategy
235 | - Original workflow backed up as `docker_publish.yml.backup`
236 | - Can be restored by renaming backup file
237 | - All new workflow files can be safely deleted for rollback
238 | - **Note**: Rollback loses multi-platform build benefits
239 | 
240 | ## Monitoring and Troubleshooting
241 | 
242 | ### Status Monitoring
243 | - **Multi-level Tracking**: Monitor workflow → job → platform level status
244 | - **Platform-specific Visibility**: Separate status for amd64 and arm64 builds
245 | - **Resource Usage Tracking**: Monitor parallel job utilization and build durations
246 | - **Cache Performance**: Track cache hit rates and build acceleration
247 | 
248 | ### Performance Metrics
249 | - **Build Time Comparison**: Monitor time reduction vs. previous architecture
250 | - **Parallel Efficiency**: Track simultaneous job execution and bottlenecks  
251 | - **Resource Utilization**: GitHub Actions minute usage and cost optimization
252 | - **Error Rates**: Platform-specific failure analysis and trends
253 | 
254 | ### Common Issues
255 | 
256 | #### Multi-Platform Specific
257 | - **Platform Build Failures**: Check platform-specific runner availability (ubuntu-24.04-arm)
258 | - **Digest Artifact Issues**: Verify artifact upload/download between build stages
259 | - **Manifest Creation Failures**: Check digest availability and format compatibility
260 | - **Cache Conflicts**: Verify cache key uniqueness across platforms and workflows
261 | 
262 | #### Scale-Related Issues  
263 | - **GitHub Actions Limits**: Monitor parallel job usage against account quotas
264 | - **Resource Exhaustion**: Watch for runner capacity issues during peak usage
265 | - **Network Bottlenecks**: Monitor artifact transfer times for large matrices
266 | - **Storage Limitations**: Manage artifact retention and storage usage
267 | 
268 | #### Legacy Issues
269 | - **Image Reference Failures**: Check SHA tag generation and image availability
270 | - **Workflow Chaining**: Verify `workflow_run` triggers are correctly configured
271 | - **Permission Issues**: Ensure all workflows have proper GITHUB_TOKEN permissions
272 | 
273 | ### Debugging Strategies
274 | 
275 | #### Platform-specific Debugging
276 | - **Single Platform Testing**: Temporarily disable one platform to isolate issues
277 | - **Selective Matrix Testing**: Use workflow_dispatch with reduced matrix for debugging
278 | - **Cache Isolation**: Clear platform-specific caches to eliminate cache-related issues
279 | 
280 | #### Large-scale Debugging
281 | - **Staged Rollouts**: Test changes on smaller matrices before full deployment
282 | - **Parallel Limit Adjustment**: Reduce max-parallel settings during debugging
283 | - **Selective Re-runs**: Re-execute only failed combinations rather than entire workflows
284 | 
285 | #### Tools and Techniques
286 | - **Workflow Dispatch**: Manual triggering with custom parameters for testing
287 | - **Debug Logging**: Enhanced logging for multi-platform build processes
288 | - **Artifact Inspection**: Download and analyze build artifacts for troubleshooting
289 | - **Performance Profiling**: Use GitHub Actions built-in timing and resource metrics
290 | 


--------------------------------------------------------------------------------
/.github/workflows/auto_merge.yml:
--------------------------------------------------------------------------------
 1 | name: Automatically Approve / Merge PR
 2 | on:
 3 |   pull_request:
 4 | 
 5 | jobs:
 6 |   Auto_Approve:
 7 |     name: Auto Approve
 8 |     runs-on: ubuntu-latest
 9 |     if: github.actor == 'jim60105' && github.repository == 'jim60105/docker-whisperX'
10 |     steps:
11 |       - name: Auto approve
12 |         uses: hmarr/auto-approve-action@v3
13 |         with:
14 |           github-token: "${{ secrets.GITHUB_TOKEN }}"
15 | 
16 |   Auto_Merge_PR:
17 |     name: Auto Merge PR
18 |     runs-on: ubuntu-latest
19 |     needs: "Auto_Approve"
20 |     steps:
21 |       - name: Git Auto Merge
22 |         uses: plm9606/automerge_actions@1.2.3
23 |         with:
24 |           # Use PAT to trigger another workflow
25 |           github-token: ${{ secrets.CR_PAT }}
26 |           merge-method: squash
27 |           reviewers-number: 0
28 |           label-name: "automerge"
29 | 
30 |       - name: Remove label
31 |         if: ${{ success() }}
32 |         uses: buildsville/add-remove-label@v1
33 |         with:
34 |           token: ${{secrets.GITHUB_TOKEN}}
35 |           label: "automerge"
36 |           type: remove
37 | 


--------------------------------------------------------------------------------
/.github/workflows/docker-reused-steps/action.yml:
--------------------------------------------------------------------------------
 1 | name: Reusable docker workflow
 2 | 
 3 | description: Reusable docker workflow.
 4 | 
 5 | inputs:
 6 |   tag:
 7 |     description: "A tag to use for the image"
 8 |     default: "no_model"
 9 | 
10 | outputs:
11 |   tags:
12 |     description: "tags"
13 |     value: ${{ steps.meta.outputs.tags }}
14 |   labels:
15 |     description: "labels"
16 |     value: ${{ steps.meta.outputs.labels }}
17 | 
18 | runs:
19 |   using: composite
20 |   steps:
21 |     # We require additional space due to the large size of our image. (~10GB)
22 |     - name: Free Disk Space (Ubuntu)
23 |       uses: jlumbroso/free-disk-space@main
24 |       with:
25 |         tool-cache: true
26 |         android: true
27 |         dotnet: true
28 |         haskell: true
29 |         large-packages: true
30 |         docker-images: true
31 |         swap-storage: true
32 | 
33 |     - name: Docker meta:${{ inputs.tag }}
34 |       id: meta
35 |       uses: docker/metadata-action@v5
36 |       with:
37 |         images: ghcr.io/${{ github.repository_owner }}/whisperx
38 |         tags: |
39 |           ${{ inputs.tag }}
40 |           type=sha,prefix=${{ inputs.tag }}-
41 |           type=raw,value=latest,enable=${{ inputs.tag == 'no_model' }}
42 | 
43 |     - name: Set up QEMU
44 |       uses: docker/setup-qemu-action@v3
45 |     
46 |     - name: Set up Docker Buildx
47 |       uses: docker/setup-buildx-action@v3
48 | 
49 |     # You may need to manage write and read access of GitHub Actions for repositories in the container settings.
50 |     - name: Login to GitHub Container Registry
51 |       uses: docker/login-action@v3
52 |       with:
53 |         registry: ghcr.io
54 |         username: ${{ github.repository_owner }}
55 |         password: ${{ github.token }}
56 | 


--------------------------------------------------------------------------------
/.github/workflows/docker_publish.yml:
--------------------------------------------------------------------------------
 1 | # Check this guide for more information about publishing to ghcr.io with GitHub Actions:
 2 | # https://docs.github.com/en/packages/managing-github-packages-using-github-actions-workflows/publishing-and-installing-a-package-with-github-actions#upgrading-a-workflow-that-accesses-ghcrio
 3 | 
 4 | # Main workflow trigger that initiates the Docker image build chain
 5 | # This workflow has been refactored to trigger a sequence of specialized workflows:
 6 | # 1. 01-build-base-images.yml - Builds ubi-no_model and no_model base images
 7 | # 2. 02-build-model-cache.yml - Builds model cache images (6 models)
 8 | # 3. 03-build-distil-en.yml + 04-build-matrix-images.yml - Parallel builds of final images
 9 | name: docker_publish
10 | 
11 | on:
12 |   push:
13 |     branches:
14 |       - "master"
15 |     tags:
16 |       - "*"
17 |     paths-ignore:
18 |       - "*.md"
19 | 
20 |   # Allows you to run this workflow manually from the Actions tab
21 |   workflow_dispatch:
22 | 
23 | # Sets the permissions granted to the GITHUB_TOKEN for the actions in this job.
24 | permissions:
25 |   contents: read
26 |   packages: write
27 |   id-token: write
28 |   attestations: write
29 | 
30 | jobs:
31 |   # Trigger workflow to initiate the build chain
32 |   trigger-build-chain:
33 |     runs-on: ubuntu-latest
34 |     steps:
35 |       - name: Checkout
36 |         uses: actions/checkout@v4
37 |         with:
38 |           submodules: true
39 | 
40 |       - name: Workflow chain initialization
41 |         run: |
42 |           echo "=== Docker Build Chain Initiated ==="
43 |           echo "Commit: ${{ github.sha }}"
44 |           echo "Ref: ${{ github.ref }}"
45 |           echo "Run number: ${{ github.run_number }}"
46 |           echo ""
47 |           echo "This workflow will trigger the following sequence:"
48 |           echo "1. 01-build-base-images.yml - Base images (ubi-no_model, no_model)"
49 |           echo "2. 02-build-model-cache.yml - Model cache images (6 models)"
50 |           echo "3. 03-build-distil-en.yml - Distil English model (parallel)"
51 |           echo "4. 04-build-matrix-images.yml - Full matrix build + tests (parallel)"
52 |           echo ""
53 |           echo "Total expected images: ~175+"
54 |           echo "=== Build Chain Ready ==="
55 | 


--------------------------------------------------------------------------------
/.github/workflows/scan.yml:
--------------------------------------------------------------------------------
 1 | name: scan
 2 | 
 3 | on:
 4 |   workflow_run:
 5 |     workflows: ["01-build-base-images"]
 6 |     types: [completed]
 7 | 
 8 |   # Allows you to run this workflow manually from the Actions tab
 9 |   workflow_dispatch:
10 | 
11 | jobs:
12 |   scan:
13 |     name: Scan Python official base image
14 |     runs-on: ubuntu-latest
15 |     if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }}
16 |     steps:
17 |       - name: Checkout
18 |         uses: actions/checkout@v4
19 |         with:
20 |           sparse-checkout: |
21 |             .github/workflows/scan/html.tpl
22 |           sparse-checkout-cone-mode: false
23 | 
24 |       - name: Run Trivy vulnerability scanner for Python official image
25 |         uses: aquasecurity/trivy-action@0.16.1
26 |         with:
27 |           image-ref: "ghcr.io/jim60105/whisperx:no_model"
28 |           vuln-type: "os,library"
29 |           scanners: vuln
30 |           severity: "CRITICAL,HIGH"
31 |           format: "template"
32 |           template: "@.github/workflows/scan/html.tpl"
33 |           exit-code: '1'
34 |           ignore-unfixed: true
35 |           output: "trivy-results.html"
36 | 
37 |       - name: Upload Artifact
38 |         uses: actions/upload-artifact@v4
39 |         if: always()
40 |         with:
41 |           name: trivy-results
42 |           path: trivy-results.html
43 |           retention-days: 90
44 | 


--------------------------------------------------------------------------------
/.github/workflows/scan/html.tpl:
--------------------------------------------------------------------------------
  1 | <!-- Template from https://github.com/aquasecurity/trivy/blob/main/contrib/html.tpl -->
  2 | <!DOCTYPE html>
  3 | <html>
  4 |   <head>
  5 |     <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  6 | {{- if . }}
  7 |     <style>
  8 |       * {
  9 |         font-family: Arial, Helvetica, sans-serif;
 10 |       }
 11 |       h1 {
 12 |         text-align: center;
 13 |       }
 14 |       .group-header th {
 15 |         font-size: 200%;
 16 |       }
 17 |       .sub-header th {
 18 |         font-size: 150%;
 19 |       }
 20 |       table, th, td {
 21 |         border: 1px solid black;
 22 |         border-collapse: collapse;
 23 |         white-space: nowrap;
 24 |         padding: .3em;
 25 |       }
 26 |       table {
 27 |         margin: 0 auto;
 28 |       }
 29 |       .severity {
 30 |         text-align: center;
 31 |         font-weight: bold;
 32 |         color: #fafafa;
 33 |       }
 34 |       .severity-LOW .severity { background-color: #5fbb31; }
 35 |       .severity-MEDIUM .severity { background-color: #e9c600; }
 36 |       .severity-HIGH .severity { background-color: #ff8800; }
 37 |       .severity-CRITICAL .severity { background-color: #e40000; }
 38 |       .severity-UNKNOWN .severity { background-color: #747474; }
 39 |       .severity-LOW { background-color: #5fbb3160; }
 40 |       .severity-MEDIUM { background-color: #e9c60060; }
 41 |       .severity-HIGH { background-color: #ff880060; }
 42 |       .severity-CRITICAL { background-color: #e4000060; }
 43 |       .severity-UNKNOWN { background-color: #74747460; }
 44 |       table tr td:first-of-type {
 45 |         font-weight: bold;
 46 |       }
 47 |       .links a,
 48 |       .links[data-more-links=on] a {
 49 |         display: block;
 50 |       }
 51 |       .links[data-more-links=off] a:nth-of-type(1n+5) {
 52 |         display: none;
 53 |       }
 54 |       a.toggle-more-links { cursor: pointer; }
 55 |     </style>
 56 |     <title>{{- escapeXML ( index . 0 ).Target }} - Trivy Report - {{ now }} </title>
 57 |     <script>
 58 |       window.onload = function() {
 59 |         document.querySelectorAll('td.links').forEach(function(linkCell) {
 60 |           var links = [].concat.apply([], linkCell.querySelectorAll('a'));
 61 |           [].sort.apply(links, function(a, b) {
 62 |             return a.href > b.href ? 1 : -1;
 63 |           });
 64 |           links.forEach(function(link, idx) {
 65 |             if (links.length > 3 && 3 === idx) {
 66 |               var toggleLink = document.createElement('a');
 67 |               toggleLink.innerText = "Toggle more links";
 68 |               toggleLink.href = "#toggleMore";
 69 |               toggleLink.setAttribute("class", "toggle-more-links");
 70 |               linkCell.appendChild(toggleLink);
 71 |             }
 72 |             linkCell.appendChild(link);
 73 |           });
 74 |         });
 75 |         document.querySelectorAll('a.toggle-more-links').forEach(function(toggleLink) {
 76 |           toggleLink.onclick = function() {
 77 |             var expanded = toggleLink.parentElement.getAttribute("data-more-links");
 78 |             toggleLink.parentElement.setAttribute("data-more-links", "on" === expanded ? "off" : "on");
 79 |             return false;
 80 |           };
 81 |         });
 82 |       };
 83 |     </script>
 84 |   </head>
 85 |   <body>
 86 |     <h1>{{- escapeXML ( index . 0 ).Target }} - Trivy Report - {{ now }}</h1>
 87 |     <table>
 88 |     {{- range . }}
 89 |       <tr class="group-header"><th colspan="6">{{ .Type | toString | escapeXML }}</th></tr>
 90 |       {{- if (eq (len .Vulnerabilities) 0) }}
 91 |       <tr><th colspan="6">No Vulnerabilities found</th></tr>
 92 |       {{- else }}
 93 |       <tr class="sub-header">
 94 |         <th>Package</th>
 95 |         <th>Vulnerability ID</th>
 96 |         <th>Severity</th>
 97 |         <th>Installed Version</th>
 98 |         <th>Fixed Version</th>
 99 |         <th>Links</th>
100 |       </tr>
101 |         {{- range .Vulnerabilities }}
102 |       <tr class="severity-{{ escapeXML .Vulnerability.Severity }}">
103 |         <td class="pkg-name">{{ escapeXML .PkgName }}</td>
104 |         <td>{{ escapeXML .VulnerabilityID }}</td>
105 |         <td class="severity">{{ escapeXML .Vulnerability.Severity }}</td>
106 |         <td class="pkg-version">{{ escapeXML .InstalledVersion }}</td>
107 |         <td>{{ escapeXML .FixedVersion }}</td>
108 |         <td class="links" data-more-links="off">
109 |           {{- range .Vulnerability.References }}
110 |           <a href={{ escapeXML . | printf "%q" }}>{{ escapeXML . }}</a>
111 |           {{- end }}
112 |         </td>
113 |       </tr>
114 |         {{- end }}
115 |       {{- end }}
116 |       {{- if (eq (len .Misconfigurations ) 0) }}
117 |       <tr><th colspan="6">No Misconfigurations found</th></tr>
118 |       {{- else }}
119 |       <tr class="sub-header">
120 |         <th>Type</th>
121 |         <th>Misconf ID</th>
122 |         <th>Check</th>
123 |         <th>Severity</th>
124 |         <th>Message</th>
125 |       </tr>
126 |         {{- range .Misconfigurations }}
127 |       <tr class="severity-{{ escapeXML .Severity }}">
128 |         <td class="misconf-type">{{ escapeXML .Type }}</td>
129 |         <td>{{ escapeXML .ID }}</td>
130 |         <td class="misconf-check">{{ escapeXML .Title }}</td>
131 |         <td class="severity">{{ escapeXML .Severity }}</td>
132 |         <td class="link" data-more-links="off"  style="white-space:normal;">
133 |           {{ escapeXML .Message }}
134 |           <br>
135 |             <a href={{ escapeXML .PrimaryURL | printf "%q" }}>{{ escapeXML .PrimaryURL }}</a>
136 |           </br>
137 |         </td>
138 |       </tr>
139 |         {{- end }}
140 |       {{- end }}
141 |     {{- end }}
142 |     </table>
143 | {{- else }}
144 |   </head>
145 |   <body>
146 |     <h1>Trivy Returned Empty Report</h1>
147 | {{- end }}
148 |   </body>
149 | </html>
150 | 


--------------------------------------------------------------------------------
/.github/workflows/scan_ubi.yml:
--------------------------------------------------------------------------------
 1 | name: scan
 2 | 
 3 | on:
 4 |   workflow_run:
 5 |     workflows: ["01-build-base-images"]
 6 |     types: [completed]
 7 | 
 8 |   # Allows you to run this workflow manually from the Actions tab
 9 |   workflow_dispatch:
10 | 
11 | jobs:
12 |   scan-ubi:
13 |     name: Scan Red Hat UBI base image
14 |     runs-on: ubuntu-latest
15 |     if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }}
16 |     steps:
17 |       - name: Checkout
18 |         uses: actions/checkout@v4
19 |         with:
20 |           sparse-checkout: |
21 |             .github/workflows/scan/html.tpl
22 |           sparse-checkout-cone-mode: false
23 | 
24 |       - name: Run Trivy vulnerability scanner for UBI image
25 |         uses: aquasecurity/trivy-action@0.16.1
26 |         with:
27 |           image-ref: "ghcr.io/jim60105/whisperx:ubi-no_model"
28 |           vuln-type: "os,library"
29 |           scanners: vuln
30 |           severity: "CRITICAL,HIGH"
31 |           format: "template"
32 |           template: "@.github/workflows/scan/html.tpl"
33 |           ignore-unfixed: true
34 |           output: "trivy-results-ubi.html"
35 | 
36 |       - name: Upload Artifact
37 |         uses: actions/upload-artifact@v4
38 |         with:
39 |           name: trivy-results-ubi
40 |           path: trivy-results-ubi.html
41 |           retention-days: 90
42 | 
43 |       - name: Run Trivy vulnerability scanner for UBI image (SARIF)
44 |         uses: aquasecurity/trivy-action@master
45 |         if: always()
46 |         with:
47 |           image-ref: "ghcr.io/jim60105/whisperx:ubi-no_model"
48 |           vuln-type: "os,library"
49 |           scanners: vuln
50 |           severity: "CRITICAL,HIGH"
51 |           format: 'sarif'
52 |           exit-code: '1'
53 |           ignore-unfixed: true
54 |           output: 'trivy-results.sarif'
55 |   
56 |       - name: Upload Trivy scan results to GitHub Security tab
57 |         uses: github/codeql-action/upload-sarif@v3
58 |         if: always()
59 |         with:
60 |           sarif_file: 'trivy-results.sarif'
61 | 


--------------------------------------------------------------------------------
/.github/workflows/submodule_update.yml:
--------------------------------------------------------------------------------
 1 | name: Submodule Updates
 2 | 
 3 | on:
 4 |   schedule:
 5 |     - cron: "0 0 * * 0"
 6 |   workflow_dispatch:
 7 | 
 8 | jobs:
 9 |   update_submodules:
10 |     name: Submodule update
11 |     runs-on: ubuntu-latest
12 |     env:
13 |       PARENT_REPOSITORY: ${{ github.repository_owner }}/docker-whisperX
14 |       CHECKOUT_BRANCH: master
15 |       PR_AGAINST_BRANCH: master
16 |       OWNER: ${{ github.repository_owner }}
17 | 
18 |     steps:
19 |       - name: Checkout Code
20 |         uses: actions/checkout@v3
21 | 
22 |       - name: Update Submodules
23 |         uses: releasehub-com/github-action-create-pr-parent-submodule@v1
24 |         continue-on-error: true
25 |         with:
26 |           github_token: ${{ secrets.CR_PAT }}
27 |           parent_repository: ${{ env.PARENT_REPOSITORY }}
28 |           checkout_branch: ${{ env.CHECKOUT_BRANCH}}
29 |           pr_against_branch: ${{ env.PR_AGAINST_BRANCH }}
30 |           owner: ${{ env.OWNER }}
31 |           label: "automerge"
32 | 


--------------------------------------------------------------------------------
/.github/workflows/test/en.webm:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jim60105/docker-whisperX/9245177c906bb94f40f1015629d62b6496dce317/.github/workflows/test/en.webm


--------------------------------------------------------------------------------
/.github/workflows/test/zh.webm:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jim60105/docker-whisperX/9245177c906bb94f40f1015629d62b6496dce317/.github/workflows/test/zh.webm


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.env
2 | cache
3 | 


--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "whisperX"]
2 | 	path = whisperX
3 | 	url = https://github.com/jim60105/whisperX
4 | 


--------------------------------------------------------------------------------
/.hadolint.yml:
--------------------------------------------------------------------------------
 1 | ignored:
 2 |   - DL3041 # Specify version with `dnf install -y <package>-<version>`.
 3 |   - DL3042 # Avoid use of cache directory with pip. Use `pip install --no-cache-dir <package>`
 4 |   - DL4006 # Set the SHELL option -o pipefail before RUN with a pipe in it
 5 |   - DL3013 # Pin versions in pip. Instead of `pip install <package>` use `pip install <package>==<version>`
 6 |   - SC2015 # Note that A && B || C is not if-then-else. C may run when A is true.
 7 |   - DL3006 # Always tag the version of an image explicitly
 8 |   - DL3008 # Pin versions in apt get install. Instead of `apt-get install <package>` use `apt-get install <package>=<version>`
 9 |   - DL3040 # `dnf clean all` missing after dnf command.
10 | 


--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "cSpell.words": [
 3 |         "anuragshas",
 4 |         "aptlists",
 5 |         "bokmaal",
 6 |         "buildx",
 7 |         "catala",
 8 |         "comodoro",
 9 |         "CUDA",
10 |         "Diarization",
11 |         "distil",
12 |         "ffprobe",
13 |         "findutils",
14 |         "ftspeech",
15 |         "huggingface",
16 |         "imvladikon",
17 |         "jonatasgrosman",
18 |         "kingabzpro",
19 |         "kresnik",
20 |         "libgomp",
21 |         "libsndfile",
22 |         "microdnf",
23 |         "mpoyraz",
24 |         "nguyenvulebinh",
25 |         "nodocs",
26 |         "noplugins",
27 |         "numpy",
28 |         "pipefail",
29 |         "pyannote",
30 |         "pycache",
31 |         "pyproject",
32 |         "PYTHONIOENCODING",
33 |         "PYTHONPATH",
34 |         "PYTHONUNBUFFERED",
35 |         "saattrupdan",
36 |         "setopt",
37 |         "setuptools",
38 |         "Silero",
39 |         "softcatala",
40 |         "submuodules",
41 |         "theainerd",
42 |         "tmpfs",
43 |         "torchaudio",
44 |         "tsflags",
45 |         "venv",
46 |         "VOXPOPULI",
47 |         "whisperx",
48 |         "xlsr",
49 |         "Yehor"
50 |     ],
51 |     "[python]": {
52 |         "editor.defaultFormatter": "ms-python.black-formatter"
53 |     },
54 |     "python.formatting.provider": "none"
55 | }
56 | 


--------------------------------------------------------------------------------
/AGENTS.md:
--------------------------------------------------------------------------------
  1 | # GitHub Copilot Instructions for docker-whisperX
  2 | 
  3 | * **Response Language:** `zh-TW 正體中文`
  4 | 
  5 | # Key Directives:
  6 | 
  7 | * Maintain the highest standard of quality in all deliverables by following best practices.
  8 | * All code comments and documentation must be written in **English** as per project conventions.
  9 | * Proactively consult both core documentation and conversation history to ensure accurate comprehension of all requirements.
 10 | * You are neither able to execute `docker`, use `podman` instead.
 11 | * When doing Git commit, use the conventional commit format for the title and a brief description in the body. Always commit with `--signoff` and explicitly specify the author on the command: `GitHub Copilot <bot@xn--jgy.tw>`. Write the commit in English.
 12 | 
 13 | ---
 14 | 
 15 | # Project DevOps
 16 | 
 17 | This project uses GitHub for DevOps management.
 18 | 
 19 | Use `gh` CLI commands to perform DevOps tasks.
 20 | 
 21 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!***
 22 | 
 23 | * **GitHub repo**: https://github.com/jim60105/standalone-civitai-shortcut
 24 | 
 25 | * **Backlog & Bugs**: All backlogs and bugs must be managed on GitHub Issues.
 26 | 
 27 |   * Each issue represents a specific backlog plan / bug reports / enhancement requests.
 28 |   * Contains implementation or bug-fix guides from project foundation to deployment
 29 |   * Each issue(backlogs) includes complete technical design and implementation details
 30 |   * Each issue(bugs) includes problem description, reproduction steps, and proposed solutions
 31 |   * Serves as task queue for ongoing maintenance and improvements
 32 | 
 33 | ## DevOps Flow
 34 | 
 35 | ### Planning Stage
 36 | 
 37 | **If we are at planning stage you shouldn't start to implement anything!**
 38 | **Planning Stage is to create a detailed development plan and create issue on GitHub using `gh issue create`**
 39 | 
 40 | 1. **Issue Creation**: Use `gh issue create --title "Issue Title" --body "Issue Description"` to create a new issue for each backlog item or bug report. Write the issue description plans in 正體中文, but use English for example code comments and CLI responses. The plan should be very detailed (try your best!). Please write that enables anyone to complete the work successfully.
 41 | 2. **Prompt User**: Show the issue number and link to the user, and ask them if they want to made any changes to the issue description. If they do, you can edit the issue description using `gh issue edit [number] --body "New Description"`.
 42 | 
 43 | ### Implementation Stage
 44 | 
 45 | **Only start to implement stage when user prompt you to do so!**
 46 | **Implementation Stage is to implement the plan step by step, following the instructions provided in the issue and submit a work report PR at last**
 47 | 
 48 | 1. **Check Current Situation**: Run `git status` to check the current status of the Git repository to ensure you are aware of any uncommitted changes or issues before proceeding with any operations. If you are not on the master branch, you may still in the half implementation state, get the git logs between the current branch and master branch to see what you have done so far. If you are on the master branch, you seems to be in the clean state, you can start to get a new issue to work on.
 49 | 2. **Get Issue Lists**: Use `gh issue list` to get the list of issues to see all backlogs and bugs. Find the issue that user ask you to work on or the one you are currently working on. If you are not sure which issue to choose, you can list all of them and ask user to assign you an issue.
 50 | 3. **Get Issue Details**: Use `gh issue view [number]` to get the details of the issue to understand the requirements and implementation plan. Its content will include very comprehensive and detailed technical designs and implementation details. Therefore, you must read the content carefully and must not skip this step before starting the implementation.
 51 | 4. **Get Issue Comments**: Use `gh issue view [number] --comments` to read the comments in the issue to understand the context and any additional requirements or discussions that have taken place. Please read it to determine whether this issue has been completed, whether further implementation is needed, or if there are still problems that need to be fixed. This step must not be skipped before starting implementation.
 52 | 5. **Get Pull Requests**: Use `gh pr list`, `gh pr view [number]`, and `gh pr view [number] --comments` to list the existing pull requests and details to check if there are any related to the issue you are working on. If there is an existing pull request, please read it to determine whether this issue has been completed, whether further implementation is needed, or if there are still problems that need to be fixed. This step must not be skipped before starting implementation.
 53 | 6. **Git Checkout**: Run `git checkout -b [branch-name]` to checkout the issue branch to start working on the code changes. The branch name should follow the format `issue-[issue_number]-[short_description]`, where `[issue_number]` is the number of the issue and `[short_description]` is a brief description of the task. Skip this step if you are already on the correct branch.
 54 | 7. **Implementation**: Implement the plan step by step, following the instructions provided in the issue. Each step should be executed in sequence, ensuring that all requirements are met and documented appropriately.
 55 | 8. **Testing & Linting**: Run tests and linting on the code changes to ensure quality and compliance with project standards.
 56 | 9. **Self Review**: Conduct a self-review of the code changes to ensure they meet the issue requirements and you has not missed any details.
 57 | 10. **Git Commit & Git Push**: Run `git commit` using the conventional commit format for the title and a brief description in the body. Always commit with `--signoff` and explicitly specify the author on the command: `Codex-CLI <bot@xn--jgy.tw>`. Write the commit in English. Link the issue number in the commit message body. Run `git push` to push the changes to the remote repository.
 58 | 11. **Create Pull Request**: Use `gh pr list` and `gh pr create` commands. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. Create a pull request if there isn't already has one related to your issue using `gh pr create --title "PR Title" --body "PR Description"`. Create a comprehensive work report and use it as pull request details, detailing the work performed, code changes, and test results for the project. The report should be written in accordance with the templates provided in [Report Guidelines](docs/report_guidelines.md) and [REPORT_TEMPLATE](docs/REPORT_TEMPLATE.md). Follow the template exactly. Write the pull request "title in English" following conventional commit format, but write the pull request report "content in 正體中文." Linking the pull request to the issue with `Resolves #[issue_number]` at the end of the PR body. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR TO `upstream`. ALWAYS SUBMIT PR TO `origin`, NEVER SUBMIT PR to `upstream`.
 59 | 
 60 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!***
 61 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!***
 62 | ***Highest-level restriction: All issue and PR operations are limited to repositories owned by jim60105 only!***
 63 | 
 64 | ---
 65 | 
 66 | ## Project Overview
 67 | 
 68 | This project provides a **Docker containerization** for [WhisperX](https://github.com/m-bain/whisperX).
 69 | 
 70 | The project focuses on **continuous integration optimization** for building 175+ Docker images (10GB each) weekly on GitHub Free runners, emphasizing efficient docker layer caching, parallel builds, and minimal image sizes.
 71 | 
 72 | The focus of this project is on the Dockerfile and CI workflow, not on the WhisperX project itself.
 73 | 
 74 | ## Project Structure
 75 | 
 76 | ```
 77 | docker-whisperX/
 78 | ├── Dockerfile              # Main Docker build configuration (For docker compatibility)
 79 | ├── ubi.Dockerfile          # Red Hat UBI-based alternative (For podman compatibility)
 80 | ├── docker-bake.hcl         # Docker Buildx bake configuration for matrix builds
 81 | ├── load_align_model.py     # Preloads alignment models for supported languages
 82 | ├── whisperX/               # Git submodule containing WhisperX source code
 83 | │   ├── pyproject.toml      # Python package configuration
 84 | │   └── whisperx/           # Main WhisperX Python package
 85 | └── .github/
 86 |     └── workflows/          # CI/CD pipeline configurations
 87 | ```
 88 | 
 89 | ## Coding Standards and Conventions
 90 | 
 91 | ### Docker Best Practices
 92 | - Use **multi-stage builds** to minimize final image size
 93 | - Leverage **BuildKit features** like `--mount=type=cache` for dependency caching
 94 | - Apply **layer caching strategies** to optimize CI build times
 95 | - Use **ARG** variables for build-time configuration (WHISPER_MODEL, LANG, etc.)
 96 | - Follow **security best practices**: run as non-root user, minimize installed packages
 97 | - Do not use `--link` in ubi.Dockerfile, as it is not supported by Podman.
 98 | - Do not use `,z` or `,Z` in Dockerfile, as it is not supported by Docker buildx.
 99 | 
100 | ### Documentation Standards
101 | - Write documentation in English for user-facing content
102 | - Use **English** for technical comments in code and commit messages
103 | - Include **clear examples** in README files showing actual usage commands
104 | - Document **build arguments** and their acceptable values
105 | - Provide **troubleshooting guidance** for common issues
106 | 
107 | ## Key Technologies and Dependencies
108 | 
109 | ### Build Tools
110 | - **uv**: Modern Python package manager for dependency resolution (Used in Dockerfile)
111 | - **Docker Buildx**: Extended build capabilities with bake support
112 | - **GitHub Actions**: CI/CD automation for multi-architecture builds
113 | 
114 | ## Development Guidelines
115 | 
116 | ### When Working with Docker Configuration
117 | - **Dockerfile modifications**: Always test both `amd64` and `arm64` architectures
118 | - **Build arguments**: Validate that ARG values match supported languages in `load_align_model.py`
119 | - **Cache optimization**: Consider layer ordering impact on CI build performance
120 | - **Multi-stage builds**: Ensure each stage serves a clear purpose (build → no_model → load_whisper)
121 | 
122 | ### When Working with CI/CD
123 | - **Parallel builds**: Consider the large amount of build matrix impact on GitHub runner resources
124 | - **Caching strategy**: Optimize for both build time and cache storage efficiency
125 | - **Multi-architecture**: Ensure changes work correctly on both x86_64 and arm64
126 | 
127 | ## Project-Specific Conventions
128 | 
129 | ## Additional Notes for Contributors
130 | 
131 | When suggesting changes, always consider the impact on:
132 | 1. **Build time efficiency** for the CI pipeline
133 | 2. **Multi-architecture compatibility** (amd64/arm64)  
134 | 
135 | ---
136 | 
137 | When contributing to this codebase, adhere strictly to these directives to ensure consistency with the existing architectural conventions and stylistic norms.
138 | 


--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
  1 | # syntax=docker/dockerfile:1
  2 | ARG WHISPER_MODEL=base
  3 | ARG LANG=en
  4 | ARG UID=1001
  5 | ARG VERSION=EDGE
  6 | ARG RELEASE=0
  7 | 
  8 | # These ARGs are for caching stage builds in CI
  9 | # Leave them as is when building locally
 10 | ARG LOAD_WHISPER_STAGE=load_whisper
 11 | ARG NO_MODEL_STAGE=no_model
 12 | 
 13 | # When downloading diarization model with auth token, it seems that it is not respecting the TORCH_HOME env variable.
 14 | # So it is necessary to ensure that the CACHE_HOME is set to the exact same path as the default path.
 15 | # https://github.com/jim60105/docker-whisperX/issues/27
 16 | ARG CACHE_HOME=/.cache
 17 | ARG CONFIG_HOME=/.config
 18 | ARG TORCH_HOME=${CACHE_HOME}/torch
 19 | ARG HF_HOME=${CACHE_HOME}/huggingface
 20 | 
 21 | ########################################
 22 | # Base stage for amd64
 23 | ########################################
 24 | FROM docker.io/library/python:3.11-slim-bullseye AS prepare_base_amd64
 25 | 
 26 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892
 27 | ARG TARGETARCH
 28 | ARG TARGETVARIANT
 29 | 
 30 | WORKDIR /tmp
 31 | 
 32 | ENV NVIDIA_VISIBLE_DEVICES=all
 33 | ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
 34 | 
 35 | ########################################
 36 | # Base stage for arm64
 37 | ########################################
 38 | FROM docker.io/library/python:3.11-slim-bullseye AS prepare_base_arm64
 39 | 
 40 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892
 41 | ARG TARGETARCH
 42 | ARG TARGETVARIANT
 43 | 
 44 | WORKDIR /tmp
 45 | 
 46 | # Missing dependencies for arm64 (needed for build-time and run-time)
 47 | # https://github.com/jim60105/docker-whisperX/issues/14
 48 | RUN --mount=type=cache,id=apt-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/cache/apt \
 49 |     --mount=type=cache,id=aptlists-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/lib/apt/lists \
 50 |     apt-get update && apt-get install -y --no-install-recommends \
 51 |     libgomp1 libsndfile1
 52 | 
 53 | # Select the base stage by target architecture
 54 | FROM prepare_base_$TARGETARCH$TARGETVARIANT AS base
 55 | 
 56 | ########################################
 57 | # Build stage
 58 | ########################################
 59 | FROM base AS build
 60 | 
 61 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892
 62 | ARG TARGETARCH
 63 | ARG TARGETVARIANT
 64 | 
 65 | WORKDIR /app
 66 | 
 67 | # Install uv
 68 | COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
 69 | 
 70 | ENV UV_PROJECT_ENVIRONMENT=/venv
 71 | ENV VIRTUAL_ENV=/venv
 72 | ENV UV_LINK_MODE=copy
 73 | ENV UV_PYTHON_DOWNLOADS=0
 74 | 
 75 | # Install big dependencies separately for layer caching
 76 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \
 77 |     uv venv --system-site-packages /venv && \
 78 |     uv pip install --no-deps --index "https://download.pytorch.org/whl/cu128" \
 79 |     "torch==2.7.1+cu128" \
 80 |     "torchaudio" \
 81 |     "triton" \
 82 |     "pyannote.audio==3.3.2"
 83 | 
 84 | # Install whisperX dependencies
 85 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \
 86 |     --mount=type=bind,source=whisperX/pyproject.toml,target=pyproject.toml \
 87 |     --mount=type=bind,source=whisperX/uv.lock,target=uv.lock \
 88 |     uv sync --frozen --no-dev --no-install-project --no-editable
 89 | 
 90 | # Install whisperX project
 91 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \
 92 |     --mount=source=whisperX,target=.,rw \
 93 |     uv sync --frozen --no-dev --no-editable
 94 | 
 95 | ########################################
 96 | # Final stage for no_model
 97 | ########################################
 98 | FROM base AS no_model
 99 | 
100 | # We don't need them anymore
101 | RUN pip3.11 uninstall -y pip wheel && \
102 |     rm -rf /root/.cache/pip
103 | 
104 | # Create user
105 | ARG UID
106 | RUN groupadd -g $UID $UID && \
107 |     useradd -l -u $UID -g $UID -m -s /bin/sh -N $UID
108 | 
109 | ARG CACHE_HOME
110 | ARG CONFIG_HOME
111 | ARG TORCH_HOME
112 | ARG HF_HOME
113 | ENV XDG_CACHE_HOME=${CACHE_HOME}
114 | ENV TORCH_HOME=${TORCH_HOME}
115 | ENV HF_HOME=${HF_HOME}
116 | 
117 | RUN install -d -m 775 -o $UID -g 0 /licenses && \
118 |     install -d -m 775 -o $UID -g 0 /root && \
119 |     install -d -m 775 -o $UID -g 0 ${CACHE_HOME} && \
120 |     install -d -m 775 -o $UID -g 0 ${CONFIG_HOME} && \
121 |     install -d -m 775 -o $UID -g 0 /nltk_data
122 | 
123 | # ffmpeg
124 | COPY --link --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /ffmpeg /usr/local/bin/
125 | # COPY --link --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /ffprobe /usr/local/bin/
126 | 
127 | # dumb-init
128 | COPY --link --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /dumb-init /usr/local/bin/
129 | 
130 | # Copy licenses (OpenShift Policy)
131 | COPY --link --chown=$UID:0 --chmod=775 LICENSE /licenses/LICENSE
132 | COPY --link --chown=$UID:0 --chmod=775 whisperX/LICENSE /licenses/whisperX.LICENSE
133 | 
134 | # Copy dependencies and code (and support arbitrary uid for OpenShift best practice)
135 | # https://docs.openshift.com/container-platform/4.14/openshift_images/create-images.html#use-uid_create-images
136 | COPY --link --chown=$UID:0 --chmod=775 --from=build /venv /venv
137 | 
138 | ENV PATH="/venv/bin${PATH:+:${PATH}}"
139 | ENV PYTHONPATH="/venv/lib/python3.11/site-packages"
140 | ENV LD_LIBRARY_PATH="/venv/lib/python3.11/site-packages/nvidia/cudnn/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
141 | 
142 | # Test whisperX
143 | RUN python3 -c 'import whisperx;' && \
144 |     whisperx -h
145 | 
146 | WORKDIR /app
147 | 
148 | VOLUME [ "/app" ]
149 | 
150 | USER $UID
151 | 
152 | STOPSIGNAL SIGINT
153 | 
154 | ENTRYPOINT [ "dumb-init", "--", "/bin/sh", "-c", "whisperx \"$@\"" ]
155 | 
156 | ARG VERSION
157 | ARG RELEASE
158 | LABEL name="jim60105/docker-whisperX" \
159 |     # Authors for WhisperX
160 |     vendor="Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew" \
161 |     # Maintainer for this docker image
162 |     maintainer="jim60105" \
163 |     # Dockerfile source repository
164 |     url="https://github.com/jim60105/docker-whisperX" \
165 |     version=${VERSION} \
166 |     # This should be a number, incremented with each change
167 |     release=${RELEASE} \
168 |     io.k8s.display-name="WhisperX" \
169 |     summary="WhisperX: Time-Accurate Speech Transcription of Long-Form Audio" \
170 |     description="This is the docker image for WhisperX: Automatic Speech Recognition with Word-Level Timestamps (and Speaker Diarization) from the community. For more information about this tool, please visit the following website: https://github.com/m-bain/whisperX."
171 | 
172 | ########################################
173 | # load_whisper stage
174 | # This stage will be tagged for caching in CI.
175 | ########################################
176 | FROM ${NO_MODEL_STAGE} AS load_whisper
177 | 
178 | ARG CONFIG_HOME
179 | ARG XDG_CONFIG_HOME=${CONFIG_HOME}
180 | ARG HOME="/root"
181 | 
182 | # Preload Silero vad model
183 | RUN python3 <<EOF
184 | import torch
185 | torch.hub.load(repo_or_dir='snakers4/silero-vad',
186 |                model='silero_vad',
187 |                force_reload=False,
188 |                onnx=False,
189 |                trust_repo=True)
190 | EOF
191 | 
192 | # Preload fast-whisper
193 | ARG WHISPER_MODEL
194 | ENV WHISPER_MODEL=${WHISPER_MODEL}
195 | 
196 | # Preload fast-whisper
197 | RUN echo "Preload whisper model: ${WHISPER_MODEL}" && \
198 |     python3 -c "import faster_whisper; model = faster_whisper.WhisperModel('${WHISPER_MODEL}')"
199 | 
200 | ########################################
201 | # load_align stage
202 | ########################################
203 | FROM ${LOAD_WHISPER_STAGE} AS load_align
204 | 
205 | ARG LANG
206 | ENV LANG=${LANG}
207 | 
208 | # Preload align models
209 | RUN --mount=source=load_align_model.py,target=load_align_model.py \
210 |     for i in ${LANG}; do echo "Preload align model: $i"; python3 load_align_model.py "$i"; done
211 | 
212 | ########################################
213 | # Final stage with model
214 | ########################################
215 | FROM ${NO_MODEL_STAGE} AS final
216 | 
217 | ARG UID
218 | 
219 | ARG CACHE_HOME
220 | COPY --link --chown=$UID:0 --chmod=775 --from=load_align ${CACHE_HOME} ${CACHE_HOME}
221 | 
222 | ARG LANG
223 | ENV LANG=${LANG}
224 | ARG WHISPER_MODEL
225 | ENV WHISPER_MODEL=${WHISPER_MODEL}
226 | 
227 | # Take the first language from LANG env variable
228 | ENTRYPOINT [ "dumb-init", "--", "/bin/sh", "-c", "LANG=$(echo ${LANG} | cut -d ' ' -f1); whisperx --model \"${WHISPER_MODEL}\" --language \"${LANG}\" \"$@\"" ]
229 | 
230 | ARG VERSION
231 | ARG RELEASE
232 | LABEL version=${VERSION} \
233 |     release=${RELEASE}
234 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 陳鈞
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # docker-whisperX
  2 | 
  3 | [![CodeFactor](https://www.codefactor.io/repository/github/jim60105/docker-whisperx/badge)](https://www.codefactor.io/repository/github/jim60105/docker-whisperx) ![Docker Build](https://img.shields.io/github/actions/workflow/status/jim60105/docker-whisperX/04-build-matrix-images.yml?label=Docker%20Build)
  4 | 
  5 | This is the docker image for [WhisperX: Automatic Speech Recognition with Word-Level Timestamps (and Speaker Diarization)](https://github.com/m-bain/whisperX) from the community.
  6 | 
  7 | The objective of this project is to efficiently manage the continuous integration docker build workflow on the ***GitHub Free runner*** on a ***weekly basis***. Which includes building ***175*** Docker images ***in parallel***, each with a size of ***10GB.*** To ensure smooth operation, I have concentrated on utilizing docker layer caches efficiently, maximizing layer reuse, carefully managing cache read/write order to prevent any issues, and optimizing to minimize image size and build time.
  8 | 
  9 | Additionally, for my personal preference, I am dedicated to following best practices, industry standards and policies to the best of my ability.
 10 | 
 11 | Get the Dockerfile at [GitHub](https://github.com/jim60105/docker-whisperX), or pull the image from [ghcr.io](https://ghcr.io/jim60105/whisperx).
 12 | 
 13 | ## 🚀 Get your Docker ready for GPU support
 14 | 
 15 | ### Windows
 16 | 
 17 | Once you have installed **Docker Desktop**, **CUDA Toolkit**, **NVIDIA Windows Driver**, and ensured that your Docker is running with **WSL2**, you are ready to go.
 18 | 
 19 | Here is the official documentation for further reference.  
 20 | <https://docs.nvidia.com/cuda/wsl-user-guide/index.html#nvidia-compute-software-support-on-wsl-2>
 21 | <https://docs.docker.com/desktop/wsl/use-wsl/#gpu-support>
 22 | 
 23 | ### Linux, OSX
 24 | 
 25 | Install an NVIDIA GPU Driver if you do not already have one installed.  
 26 | <https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html>
 27 | 
 28 | Install the NVIDIA Container Toolkit with this guide.  
 29 | <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>
 30 | 
 31 | > [!TIP]  
 32 | > I have a Chinese blog about this topic:  
 33 | > [Podman GPU Configuration Notes for Fedora/RHEL](https://xn--jgy.tw/Container/configuring-gpu-in-linux-podman/)
 34 | 
 35 | ## 📦 Available Pre-built Image
 36 | 
 37 | ![GitHub Workflow Status (with event)](https://img.shields.io/github/actions/workflow/status/jim60105/docker-whisperX/04-build-matrix-images.yml?label=Docker%20Build) ![GitHub last commit (branch)](https://img.shields.io/github/last-commit/jim60105/docker-whisperX/master?label=Date)
 38 | 
 39 | > [!NOTE]  
 40 | > The WhisperX code base in these images aligns with the git submodule commit hash.  
 41 | > I have [a scheduled CI workflow](https://github.com/jim60105/docker-whisperX/actions/workflows/submodule_update.yml) runs weekly to target on [the main branch](https://github.com/m-bain/whisperX/tree/main) and rebuild all docker images.
 42 | 
 43 | ```bash
 44 | docker run --gpus all -it -v ".:/app" ghcr.io/jim60105/whisperx:base-en     -- --output_format srt audio.mp3
 45 | docker run --gpus all -it -v ".:/app" ghcr.io/jim60105/whisperx:large-v3-ja -- --output_format srt audio.mp3
 46 | docker run --gpus all -it -v ".:/app" ghcr.io/jim60105/whisperx:no_model    -- --model tiny --language en --output_format srt audio.mp3
 47 | ```
 48 | 
 49 | The image tags are formatted as `WHISPER_MODEL`-`LANG`, for example, `tiny-en`, `base-de` or `large-v3-zh`.  
 50 | Please be aware that the whisper models `*.en`,  `large-v1`, `large-v2` have been excluded as I believe they are not frequently used. If you require these models, please refer to the following section to build them on your own.
 51 | 
 52 | You can find the actual build matrix in [04-build-matrix-images.yml](.github/workflows/04-build-matrix-images.yml) and all available tags at [ghcr.io](https://github.com/jim60105/docker-whisperX/pkgs/container/whisperx/versions?filters%5Bversion_type%5D=tagged).
 53 | 
 54 | In addition, there is also a `no_model` tag that does not include any pre-downloaded models, also referred to as `latest`.
 55 | 
 56 | > Added a `distil-large-v3-en` model.  
 57 | > Only en, distil model seems to only support English.
 58 | 
 59 | ## ⚡️ Preserve the download cache for the align models when working with various languages
 60 | 
 61 | You can mount the `/.cache` to share align models between containers.  
 62 | Please use tag `no_model` (`latest`) for this scenario.
 63 | 
 64 | ```bash
 65 | docker run --gpus all -it -v ".:/app" -v whisper_cache:/.cache ghcr.io/jim60105/whisperx:latest -- --model large-v3 --language en --output_format srt audio.mp3
 66 | ```
 67 | 
 68 | ## 🛠️ Building the Docker Image
 69 | 
 70 | > [!IMPORTANT]  
 71 | > Clone the Git repository recursively to include submodules:  
 72 | > `git clone --recursive https://github.com/jim60105/docker-whisperX.git`
 73 | 
 74 | ### Build Arguments
 75 | 
 76 | The [Dockerfile](Dockerfile) builds the image contained models. It accepts two build arguments: `LANG` and `WHISPER_MODEL`.
 77 | 
 78 | - `LANG`: The language to transcribe. The default is `en`. See [supported languages in load_align_model.py](https://github.com/jim60105/docker-whisperX/blob/master/load_align_model.py).
 79 | - `WHISPER_MODEL`: The model name. The default is `base`. See [fast-whisper](https://huggingface.co/Systran) for supported models.
 80 | 
 81 | In case of multiple language alignments needed, use space separated list of languages `"LANG=pl fr en"` when building the image. Also note that WhisperX is not doing well to handle multiple languages within the same audio file. Even if you do not provide the language parameter, it will still recognize the language (or fallback to en) and use it for choosing the alignment model. Alignment models are language specific. **This instruction is simply for embedding multiple alignment models into a docker image.**
 82 | 
 83 | ### Build Command
 84 | 
 85 | > [!NOTE]  
 86 | > If you are using an earlier version of the docker client, it is necessary to [enable the BuildKit mode](https://docs.docker.com/build/buildkit/#getting-started) when building the image. This is because I used the `COPY --link` feature which enhances the build performance and was introduced in Buildx v0.8.  
 87 | > With the Docker Engine 23.0 and Docker Desktop 4.19, Buildx has become the default build client. So you won't have to worry about this when using the latest version.
 88 | 
 89 | For example, if you want to build the image with `en` language and `large-v3` model:
 90 | 
 91 | ```bash
 92 | docker build --build-arg LANG=en --build-arg WHISPER_MODEL=large-v3 -t whisperx:large-v3-en .
 93 | ```
 94 | 
 95 | If you want to build the image without any pre-downloaded models:
 96 | 
 97 | ```bash
 98 | docker build --target no_model -t whisperx:no_model .
 99 | ```
100 | 
101 | If you want to build all images at once, we have [a Docker bake file](docker-bake.hcl) available:
102 | 
103 | > [!WARNING]  
104 | > [Bake](https://docs.docker.com/build/bake/) is currently an experimental feature, and it may require additional configuration in order to function correctly.
105 | 
106 | ```bash
107 | docker buildx bake build no_model ubi-no_model
108 | ```
109 | 
110 | ### Usage Command
111 | 
112 | Mount the current directory as `/app` and run WhisperX with additional input arguments:
113 | 
114 | ```bash
115 | docker run --gpus all -it -v ".:/app" whisperx:large-v3-ja -- --output_format srt audio.mp3
116 | ```
117 | 
118 | > [!NOTE]  
119 | > Remember to prepend `--` before the arguments.  
120 | > `--model` and `--language` args are defined in Dockerfile, no need to specify.
121 | 
122 | ## ⛑️ Red Hat UBI based Image
123 | 
124 | ![Docker Build](https://img.shields.io/github/actions/workflow/status/jim60105/docker-whisperX/01-build-base-images.yml?label=Docker%20Build)
125 | 
126 | I have created an alternative [ubi.Dockerfile](ubi.Dockerfile) that is based on the **Red Hat Universal Base Image (UBI)** image, unlike the default one which used the **Python official image** as the base image. If you are a Red Hat subscriber, I believe you will find its benefits.
127 | 
128 | > [!TIP]
129 | > With the release of the Red Hat Universal Base Image (UBI), you can now take advantage of the greater reliability, security, and performance of official Red Hat container images where OCI-compliant Linux containers run - whether you're a customer or not. -- [Red Hat blog](https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image)
130 | 
131 | It is important to mention that it is *NOT* necessary obtaining a license from Red Hat to use UBI, however, if you are the subscriber and runs it on RHEL/OpenShift, you may get supports from Red Hat.
132 | 
133 | Despite my initial hesitation, I made the decision not to utilize the *UBI* version as the default image. The *Python official image* has a significantly larger user base compared to *UBI*, and I believe that opting for it aligns better with public expectations. Nevertheless, I would still suggest giving the *UBI* version a try.
134 | 
135 | Please refer to [the latest vulnerability scan report](https://github.com/jim60105/docker-whisperX/actions/workflows/scan.yml?query=is%3Asuccess) from our scanning workflow artifact. You can see that the *UBI* version has fewer vulnerabilities compared to the *Python official image* version.
136 | 
137 | You can get the pre-built image at tag `ubi-no_model`. Notice that only `no_model` is available. Feel free to build your own image with the [ubi.Dockerfile](ubi.Dockerfile) for your needs. This Dockerfile supports the same build arguments as the default one.
138 | 
139 | ```bash
140 | docker run --gpus all -it -v ".:/app" ghcr.io/jim60105/whisperx:ubi-no_model -- --model tiny --language en --output_format srt audio.mp3
141 | ```
142 | 
143 | > [!WARNING]
144 | > ***DISCLAIMER***:  
145 | > I have created the image in accordance with the specifications outlined in the [Red Hat Container Certification Requirement](https://access.redhat.com/documentation/en-us/red_hat_software_certification/8.72/html/red_hat_openshift_software_certification_policy_guide/assembly-requirements-for-container-images_openshift-sw-cert-policy-introduction) but I am not going to pursue the actual [certification](https://connect.redhat.com/en/partner-with-us/red-hat-container-certification).
146 | 
147 | ## 📝 LICENSE
148 | 
149 | > The main program, WhisperX, is distributed under [the BSD-4 license](https://github.com/m-bain/whisperX/blob/main/LICENSE).  
150 | > Please consult their repository for access to the source code and license.
151 | 
152 | The Dockerfile and CI workflow files in this repository are licensed under [the MIT license](LICENSE).
153 | 
154 | ## 🌟 Star History
155 | 
156 | <a href="https://www.star-history.com/#jim60105/docker-whisperX&Date">
157 |  <picture>
158 |    <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=jim60105/docker-whisperX&type=Date&theme=dark" />
159 |    <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=jim60105/docker-whisperX&type=Date" />
160 |    <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=jim60105/docker-whisperX&type=Date" />
161 |  </picture>
162 | </a>
163 | 


--------------------------------------------------------------------------------
/docker-bake.hcl:
--------------------------------------------------------------------------------
  1 | group "default" {
  2 |   targets = ["no_model", "ubi-no_model", "build"]
  3 | }
  4 | 
  5 | variable "WHISPER_MODEL" {
  6 |   default = "base"
  7 | }
  8 | 
  9 | variable "LANG" {
 10 |   default = "en"
 11 | }
 12 | 
 13 | target "build" {
 14 |   matrix = {
 15 |     "WHISPER_MODEL" = [
 16 |       "tiny",
 17 |       "base",
 18 |       "small",
 19 |       "medium",
 20 |       "large-v3",
 21 |       "distil-large-v3"
 22 |     ]
 23 |     "LANG" = [
 24 |       "en",
 25 |       "fr",
 26 |       "de",
 27 |       "es",
 28 |       "it",
 29 |       "ja",
 30 |       "zh",
 31 |       "nl",
 32 |       "uk",
 33 |       "pt",
 34 |       "ar",
 35 |       "cs",
 36 |       "ru",
 37 |       "pl",
 38 |       "hu",
 39 |       "fi",
 40 |       "fa",
 41 |       "el",
 42 |       "tr",
 43 |       "da",
 44 |       "he",
 45 |       "vi",
 46 |       "ko",
 47 |       "ur",
 48 |       "te",
 49 |       "hi",
 50 |       "ca",
 51 |       "ml",
 52 |       "no",
 53 |       "nn",
 54 |       "sk",
 55 |       "sl",
 56 |       "hr",
 57 |       "ro",
 58 |       "eu",
 59 |       "gl",
 60 |       "ka",
 61 |       "lv",
 62 |       "tl",
 63 |     ]
 64 |   }
 65 | 
 66 |   args = {
 67 |     WHISPER_MODEL = "${WHISPER_MODEL}"
 68 |     LANG          = "${LANG}"
 69 |   }
 70 | 
 71 |   name       = "whisperx-${WHISPER_MODEL}-${LANG}"
 72 |   dockerfile = "Dockerfile"
 73 |   tags = [
 74 |     "ghcr.io/jim60105/whisperx:${WHISPER_MODEL}-${LANG}"
 75 |   ]
 76 |   platforms  = ["linux/amd64", "linux/arm64"]
 77 |   cache-from = ["type=local,mode=max,src=cache"]
 78 |   cache-to   = ["type=local,mode=max,dest=cache"]
 79 | }
 80 | 
 81 | target "no_model" {
 82 |   dockerfile = "Dockerfile"
 83 |   target     = "no_model"
 84 |   tags = [
 85 |     "ghcr.io/jim60105/whisperx:latest",
 86 |     "ghcr.io/jim60105/whisperx:no_model"
 87 |   ]
 88 |   platforms  = ["linux/amd64", "linux/arm64"]
 89 |   cache-from = ["type=local,mode=max,src=cache"]
 90 |   cache-to   = ["type=local,mode=max,dest=cache"]
 91 | }
 92 | 
 93 | target "ubi-no_model" {
 94 |   dockerfile = "ubi.Dockerfile"
 95 |   target     = "no_model"
 96 |   tags = [
 97 |     "ghcr.io/jim60105/whisperx:ubi-no_model"
 98 |   ]
 99 |   platforms  = ["linux/amd64", "linux/arm64"]
100 |   cache-from = ["type=local,mode=max,src=cache"]
101 |   cache-to   = ["type=local,mode=max,dest=cache"]
102 | }
103 | 


--------------------------------------------------------------------------------
/load_align_model.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import torchaudio
 3 | from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
 4 | 
 5 | lang = sys.argv[1]
 6 | 
 7 | # https://github.com/m-bain/whisperX/blob/v3.1.1/whisperx/alignment.py#L21
 8 | DEFAULT_ALIGN_MODELS_TORCH = {
 9 |     "en": "WAV2VEC2_ASR_BASE_960H",
10 |     "fr": "VOXPOPULI_ASR_BASE_10K_FR",
11 |     "de": "VOXPOPULI_ASR_BASE_10K_DE",
12 |     "es": "VOXPOPULI_ASR_BASE_10K_ES",
13 |     "it": "VOXPOPULI_ASR_BASE_10K_IT",
14 | }
15 | 
16 | DEFAULT_ALIGN_MODELS_HF = {
17 |     "ja": "jonatasgrosman/wav2vec2-large-xlsr-53-japanese",
18 |     "zh": "jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn",
19 |     "nl": "jonatasgrosman/wav2vec2-large-xlsr-53-dutch",
20 |     "uk": "Yehor/wav2vec2-xls-r-300m-uk-with-small-lm",
21 |     "pt": "jonatasgrosman/wav2vec2-large-xlsr-53-portuguese",
22 |     "ar": "jonatasgrosman/wav2vec2-large-xlsr-53-arabic",
23 |     "cs": "comodoro/wav2vec2-xls-r-300m-cs-250",
24 |     "ru": "jonatasgrosman/wav2vec2-large-xlsr-53-russian",
25 |     "pl": "jonatasgrosman/wav2vec2-large-xlsr-53-polish",
26 |     "hu": "jonatasgrosman/wav2vec2-large-xlsr-53-hungarian",
27 |     "fi": "jonatasgrosman/wav2vec2-large-xlsr-53-finnish",
28 |     "fa": "jonatasgrosman/wav2vec2-large-xlsr-53-persian",
29 |     "el": "jonatasgrosman/wav2vec2-large-xlsr-53-greek",
30 |     "tr": "mpoyraz/wav2vec2-xls-r-300m-cv7-turkish",
31 |     "da": "saattrupdan/wav2vec2-xls-r-300m-ftspeech",
32 |     "he": "imvladikon/wav2vec2-xls-r-300m-hebrew",
33 |     "vi": "nguyenvulebinh/wav2vec2-base-vi-vlsp2020",
34 |     "ko": "kresnik/wav2vec2-large-xlsr-korean",
35 |     "ur": "kingabzpro/wav2vec2-large-xls-r-300m-Urdu",
36 |     "te": "anuragshas/wav2vec2-large-xlsr-53-telugu",
37 |     "hi": "theainerd/Wav2Vec2-large-xlsr-hindi",
38 |     "ca": "softcatala/wav2vec2-large-xlsr-catala",
39 |     "ml": "gvs/wav2vec2-large-xlsr-malayalam",
40 |     "no": "NbAiLab/nb-wav2vec2-1b-bokmaal-v2",
41 |     "nn": "NbAiLab/nb-wav2vec2-1b-nynorsk",
42 |     "sk": "comodoro/wav2vec2-xls-r-300m-sk-cv8",
43 |     "sl": "anton-l/wav2vec2-large-xlsr-53-slovenian",
44 |     "hr": "classla/wav2vec2-xls-r-parlaspeech-hr",
45 |     "ro": "gigant/romanian-wav2vec2",
46 |     "eu": "stefan-it/wav2vec2-large-xlsr-53-basque",
47 |     "gl": "ifrz/wav2vec2-large-xlsr-galician",
48 |     "ka": "xsway/wav2vec2-large-xlsr-georgian",
49 |     "lv": "jimregan/wav2vec2-large-xlsr-latvian-cv",
50 |     "tl": "Khalsuu/filipino-wav2vec2-l-xls-r-300m-official",
51 | }
52 | 
53 | # From https://github.com/m-bain/whisperX/issues/189#issuecomment-1523392800
54 | if lang in DEFAULT_ALIGN_MODELS_TORCH:
55 |     model_name = DEFAULT_ALIGN_MODELS_TORCH[lang]
56 |     bundle = torchaudio.pipelines.__dict__[model_name]
57 |     align_model = bundle.get_model()
58 |     labels = bundle.get_labels()
59 | 
60 | elif lang in DEFAULT_ALIGN_MODELS_HF:
61 |     model_name = DEFAULT_ALIGN_MODELS_HF[lang]
62 |     processor = Wav2Vec2Processor.from_pretrained(model_name)
63 |     align_model = Wav2Vec2ForCTC.from_pretrained(model_name)
64 | else:
65 |     raise ValueError(f"Unsupported language: {lang}")
66 | 


--------------------------------------------------------------------------------
/ubi.Dockerfile:
--------------------------------------------------------------------------------
  1 | # syntax=docker/dockerfile:1
  2 | ARG WHISPER_MODEL=base
  3 | ARG LANG=en
  4 | ARG UID=1001
  5 | ARG VERSION=EDGE
  6 | ARG RELEASE=0
  7 | 
  8 | # These ARGs are for caching stage builds in CI
  9 | # Leave them as is when building locally
 10 | ARG LOAD_WHISPER_STAGE=load_whisper
 11 | ARG NO_MODEL_STAGE=no_model
 12 | 
 13 | # When downloading diarization model with auth token, it seems that it is not respecting the TORCH_HOME env variable.
 14 | # So it is necessary to ensure that the CACHE_HOME is set to the exact same path as the default path.
 15 | # https://github.com/jim60105/docker-whisperX/issues/27
 16 | ARG CACHE_HOME=/.cache
 17 | ARG CONFIG_HOME=/.config
 18 | ARG TORCH_HOME=${CACHE_HOME}/torch
 19 | ARG HF_HOME=${CACHE_HOME}/huggingface
 20 | 
 21 | ########################################
 22 | # Python stage for all bases
 23 | ########################################
 24 | FROM registry.access.redhat.com/ubi9/ubi-minimal AS ubi-python
 25 | 
 26 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892
 27 | ARG TARGETARCH
 28 | ARG TARGETVARIANT
 29 | 
 30 | ENV PYTHON_VERSION=3.11
 31 | ENV PYTHONUNBUFFERED=1
 32 | ENV PYTHONIOENCODING=UTF-8
 33 | 
 34 | RUN --mount=type=cache,id=dnf-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/cache/dnf \
 35 |     microdnf -y upgrade --refresh --best --nodocs --noplugins --setopt=install_weak_deps=0 && \
 36 |     microdnf -y install --setopt=install_weak_deps=0 --setopt=tsflags=nodocs \
 37 |     python3.11
 38 | RUN ln -s /usr/bin/python3.11 /usr/bin/python3 && \
 39 |     ln -s /usr/bin/python3.11 /usr/bin/python
 40 | 
 41 | ########################################
 42 | # Base stage for amd64
 43 | ########################################
 44 | FROM ubi-python AS prepare_base_amd64
 45 | 
 46 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892
 47 | ARG TARGETARCH
 48 | ARG TARGETVARIANT
 49 | 
 50 | WORKDIR /tmp
 51 | 
 52 | ENV NVIDIA_VISIBLE_DEVICES=all
 53 | ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
 54 | 
 55 | ########################################
 56 | # Base stage for arm64
 57 | ########################################
 58 | FROM ubi-python AS prepare_base_arm64
 59 | 
 60 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892
 61 | ARG TARGETARCH
 62 | ARG TARGETVARIANT
 63 | 
 64 | WORKDIR /tmp
 65 | 
 66 | # Missing dependencies for arm64
 67 | # https://github.com/jim60105/docker-whisperX/issues/14
 68 | RUN --mount=type=cache,id=dnf-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/var/cache/dnf \
 69 |     microdnf -y install --setopt=install_weak_deps=0 --setopt=tsflags=nodocs \
 70 |     libgomp libsndfile
 71 | 
 72 | # Select the base stage by target architecture
 73 | FROM prepare_base_$TARGETARCH$TARGETVARIANT AS base
 74 | 
 75 | ########################################
 76 | # Build stage
 77 | ########################################
 78 | FROM base AS build
 79 | 
 80 | # RUN mount cache for multi-arch: https://github.com/docker/buildx/issues/549#issuecomment-1788297892
 81 | ARG TARGETARCH
 82 | ARG TARGETVARIANT
 83 | 
 84 | WORKDIR /app
 85 | 
 86 | # Install uv
 87 | COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
 88 | 
 89 | ENV UV_PROJECT_ENVIRONMENT=/venv
 90 | ENV VIRTUAL_ENV=/venv
 91 | ENV UV_LINK_MODE=copy
 92 | ENV UV_PYTHON_DOWNLOADS=0
 93 | 
 94 | # Install torch separately as required
 95 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \
 96 |     uv venv --system-site-packages /venv && \
 97 |     uv pip install --no-deps --index "https://download.pytorch.org/whl/cu128" \
 98 |     "torch==2.7.1+cu128" \
 99 |     "torchaudio" \
100 |     "triton" \
101 |     "pyannote.audio==3.3.2"
102 | 
103 | # Install whisperX dependencies
104 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \
105 |     --mount=type=bind,source=whisperX/pyproject.toml,target=pyproject.toml \
106 |     --mount=type=bind,source=whisperX/uv.lock,target=uv.lock \
107 |     uv sync --frozen --no-dev --no-install-project --no-editable
108 | 
109 | # Install whisperX project
110 | RUN --mount=type=cache,id=uv-$TARGETARCH$TARGETVARIANT,sharing=locked,target=/root/.cache/uv \
111 |     --mount=source=whisperX,target=.,rw \
112 |     uv sync --frozen --no-dev --no-editable
113 | 
114 | ########################################
115 | # Final stage for no_model
116 | ########################################
117 | FROM base AS no_model
118 | 
119 | ARG CACHE_HOME
120 | ARG CONFIG_HOME
121 | ARG TORCH_HOME
122 | ARG HF_HOME
123 | ENV XDG_CACHE_HOME=${CACHE_HOME}
124 | ENV TORCH_HOME=${TORCH_HOME}
125 | ENV HF_HOME=${HF_HOME}
126 | 
127 | ARG UID
128 | RUN install -d -m 775 -o $UID -g 0 /licenses && \
129 |     install -d -m 775 -o $UID -g 0 /root && \
130 |     install -d -m 775 -o $UID -g 0 ${CACHE_HOME} && \
131 |     install -d -m 775 -o $UID -g 0 ${CONFIG_HOME} && \
132 |     install -d -m 775 -o $UID -g 0 /nltk_data
133 | 
134 | # ffmpeg
135 | COPY --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /ffmpeg /usr/local/bin/
136 | # COPY --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /ffprobe /usr/local/bin/
137 | 
138 | # dumb-init
139 | COPY --from=ghcr.io/jim60105/static-ffmpeg-upx:8.0 /dumb-init /usr/local/bin/
140 | 
141 | # Copy licenses (OpenShift Policy)
142 | COPY --chown=$UID:0 --chmod=775 LICENSE /licenses/LICENSE
143 | COPY --chown=$UID:0 --chmod=775 whisperX/LICENSE /licenses/whisperX.LICENSE
144 | 
145 | # Copy dependencies and code (and support arbitrary uid for OpenShift best practice)
146 | # https://docs.openshift.com/container-platform/4.14/openshift_images/create-images.html#use-uid_create-images
147 | COPY --chown=$UID:0 --chmod=775 --from=build /venv /venv
148 | 
149 | ENV PATH="/venv/bin${PATH:+:${PATH}}"
150 | ENV PYTHONPATH="/venv/lib/python3.11/site-packages"
151 | ENV LD_LIBRARY_PATH="/venv/lib/python3.11/site-packages/nvidia/cudnn/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
152 | 
153 | # Test whisperX
154 | RUN python3 -c 'import whisperx;' && \
155 |     whisperx -h
156 | 
157 | WORKDIR /app
158 | 
159 | VOLUME [ "/app" ]
160 | 
161 | USER $UID
162 | 
163 | STOPSIGNAL SIGINT
164 | 
165 | ENTRYPOINT [ "dumb-init", "--", "/bin/sh", "-c", "whisperx \"$@\"" ]
166 | 
167 | ARG VERSION
168 | ARG RELEASE
169 | LABEL name="jim60105/docker-whisperX" \
170 |     # Authors for WhisperX
171 |     vendor="Bain, Max and Huh, Jaesung and Han, Tengda and Zisserman, Andrew" \
172 |     # Maintainer for this docker image
173 |     maintainer="jim60105" \
174 |     # Dockerfile source repository
175 |     url="https://github.com/jim60105/docker-whisperX" \
176 |     version=${VERSION} \
177 |     # This should be a number, incremented with each change
178 |     release=${RELEASE} \
179 |     io.k8s.display-name="WhisperX" \
180 |     summary="WhisperX: Time-Accurate Speech Transcription of Long-Form Audio" \
181 |     description="This is the docker image for WhisperX: Automatic Speech Recognition with Word-Level Timestamps (and Speaker Diarization) from the community. For more information about this tool, please visit the following website: https://github.com/m-bain/whisperX."
182 | 
183 | ########################################
184 | # load_whisper stage
185 | # This stage will be tagged for caching in CI.
186 | ########################################
187 | FROM ${NO_MODEL_STAGE} AS load_whisper
188 | 
189 | ARG CONFIG_HOME
190 | ARG XDG_CONFIG_HOME=${CONFIG_HOME}
191 | ARG HOME="/root"
192 | 
193 | # Preload Silero vad model
194 | RUN python3 <<EOF
195 | import torch
196 | torch.hub.load(repo_or_dir='snakers4/silero-vad',
197 |                model='silero_vad',
198 |                force_reload=False,
199 |                onnx=False,
200 |                trust_repo=True)
201 | EOF
202 | 
203 | ARG WHISPER_MODEL
204 | ENV WHISPER_MODEL=${WHISPER_MODEL}
205 | 
206 | # Preload fast-whisper
207 | RUN echo "Preload whisper model: ${WHISPER_MODEL}" && \
208 |     python3 -c "import faster_whisper; model = faster_whisper.WhisperModel('${WHISPER_MODEL}')"
209 | 
210 | ########################################
211 | # load_align stage
212 | ########################################
213 | FROM ${LOAD_WHISPER_STAGE} AS load_align
214 | 
215 | ARG LANG
216 | ENV LANG=${LANG}
217 | 
218 | # Preload align models
219 | RUN --mount=source=load_align_model.py,target=load_align_model.py \
220 |     for i in ${LANG}; do echo "Preload align model: $i"; python3 load_align_model.py "$i"; done
221 | 
222 | ########################################
223 | # Final stage with model
224 | ########################################
225 | FROM ${NO_MODEL_STAGE} AS final
226 | 
227 | ARG UID
228 | 
229 | ARG CACHE_HOME
230 | COPY --chown=$UID:0 --chmod=775 --from=load_align ${CACHE_HOME} ${CACHE_HOME}
231 | 
232 | ARG LANG
233 | ENV LANG=${LANG}
234 | ARG WHISPER_MODEL
235 | ENV WHISPER_MODEL=${WHISPER_MODEL}
236 | 
237 | # Take the first language from LANG env variable
238 | ENTRYPOINT [ "dumb-init", "--", "/bin/sh", "-c", "LANG=$(echo ${LANG} | cut -d ' ' -f1); whisperx --model \"${WHISPER_MODEL}\" --language \"${LANG}\" \"$@\"" ]
239 | 
240 | ARG VERSION
241 | ARG RELEASE
242 | LABEL version=${VERSION} \
243 |     release=${RELEASE}
244 | 


--------------------------------------------------------------------------------