├── data
    └── .gitkeep
├── submission_src
    └── .gitkeep
├── dev-requirements.txt
├── examples
    ├── transcription
    │   ├── assets
    │   │   └── .gitkeep
    │   └── main.py
    └── random
    │   └── main.py
├── pyproject.toml
├── runtime
    ├── apt.txt
    ├── Dockerfile-lock
    ├── tests
    │   └── test_packages.py
    ├── pixi.toml
    ├── entrypoint.sh
    └── Dockerfile
├── MAINTAINERS.md
├── .dockerignore
├── CHANGELOG.md
├── LICENSE
├── .github
    └── workflows
    │   └── build.yml
├── .gitignore
├── Makefile
└── README.md


/data/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/submission_src/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/dev-requirements.txt:
--------------------------------------------------------------------------------
1 | ruff
2 | 


--------------------------------------------------------------------------------
/examples/transcription/assets/.gitkeep:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [tool.ruff]
2 | line-length = 99
3 | 


--------------------------------------------------------------------------------
/runtime/apt.txt:
--------------------------------------------------------------------------------
1 | curl
2 | ffmpeg
3 | libxml2
4 | tzdata
5 | wget
6 | zip
7 | 


--------------------------------------------------------------------------------
/MAINTAINERS.md:
--------------------------------------------------------------------------------
1 | # Maintainer notes
2 | 
3 | This file documents notes for maintainers of the repository.
4 | 


--------------------------------------------------------------------------------
/.dockerignore:
--------------------------------------------------------------------------------
1 | # Ignore everything by default
2 | *
3 | 
4 | # Whitelist specific files/directories
5 | !runtime/


--------------------------------------------------------------------------------
/runtime/Dockerfile-lock:
--------------------------------------------------------------------------------
1 | FROM --platform=linux/amd64 ghcr.io/prefix-dev/pixi:0.34.0-jammy
2 | 
3 | USER root
4 | 
5 | RUN mkdir -p /tmp
6 | WORKDIR /tmp
7 | 
8 | ENTRYPOINT ["pixi", "ls", "--manifest-path", "pixi.toml", "--platform", "linux-64", "-v"]
9 | 


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | # Changelog
 2 | 
 3 | All notable changes to this project will be documented in this file.
 4 | 
 5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 7 | 
 8 | ## 2024-11-05
 9 | 
10 | ### Added
11 | 
12 | - Initial commit
13 | 
14 | 


--------------------------------------------------------------------------------
/runtime/tests/test_packages.py:
--------------------------------------------------------------------------------
 1 | import importlib
 2 | import subprocess
 3 | 
 4 | import pytest
 5 | 
 6 | packages = [
 7 |     "numpy",
 8 |     "pandas",
 9 |     "scipy",
10 |     "sklearn",
11 |     "torch",
12 |     "torchaudio",
13 |     "transformers",
14 |     "whisper",
15 |     "speechbrain",
16 | ]
17 | 
18 | 
19 | def is_gpu_available():
20 |     try:
21 |         return subprocess.check_call(["nvidia-smi"]) == 0
22 | 
23 |     except FileNotFoundError:
24 |         return False
25 | 
26 | 
27 | GPU_AVAILABLE = is_gpu_available()
28 | 
29 | 
30 | @pytest.mark.parametrize("package_name", packages, ids=packages)
31 | def test_import(package_name):
32 |     """Test that certain dependencies are importable."""
33 |     importlib.import_module(package_name)
34 | 
35 | 
36 | @pytest.mark.skipif(not GPU_AVAILABLE, reason="No GPU available")
37 | def test_allocate_torch():
38 |     import torch
39 | 
40 |     assert torch.cuda.is_available()
41 | 
42 |     torch.zeros(1).cuda()
43 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 DrivenData
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/runtime/pixi.toml:
--------------------------------------------------------------------------------
 1 | [project]
 2 | name = "literacy-screening"
 3 | channels = ["conda-forge", "pytorch"]
 4 | platforms = ["linux-64"]
 5 | 
 6 | # conda package dependencies
 7 | [dependencies]
 8 | accelerate = "1.0.1"
 9 | einops = "0.8.0"
10 | langchain = "0.3.*"
11 | librosa = "0.10.2.post1"
12 | loguru = "0.7.*"
13 | numpy = "1.26.*"
14 | pandas = "2.2.3"
15 | pytest = "8.3.3"
16 | python = "3.12.7"
17 | pytorch = {version = "2.4.1", channel = "pytorch"}
18 | scikit-learn = "1.5.*"
19 | scipy = "1.14.*"
20 | torchaudio = {version = "2.4.1", channel = "pytorch"}
21 | torchvision = {version = "0.19.1", channel = "pytorch"}
22 | transformers = "4.46.*"
23 | tqdm = "4.66.*"
24 | xgboost = "2.1.*"
25 | 
26 | [pypi-dependencies]
27 | speechbrain = "==1.0.1"
28 | openai-whisper = "==20240930"
29 | opensmile = "==2.5.0"
30 | 
31 | [feature.cuda]
32 | platforms = ["linux-64"]
33 | channels = ["nvidia", {channel = "pytorch", priority = -1}]
34 | system-requirements = {cuda = "12.1"}
35 | 
36 | [feature.cuda.dependencies]
37 | pytorch-cuda = {version = "12.1.*", channel = "pytorch"}
38 | 
39 | [feature.cpu]
40 | platforms = ["linux-64"]
41 | 
42 | [feature.cuda.tasks]
43 | check_cuda = 'python -c "import torch; print(torch.cuda.is_available())"'
44 | 
45 | [environments]
46 | cpu = ["cpu"]
47 | gpu = ["cuda"]
48 | 


--------------------------------------------------------------------------------
/examples/random/main.py:
--------------------------------------------------------------------------------
 1 | """This is an example submission that just generates random predictions."""
 2 | 
 3 | from pathlib import Path
 4 | 
 5 | import librosa
 6 | from loguru import logger
 7 | import numpy as np
 8 | import pandas as pd
 9 | 
10 | DATA_DIR = Path("data/")
11 | 
12 | 
13 | def main():
14 |     # load the two csvs in the data directory
15 |     df = pd.read_csv(DATA_DIR / "submission_format.csv", index_col="filename")
16 |     metadata = pd.read_csv(DATA_DIR / "test_metadata.csv", index_col="filename")
17 | 
18 |     # set random state for a reproducible submission since we're generating random probabilities
19 |     rng = np.random.RandomState(99)
20 | 
21 |     # iterate over audio files
22 |     scores = []
23 |     for file in df.index:
24 |         logger.info(f"Loading {file}")
25 |         audio, sr = librosa.load(DATA_DIR / file)
26 | 
27 |         # since this is a dummy submission, just assign a random number between 0 and 1
28 |         scores.append(rng.random())
29 | 
30 |     # write the scores to score column in the submission format
31 |     df["score"] = scores
32 | 
33 |     # write out predictions to submission.csv in the main directory
34 |     logger.info("Writing out submission.csv")
35 |     df.to_csv("submission.csv")
36 | 
37 | 
38 | if __name__ == "__main__":
39 |     main()
40 | 


--------------------------------------------------------------------------------
/runtime/entrypoint.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -euxo pipefail
 4 | 
 5 | main () {
 6 |     expected_filename=main.py
 7 | 
 8 |     cd /code_execution
 9 | 
10 |     submission_files=$(zip -sf ./submission/submission.zip)
11 |     if ! grep -q ${expected_filename}<<<$submission_files; then
12 |         echo "Submission zip archive must include $expected_filename"
13 |     return 1
14 |     fi
15 | 
16 |     echo "Unpacking submission"
17 |     unzip ./submission/submission.zip -d ./
18 | 
19 |     ls -alh
20 | 
21 |     if $IS_SMOKE_TEST; then
22 |         echo "Running smoke test"
23 |         pixi run -e $CPU_OR_GPU python main.py
24 |     else
25 |         echo "Running submission using $CPU_OR_GPU"
26 |         pixi run -e $CPU_OR_GPU python main.py &> "/code_execution/submission/private_log.txt"
27 |     fi
28 | 
29 |     echo "Exporting submission.csv result..."
30 | 
31 |     # Valid scripts must create a "submission.csv" file within the same directory as main
32 |     if [ -f "submission.csv" ]
33 |     then
34 |         echo "Script completed its run."
35 |         cp submission.csv ./submission/submission.csv
36 |     else
37 |         echo "ERROR: Script did not produce a submission.csv file in the main directory."
38 |         exit_code=1
39 |     fi
40 | }
41 | 
42 | main |& tee "/code_execution/submission/log.txt"
43 | exit_code=${PIPESTATUS[0]}
44 | 
45 | # Copy for terminationMessagePath
46 | cp /code_execution/submission/log.txt /tmp/log
47 | 
48 | exit $exit_code
49 | 


--------------------------------------------------------------------------------
/runtime/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM --platform=linux/amd64 ghcr.io/prefix-dev/pixi:0.34.0-jammy-cuda-12.1.1
 2 | 
 3 | USER root
 4 | 
 5 | ARG CPU_OR_GPU=gpu
 6 | 
 7 | ENV DEBIAN_FRONTEND=noninteractive \
 8 |     LANG=C.UTF-8  \
 9 |     LC_ALL=C.UTF-8 \
10 |     PYTHONUNBUFFERED=1 \
11 |     SHELL=/bin/bash
12 | 
13 | # Create user andset permissions
14 | ENV RUNTIME_USER=runtimeuser
15 | ENV RUNTIME_UID=1000
16 | ENV RUNTIME_GID=1000
17 | ENV CPU_OR_GPU=$CPU_OR_GPU
18 | ENV CONDA_OVERRIDE_CUDA="12.1"
19 | 
20 | RUN echo "Creating ${RUNTIME_USER} user..." \
21 |     && groupadd --gid ${RUNTIME_GID} ${RUNTIME_USER}  \
22 |     && useradd --create-home --gid ${RUNTIME_GID} --no-log-init --uid ${RUNTIME_UID} ${RUNTIME_USER}
23 | 
24 | COPY apt.txt apt.txt
25 | RUN apt-get update --fix-missing \
26 |     && apt-get install -y apt-utils 2> /dev/null \
27 |     && xargs -a apt.txt apt-get install -y \
28 |     && apt-get clean \
29 |     && rm -rf /var/lib/apt/lists/* /apt.txt
30 | 
31 | # Set up code execution working directory
32 | RUN mkdir /code_execution
33 | RUN chown -R ${RUNTIME_USER}:${RUNTIME_USER} /code_execution
34 | WORKDIR /code_execution
35 | 
36 | # Switch to runtime user
37 | USER ${RUNTIME_USER}
38 | 
39 | COPY pixi.lock ./pixi.lock
40 | COPY pixi.toml ./pixi.toml
41 | 
42 | RUN pixi install -e ${CPU_OR_GPU} --frozen \
43 |     && pixi clean cache --yes \
44 |     && pixi info
45 | 
46 | COPY entrypoint.sh /entrypoint.sh
47 | COPY --chown=${RUNTIME_USER}:${RUNTIME_USER} tests ./tests
48 | 
49 | CMD ["bash", "/entrypoint.sh"]
50 | 


--------------------------------------------------------------------------------
/examples/transcription/main.py:
--------------------------------------------------------------------------------
 1 | """This is an example submission that uses a pretrained model to generate transcriptions.
 2 | Note: for this submission to work, you must download the whisper model to the assets/ dir first."""
 3 | 
 4 | import string
 5 | from pathlib import Path
 6 | 
 7 | from loguru import logger
 8 | import numpy as np
 9 | import pandas as pd
10 | import torch
11 | import whisper
12 | 
13 | DATA_DIR = Path("data/")
14 | 
15 | 
16 | def download_whisper_model(download_root="assets"):
17 |     """Code to download model locally so we can include it in our submission"""
18 |     whisper.load_model("turbo", download_root=download_root)
19 | 
20 | 
21 | def clean_column(col: pd.Series):
22 |     return col.str.lower().str.strip().replace(f"[{string.punctuation}]", "", regex=True)
23 | 
24 | 
25 | def main():
26 |     # load the metadata that has the expected text for each audio file
27 |     df = pd.read_csv(DATA_DIR / "test_metadata.csv", index_col="filename")
28 | 
29 |     # load whisper model and put on GPU if available
30 |     device = "cuda" if torch.cuda.is_available() else "cpu"
31 |     model = whisper.load_model("assets/large-v3-turbo.pt").to(device)
32 | 
33 |     # iterate over audio files and get transcribed text
34 |     transcribed_texts = []
35 |     for file in df.index:
36 |         logger.info(f"Transcribing {file}")
37 |         # set temperature at 0 for reproducible results
38 |         result = model.transcribe(str(DATA_DIR / file), language="english", temperature=0)
39 |         transcribed_texts.append(result["text"])
40 | 
41 |     df["transcribed_text"] = transcribed_texts
42 | 
43 |     # clean columns to avoid false mismatches
44 |     df["expected_text"] = clean_column(df.expected_text)
45 |     df["transcribed_text"] = clean_column(df.transcribed_text)
46 | 
47 |     # score = 1 if transcribed text matches expected text
48 |     # score = 0.5 if transcription doesn't match; avoids penalizing confident but wrong
49 |     df["score"] = np.where(df.transcribed_text == df.expected_text, 1.0, 0.5)
50 | 
51 |     # ensure index matches submission format
52 |     sub_format = pd.read_csv(DATA_DIR / "submission_format.csv", index_col="filename")
53 |     preds = df[["score"]].loc[sub_format.index]
54 | 
55 |     # write out predictions to submission.csv in the main directory
56 |     logger.info("Writing out submission.csv")
57 |     preds.to_csv("submission.csv")
58 | 
59 | 
60 | if __name__ == "__main__":
61 |     main()
62 | 


--------------------------------------------------------------------------------
/.github/workflows/build.yml:
--------------------------------------------------------------------------------
 1 | 
 2 | name: build
 3 | 
 4 | on:
 5 |   push:
 6 |     branches: [main]
 7 |     paths: ["runtime/**", ".github/workflows/build.yml"]
 8 |   pull_request:
 9 |     paths: ["runtime/**", ".github/workflows/build.yml"]
10 |   workflow_dispatch:
11 |     inputs:
12 |       publishDev:
13 |         description: 'Publish dev image as cpu-{sha} and gpu-{sha}'
14 |         required: true
15 |         default: false
16 |         type: boolean
17 | 
18 | permissions:
19 |   id-token: write
20 |   contents: read
21 | 
22 | jobs:
23 |   build:
24 |     name: Build
25 |     runs-on: ubuntu-latest
26 |     strategy:
27 |       matrix:
28 |         proc: ["cpu", "gpu"]
29 |     env:
30 |       SHOULD_PUBLISH_LATEST: ${{ github.ref == 'refs/heads/main' && vars.PUBLISH_LATEST_ON_MAIN != '' }}
31 |       SHOULD_PUBLISH: |
32 |         ${{
33 |           github.ref == 'refs/heads/main' && vars.PUBLISH_LATEST_ON_MAIN != ''
34 |           || github.event_name == 'workflow_dispatch' && inputs.publishDev
35 |         }}
36 | 
37 |       LOGIN_SERVER: literacyscreening.azurecr.io
38 |       IMAGE: literacy-screening-competition
39 | 
40 |       SHA_TAG: ${{ matrix.proc }}-${{ github.sha }}
41 |       LATEST_TAG: ${{ matrix.proc }}-latest
42 | 
43 |     steps:
44 |       - name: Remove unwanted software
45 |         run: |
46 |             echo "Available storage before:"
47 |             sudo df -h
48 |             echo
49 |             sudo rm -rf /usr/share/dotnet
50 |             sudo rm -rf /usr/local/lib/android
51 |             sudo rm -rf /opt/ghc
52 |             sudo rm -rf /opt/hostedtoolcache/CodeQL
53 |             echo "Available storage after:"
54 |             sudo df -h
55 |             echo
56 | 
57 |       - uses: actions/checkout@v4
58 | 
59 |       - name: Build Image
60 |         run: |
61 |           docker build runtime \
62 |             --build-arg CPU_OR_GPU=${{ matrix.proc }} \
63 |             --tag $LOGIN_SERVER/$IMAGE:$SHA_TAG \
64 |             ${{ fromJson(env.SHOULD_PUBLISH_LATEST) && '--tag $LOGIN_SERVER/$IMAGE:$LATEST_TAG' || '' }}
65 | 
66 |       - name: Check image size
67 |         run: |
68 |           docker image list $LOGIN_SERVER/$IMAGE --format "{{.Tag}}: {{.Size}}" | tee -a $GITHUB_STEP_SUMMARY
69 | 
70 |       - name: Tests packages in container
71 |         run: |
72 |           docker run --network none \
73 |             $LOGIN_SERVER/$IMAGE:$SHA_TAG \
74 |             /code_execution/.pixi/envs/${{ matrix.proc }}/bin/python -m pytest tests
75 | 
76 |       - name: Log into Azure
77 |         if: ${{ fromJson(env.SHOULD_PUBLISH) }}
78 |         uses: azure/login@v1
79 |         with:
80 |           client-id: ${{secrets.AZURE_CLIENT_ID}}
81 |           tenant-id: ${{secrets.AZURE_TENANT_ID}}
82 |           subscription-id: ${{secrets.AZURE_SUBSCRIPTION_ID}}
83 | 
84 |       - name: Log into ACR with Docker
85 |         if: ${{ fromJson(env.SHOULD_PUBLISH) }}
86 |         uses: azure/docker-login@v1
87 |         with:
88 |           login-server: ${{ env.LOGIN_SERVER }}
89 |           username: ${{ secrets.REGISTRY_USERNAME }}
90 |           password: ${{ secrets.REGISTRY_PASSWORD }}
91 | 
92 |       - name: Push image to ACR
93 |         if: ${{ fromJson(env.SHOULD_PUBLISH) }}
94 |         run: |
95 |           docker push $LOGIN_SERVER/$IMAGE --all-tags
96 | 
97 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | data/*
  2 | submission/*
  3 | submission_src/*
  4 | !.gitkeep
  5 | 
  6 | ## Python.gitignore
  7 | ## https://github.com/github/gitignore/blob/4488915eec0b3a45b5c63ead28f286819c0917de/Python.gitignore
  8 | 
  9 | # Byte-compiled / optimized / DLL files
 10 | __pycache__/
 11 | *.py[cod]
 12 | *$py.class
 13 | 
 14 | # C extensions
 15 | *.so
 16 | 
 17 | # Distribution / packaging
 18 | .Python
 19 | build/
 20 | develop-eggs/
 21 | dist/
 22 | downloads/
 23 | eggs/
 24 | .eggs/
 25 | lib/
 26 | lib64/
 27 | parts/
 28 | sdist/
 29 | var/
 30 | wheels/
 31 | share/python-wheels/
 32 | *.egg-info/
 33 | .installed.cfg
 34 | *.egg
 35 | MANIFEST
 36 | 
 37 | # PyInstaller
 38 | #  Usually these files are written by a python script from a template
 39 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 40 | *.manifest
 41 | *.spec
 42 | 
 43 | # Installer logs
 44 | pip-log.txt
 45 | pip-delete-this-directory.txt
 46 | 
 47 | # Unit test / coverage reports
 48 | htmlcov/
 49 | .tox/
 50 | .nox/
 51 | .coverage
 52 | .coverage.*
 53 | .cache
 54 | nosetests.xml
 55 | coverage.xml
 56 | *.cover
 57 | *.py,cover
 58 | .hypothesis/
 59 | .pytest_cache/
 60 | cover/
 61 | 
 62 | # Translations
 63 | *.mo
 64 | *.pot
 65 | 
 66 | # Django stuff:
 67 | *.log
 68 | local_settings.py
 69 | db.sqlite3
 70 | db.sqlite3-journal
 71 | 
 72 | # Flask stuff:
 73 | instance/
 74 | .webassets-cache
 75 | 
 76 | # Scrapy stuff:
 77 | .scrapy
 78 | 
 79 | # Sphinx documentation
 80 | docs/_build/
 81 | 
 82 | # PyBuilder
 83 | .pybuilder/
 84 | target/
 85 | 
 86 | # Jupyter Notebook
 87 | .ipynb_checkpoints
 88 | 
 89 | # IPython
 90 | profile_default/
 91 | ipython_config.py
 92 | 
 93 | # pyenv
 94 | #   For a library or package, you might want to ignore these files since the code is
 95 | #   intended to run in multiple environments; otherwise, check them in:
 96 | # .python-version
 97 | 
 98 | # pipenv
 99 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
100 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
101 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
102 | #   install all needed dependencies.
103 | #Pipfile.lock
104 | 
105 | # poetry
106 | #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
107 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
108 | #   commonly ignored for libraries.
109 | #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
110 | #poetry.lock
111 | 
112 | # pdm
113 | #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
114 | #pdm.lock
115 | #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
116 | #   in version control.
117 | #   https://pdm.fming.dev/#use-with-ide
118 | .pdm.toml
119 | 
120 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
121 | __pypackages__/
122 | 
123 | # Celery stuff
124 | celerybeat-schedule
125 | celerybeat.pid
126 | 
127 | # SageMath parsed files
128 | *.sage.py
129 | 
130 | # Environments
131 | **/.pixi/envs
132 | .env
133 | .venv
134 | env/
135 | venv/
136 | ENV/
137 | env.bak/
138 | venv.bak/
139 | 
140 | # Spyder project settings
141 | .spyderproject
142 | .spyproject
143 | 
144 | # Rope project settings
145 | .ropeproject
146 | 
147 | # mkdocs documentation
148 | /site
149 | 
150 | # mypy
151 | .mypy_cache/
152 | .dmypy.json
153 | dmypy.json
154 | 
155 | # Pyre type checker
156 | .pyre/
157 | 
158 | # pytype static type analyzer
159 | .pytype/
160 | 
161 | # Cython debug symbols
162 | cython_debug/
163 | 
164 | # PyCharm
165 | #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
166 | #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
167 | #  and can be added to the global gitignore or merged into this file.  For a more nuclear
168 | #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
169 | #.idea/
170 | 
171 | # Ruff formating
172 | .ruff_cache/
173 | 
174 | # Model weights
175 | **/*.pt
176 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
  1 | #################################################################################
  2 | # Settings                                                                      #
  3 | #################################################################################
  4 | 
  5 | ifeq (, $(shell which nvidia-smi))
  6 | CPU_OR_GPU ?= cpu
  7 | else
  8 | CPU_OR_GPU ?= gpu
  9 | endif
 10 | 
 11 | BLOCK_INTERNET ?= true
 12 | PLATFORM_ARGS = --platform linux/amd64
 13 | 
 14 | TAG := ${CPU_OR_GPU}-latest
 15 | LOCAL_TAG := ${CPU_OR_GPU}-local
 16 | 
 17 | IMAGE_NAME = literacy-screening-competition
 18 | OFFICIAL_IMAGE = literacyscreening.azurecr.io/${IMAGE_NAME}
 19 | LOCAL_IMAGE = ${IMAGE_NAME}
 20 | 
 21 | # Resolve which image to use in commands. The priority is:
 22 | # 1. User-provided, e.g., SUBMISSION_IMAGE=my-image:gpu-local make test-submission
 23 | # 2. Local image, e.g., literacy-screening-competition:gpu-local
 24 | # 3. Official competition image, e.g., literacyscreening.azurecr.io/literacy-screening-competition
 25 | SUBMISSION_IMAGE ?= ${LOCAL_IMAGE}:${LOCAL_TAG}
 26 | ifeq (,$(shell docker images -q ${SUBMISSION_IMAGE}))
 27 | SUBMISSION_IMAGE = ${OFFICIAL_IMAGE}:${TAG}
 28 | endif
 29 | 
 30 | # Get the image ID
 31 | SUBMISSION_IMAGE_ID := $(shell docker images -q ${SUBMISSION_IMAGE})
 32 | 
 33 | # Name of the running container, i.e., docker run ... --name <CONTAINER_NAME>
 34 | CONTAINER_NAME ?= ${IMAGE_NAME}
 35 | 
 36 | # Enable or disable host GPU access
 37 | ifeq (${CPU_OR_GPU}, gpu)
 38 | GPU_ARGS = --gpus all
 39 | endif
 40 | 
 41 | SKIP_GPU ?= false
 42 | ifeq (${SKIP_GPU}, true)
 43 | GPU_ARGS =
 44 | endif
 45 | 
 46 | # If no TTY (for example GitHub Actions CI) no interactive tty commands for docker
 47 | ifneq (true, ${GITHUB_ACTIONS_NO_TTY})
 48 | TTY_ARGS = -it
 49 | endif
 50 | 
 51 | # Option to block or allow internet access from the submission Docker container
 52 | ifeq (true, ${BLOCK_INTERNET})
 53 | NETWORK_ARGS = --network none
 54 | endif
 55 | 
 56 | # Name of the example submission to pack when running `make pack-example`
 57 | EXAMPLE ?= random
 58 | 
 59 | .PHONY: _check_image _echo_image _submission_write_perms
 60 | 
 61 | # Give write access to the submission folder to everyone so Docker user can write when mounted
 62 | _submission_write_perms:
 63 | 	mkdir -p submission/
 64 | 	chmod -R 0777 submission/
 65 | 
 66 | 
 67 | _check_image:
 68 | # If container does not exist, error and tell user to pull or build
 69 | ifeq (${SUBMISSION_IMAGE_ID},)
 70 | 	$(error To test your submission, you must first run `make pull` (to get official container) or `make build` \
 71 | 		(to build a local version if you have changes).)
 72 | endif
 73 | 
 74 | _echo_image:
 75 | 	@echo
 76 | ifeq (,${SUBMISSION_IMAGE_ID})
 77 | 	@echo "$$(tput bold)Using image:$$(tput sgr0) ${SUBMISSION_IMAGE} (image does not exist locally)"
 78 | 	@echo
 79 | else
 80 | 	@echo "$$(tput bold)Using image:$$(tput sgr0) ${SUBMISSION_IMAGE} (${SUBMISSION_IMAGE_ID})"
 81 | 	@echo "┏"
 82 | 	@echo "┃ NAME(S)"
 83 | 
 84 | 	@docker inspect $(SUBMISSION_IMAGE_ID) --format='{{join .RepoTags "\n"}}' | awk '{print "┃ "$$0}'
 85 | 
 86 | 	@echo "└"
 87 | 	@echo
 88 | endif
 89 | ifeq (,$(shell docker images ${OFFICIAL_IMAGE} -q))
 90 | 	@echo "$$(tput bold)No official images available locally$$(tput sgr0)"
 91 | 	@echo "Run 'make pull' to download the official image."
 92 | 	@echo
 93 | else
 94 | 	@echo "$$(tput bold)Available official images:$$(tput sgr0)"
 95 | 	@echo "┏"
 96 | 	@docker images ${OFFICIAL_IMAGE} | awk '{print "┃ "$$0}'
 97 | 	@echo "└"
 98 | 	@echo
 99 | endif
100 | ifeq (,$(shell docker images ${LOCAL_IMAGE} -q))
101 | 	@echo "$$(tput bold)No local images available$$(tput sgr0)"
102 | 	@echo "Run 'make build' to build the image."
103 | 	@echo
104 | else
105 | 	@echo "$$(tput bold)Available local images:$$(tput sgr0)"
106 | 	@echo "┏"
107 | 	@docker images ${LOCAL_IMAGE} | awk '{print "┃ "$$0}'
108 | 	@echo "└"
109 | 	@echo
110 | endif
111 | 
112 | #################################################################################
113 | # Commands for building the container if you are changing the requirements      #
114 | #################################################################################
115 | .PHONY: build clean interact-container pack-example pack-submission pull test-container test-submission update-lockfile
116 | 
117 | ## Builds the container locally
118 | build:
119 | 	docker build runtime \
120 | 		--build-arg CPU_OR_GPU=${CPU_OR_GPU} \
121 | 		--tag ${LOCAL_IMAGE}:${LOCAL_TAG}
122 | 
123 | ## Updates runtime environment lockfile using Docker
124 | update-lockfile: runtime/pixi.lock
125 | 	@echo Building Docker image to generate lockfile
126 | 	docker build runtime \
127 | 		--file runtime/Dockerfile-lock \
128 | 		--tag pixi-lock:local
129 | 	@echo Running lock container
130 | 	docker run \
131 | 		${PLATFORM_ARGS} \
132 | 		--mount type=bind,source="$(shell pwd)"/runtime/pixi.toml,target=/tmp/pixi.toml \
133 | 		--mount type=bind,source="$(shell pwd)"/runtime/pixi.lock,target=/tmp/pixi.lock \
134 | 		--rm \
135 | 		pixi-lock:local
136 | 
137 | ## Ensures that your locally built image can import all the Python packages successfully when it runs
138 | test-container: _check_image _echo_image _submission_write_perms
139 | 	docker run \
140 | 		${PLATFORM_ARGS} \
141 | 		${GPU_ARGS} \
142 | 		${NETWORK_ARGS} \
143 | 		${TTY_ARGS} \
144 | 		--mount type=bind,source="$(shell pwd)"/runtime/tests,target=/tests,readonly \
145 | 		--pid host \
146 | 		${SUBMISSION_IMAGE_ID} \
147 | 		pixi run -e ${CPU_OR_GPU} python -m pytest tests
148 | 
149 | ## Open an interactive bash shell within the running container (with network access)
150 | interact-container: _check_image _echo_image _submission_write_perms
151 | 	docker run \
152 | 		${PLATFORM_ARGS} \
153 | 		${GPU_ARGS} \
154 | 		${NETWORK_ARGS} \
155 | 		--mount type=bind,source=${shell pwd}/data,target=/code_execution/data,readonly \
156 | 		--mount type=bind,source="$(shell pwd)/submission",target=/code_execution/submission \
157 | 		--shm-size 8g \
158 | 		--pid host \
159 | 		-it \
160 | 		${SUBMISSION_IMAGE_ID} \
161 | 		bash
162 | 
163 | #################################################################################
164 | # Commands for testing and debugging your submission                            #
165 | #################################################################################
166 | 
167 | ## Pulls the official container from Azure Container Registry
168 | pull:
169 | 	docker pull ${OFFICIAL_IMAGE}:${TAG}
170 | 
171 | ## Creates a submission/submission.zip file from the source code in examples
172 | pack-example:
173 | # Don't overwrite so no work is lost accidentally
174 | ifneq (,$(wildcard ./submission/submission.zip))
175 | 	$(error You already have a submission/submission.zip file. Rename or remove that file (e.g., rm submission/submission.zip).)
176 | endif
177 | 	mkdir -p submission/
178 | 	cd examples/${EXAMPLE}; zip -r ../../submission/submission.zip ./*
179 | 
180 | ## Creates a submission/submission.zip file from the source code in submission_src
181 | pack-submission:
182 | # Don't overwrite so no work is lost accidentally
183 | ifneq (,$(wildcard ./submission/submission.zip))
184 | 	$(error You already have a submission/submission.zip file. Rename or remove that file (e.g., rm submission/submission.zip).)
185 | endif
186 | # Note that the glob wildcard excludes hidden/dot files
187 | 	mkdir -p submission/
188 | 	cd submission_src; zip -r ../submission/submission.zip ./*
189 | 
190 | ## Runs container using code from `submission/submission.zip` and data from `/code_execution/data/`
191 | test-submission: _check_image _echo_image _submission_write_perms
192 | # if submission file does not exist
193 | ifeq (,$(wildcard ./submission/submission.zip))
194 | 	$(error To test your submission, you must first put a "submission.zip" file in the "submission" folder. \
195 | 	  If you want to use an example, you can run `make pack-example` first)
196 | endif
197 | 	docker run \
198 | 		${PLATFORM_ARGS} \
199 | 		${TTY_ARGS} \
200 | 		${GPU_ARGS} \
201 | 		${NETWORK_ARGS} \
202 | 		-e LOGURU_LEVEL=INFO \
203 | 		-e IS_SMOKE_TEST=true \
204 | 		--mount type=bind,source=${shell pwd}/data,target=/code_execution/data,readonly \
205 | 		--mount type=bind,source="$(shell pwd)/submission",target=/code_execution/submission \
206 | 		--shm-size 8g \
207 | 		--pid host \
208 | 		--name ${CONTAINER_NAME} \
209 | 		--rm \
210 | 		${SUBMISSION_IMAGE_ID}
211 | 
212 | ## Delete temporary Python cache and bytecode files
213 | clean:
214 | 	find . -type f -name "*.py[co]" -delete
215 | 	find . -type d -name "__pycache__" -delete
216 | 
217 | ## Format code with ruff
218 | format:
219 | 	ruff format
220 | 
221 | #################################################################################
222 | # Self Documenting Commands                                                     #
223 | #################################################################################
224 | 
225 | .DEFAULT_GOAL := help
226 | 
227 | # Inspired by <http://marmelab.com/blog/2016/02/29/auto-documented-makefile.html>
228 | # sed script explained:
229 | # /^##/:
230 | # 	* save line in hold space
231 | # 	* purge line
232 | # 	* Loop:
233 | # 		* append newline + line to hold space
234 | # 		* go to next line
235 | # 		* if line starts with doc comment, strip comment character off and loop
236 | # 	* remove target prerequisites
237 | # 	* append hold space (+ newline) to line
238 | # 	* replace newline plus comments by `---`
239 | # 	* print line
240 | # Separate expressions are necessary because labels cannot be delimited by
241 | # semicolon; see <http://stackoverflow.com/a/11799865/1968>
242 | .PHONY: help
243 | help: _echo_image
244 | 	@echo
245 | 	@echo "$$(tput bold)Available commands:$$(tput sgr0)"
246 | 	@echo
247 | 	@sed -n -e "/^## / { \
248 | 		h; \
249 | 		s/.*//; \
250 | 		:doc" \
251 | 		-e "H; \
252 | 		n; \
253 | 		s/^## //; \
254 | 		t doc" \
255 | 		-e "s/:.*//; \
256 | 		G; \
257 | 		s/\\n## /---/; \
258 | 		s/\\n/ /g; \
259 | 		p; \
260 | 	}" ${MAKEFILE_LIST} \
261 | 	| LC_ALL='C' sort --ignore-case \
262 | 	| awk -F '---' \
263 | 		-v ncol=$$(tput cols) \
264 | 		-v indent=19 \
265 | 		-v col_on="$$(tput setaf 6)" \
266 | 		-v col_off="$$(tput sgr0)" \
267 | 	'{ \
268 | 		printf "%s%*s%s ", col_on, -indent, $$1, col_off; \
269 | 		n = split($$2, words, " "); \
270 | 		line_length = ncol - indent; \
271 | 		for (i = 1; i <= n; i++) { \
272 | 			line_length -= length(words[i]) + 1; \
273 | 			if (line_length <= 0) { \
274 | 				line_length = ncol - indent - length(words[i]) - 1; \
275 | 				printf "\n%*s ", -indent, " "; \
276 | 			} \
277 | 			printf "%s ", words[i]; \
278 | 		} \
279 | 		printf "\n"; \
280 | 	}' \
281 | 	| more $(shell test $(shell uname) = Darwin && echo '--no-init --raw-control-chars')
282 | 	@echo
283 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Goodnight Moon, Hello Early Literacy Screening
  2 | 
  3 | ![Python 3.12](https://img.shields.io/badge/Python-3.12-blue) [![Goodnight Moon, Hello Early Literacy Screening](https://img.shields.io/badge/DrivenData-Goodnight%20Moon,%20Hello%20Early%20Literacy%20Screening-white?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABGdBTUEAALGPC/xhBQAABBlpQ0NQa0NHQ29sb3JTcGFjZUdlbmVyaWNSR0IAADiNjVVdaBxVFD67c2cjJM5TbDSFdKg/DSUNk1Y0obS6f93dNm6WSTbaIuhk9u7OmMnOODO7/aFPRVB8MeqbFMS/t4AgKPUP2z60L5UKJdrUICg+tPiDUOiLpuuZOzOZabqx3mXufPOd75577rln7wXouapYlpEUARaari0XMuJzh4+IPSuQhIegFwahV1EdK12pTAI2Twt3tVvfQ8J7X9nV3f6frbdGHRUgcR9is+aoC4iPAfCnVct2AXr6kR8/6loe9mLotzFAxC96uOFj18NzPn6NaWbkLOLTiAVVU2qIlxCPzMX4Rgz7MbDWX6BNauuq6OWiYpt13aCxcO9h/p9twWiF823Dp8+Znz6E72Fc+ys1JefhUcRLqpKfRvwI4mttfbYc4NuWm5ERPwaQ3N6ar6YR70RcrNsHqr6fpK21iiF+54Q28yziLYjPN+fKU8HYq6qTxZzBdsS3NVry8jsEwIm6W5rxx3L7bVOe8ufl6jWay3t5RPz6vHlI9n1ynznt6Xzo84SWLQf8pZeUgxXEg4h/oUZB9ufi/rHcShADGWoa5Ul/LpKjDlsv411tpujPSwwXN9QfSxbr+oFSoP9Es4tygK9ZBqtRjI1P2i256uv5UcXOF3yffIU2q4F/vg2zCQUomDCHvQpNWAMRZChABt8W2Gipgw4GMhStFBmKX6FmFxvnwDzyOrSZzcG+wpT+yMhfg/m4zrQqZIc+ghayGvyOrBbTZfGrhVxjEz9+LDcCPyYZIBLZg89eMkn2kXEyASJ5ijxN9pMcshNk7/rYSmxFXjw31v28jDNSpptF3Tm0u6Bg/zMqTFxT16wsDraGI8sp+wVdvfzGX7Fc6Sw3UbbiGZ26V875X/nr/DL2K/xqpOB/5Ffxt3LHWsy7skzD7GxYc3dVGm0G4xbw0ZnFicUd83Hx5FcPRn6WyZnnr/RdPFlvLg5GrJcF+mr5VhlOjUSs9IP0h7QsvSd9KP3Gvc19yn3Nfc59wV0CkTvLneO+4S5wH3NfxvZq8xpa33sWeRi3Z+mWa6xKISNsFR4WcsI24VFhMvInDAhjQlHYgZat6/sWny+ePR0OYx/mp/tcvi5WAYn7sQL0Tf5VVVTpcJQpHVZvTTi+QROMJENkjJQ2VPe4V/OhIpVP5VJpEFM7UxOpsdRBD4ezpnagbQL7/B3VqW6yUurSY959AlnTOm7rDc0Vd0vSk2IarzYqlprq6IioGIbITI5oU4fabVobBe/e9I/0mzK7DxNbLkec+wzAvj/x7Psu4o60AJYcgIHHI24Yz8oH3gU484TastvBHZFIfAvg1Pfs9r/6Mnh+/dTp3MRzrOctgLU3O52/3+901j5A/6sAZ41/AaCffFUDXAvvAAAAIGNIUk0AAHomAACAhAAA+gAAAIDoAAB1MAAA6mAAADqYAAAXcJy6UTwAAABEZVhJZk1NACoAAAAIAAIBEgADAAAAAQABAACHaQAEAAAAAQAAACYAAAAAAAKgAgAEAAAAAQAAABCgAwAEAAAAAQAAABAAAAAA/iXkXAAAAVlpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IlhNUCBDb3JlIDUuNC4wIj4KICAgPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4KICAgICAgPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIKICAgICAgICAgICAgeG1sbnM6dGlmZj0iaHR0cDovL25zLmFkb2JlLmNvbS90aWZmLzEuMC8iPgogICAgICAgICA8dGlmZjpPcmllbnRhdGlvbj4xPC90aWZmOk9yaWVudGF0aW9uPgogICAgICA8L3JkZjpEZXNjcmlwdGlvbj4KICAgPC9yZGY6UkRGPgo8L3g6eG1wbWV0YT4KTMInWQAAAGZJREFUOBFj/HdD5j8DBYCJAr1grSzzmDRINiNFbQ8jTBPFLoAZNHA04/O8g2THguQke0aKw4ClX5uw97vS7eGhjq6aYhegG0h/PuOfohCyYoGlbw04XCgOA8bwI7PIcgEssCh2AQDqYhG4FWqALwAAAABJRU5ErkJggg==)](https://www.drivendata.org/competitions/298/literacy-screening)
  4 | 
  5 | Welcome to the runtime repository for the [Goodnight Moon, Hello Early Literacy Screening](https://www.drivendata.org/competitions/298/literacy-screening) competition on DrivenData! This repository contains a few things to help you create your code submission for this code execution competition:
  6 | 
  7 | 1. **Runtime environment specification** ([`runtime/`](./runtime/)) — the definition of the environment in which your code will run.
  8 | 2. **Example submissions** ([`examples/`](./examples/)) — simple demonstration solutions that will run successfully in the code execution runtime and output a valid submission.
  9 |     - **Random probabilities** ([`examples/random`](./examples/random/main.py/)): a dummy submission that generates a random prediction for each audio file
 10 |     - **Whisper transcription** ([`examples/transcription`](./examples/transcription/main.py/)): a baseline submission that shows how to load a model asset as part of your submission. This submission uses OpenAI's Whisper model to transcribe each audio clip and compares the transcription to the expected text. It requires that you download the model weights beforehand and include them in the `assets` directory. There's no internet access in the runtime container, so any pretrained model weights must be included as part of the submission.
 11 | 
 12 | You can use this repository to:
 13 | 
 14 | 💡 **Get started**: The example submissions provide a basic functional solution. They probably won't win you the competition, but you can use them as a guide for bringing in your own work and generating a real submission.
 15 | 
 16 | 🔧 **Test your submission**: Test your submission using a locally running version of the competition runtime to discover errors before submitting to the competition website.
 17 | 
 18 | 📦 **Request new packages in the official runtime**: Since your submission will not have general access to the internet, all dependencies must be pre-installed. If you want to use a package that is not in the runtime environment, make a pull request to this repository.
 19 | 
 20 | Changes to the repository are documented in [CHANGELOG.md](./CHANGELOG.md).
 21 | 
 22 | ---
 23 | 
 24 | #### [1. Quickstart](#quickstart)
 25 | 
 26 | - [Prerequisites](#prerequisites)
 27 | - [Setting up the data directory](#setting-up-the-data-directory)
 28 | - [Running `make` commands](#running-make-commands)
 29 | 
 30 | #### [2. Testing a submission locally](#testing-your-submission-locally)
 31 | - [Code submission format](#code-submission-format)
 32 | - [Running your submission locally](#running-your-submission-locally)
 33 | - [Smoke tests](#smoke-tests)
 34 | 
 35 | #### [3. Updating runtime packages](#updating-runtime-packages)
 36 | 
 37 | #### [4. Makefile commands](#make-commands)
 38 | 
 39 | ---
 40 | 
 41 | ## Quickstart
 42 | 
 43 | This quickstart guide will show you how to get the provided example solution running end-to-end. Once you get there, it's off to the races!
 44 | 
 45 | ### Prerequisites
 46 | 
 47 | When you make a submission on the DrivenData competition site, we run your submission inside a Docker container, a virtual operating system that allows for a consistent software environment across machines. **The best way to make sure your submission to the site will run is to first run it successfully in the container on your local machine**. For that, you'll need:
 48 | 
 49 | - A clone of this repository
 50 | - [Docker](https://docs.docker.com/get-docker/)
 51 | - At least 8 GB of free space for the Docker image
 52 | - [GNU make](https://www.gnu.org/software/make/) (optional, but useful for running the commands in the Makefile)
 53 | 
 54 | Additional requirements to run with GPU:
 55 | 
 56 | - [NVIDIA drivers](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation) with CUDA 12
 57 | - [NVIDIA container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html)
 58 | 
 59 | ### Setting up the data directory
 60 | 
 61 | In the official code execution platform, `code_execution/data` will contain the test set audio files, `test_metadata.csv`, and `submission_format.csv`.
 62 | 
 63 | To test your submission locally, you should use the smoke test data from the [data download page](https://www.drivendata.org/competitions/298/literacy-screening/data/). Download `smoke.tar.gz` and then run `tar xzvf smoke.tar.gz --strip-components=1 -C data/`. This will extract the files directly into `data/` without nesting them in subdirectories. Your local `data` directory should look like:
 64 | 
 65 | ```
 66 | data
 67 | ├── bfaiol.wav
 68 | ├── czfqjg.wav
 69 | ├── fprljz.wav
 70 | ├── hgxrel.wav
 71 | ├── htfbnp.wav
 72 | ├── idjpne.wav
 73 | ├── ktvyww.wav
 74 | ├── ltbona.wav
 75 | ├── submission_format.csv
 76 | ├── test_labels.csv
 77 | └── test_metadata.csv
 78 | ```
 79 | 
 80 | Now you're ready to run your submission against this data!
 81 | 
 82 | Keep in mind, the smoke test data contains clips from the _training set_. That's why we provide the labels too. Of course, the real test set labels won't be available in the runtime container 😉
 83 | 
 84 | ### Running `make` commands
 85 | 
 86 | To test out the full execution pipeline, make sure Docker is running and then run the following commands in the terminal:
 87 | 
 88 | 1. **`make pull`** pulls the latest official Docker image from the container registry. You'll need an internet connection for this.
 89 | 1. **`make pack-example`** packages a code submission with the `main.py` contained in `examples/random/` and saves it as `submission/submission.zip`. 
 90 | 1. **`make test-submission`** will do a test run of your submission, simulating what happens during actual code execution. This command runs the Docker container with the requisite host directories mounted, and executes `main.py` to produce a submission file containing your predictions.
 91 | 
 92 | ```bash
 93 | make pull
 94 | make pack-example
 95 | make test-submission
 96 | ```
 97 | 
 98 | 🎉 **Congratulations!** You've just completed your first test run for the Goodnight Moon, Hello Early Literacy Screening Challenge. If everything worked as expected, you should see that a new submission file has been generated.
 99 | 
100 | If you were ready to make a real submission to the competition, you would upload the `submission.zip` file from step 2 above to the competition [submission page](https://www.drivendata.org/competitions/298/literacy-screening/submissions/).
101 | 
102 | To run the Whisper transcription example instead, replace the second command with `EXAMPLE=transcription make pack-example`. Just be sure to [download](https://github.com/drivendataorg/literacy-screening-runtime/blob/09cbb05aac0573d635d6d886450f20a619617612/examples/transcription/main.py#L16-L18) the Whisper model first and include it in the `assets` directory. There's no internet access in the runtime container, so any pretrained model weights must be included as part of the submission.
103 | 
104 | ## Testing your submission locally
105 | 
106 | As you develop your own submission, you'll need to know a little bit more about how your submission will be unpacked for running inference. This section contains more complete documentation for developing and testing your own submission.
107 | 
108 | ### Code submission format
109 | 
110 | Your final submission should be a zip archive named with the extension `.zip` (for example, `submission.zip`). The root level of the `submission.zip` file must contain a `main.py` which performs inference on the test audio clips and writes the predictions to a file named `submission.csv` in the same directory as `main.py`. Check out the `main.py` scripts in the [example submissions](./examples/).
111 | 
112 | ### Running your submission locally
113 | 
114 | This section provides instructions on how to run the your submission in the code execution container from your local machine. To simplify the steps, key processes have been defined in the `Makefile`. Commands from the `Makefile` are then run with `make {command_name}`. The basic steps are:
115 | 
116 | ```sh
117 | make pull
118 | make pack-submission
119 | make test-submission
120 | ```
121 | 
122 | Run `make help` for more information about the available commands as well as information on the official and built images that are available locally.
123 | 
124 | Here's the process in a bit more detail:
125 | 
126 | 1. First, make sure you have set up the [prerequisites](#prerequisites).
127 | 2. Download the official competition Docker image:
128 | 
129 |     ```sh
130 |     make pull
131 |     ```
132 | 
133 | > [!NOTE]
134 | > If you have built a local version of the runtime image with `make build`, that image will take precedence over the pulled image when using any make commands that run a container. You can explicitly use the pulled image by setting the `SUBMISSION_IMAGE` shell/environment variable to the pulled image or by deleting all locally built images.
135 | 
136 | 3. Save all of your submission files, including the required `main.py` script, in the `submission_src` folder of the runtime repository. Make sure any needed model weights and other assets are saved in `submission_src` as well.
137 | 
138 | 4. Create a `submission/submission.zip` file containing your code and model assets:
139 | 
140 |     ```sh
141 |     make pack-submission
142 |     #> mkdir -p submission/
143 |     #> cd submission_src; zip -r ../submission/submission.zip ./*
144 |     #>   adding: main.py (deflated 73%)
145 |     ```
146 | 
147 | 5. Launch an instance of the competition Docker image, and run the same inference process that will take place in the official runtime:
148 | 
149 |     ```sh
150 |     make test-submission
151 |     ```
152 | 
153 | This runs the container [entrypoint](./runtime/entrypoint.sh) script. First, it unzips `submission/submission.zip` into `/code_execution/` in the container. Then, it runs your submitted `main.py`. In the local testing setting, the final submission is saved out to the `submission/` folder on your local machine.
154 | 
155 | > [!NOTE]
156 | > Remember that `/code_execution/data` is just a mounted version of what you have saved locally in `data` so you will just be using the training files for local testing. In the official code execution platform, `/code_execution/data` will contain the actual test data.
157 | 
158 | When you run `make test-submission` the logs will be printed to the terminal and written out to `submission/log.txt`. If you run into errors, use the container logs written to `log.txt` to determine what changes you need to make for your code to execute successfully.
159 | 
160 | ### Smoke tests
161 | 
162 | When submitting on the platform, you will have the ability to submit "smoke tests." Smoke tests run on a reduced version of the train set data in order to run and debug issues more quickly. They will not be considered for prize evaluation and are intended to let you test your code for correctness. **You should test your code locally as thorougly as possible before submitting your code for smoke tests or for full evaluation.**
163 | 
164 | ## Updating runtime packages
165 | 
166 | If you want to use a package that is not in the environment, you are welcome to make a pull request to this repository. Remember, your submission will only have access to packages in this runtime repository. If you're new to the GitHub contribution workflow, check out [this guide by GitHub](https://docs.github.com/en/get-started/quickstart/contributing-to-projects).
167 | 
168 | The runtime manages dependencies using [Pixi](https://pixi.sh/latest/). Here is a good [tutorial](https://pixi.sh/latest/tutorials/python/) to get started with Pixi. The official runtime uses **Python 3.12.7**.
169 | 
170 | 1. Fork this repository.
171 | 
172 | 2. Install pixi. See [here](https://pixi.sh/latest/#installation) for installation options.
173 | 
174 | 3. Edit the `runtime/pixi.toml` file to add your new packages. We recommend starting without a specific pinned version, and then pinning to the version in the resolved `pixi.lock` file that is generated.
175 | 
176 |     - Conda-installed packages go in the `dependencies` section. These install from the [conda-forge](https://anaconda.org/conda-forge/) channel. **Installing packages with conda is strongly preferred.** Packages should only be installed using pip if they are not available in a conda channel.
177 |     - Pip-installed packages go in the `pypi-dependencies` section.
178 |     - GPU-specific dependencies go in the `features.cuda.dependencies` section, but these should be uncommon.
179 | 
180 | 4. With Docker open and running, run `make update-lockfile`. This will generate an updated `runtime/pixi.lock` from `runtime/pixi.toml` within a Docker container.
181 | 
182 | 5. Locally test that the Docker image builds successfully for both the CPU and GPU environment:
183 | 
184 |     ```sh
185 |     CPU_OR_GPU=cpu make build
186 |     CPU_OR_GPU=gpu make build
187 |     ```
188 | 
189 | 6. Commit the changes to your forked repository. Ensure that your branch includes updated versions of both `runtime/pixi.toml` and `runtime/pixi.lock`.
190 | 
191 | 7. Open a pull request from your branch to the `main` branch of this repository. Navigate to the [Pull requests](https://github.com/drivendataorg/literacy-screening-runtime/pulls) tab in this repository, and click the "New pull request" button. For more detailed instructions, check out [GitHub's help page](https://help.github.com/en/articles/creating-a-pull-request-from-a-fork).
192 | 
193 | 8. Once you open the pull request, we will use Github Actions to build the Docker images with your changes and run the tests in `runtime/tests`. For security reasons, administrators may need to approve the workflow run before it happens. Once it starts, the process can take up to 30 minutes, and may take longer if your build is queued behind others. You will see a section on the pull request page that shows the status of the tests and links to the logs ("Details"):
194 | 
195 |     ![Example appearance of Github Actions](https://s3.amazonaws.com/drivendata-public-assets/codex_github_actions_build.png)
196 | 
197 | 9. You may be asked to submit revisions to your pull request if the tests fail or if a DrivenData staff member has feedback. Pull requests won't be merged until all tests pass and the team has reviewed and approved the changes.
198 | 
199 | ## Make commands
200 | 
201 | A Makefile with several helpful shell recipes is included in the repository. The runtime documentation above uses it extensively. Running `make` by itself in your shell will list relevant Docker images and provide you the following list of available commands:
202 | 
203 | ```
204 | Available commands:
205 | 
206 | build               Builds the container locally
207 | clean               Delete temporary Python cache and bytecode files
208 | format              Format code with ruff
209 | interact-container  Open an interactive bash shell within the running container (with network access)
210 | pack-example        Creates a submission/submission.zip file from the source code in examples
211 | pack-submission     Creates a submission/submission.zip file from the source code in submission_src
212 | pull                Pulls the official container from Azure Container Registry
213 | test-container      Ensures that your locally built image can import all the Python packages successfully when it runs
214 | test-submission     Runs container using code from `submission/submission.zip` and data from `/code_execution/data/`
215 | update-lockfile     Updates runtime environment lockfile using Docker
216 | ```
217 | 


--------------------------------------------------------------------------------