├── data └── .gitkeep ├── submission_src └── .gitkeep ├── dev-requirements.txt ├── examples ├── transcription │ ├── assets │ │ └── .gitkeep │ └── main.py └── random │ └── main.py ├── pyproject.toml ├── runtime ├── apt.txt ├── Dockerfile-lock ├── tests │ └── test_packages.py ├── pixi.toml ├── entrypoint.sh └── Dockerfile ├── MAINTAINERS.md ├── .dockerignore ├── CHANGELOG.md ├── LICENSE ├── .github └── workflows │ └── build.yml ├── .gitignore ├── Makefile └── README.md /data/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /submission_src/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /dev-requirements.txt: -------------------------------------------------------------------------------- 1 | ruff 2 | -------------------------------------------------------------------------------- /examples/transcription/assets/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [tool.ruff] 2 | line-length = 99 3 | -------------------------------------------------------------------------------- /runtime/apt.txt: -------------------------------------------------------------------------------- 1 | curl 2 | ffmpeg 3 | libxml2 4 | tzdata 5 | wget 6 | zip 7 | -------------------------------------------------------------------------------- /MAINTAINERS.md: -------------------------------------------------------------------------------- 1 | # Maintainer notes 2 | 3 | This file documents notes for maintainers of the repository. 4 | -------------------------------------------------------------------------------- /.dockerignore: -------------------------------------------------------------------------------- 1 | # Ignore everything by default 2 | * 3 | 4 | # Whitelist specific files/directories 5 | !runtime/ -------------------------------------------------------------------------------- /runtime/Dockerfile-lock: -------------------------------------------------------------------------------- 1 | FROM --platform=linux/amd64 ghcr.io/prefix-dev/pixi:0.34.0-jammy 2 | 3 | USER root 4 | 5 | RUN mkdir -p /tmp 6 | WORKDIR /tmp 7 | 8 | ENTRYPOINT ["pixi", "ls", "--manifest-path", "pixi.toml", "--platform", "linux-64", "-v"] 9 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | All notable changes to this project will be documented in this file. 4 | 5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), 6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). 7 | 8 | ## 2024-11-05 9 | 10 | ### Added 11 | 12 | - Initial commit 13 | 14 | -------------------------------------------------------------------------------- /runtime/tests/test_packages.py: -------------------------------------------------------------------------------- 1 | import importlib 2 | import subprocess 3 | 4 | import pytest 5 | 6 | packages = [ 7 | "numpy", 8 | "pandas", 9 | "scipy", 10 | "sklearn", 11 | "torch", 12 | "torchaudio", 13 | "transformers", 14 | "whisper", 15 | "speechbrain", 16 | ] 17 | 18 | 19 | def is_gpu_available(): 20 | try: 21 | return subprocess.check_call(["nvidia-smi"]) == 0 22 | 23 | except FileNotFoundError: 24 | return False 25 | 26 | 27 | GPU_AVAILABLE = is_gpu_available() 28 | 29 | 30 | @pytest.mark.parametrize("package_name", packages, ids=packages) 31 | def test_import(package_name): 32 | """Test that certain dependencies are importable.""" 33 | importlib.import_module(package_name) 34 | 35 | 36 | @pytest.mark.skipif(not GPU_AVAILABLE, reason="No GPU available") 37 | def test_allocate_torch(): 38 | import torch 39 | 40 | assert torch.cuda.is_available() 41 | 42 | torch.zeros(1).cuda() 43 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 DrivenData 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /runtime/pixi.toml: -------------------------------------------------------------------------------- 1 | [project] 2 | name = "literacy-screening" 3 | channels = ["conda-forge", "pytorch"] 4 | platforms = ["linux-64"] 5 | 6 | # conda package dependencies 7 | [dependencies] 8 | accelerate = "1.0.1" 9 | einops = "0.8.0" 10 | langchain = "0.3.*" 11 | librosa = "0.10.2.post1" 12 | loguru = "0.7.*" 13 | numpy = "1.26.*" 14 | pandas = "2.2.3" 15 | pytest = "8.3.3" 16 | python = "3.12.7" 17 | pytorch = {version = "2.4.1", channel = "pytorch"} 18 | scikit-learn = "1.5.*" 19 | scipy = "1.14.*" 20 | torchaudio = {version = "2.4.1", channel = "pytorch"} 21 | torchvision = {version = "0.19.1", channel = "pytorch"} 22 | transformers = "4.46.*" 23 | tqdm = "4.66.*" 24 | xgboost = "2.1.*" 25 | 26 | [pypi-dependencies] 27 | speechbrain = "==1.0.1" 28 | openai-whisper = "==20240930" 29 | opensmile = "==2.5.0" 30 | 31 | [feature.cuda] 32 | platforms = ["linux-64"] 33 | channels = ["nvidia", {channel = "pytorch", priority = -1}] 34 | system-requirements = {cuda = "12.1"} 35 | 36 | [feature.cuda.dependencies] 37 | pytorch-cuda = {version = "12.1.*", channel = "pytorch"} 38 | 39 | [feature.cpu] 40 | platforms = ["linux-64"] 41 | 42 | [feature.cuda.tasks] 43 | check_cuda = 'python -c "import torch; print(torch.cuda.is_available())"' 44 | 45 | [environments] 46 | cpu = ["cpu"] 47 | gpu = ["cuda"] 48 | -------------------------------------------------------------------------------- /examples/random/main.py: -------------------------------------------------------------------------------- 1 | """This is an example submission that just generates random predictions.""" 2 | 3 | from pathlib import Path 4 | 5 | import librosa 6 | from loguru import logger 7 | import numpy as np 8 | import pandas as pd 9 | 10 | DATA_DIR = Path("data/") 11 | 12 | 13 | def main(): 14 | # load the two csvs in the data directory 15 | df = pd.read_csv(DATA_DIR / "submission_format.csv", index_col="filename") 16 | metadata = pd.read_csv(DATA_DIR / "test_metadata.csv", index_col="filename") 17 | 18 | # set random state for a reproducible submission since we're generating random probabilities 19 | rng = np.random.RandomState(99) 20 | 21 | # iterate over audio files 22 | scores = [] 23 | for file in df.index: 24 | logger.info(f"Loading {file}") 25 | audio, sr = librosa.load(DATA_DIR / file) 26 | 27 | # since this is a dummy submission, just assign a random number between 0 and 1 28 | scores.append(rng.random()) 29 | 30 | # write the scores to score column in the submission format 31 | df["score"] = scores 32 | 33 | # write out predictions to submission.csv in the main directory 34 | logger.info("Writing out submission.csv") 35 | df.to_csv("submission.csv") 36 | 37 | 38 | if __name__ == "__main__": 39 | main() 40 | -------------------------------------------------------------------------------- /runtime/entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -euxo pipefail 4 | 5 | main () { 6 | expected_filename=main.py 7 | 8 | cd /code_execution 9 | 10 | submission_files=$(zip -sf ./submission/submission.zip) 11 | if ! grep -q ${expected_filename}<<<$submission_files; then 12 | echo "Submission zip archive must include $expected_filename" 13 | return 1 14 | fi 15 | 16 | echo "Unpacking submission" 17 | unzip ./submission/submission.zip -d ./ 18 | 19 | ls -alh 20 | 21 | if $IS_SMOKE_TEST; then 22 | echo "Running smoke test" 23 | pixi run -e $CPU_OR_GPU python main.py 24 | else 25 | echo "Running submission using $CPU_OR_GPU" 26 | pixi run -e $CPU_OR_GPU python main.py &> "/code_execution/submission/private_log.txt" 27 | fi 28 | 29 | echo "Exporting submission.csv result..." 30 | 31 | # Valid scripts must create a "submission.csv" file within the same directory as main 32 | if [ -f "submission.csv" ] 33 | then 34 | echo "Script completed its run." 35 | cp submission.csv ./submission/submission.csv 36 | else 37 | echo "ERROR: Script did not produce a submission.csv file in the main directory." 38 | exit_code=1 39 | fi 40 | } 41 | 42 | main |& tee "/code_execution/submission/log.txt" 43 | exit_code=${PIPESTATUS[0]} 44 | 45 | # Copy for terminationMessagePath 46 | cp /code_execution/submission/log.txt /tmp/log 47 | 48 | exit $exit_code 49 | -------------------------------------------------------------------------------- /runtime/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM --platform=linux/amd64 ghcr.io/prefix-dev/pixi:0.34.0-jammy-cuda-12.1.1 2 | 3 | USER root 4 | 5 | ARG CPU_OR_GPU=gpu 6 | 7 | ENV DEBIAN_FRONTEND=noninteractive \ 8 | LANG=C.UTF-8 \ 9 | LC_ALL=C.UTF-8 \ 10 | PYTHONUNBUFFERED=1 \ 11 | SHELL=/bin/bash 12 | 13 | # Create user andset permissions 14 | ENV RUNTIME_USER=runtimeuser 15 | ENV RUNTIME_UID=1000 16 | ENV RUNTIME_GID=1000 17 | ENV CPU_OR_GPU=$CPU_OR_GPU 18 | ENV CONDA_OVERRIDE_CUDA="12.1" 19 | 20 | RUN echo "Creating ${RUNTIME_USER} user..." \ 21 | && groupadd --gid ${RUNTIME_GID} ${RUNTIME_USER} \ 22 | && useradd --create-home --gid ${RUNTIME_GID} --no-log-init --uid ${RUNTIME_UID} ${RUNTIME_USER} 23 | 24 | COPY apt.txt apt.txt 25 | RUN apt-get update --fix-missing \ 26 | && apt-get install -y apt-utils 2> /dev/null \ 27 | && xargs -a apt.txt apt-get install -y \ 28 | && apt-get clean \ 29 | && rm -rf /var/lib/apt/lists/* /apt.txt 30 | 31 | # Set up code execution working directory 32 | RUN mkdir /code_execution 33 | RUN chown -R ${RUNTIME_USER}:${RUNTIME_USER} /code_execution 34 | WORKDIR /code_execution 35 | 36 | # Switch to runtime user 37 | USER ${RUNTIME_USER} 38 | 39 | COPY pixi.lock ./pixi.lock 40 | COPY pixi.toml ./pixi.toml 41 | 42 | RUN pixi install -e ${CPU_OR_GPU} --frozen \ 43 | && pixi clean cache --yes \ 44 | && pixi info 45 | 46 | COPY entrypoint.sh /entrypoint.sh 47 | COPY --chown=${RUNTIME_USER}:${RUNTIME_USER} tests ./tests 48 | 49 | CMD ["bash", "/entrypoint.sh"] 50 | -------------------------------------------------------------------------------- /examples/transcription/main.py: -------------------------------------------------------------------------------- 1 | """This is an example submission that uses a pretrained model to generate transcriptions. 2 | Note: for this submission to work, you must download the whisper model to the assets/ dir first.""" 3 | 4 | import string 5 | from pathlib import Path 6 | 7 | from loguru import logger 8 | import numpy as np 9 | import pandas as pd 10 | import torch 11 | import whisper 12 | 13 | DATA_DIR = Path("data/") 14 | 15 | 16 | def download_whisper_model(download_root="assets"): 17 | """Code to download model locally so we can include it in our submission""" 18 | whisper.load_model("turbo", download_root=download_root) 19 | 20 | 21 | def clean_column(col: pd.Series): 22 | return col.str.lower().str.strip().replace(f"[{string.punctuation}]", "", regex=True) 23 | 24 | 25 | def main(): 26 | # load the metadata that has the expected text for each audio file 27 | df = pd.read_csv(DATA_DIR / "test_metadata.csv", index_col="filename") 28 | 29 | # load whisper model and put on GPU if available 30 | device = "cuda" if torch.cuda.is_available() else "cpu" 31 | model = whisper.load_model("assets/large-v3-turbo.pt").to(device) 32 | 33 | # iterate over audio files and get transcribed text 34 | transcribed_texts = [] 35 | for file in df.index: 36 | logger.info(f"Transcribing {file}") 37 | # set temperature at 0 for reproducible results 38 | result = model.transcribe(str(DATA_DIR / file), language="english", temperature=0) 39 | transcribed_texts.append(result["text"]) 40 | 41 | df["transcribed_text"] = transcribed_texts 42 | 43 | # clean columns to avoid false mismatches 44 | df["expected_text"] = clean_column(df.expected_text) 45 | df["transcribed_text"] = clean_column(df.transcribed_text) 46 | 47 | # score = 1 if transcribed text matches expected text 48 | # score = 0.5 if transcription doesn't match; avoids penalizing confident but wrong 49 | df["score"] = np.where(df.transcribed_text == df.expected_text, 1.0, 0.5) 50 | 51 | # ensure index matches submission format 52 | sub_format = pd.read_csv(DATA_DIR / "submission_format.csv", index_col="filename") 53 | preds = df[["score"]].loc[sub_format.index] 54 | 55 | # write out predictions to submission.csv in the main directory 56 | logger.info("Writing out submission.csv") 57 | preds.to_csv("submission.csv") 58 | 59 | 60 | if __name__ == "__main__": 61 | main() 62 | -------------------------------------------------------------------------------- /.github/workflows/build.yml: -------------------------------------------------------------------------------- 1 | 2 | name: build 3 | 4 | on: 5 | push: 6 | branches: [main] 7 | paths: ["runtime/**", ".github/workflows/build.yml"] 8 | pull_request: 9 | paths: ["runtime/**", ".github/workflows/build.yml"] 10 | workflow_dispatch: 11 | inputs: 12 | publishDev: 13 | description: 'Publish dev image as cpu-{sha} and gpu-{sha}' 14 | required: true 15 | default: false 16 | type: boolean 17 | 18 | permissions: 19 | id-token: write 20 | contents: read 21 | 22 | jobs: 23 | build: 24 | name: Build 25 | runs-on: ubuntu-latest 26 | strategy: 27 | matrix: 28 | proc: ["cpu", "gpu"] 29 | env: 30 | SHOULD_PUBLISH_LATEST: ${{ github.ref == 'refs/heads/main' && vars.PUBLISH_LATEST_ON_MAIN != '' }} 31 | SHOULD_PUBLISH: | 32 | ${{ 33 | github.ref == 'refs/heads/main' && vars.PUBLISH_LATEST_ON_MAIN != '' 34 | || github.event_name == 'workflow_dispatch' && inputs.publishDev 35 | }} 36 | 37 | LOGIN_SERVER: literacyscreening.azurecr.io 38 | IMAGE: literacy-screening-competition 39 | 40 | SHA_TAG: ${{ matrix.proc }}-${{ github.sha }} 41 | LATEST_TAG: ${{ matrix.proc }}-latest 42 | 43 | steps: 44 | - name: Remove unwanted software 45 | run: | 46 | echo "Available storage before:" 47 | sudo df -h 48 | echo 49 | sudo rm -rf /usr/share/dotnet 50 | sudo rm -rf /usr/local/lib/android 51 | sudo rm -rf /opt/ghc 52 | sudo rm -rf /opt/hostedtoolcache/CodeQL 53 | echo "Available storage after:" 54 | sudo df -h 55 | echo 56 | 57 | - uses: actions/checkout@v4 58 | 59 | - name: Build Image 60 | run: | 61 | docker build runtime \ 62 | --build-arg CPU_OR_GPU=${{ matrix.proc }} \ 63 | --tag $LOGIN_SERVER/$IMAGE:$SHA_TAG \ 64 | ${{ fromJson(env.SHOULD_PUBLISH_LATEST) && '--tag $LOGIN_SERVER/$IMAGE:$LATEST_TAG' || '' }} 65 | 66 | - name: Check image size 67 | run: | 68 | docker image list $LOGIN_SERVER/$IMAGE --format "{{.Tag}}: {{.Size}}" | tee -a $GITHUB_STEP_SUMMARY 69 | 70 | - name: Tests packages in container 71 | run: | 72 | docker run --network none \ 73 | $LOGIN_SERVER/$IMAGE:$SHA_TAG \ 74 | /code_execution/.pixi/envs/${{ matrix.proc }}/bin/python -m pytest tests 75 | 76 | - name: Log into Azure 77 | if: ${{ fromJson(env.SHOULD_PUBLISH) }} 78 | uses: azure/login@v1 79 | with: 80 | client-id: ${{secrets.AZURE_CLIENT_ID}} 81 | tenant-id: ${{secrets.AZURE_TENANT_ID}} 82 | subscription-id: ${{secrets.AZURE_SUBSCRIPTION_ID}} 83 | 84 | - name: Log into ACR with Docker 85 | if: ${{ fromJson(env.SHOULD_PUBLISH) }} 86 | uses: azure/docker-login@v1 87 | with: 88 | login-server: ${{ env.LOGIN_SERVER }} 89 | username: ${{ secrets.REGISTRY_USERNAME }} 90 | password: ${{ secrets.REGISTRY_PASSWORD }} 91 | 92 | - name: Push image to ACR 93 | if: ${{ fromJson(env.SHOULD_PUBLISH) }} 94 | run: | 95 | docker push $LOGIN_SERVER/$IMAGE --all-tags 96 | 97 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | data/* 2 | submission/* 3 | submission_src/* 4 | !.gitkeep 5 | 6 | ## Python.gitignore 7 | ## https://github.com/github/gitignore/blob/4488915eec0b3a45b5c63ead28f286819c0917de/Python.gitignore 8 | 9 | # Byte-compiled / optimized / DLL files 10 | __pycache__/ 11 | *.py[cod] 12 | *$py.class 13 | 14 | # C extensions 15 | *.so 16 | 17 | # Distribution / packaging 18 | .Python 19 | build/ 20 | develop-eggs/ 21 | dist/ 22 | downloads/ 23 | eggs/ 24 | .eggs/ 25 | lib/ 26 | lib64/ 27 | parts/ 28 | sdist/ 29 | var/ 30 | wheels/ 31 | share/python-wheels/ 32 | *.egg-info/ 33 | .installed.cfg 34 | *.egg 35 | MANIFEST 36 | 37 | # PyInstaller 38 | # Usually these files are written by a python script from a template 39 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 40 | *.manifest 41 | *.spec 42 | 43 | # Installer logs 44 | pip-log.txt 45 | pip-delete-this-directory.txt 46 | 47 | # Unit test / coverage reports 48 | htmlcov/ 49 | .tox/ 50 | .nox/ 51 | .coverage 52 | .coverage.* 53 | .cache 54 | nosetests.xml 55 | coverage.xml 56 | *.cover 57 | *.py,cover 58 | .hypothesis/ 59 | .pytest_cache/ 60 | cover/ 61 | 62 | # Translations 63 | *.mo 64 | *.pot 65 | 66 | # Django stuff: 67 | *.log 68 | local_settings.py 69 | db.sqlite3 70 | db.sqlite3-journal 71 | 72 | # Flask stuff: 73 | instance/ 74 | .webassets-cache 75 | 76 | # Scrapy stuff: 77 | .scrapy 78 | 79 | # Sphinx documentation 80 | docs/_build/ 81 | 82 | # PyBuilder 83 | .pybuilder/ 84 | target/ 85 | 86 | # Jupyter Notebook 87 | .ipynb_checkpoints 88 | 89 | # IPython 90 | profile_default/ 91 | ipython_config.py 92 | 93 | # pyenv 94 | # For a library or package, you might want to ignore these files since the code is 95 | # intended to run in multiple environments; otherwise, check them in: 96 | # .python-version 97 | 98 | # pipenv 99 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 100 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 101 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 102 | # install all needed dependencies. 103 | #Pipfile.lock 104 | 105 | # poetry 106 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 107 | # This is especially recommended for binary packages to ensure reproducibility, and is more 108 | # commonly ignored for libraries. 109 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 110 | #poetry.lock 111 | 112 | # pdm 113 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 114 | #pdm.lock 115 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 116 | # in version control. 117 | # https://pdm.fming.dev/#use-with-ide 118 | .pdm.toml 119 | 120 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 121 | __pypackages__/ 122 | 123 | # Celery stuff 124 | celerybeat-schedule 125 | celerybeat.pid 126 | 127 | # SageMath parsed files 128 | *.sage.py 129 | 130 | # Environments 131 | **/.pixi/envs 132 | .env 133 | .venv 134 | env/ 135 | venv/ 136 | ENV/ 137 | env.bak/ 138 | venv.bak/ 139 | 140 | # Spyder project settings 141 | .spyderproject 142 | .spyproject 143 | 144 | # Rope project settings 145 | .ropeproject 146 | 147 | # mkdocs documentation 148 | /site 149 | 150 | # mypy 151 | .mypy_cache/ 152 | .dmypy.json 153 | dmypy.json 154 | 155 | # Pyre type checker 156 | .pyre/ 157 | 158 | # pytype static type analyzer 159 | .pytype/ 160 | 161 | # Cython debug symbols 162 | cython_debug/ 163 | 164 | # PyCharm 165 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 166 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 167 | # and can be added to the global gitignore or merged into this file. For a more nuclear 168 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 169 | #.idea/ 170 | 171 | # Ruff formating 172 | .ruff_cache/ 173 | 174 | # Model weights 175 | **/*.pt 176 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | ################################################################################# 2 | # Settings # 3 | ################################################################################# 4 | 5 | ifeq (, $(shell which nvidia-smi)) 6 | CPU_OR_GPU ?= cpu 7 | else 8 | CPU_OR_GPU ?= gpu 9 | endif 10 | 11 | BLOCK_INTERNET ?= true 12 | PLATFORM_ARGS = --platform linux/amd64 13 | 14 | TAG := ${CPU_OR_GPU}-latest 15 | LOCAL_TAG := ${CPU_OR_GPU}-local 16 | 17 | IMAGE_NAME = literacy-screening-competition 18 | OFFICIAL_IMAGE = literacyscreening.azurecr.io/${IMAGE_NAME} 19 | LOCAL_IMAGE = ${IMAGE_NAME} 20 | 21 | # Resolve which image to use in commands. The priority is: 22 | # 1. User-provided, e.g., SUBMISSION_IMAGE=my-image:gpu-local make test-submission 23 | # 2. Local image, e.g., literacy-screening-competition:gpu-local 24 | # 3. Official competition image, e.g., literacyscreening.azurecr.io/literacy-screening-competition 25 | SUBMISSION_IMAGE ?= ${LOCAL_IMAGE}:${LOCAL_TAG} 26 | ifeq (,$(shell docker images -q ${SUBMISSION_IMAGE})) 27 | SUBMISSION_IMAGE = ${OFFICIAL_IMAGE}:${TAG} 28 | endif 29 | 30 | # Get the image ID 31 | SUBMISSION_IMAGE_ID := $(shell docker images -q ${SUBMISSION_IMAGE}) 32 | 33 | # Name of the running container, i.e., docker run ... --name 34 | CONTAINER_NAME ?= ${IMAGE_NAME} 35 | 36 | # Enable or disable host GPU access 37 | ifeq (${CPU_OR_GPU}, gpu) 38 | GPU_ARGS = --gpus all 39 | endif 40 | 41 | SKIP_GPU ?= false 42 | ifeq (${SKIP_GPU}, true) 43 | GPU_ARGS = 44 | endif 45 | 46 | # If no TTY (for example GitHub Actions CI) no interactive tty commands for docker 47 | ifneq (true, ${GITHUB_ACTIONS_NO_TTY}) 48 | TTY_ARGS = -it 49 | endif 50 | 51 | # Option to block or allow internet access from the submission Docker container 52 | ifeq (true, ${BLOCK_INTERNET}) 53 | NETWORK_ARGS = --network none 54 | endif 55 | 56 | # Name of the example submission to pack when running `make pack-example` 57 | EXAMPLE ?= random 58 | 59 | .PHONY: _check_image _echo_image _submission_write_perms 60 | 61 | # Give write access to the submission folder to everyone so Docker user can write when mounted 62 | _submission_write_perms: 63 | mkdir -p submission/ 64 | chmod -R 0777 submission/ 65 | 66 | 67 | _check_image: 68 | # If container does not exist, error and tell user to pull or build 69 | ifeq (${SUBMISSION_IMAGE_ID},) 70 | $(error To test your submission, you must first run `make pull` (to get official container) or `make build` \ 71 | (to build a local version if you have changes).) 72 | endif 73 | 74 | _echo_image: 75 | @echo 76 | ifeq (,${SUBMISSION_IMAGE_ID}) 77 | @echo "$$(tput bold)Using image:$$(tput sgr0) ${SUBMISSION_IMAGE} (image does not exist locally)" 78 | @echo 79 | else 80 | @echo "$$(tput bold)Using image:$$(tput sgr0) ${SUBMISSION_IMAGE} (${SUBMISSION_IMAGE_ID})" 81 | @echo "┏" 82 | @echo "┃ NAME(S)" 83 | 84 | @docker inspect $(SUBMISSION_IMAGE_ID) --format='{{join .RepoTags "\n"}}' | awk '{print "┃ "$$0}' 85 | 86 | @echo "└" 87 | @echo 88 | endif 89 | ifeq (,$(shell docker images ${OFFICIAL_IMAGE} -q)) 90 | @echo "$$(tput bold)No official images available locally$$(tput sgr0)" 91 | @echo "Run 'make pull' to download the official image." 92 | @echo 93 | else 94 | @echo "$$(tput bold)Available official images:$$(tput sgr0)" 95 | @echo "┏" 96 | @docker images ${OFFICIAL_IMAGE} | awk '{print "┃ "$$0}' 97 | @echo "└" 98 | @echo 99 | endif 100 | ifeq (,$(shell docker images ${LOCAL_IMAGE} -q)) 101 | @echo "$$(tput bold)No local images available$$(tput sgr0)" 102 | @echo "Run 'make build' to build the image." 103 | @echo 104 | else 105 | @echo "$$(tput bold)Available local images:$$(tput sgr0)" 106 | @echo "┏" 107 | @docker images ${LOCAL_IMAGE} | awk '{print "┃ "$$0}' 108 | @echo "└" 109 | @echo 110 | endif 111 | 112 | ################################################################################# 113 | # Commands for building the container if you are changing the requirements # 114 | ################################################################################# 115 | .PHONY: build clean interact-container pack-example pack-submission pull test-container test-submission update-lockfile 116 | 117 | ## Builds the container locally 118 | build: 119 | docker build runtime \ 120 | --build-arg CPU_OR_GPU=${CPU_OR_GPU} \ 121 | --tag ${LOCAL_IMAGE}:${LOCAL_TAG} 122 | 123 | ## Updates runtime environment lockfile using Docker 124 | update-lockfile: runtime/pixi.lock 125 | @echo Building Docker image to generate lockfile 126 | docker build runtime \ 127 | --file runtime/Dockerfile-lock \ 128 | --tag pixi-lock:local 129 | @echo Running lock container 130 | docker run \ 131 | ${PLATFORM_ARGS} \ 132 | --mount type=bind,source="$(shell pwd)"/runtime/pixi.toml,target=/tmp/pixi.toml \ 133 | --mount type=bind,source="$(shell pwd)"/runtime/pixi.lock,target=/tmp/pixi.lock \ 134 | --rm \ 135 | pixi-lock:local 136 | 137 | ## Ensures that your locally built image can import all the Python packages successfully when it runs 138 | test-container: _check_image _echo_image _submission_write_perms 139 | docker run \ 140 | ${PLATFORM_ARGS} \ 141 | ${GPU_ARGS} \ 142 | ${NETWORK_ARGS} \ 143 | ${TTY_ARGS} \ 144 | --mount type=bind,source="$(shell pwd)"/runtime/tests,target=/tests,readonly \ 145 | --pid host \ 146 | ${SUBMISSION_IMAGE_ID} \ 147 | pixi run -e ${CPU_OR_GPU} python -m pytest tests 148 | 149 | ## Open an interactive bash shell within the running container (with network access) 150 | interact-container: _check_image _echo_image _submission_write_perms 151 | docker run \ 152 | ${PLATFORM_ARGS} \ 153 | ${GPU_ARGS} \ 154 | ${NETWORK_ARGS} \ 155 | --mount type=bind,source=${shell pwd}/data,target=/code_execution/data,readonly \ 156 | --mount type=bind,source="$(shell pwd)/submission",target=/code_execution/submission \ 157 | --shm-size 8g \ 158 | --pid host \ 159 | -it \ 160 | ${SUBMISSION_IMAGE_ID} \ 161 | bash 162 | 163 | ################################################################################# 164 | # Commands for testing and debugging your submission # 165 | ################################################################################# 166 | 167 | ## Pulls the official container from Azure Container Registry 168 | pull: 169 | docker pull ${OFFICIAL_IMAGE}:${TAG} 170 | 171 | ## Creates a submission/submission.zip file from the source code in examples 172 | pack-example: 173 | # Don't overwrite so no work is lost accidentally 174 | ifneq (,$(wildcard ./submission/submission.zip)) 175 | $(error You already have a submission/submission.zip file. Rename or remove that file (e.g., rm submission/submission.zip).) 176 | endif 177 | mkdir -p submission/ 178 | cd examples/${EXAMPLE}; zip -r ../../submission/submission.zip ./* 179 | 180 | ## Creates a submission/submission.zip file from the source code in submission_src 181 | pack-submission: 182 | # Don't overwrite so no work is lost accidentally 183 | ifneq (,$(wildcard ./submission/submission.zip)) 184 | $(error You already have a submission/submission.zip file. Rename or remove that file (e.g., rm submission/submission.zip).) 185 | endif 186 | # Note that the glob wildcard excludes hidden/dot files 187 | mkdir -p submission/ 188 | cd submission_src; zip -r ../submission/submission.zip ./* 189 | 190 | ## Runs container using code from `submission/submission.zip` and data from `/code_execution/data/` 191 | test-submission: _check_image _echo_image _submission_write_perms 192 | # if submission file does not exist 193 | ifeq (,$(wildcard ./submission/submission.zip)) 194 | $(error To test your submission, you must first put a "submission.zip" file in the "submission" folder. \ 195 | If you want to use an example, you can run `make pack-example` first) 196 | endif 197 | docker run \ 198 | ${PLATFORM_ARGS} \ 199 | ${TTY_ARGS} \ 200 | ${GPU_ARGS} \ 201 | ${NETWORK_ARGS} \ 202 | -e LOGURU_LEVEL=INFO \ 203 | -e IS_SMOKE_TEST=true \ 204 | --mount type=bind,source=${shell pwd}/data,target=/code_execution/data,readonly \ 205 | --mount type=bind,source="$(shell pwd)/submission",target=/code_execution/submission \ 206 | --shm-size 8g \ 207 | --pid host \ 208 | --name ${CONTAINER_NAME} \ 209 | --rm \ 210 | ${SUBMISSION_IMAGE_ID} 211 | 212 | ## Delete temporary Python cache and bytecode files 213 | clean: 214 | find . -type f -name "*.py[co]" -delete 215 | find . -type d -name "__pycache__" -delete 216 | 217 | ## Format code with ruff 218 | format: 219 | ruff format 220 | 221 | ################################################################################# 222 | # Self Documenting Commands # 223 | ################################################################################# 224 | 225 | .DEFAULT_GOAL := help 226 | 227 | # Inspired by 228 | # sed script explained: 229 | # /^##/: 230 | # * save line in hold space 231 | # * purge line 232 | # * Loop: 233 | # * append newline + line to hold space 234 | # * go to next line 235 | # * if line starts with doc comment, strip comment character off and loop 236 | # * remove target prerequisites 237 | # * append hold space (+ newline) to line 238 | # * replace newline plus comments by `---` 239 | # * print line 240 | # Separate expressions are necessary because labels cannot be delimited by 241 | # semicolon; see 242 | .PHONY: help 243 | help: _echo_image 244 | @echo 245 | @echo "$$(tput bold)Available commands:$$(tput sgr0)" 246 | @echo 247 | @sed -n -e "/^## / { \ 248 | h; \ 249 | s/.*//; \ 250 | :doc" \ 251 | -e "H; \ 252 | n; \ 253 | s/^## //; \ 254 | t doc" \ 255 | -e "s/:.*//; \ 256 | G; \ 257 | s/\\n## /---/; \ 258 | s/\\n/ /g; \ 259 | p; \ 260 | }" ${MAKEFILE_LIST} \ 261 | | LC_ALL='C' sort --ignore-case \ 262 | | awk -F '---' \ 263 | -v ncol=$$(tput cols) \ 264 | -v indent=19 \ 265 | -v col_on="$$(tput setaf 6)" \ 266 | -v col_off="$$(tput sgr0)" \ 267 | '{ \ 268 | printf "%s%*s%s ", col_on, -indent, $$1, col_off; \ 269 | n = split($$2, words, " "); \ 270 | line_length = ncol - indent; \ 271 | for (i = 1; i <= n; i++) { \ 272 | line_length -= length(words[i]) + 1; \ 273 | if (line_length <= 0) { \ 274 | line_length = ncol - indent - length(words[i]) - 1; \ 275 | printf "\n%*s ", -indent, " "; \ 276 | } \ 277 | printf "%s ", words[i]; \ 278 | } \ 279 | printf "\n"; \ 280 | }' \ 281 | | more $(shell test $(shell uname) = Darwin && echo '--no-init --raw-control-chars') 282 | @echo 283 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Goodnight Moon, Hello Early Literacy Screening 2 | 3 | ![Python 3.12](https://img.shields.io/badge/Python-3.12-blue) [![Goodnight Moon, Hello Early Literacy Screening](https://img.shields.io/badge/DrivenData-Goodnight%20Moon,%20Hello%20Early%20Literacy%20Screening-white?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABGdBTUEAALGPC/xhBQAABBlpQ0NQa0NHQ29sb3JTcGFjZUdlbmVyaWNSR0IAADiNjVVdaBxVFD67c2cjJM5TbDSFdKg/DSUNk1Y0obS6f93dNm6WSTbaIuhk9u7OmMnOODO7/aFPRVB8MeqbFMS/t4AgKPUP2z60L5UKJdrUICg+tPiDUOiLpuuZOzOZabqx3mXufPOd75577rln7wXouapYlpEUARaari0XMuJzh4+IPSuQhIegFwahV1EdK12pTAI2Twt3tVvfQ8J7X9nV3f6frbdGHRUgcR9is+aoC4iPAfCnVct2AXr6kR8/6loe9mLotzFAxC96uOFj18NzPn6NaWbkLOLTiAVVU2qIlxCPzMX4Rgz7MbDWX6BNauuq6OWiYpt13aCxcO9h/p9twWiF823Dp8+Znz6E72Fc+ys1JefhUcRLqpKfRvwI4mttfbYc4NuWm5ERPwaQ3N6ar6YR70RcrNsHqr6fpK21iiF+54Q28yziLYjPN+fKU8HYq6qTxZzBdsS3NVry8jsEwIm6W5rxx3L7bVOe8ufl6jWay3t5RPz6vHlI9n1ynznt6Xzo84SWLQf8pZeUgxXEg4h/oUZB9ufi/rHcShADGWoa5Ul/LpKjDlsv411tpujPSwwXN9QfSxbr+oFSoP9Es4tygK9ZBqtRjI1P2i256uv5UcXOF3yffIU2q4F/vg2zCQUomDCHvQpNWAMRZChABt8W2Gipgw4GMhStFBmKX6FmFxvnwDzyOrSZzcG+wpT+yMhfg/m4zrQqZIc+ghayGvyOrBbTZfGrhVxjEz9+LDcCPyYZIBLZg89eMkn2kXEyASJ5ijxN9pMcshNk7/rYSmxFXjw31v28jDNSpptF3Tm0u6Bg/zMqTFxT16wsDraGI8sp+wVdvfzGX7Fc6Sw3UbbiGZ26V875X/nr/DL2K/xqpOB/5Ffxt3LHWsy7skzD7GxYc3dVGm0G4xbw0ZnFicUd83Hx5FcPRn6WyZnnr/RdPFlvLg5GrJcF+mr5VhlOjUSs9IP0h7QsvSd9KP3Gvc19yn3Nfc59wV0CkTvLneO+4S5wH3NfxvZq8xpa33sWeRi3Z+mWa6xKISNsFR4WcsI24VFhMvInDAhjQlHYgZat6/sWny+ePR0OYx/mp/tcvi5WAYn7sQL0Tf5VVVTpcJQpHVZvTTi+QROMJENkjJQ2VPe4V/OhIpVP5VJpEFM7UxOpsdRBD4ezpnagbQL7/B3VqW6yUurSY959AlnTOm7rDc0Vd0vSk2IarzYqlprq6IioGIbITI5oU4fabVobBe/e9I/0mzK7DxNbLkec+wzAvj/x7Psu4o60AJYcgIHHI24Yz8oH3gU484TastvBHZFIfAvg1Pfs9r/6Mnh+/dTp3MRzrOctgLU3O52/3+901j5A/6sAZ41/AaCffFUDXAvvAAAAIGNIUk0AAHomAACAhAAA+gAAAIDoAAB1MAAA6mAAADqYAAAXcJy6UTwAAABEZVhJZk1NACoAAAAIAAIBEgADAAAAAQABAACHaQAEAAAAAQAAACYAAAAAAAKgAgAEAAAAAQAAABCgAwAEAAAAAQAAABAAAAAA/iXkXAAAAVlpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IlhNUCBDb3JlIDUuNC4wIj4KICAgPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4KICAgICAgPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIKICAgICAgICAgICAgeG1sbnM6dGlmZj0iaHR0cDovL25zLmFkb2JlLmNvbS90aWZmLzEuMC8iPgogICAgICAgICA8dGlmZjpPcmllbnRhdGlvbj4xPC90aWZmOk9yaWVudGF0aW9uPgogICAgICA8L3JkZjpEZXNjcmlwdGlvbj4KICAgPC9yZGY6UkRGPgo8L3g6eG1wbWV0YT4KTMInWQAAAGZJREFUOBFj/HdD5j8DBYCJAr1grSzzmDRINiNFbQ8jTBPFLoAZNHA04/O8g2THguQke0aKw4ClX5uw97vS7eGhjq6aYhegG0h/PuOfohCyYoGlbw04XCgOA8bwI7PIcgEssCh2AQDqYhG4FWqALwAAAABJRU5ErkJggg==)](https://www.drivendata.org/competitions/298/literacy-screening) 4 | 5 | Welcome to the runtime repository for the [Goodnight Moon, Hello Early Literacy Screening](https://www.drivendata.org/competitions/298/literacy-screening) competition on DrivenData! This repository contains a few things to help you create your code submission for this code execution competition: 6 | 7 | 1. **Runtime environment specification** ([`runtime/`](./runtime/)) — the definition of the environment in which your code will run. 8 | 2. **Example submissions** ([`examples/`](./examples/)) — simple demonstration solutions that will run successfully in the code execution runtime and output a valid submission. 9 | - **Random probabilities** ([`examples/random`](./examples/random/main.py/)): a dummy submission that generates a random prediction for each audio file 10 | - **Whisper transcription** ([`examples/transcription`](./examples/transcription/main.py/)): a baseline submission that shows how to load a model asset as part of your submission. This submission uses OpenAI's Whisper model to transcribe each audio clip and compares the transcription to the expected text. It requires that you download the model weights beforehand and include them in the `assets` directory. There's no internet access in the runtime container, so any pretrained model weights must be included as part of the submission. 11 | 12 | You can use this repository to: 13 | 14 | 💡 **Get started**: The example submissions provide a basic functional solution. They probably won't win you the competition, but you can use them as a guide for bringing in your own work and generating a real submission. 15 | 16 | 🔧 **Test your submission**: Test your submission using a locally running version of the competition runtime to discover errors before submitting to the competition website. 17 | 18 | 📦 **Request new packages in the official runtime**: Since your submission will not have general access to the internet, all dependencies must be pre-installed. If you want to use a package that is not in the runtime environment, make a pull request to this repository. 19 | 20 | Changes to the repository are documented in [CHANGELOG.md](./CHANGELOG.md). 21 | 22 | --- 23 | 24 | #### [1. Quickstart](#quickstart) 25 | 26 | - [Prerequisites](#prerequisites) 27 | - [Setting up the data directory](#setting-up-the-data-directory) 28 | - [Running `make` commands](#running-make-commands) 29 | 30 | #### [2. Testing a submission locally](#testing-your-submission-locally) 31 | - [Code submission format](#code-submission-format) 32 | - [Running your submission locally](#running-your-submission-locally) 33 | - [Smoke tests](#smoke-tests) 34 | 35 | #### [3. Updating runtime packages](#updating-runtime-packages) 36 | 37 | #### [4. Makefile commands](#make-commands) 38 | 39 | --- 40 | 41 | ## Quickstart 42 | 43 | This quickstart guide will show you how to get the provided example solution running end-to-end. Once you get there, it's off to the races! 44 | 45 | ### Prerequisites 46 | 47 | When you make a submission on the DrivenData competition site, we run your submission inside a Docker container, a virtual operating system that allows for a consistent software environment across machines. **The best way to make sure your submission to the site will run is to first run it successfully in the container on your local machine**. For that, you'll need: 48 | 49 | - A clone of this repository 50 | - [Docker](https://docs.docker.com/get-docker/) 51 | - At least 8 GB of free space for the Docker image 52 | - [GNU make](https://www.gnu.org/software/make/) (optional, but useful for running the commands in the Makefile) 53 | 54 | Additional requirements to run with GPU: 55 | 56 | - [NVIDIA drivers](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-installation) with CUDA 12 57 | - [NVIDIA container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html) 58 | 59 | ### Setting up the data directory 60 | 61 | In the official code execution platform, `code_execution/data` will contain the test set audio files, `test_metadata.csv`, and `submission_format.csv`. 62 | 63 | To test your submission locally, you should use the smoke test data from the [data download page](https://www.drivendata.org/competitions/298/literacy-screening/data/). Download `smoke.tar.gz` and then run `tar xzvf smoke.tar.gz --strip-components=1 -C data/`. This will extract the files directly into `data/` without nesting them in subdirectories. Your local `data` directory should look like: 64 | 65 | ``` 66 | data 67 | ├── bfaiol.wav 68 | ├── czfqjg.wav 69 | ├── fprljz.wav 70 | ├── hgxrel.wav 71 | ├── htfbnp.wav 72 | ├── idjpne.wav 73 | ├── ktvyww.wav 74 | ├── ltbona.wav 75 | ├── submission_format.csv 76 | ├── test_labels.csv 77 | └── test_metadata.csv 78 | ``` 79 | 80 | Now you're ready to run your submission against this data! 81 | 82 | Keep in mind, the smoke test data contains clips from the _training set_. That's why we provide the labels too. Of course, the real test set labels won't be available in the runtime container 😉 83 | 84 | ### Running `make` commands 85 | 86 | To test out the full execution pipeline, make sure Docker is running and then run the following commands in the terminal: 87 | 88 | 1. **`make pull`** pulls the latest official Docker image from the container registry. You'll need an internet connection for this. 89 | 1. **`make pack-example`** packages a code submission with the `main.py` contained in `examples/random/` and saves it as `submission/submission.zip`. 90 | 1. **`make test-submission`** will do a test run of your submission, simulating what happens during actual code execution. This command runs the Docker container with the requisite host directories mounted, and executes `main.py` to produce a submission file containing your predictions. 91 | 92 | ```bash 93 | make pull 94 | make pack-example 95 | make test-submission 96 | ``` 97 | 98 | 🎉 **Congratulations!** You've just completed your first test run for the Goodnight Moon, Hello Early Literacy Screening Challenge. If everything worked as expected, you should see that a new submission file has been generated. 99 | 100 | If you were ready to make a real submission to the competition, you would upload the `submission.zip` file from step 2 above to the competition [submission page](https://www.drivendata.org/competitions/298/literacy-screening/submissions/). 101 | 102 | To run the Whisper transcription example instead, replace the second command with `EXAMPLE=transcription make pack-example`. Just be sure to [download](https://github.com/drivendataorg/literacy-screening-runtime/blob/09cbb05aac0573d635d6d886450f20a619617612/examples/transcription/main.py#L16-L18) the Whisper model first and include it in the `assets` directory. There's no internet access in the runtime container, so any pretrained model weights must be included as part of the submission. 103 | 104 | ## Testing your submission locally 105 | 106 | As you develop your own submission, you'll need to know a little bit more about how your submission will be unpacked for running inference. This section contains more complete documentation for developing and testing your own submission. 107 | 108 | ### Code submission format 109 | 110 | Your final submission should be a zip archive named with the extension `.zip` (for example, `submission.zip`). The root level of the `submission.zip` file must contain a `main.py` which performs inference on the test audio clips and writes the predictions to a file named `submission.csv` in the same directory as `main.py`. Check out the `main.py` scripts in the [example submissions](./examples/). 111 | 112 | ### Running your submission locally 113 | 114 | This section provides instructions on how to run the your submission in the code execution container from your local machine. To simplify the steps, key processes have been defined in the `Makefile`. Commands from the `Makefile` are then run with `make {command_name}`. The basic steps are: 115 | 116 | ```sh 117 | make pull 118 | make pack-submission 119 | make test-submission 120 | ``` 121 | 122 | Run `make help` for more information about the available commands as well as information on the official and built images that are available locally. 123 | 124 | Here's the process in a bit more detail: 125 | 126 | 1. First, make sure you have set up the [prerequisites](#prerequisites). 127 | 2. Download the official competition Docker image: 128 | 129 | ```sh 130 | make pull 131 | ``` 132 | 133 | > [!NOTE] 134 | > If you have built a local version of the runtime image with `make build`, that image will take precedence over the pulled image when using any make commands that run a container. You can explicitly use the pulled image by setting the `SUBMISSION_IMAGE` shell/environment variable to the pulled image or by deleting all locally built images. 135 | 136 | 3. Save all of your submission files, including the required `main.py` script, in the `submission_src` folder of the runtime repository. Make sure any needed model weights and other assets are saved in `submission_src` as well. 137 | 138 | 4. Create a `submission/submission.zip` file containing your code and model assets: 139 | 140 | ```sh 141 | make pack-submission 142 | #> mkdir -p submission/ 143 | #> cd submission_src; zip -r ../submission/submission.zip ./* 144 | #> adding: main.py (deflated 73%) 145 | ``` 146 | 147 | 5. Launch an instance of the competition Docker image, and run the same inference process that will take place in the official runtime: 148 | 149 | ```sh 150 | make test-submission 151 | ``` 152 | 153 | This runs the container [entrypoint](./runtime/entrypoint.sh) script. First, it unzips `submission/submission.zip` into `/code_execution/` in the container. Then, it runs your submitted `main.py`. In the local testing setting, the final submission is saved out to the `submission/` folder on your local machine. 154 | 155 | > [!NOTE] 156 | > Remember that `/code_execution/data` is just a mounted version of what you have saved locally in `data` so you will just be using the training files for local testing. In the official code execution platform, `/code_execution/data` will contain the actual test data. 157 | 158 | When you run `make test-submission` the logs will be printed to the terminal and written out to `submission/log.txt`. If you run into errors, use the container logs written to `log.txt` to determine what changes you need to make for your code to execute successfully. 159 | 160 | ### Smoke tests 161 | 162 | When submitting on the platform, you will have the ability to submit "smoke tests." Smoke tests run on a reduced version of the train set data in order to run and debug issues more quickly. They will not be considered for prize evaluation and are intended to let you test your code for correctness. **You should test your code locally as thorougly as possible before submitting your code for smoke tests or for full evaluation.** 163 | 164 | ## Updating runtime packages 165 | 166 | If you want to use a package that is not in the environment, you are welcome to make a pull request to this repository. Remember, your submission will only have access to packages in this runtime repository. If you're new to the GitHub contribution workflow, check out [this guide by GitHub](https://docs.github.com/en/get-started/quickstart/contributing-to-projects). 167 | 168 | The runtime manages dependencies using [Pixi](https://pixi.sh/latest/). Here is a good [tutorial](https://pixi.sh/latest/tutorials/python/) to get started with Pixi. The official runtime uses **Python 3.12.7**. 169 | 170 | 1. Fork this repository. 171 | 172 | 2. Install pixi. See [here](https://pixi.sh/latest/#installation) for installation options. 173 | 174 | 3. Edit the `runtime/pixi.toml` file to add your new packages. We recommend starting without a specific pinned version, and then pinning to the version in the resolved `pixi.lock` file that is generated. 175 | 176 | - Conda-installed packages go in the `dependencies` section. These install from the [conda-forge](https://anaconda.org/conda-forge/) channel. **Installing packages with conda is strongly preferred.** Packages should only be installed using pip if they are not available in a conda channel. 177 | - Pip-installed packages go in the `pypi-dependencies` section. 178 | - GPU-specific dependencies go in the `features.cuda.dependencies` section, but these should be uncommon. 179 | 180 | 4. With Docker open and running, run `make update-lockfile`. This will generate an updated `runtime/pixi.lock` from `runtime/pixi.toml` within a Docker container. 181 | 182 | 5. Locally test that the Docker image builds successfully for both the CPU and GPU environment: 183 | 184 | ```sh 185 | CPU_OR_GPU=cpu make build 186 | CPU_OR_GPU=gpu make build 187 | ``` 188 | 189 | 6. Commit the changes to your forked repository. Ensure that your branch includes updated versions of both `runtime/pixi.toml` and `runtime/pixi.lock`. 190 | 191 | 7. Open a pull request from your branch to the `main` branch of this repository. Navigate to the [Pull requests](https://github.com/drivendataorg/literacy-screening-runtime/pulls) tab in this repository, and click the "New pull request" button. For more detailed instructions, check out [GitHub's help page](https://help.github.com/en/articles/creating-a-pull-request-from-a-fork). 192 | 193 | 8. Once you open the pull request, we will use Github Actions to build the Docker images with your changes and run the tests in `runtime/tests`. For security reasons, administrators may need to approve the workflow run before it happens. Once it starts, the process can take up to 30 minutes, and may take longer if your build is queued behind others. You will see a section on the pull request page that shows the status of the tests and links to the logs ("Details"): 194 | 195 | ![Example appearance of Github Actions](https://s3.amazonaws.com/drivendata-public-assets/codex_github_actions_build.png) 196 | 197 | 9. You may be asked to submit revisions to your pull request if the tests fail or if a DrivenData staff member has feedback. Pull requests won't be merged until all tests pass and the team has reviewed and approved the changes. 198 | 199 | ## Make commands 200 | 201 | A Makefile with several helpful shell recipes is included in the repository. The runtime documentation above uses it extensively. Running `make` by itself in your shell will list relevant Docker images and provide you the following list of available commands: 202 | 203 | ``` 204 | Available commands: 205 | 206 | build Builds the container locally 207 | clean Delete temporary Python cache and bytecode files 208 | format Format code with ruff 209 | interact-container Open an interactive bash shell within the running container (with network access) 210 | pack-example Creates a submission/submission.zip file from the source code in examples 211 | pack-submission Creates a submission/submission.zip file from the source code in submission_src 212 | pull Pulls the official container from Azure Container Registry 213 | test-container Ensures that your locally built image can import all the Python packages successfully when it runs 214 | test-submission Runs container using code from `submission/submission.zip` and data from `/code_execution/data/` 215 | update-lockfile Updates runtime environment lockfile using Docker 216 | ``` 217 | --------------------------------------------------------------------------------