├── .gitattributes ├── .github └── workflows │ ├── create_release.yml │ ├── preview.yml │ ├── publish.yml │ ├── remove_preview.yml │ ├── resolve_manifest.yml │ ├── version_check.jl │ └── version_check.yml ├── .gitignore ├── 404.qmd ├── LICENSE ├── Manifest.toml ├── Project.toml ├── README.md ├── _quarto.yml ├── assets ├── favicon.ico ├── images │ ├── turing-logo-wide.svg │ └── turing-logo.svg └── scripts │ ├── changelog.sh │ └── versions.sh ├── core-functionality └── index.qmd ├── developers ├── compiler │ ├── design-overview │ │ └── index.qmd │ ├── minituring-compiler │ │ └── index.qmd │ ├── minituring-contexts │ │ └── index.qmd │ └── model-manual │ │ └── index.qmd ├── contexts │ └── submodel-condition │ │ └── index.qmd ├── contributing │ └── index.qmd ├── inference │ ├── abstractmcmc-interface │ │ └── index.qmd │ ├── abstractmcmc-turing │ │ └── index.qmd │ ├── implementing-samplers │ │ └── index.qmd │ └── variational-inference │ │ └── index.qmd └── transforms │ ├── bijectors │ └── index.qmd │ ├── distributions │ └── index.qmd │ └── dynamicppl │ ├── dynamicppl_link.png │ ├── dynamicppl_link2.png │ └── index.qmd ├── getting-started └── index.qmd ├── theming ├── styles.css └── theme-dark.scss ├── tutorials ├── bayesian-differential-equations │ └── index.qmd ├── bayesian-linear-regression │ └── index.qmd ├── bayesian-logistic-regression │ └── index.qmd ├── bayesian-neural-networks │ └── index.qmd ├── bayesian-poisson-regression │ └── index.qmd ├── bayesian-time-series-analysis │ └── index.qmd ├── coin-flipping │ └── index.qmd ├── gaussian-mixture-models │ └── index.qmd ├── gaussian-process-latent-variable-models │ └── index.qmd ├── gaussian-processes-introduction │ ├── golf.dat │ └── index.qmd ├── hidden-markov-models │ └── index.qmd ├── infinite-mixture-models │ └── index.qmd ├── multinomial-logistic-regression │ └── index.qmd ├── probabilistic-pca │ └── index.qmd └── variational-inference │ └── index.qmd └── usage ├── automatic-differentiation └── index.qmd ├── custom-distribution └── index.qmd ├── dynamichmc └── index.qmd ├── external-samplers └── index.qmd ├── mode-estimation └── index.qmd ├── modifying-logprob └── index.qmd ├── performance-tips └── index.qmd ├── probability-interface └── index.qmd ├── sampler-visualisation └── index.qmd ├── tracking-extra-quantities └── index.qmd └── troubleshooting └── index.qmd /.gitattributes: -------------------------------------------------------------------------------- 1 | ################ 2 | # Line endings # 3 | ################ 4 | * text=auto 5 | 6 | ################### 7 | # GitHub Linguist # 8 | ################### 9 | *.qmd linguist-detectable 10 | *.qmd linguist-language=Markdown 11 | -------------------------------------------------------------------------------- /.github/workflows/create_release.yml: -------------------------------------------------------------------------------- 1 | name: Create release with _freeze 2 | 3 | on: 4 | workflow_dispatch: 5 | 6 | permissions: 7 | contents: write 8 | 9 | jobs: 10 | build: 11 | runs-on: ubuntu-latest 12 | steps: 13 | - name: Checkout 14 | uses: actions/checkout@v4 15 | 16 | - name: Setup Julia 17 | uses: julia-actions/setup-julia@v2 18 | with: 19 | version: '1.11' 20 | 21 | - name: Load Julia packages from cache 22 | uses: julia-actions/cache@v2 23 | 24 | - name: Set up Quarto 25 | uses: quarto-dev/quarto-actions/setup@v2 26 | with: 27 | # Needs Quarto 1.6 (which is currently a pre-release version) to fix #533 28 | version: pre-release 29 | 30 | - name: Restore cached _freeze folder 31 | id: cache-restore 32 | uses: actions/cache/restore@v4 33 | with: 34 | path: | 35 | ./_freeze/ 36 | key: | 37 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }} 38 | restore-keys: | 39 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }} 40 | 41 | - name: Render 42 | run: quarto render 43 | 44 | - name: Compress _freeze folder 45 | run: tar -czf _freeze.tar.gz _freeze 46 | 47 | - name: Generate tag name for release 48 | id: tag 49 | run: echo "tag_name=freeze_$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT 50 | 51 | - name: Create GitHub release 52 | uses: softprops/action-gh-release@v2 53 | with: 54 | tag_name: ${{ steps.tag.outputs.tag_name }} 55 | files: | 56 | _freeze.tar.gz 57 | Manifest.toml 58 | body: | 59 | This release contains the `_freeze` folder generated by Quarto when 60 | rendering the docs. You can use this to speed up the rendering 61 | process on your local machine by downloading and extracting the 62 | `_freeze` folder, then placing it at the root of the project. 63 | 64 | Note that the contents of the `_freeze` folder only hash the 65 | contents of the .qmd files, and do not include information about 66 | the Julia environment. Thus, each `_freeze` folder is only valid 67 | for a given Julia environment, which is specified in the 68 | Manifest.toml file included in this release. To ensure 69 | reproducibility, you should make sure to use the Manifest.toml file 70 | locally as well. 71 | 72 | These releases are not automatically generated. To make an updated 73 | release with the contents of the `_freeze` folder from the main 74 | branch, you can run the `Create release with _freeze` workflow from 75 | https://github.com/TuringLang/docs/actions/workflows/create_release.yml. 76 | -------------------------------------------------------------------------------- /.github/workflows/preview.yml: -------------------------------------------------------------------------------- 1 | name: PR Preview Workflow 2 | 3 | on: 4 | pull_request: 5 | types: 6 | - opened 7 | - synchronize 8 | 9 | concurrency: 10 | group: docs 11 | 12 | permissions: 13 | contents: write 14 | pull-requests: write 15 | 16 | jobs: 17 | build-and-preview: 18 | if: github.event.action == 'opened' || github.event.action == 'synchronize' 19 | runs-on: ubuntu-latest 20 | steps: 21 | - name: Checkout 22 | uses: actions/checkout@v4 23 | with: 24 | ref: ${{ github.event.pull_request.head.sha }} 25 | 26 | - name: Setup Julia 27 | uses: julia-actions/setup-julia@v2 28 | with: 29 | version: '1.11' 30 | 31 | - name: Load Julia packages from cache 32 | id: julia-cache 33 | uses: julia-actions/cache@v2 34 | with: 35 | cache-name: julia-cache;${{ hashFiles('**/Manifest.toml') }} 36 | delete-old-caches: false 37 | 38 | # Note: needs resolve() to fix #518 39 | - name: Instantiate Julia environment 40 | run: julia --project=. -e 'using Pkg; Pkg.instantiate(); Pkg.resolve()' 41 | 42 | - name: Set up Quarto 43 | uses: quarto-dev/quarto-actions/setup@v2 44 | 45 | - name: Restore cached _freeze folder 46 | id: cache-restore 47 | uses: actions/cache/restore@v4 48 | with: 49 | path: | 50 | ./_freeze/ 51 | key: | 52 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }} 53 | restore-keys: | 54 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }} 55 | 56 | - name: Render Quarto site 57 | run: quarto render 58 | 59 | - name: Save _freeze folder 60 | id: cache-save 61 | if: ${{ !cancelled() }} 62 | uses: actions/cache/save@v4 63 | with: 64 | path: | 65 | ./_freeze/ 66 | key: ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }} 67 | 68 | - name: Save Julia depot cache 69 | id: julia-cache-save 70 | if: ${{ !cancelled() && steps.julia-cache.outputs.cache-hit != 'true' }} 71 | uses: actions/cache/save@v4 72 | with: 73 | path: ${{ steps.julia-cache.outputs.cache-paths }} 74 | key: ${{ steps.julia-cache.outputs.cache-key }} 75 | 76 | - name: Deploy to GitHub Pages 77 | uses: JamesIves/github-pages-deploy-action@v4 78 | with: 79 | branch: gh-pages 80 | folder: _site 81 | target-folder: pr-previews/${{ github.event.pull_request.number }} 82 | clean: false 83 | commit-message: Deploy preview for PR ${{ github.event.pull_request.number }} 84 | token: ${{ secrets.GITHUB_TOKEN }} 85 | 86 | - name: Comment preview URL 87 | uses: thollander/actions-comment-pull-request@v2 88 | with: 89 | message: | 90 | 91 | Preview the changes: https://turinglang.org/docs/pr-previews/${{ github.event.pull_request.number }} 92 | Please avoid using the search feature and navigation bar in PR previews! 93 | comment_tag: preview-url-comment 94 | -------------------------------------------------------------------------------- /.github/workflows/publish.yml: -------------------------------------------------------------------------------- 1 | name: Render Docs Website 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | - backport-v0.* 8 | workflow_dispatch: # manual trigger for testing 9 | 10 | concurrency: 11 | group: docs 12 | cancel-in-progress: true 13 | 14 | permissions: 15 | contents: write 16 | 17 | jobs: 18 | build-and-deploy: 19 | runs-on: ubuntu-latest 20 | 21 | steps: 22 | - name: Checkout 23 | uses: actions/checkout@v4 24 | 25 | - name: Setup Julia 26 | uses: julia-actions/setup-julia@v2 27 | with: 28 | version: '1.11' 29 | 30 | - name: Load Julia packages from cache 31 | id: julia-cache 32 | uses: julia-actions/cache@v2 33 | with: 34 | cache-name: julia-cache;${{ hashFiles('**/Manifest.toml') }} 35 | delete-old-caches: false 36 | 37 | # Note: needs resolve() to fix #518 38 | - name: Instantiate Julia environment 39 | run: julia --project=. -e 'using Pkg; Pkg.instantiate(); Pkg.resolve()' 40 | 41 | - name: Set up Quarto 42 | uses: quarto-dev/quarto-actions/setup@v2 43 | 44 | - name: Install jq 45 | run: sudo apt-get install jq 46 | 47 | - name: Restore cached _freeze folder 48 | id: cache-restore 49 | uses: actions/cache/restore@v4 50 | with: 51 | path: | 52 | ./_freeze/ 53 | key: | 54 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }} 55 | restore-keys: | 56 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }} 57 | 58 | - name: Extract version from _quarto.yml 59 | id: extract_version 60 | run: | 61 | minor_version=$(grep -oP 'text:\s+"v\K\d+\.\d+' _quarto.yml) 62 | echo "minor_version=$minor_version" >> $GITHUB_ENV 63 | 64 | - name: Fetch latest bugfix version for the extracted minor version 65 | id: fetch_latest_bugfix 66 | run: | 67 | repo_url="https://api.github.com/repos/TuringLang/Turing.jl/tags" 68 | tags=$(curl -s $repo_url | jq -r '.[].name') 69 | stable_tags=$(echo "$tags" | grep -Eo 'v[0-9]+\.[0-9]+\.[0-9]+$') 70 | latest_bugfix=$(echo "$stable_tags" | grep "^v${{ env.minor_version }}" | sort -rV | head -n 1) 71 | echo "version=$latest_bugfix" >> $GITHUB_ENV 72 | 73 | - name: Fetch the actual latest bugfix version 74 | id: fetch_latest_bugfix_actual 75 | run: | 76 | latest=$(curl --silent "https://api.github.com/repos/TuringLang/Turing.jl/releases/latest" | jq -r .tag_name) 77 | echo "LATEST=$latest" >> $GITHUB_ENV 78 | 79 | - name: Run Changelog and Versions Scripts 80 | if: env.version == env.LATEST 81 | run: | 82 | sh assets/scripts/changelog.sh 83 | sh assets/scripts/versions.sh 84 | 85 | - name: Render Quarto site 86 | run: quarto render 87 | 88 | - name: Rename original search index 89 | run: mv _site/search.json _site/search_original.json 90 | 91 | - name: Save _freeze folder 92 | id: cache-save 93 | if: ${{ !cancelled() }} 94 | uses: actions/cache/save@v4 95 | with: 96 | path: | 97 | ./_freeze/ 98 | key: ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }} 99 | 100 | - name: Save Julia depot cache 101 | id: julia-cache-save 102 | if: ${{ !cancelled() && steps.julia-cache.outputs.cache-hit != 'true' }} 103 | uses: actions/cache/save@v4 104 | with: 105 | path: ${{ steps.julia-cache.outputs.cache-paths }} 106 | key: ${{ steps.julia-cache.outputs.cache-key }} 107 | 108 | - name: Fetch search_original.json from main site 109 | run: curl -O https://raw.githubusercontent.com/TuringLang/turinglang.github.io/gh-pages/search_original.json 110 | 111 | - name: Convert main site search index URLs to relative URLs 112 | run: | 113 | jq 'map( 114 | if .href then .href = "../" + .href else . end | 115 | if .objectID then .objectID = "../" + .objectID else . end)' search_original.json > fixed_main_search.json 116 | 117 | - name: Merge both search index 118 | run: | 119 | jq -s '.[0] + .[1]' _site/search_original.json fixed_main_search.json > _site/search.json 120 | 121 | - name: Checkout gh-pages branch 122 | uses: actions/checkout@v4 123 | with: 124 | ref: gh-pages 125 | path: gh-pages 126 | 127 | - name: Update gh-pages branch 128 | run: | 129 | # Copy to versions/ subdirectory 130 | mkdir -p gh-pages/versions/${{ env.version }} 131 | cp -r _site/* gh-pages/versions/${{ env.version }} 132 | 133 | # Find the latest version of the docs and copy that to the root 134 | cd gh-pages/versions 135 | LATEST_DOCS=$(ls -d * | sort -V | tail -n 1) 136 | cp -r $LATEST_DOCS/* ../ 137 | 138 | # Commit and push 139 | git config --global user.name "github-actions[bot]" 140 | git config --global user.email "github-actions[bot]@users.noreply.github.com" 141 | git add -A 142 | git commit -m "Publish docs @ ${GITHUB_REPOSITORY}@${GITHUB_SHA}" 143 | git push 144 | -------------------------------------------------------------------------------- /.github/workflows/remove_preview.yml: -------------------------------------------------------------------------------- 1 | name: Remove PR previews 2 | 3 | on: 4 | pull_request_target: 5 | types: 6 | - closed 7 | 8 | permissions: 9 | contents: write 10 | 11 | jobs: 12 | delete-preview-directory: 13 | if: github.event.action == 'closed' || github.event.pull_request.merged == true 14 | runs-on: ubuntu-latest 15 | steps: 16 | - name: Checkout gh-pages branch 17 | uses: actions/checkout@v4 18 | with: 19 | ref: gh-pages 20 | 21 | - name: Remove PR Preview Directory 22 | run: | 23 | PR_NUMBER=${{ github.event.pull_request.number }} 24 | PREVIEW_DIR="pr-previews/${PR_NUMBER}" 25 | git config --global user.name "github-actions[bot]" 26 | git config --global user.email "github-actions[bot]@users.noreply.github.com" 27 | git pull origin gh-pages 28 | rm -rf ${PREVIEW_DIR} 29 | git add . 30 | git commit -m "Remove preview for merged PR #${PR_NUMBER}" 31 | git push 32 | -------------------------------------------------------------------------------- /.github/workflows/resolve_manifest.yml: -------------------------------------------------------------------------------- 1 | # This action runs Pkg.instantiate() and Pkg.resolve() every time the main 2 | # branch is pushed to. If this leads to a change in the Manifest.toml file, it 3 | # will open a PR to update the Manifest.toml file. This ensures that the 4 | # contents of the Manifest in the repository are consistent with the contents 5 | # of the Manifest used by the CI system (i.e. during the actual docs 6 | # generation). 7 | # 8 | # See https://github.com/TuringLang/docs/issues/518 for motivation. 9 | 10 | name: Resolve Manifest 11 | on: 12 | push: 13 | branches: 14 | - main 15 | workflow_dispatch: 16 | 17 | jobs: 18 | check-version: 19 | runs-on: ubuntu-latest 20 | 21 | permissions: 22 | contents: write 23 | pull-requests: write 24 | 25 | env: 26 | # Disable precompilation as it takes a long time and is not needed for this workflow 27 | JULIA_PKG_PRECOMPILE_AUTO: 0 28 | 29 | steps: 30 | - name: Checkout 31 | uses: actions/checkout@v4 32 | 33 | - name: Setup Julia 34 | uses: julia-actions/setup-julia@v2 35 | with: 36 | version: '1.11' 37 | 38 | - name: Instantiate and resolve 39 | run: | 40 | julia -e 'using Pkg; Pkg.instantiate(); Pkg.resolve()' 41 | 42 | - name: Open PR 43 | id: create_pr 44 | uses: peter-evans/create-pull-request@v6 45 | with: 46 | branch: resolve-manifest 47 | add-paths: Manifest.toml 48 | commit-message: "Update Manifest.toml" 49 | body: "This PR is automatically generated by the `resolve_manifest.yml` GitHub Action." 50 | title: "Update Manifest.toml to match CI environment" 51 | -------------------------------------------------------------------------------- /.github/workflows/version_check.jl: -------------------------------------------------------------------------------- 1 | # Set up a temporary environment just to run this script 2 | using Pkg 3 | Pkg.activate(temp=true) 4 | Pkg.add(["YAML", "TOML", "JSON", "HTTP"]) 5 | import YAML 6 | import TOML 7 | import JSON 8 | import HTTP 9 | 10 | PROJECT_TOML_PATH = "Project.toml" 11 | QUARTO_YML_PATH = "_quarto.yml" 12 | MANIFEST_TOML_PATH = "Manifest.toml" 13 | 14 | function major_minor_match(vs...) 15 | first = vs[1] 16 | all(v.:major == first.:major && v.:minor == first.:minor for v in vs) 17 | end 18 | 19 | function major_minor_patch_match(vs...) 20 | first = vs[1] 21 | all(v.:major == first.:major && v.:minor == first.:minor && v.:patch == first.:patch for v in vs) 22 | end 23 | 24 | """ 25 | Update the version number in Project.toml to match `target_version`. 26 | 27 | This uses a naive regex replacement on lines, i.e. sed-like behaviour. Parsing 28 | the file, editing the TOML and then re-serialising also works and would be more 29 | correct, but the entries in the output file can end up being scrambled, which 30 | would lead to unnecessarily large diffs in the PR. 31 | """ 32 | function update_project_toml(filename, target_version::VersionNumber) 33 | lines = readlines(filename) 34 | open(filename, "w") do io 35 | for line in lines 36 | if occursin(r"^Turing\s*=\s*\"\d+\.\d+\"\s*$", line) 37 | println(io, "Turing = \"$(target_version.:major).$(target_version.:minor)\"") 38 | else 39 | println(io, line) 40 | end 41 | end 42 | end 43 | end 44 | 45 | """ 46 | Update the version number in _quarto.yml to match `target_version`. 47 | 48 | See `update_project_toml` for implementation rationale. 49 | """ 50 | function update_quarto_yml(filename, target_version::VersionNumber) 51 | # Don't deserialise/serialise as this will scramble lines 52 | lines = readlines(filename) 53 | open(filename, "w") do io 54 | for line in lines 55 | m = match(r"^(\s+)- text:\s*\"v\d+\.\d+\"\s*$", line) 56 | if m !== nothing 57 | println(io, "$(m[1])- text: \"v$(target_version.:major).$(target_version.:minor)\"") 58 | else 59 | println(io, line) 60 | end 61 | end 62 | end 63 | end 64 | 65 | # Retain the original version number string for error messages, as 66 | # VersionNumber() will tack on a patch version of 0 67 | quarto_yaml = YAML.load_file(QUARTO_YML_PATH) 68 | quarto_version_str = quarto_yaml["website"]["navbar"]["right"][1]["text"] 69 | quarto_version = VersionNumber(quarto_version_str) 70 | println("_quarto.yml version: ", quarto_version_str) 71 | 72 | project_toml = TOML.parsefile(PROJECT_TOML_PATH) 73 | project_version_str = project_toml["compat"]["Turing"] 74 | project_version = VersionNumber(project_version_str) 75 | println("Project.toml version: ", project_version_str) 76 | 77 | manifest_toml = TOML.parsefile(MANIFEST_TOML_PATH) 78 | manifest_version = VersionNumber(manifest_toml["deps"]["Turing"][1]["version"]) 79 | println("Manifest.toml version: ", manifest_version) 80 | 81 | errors = [] 82 | 83 | if ENV["TARGET_IS_MAIN"] == "true" 84 | # This environment variable is set by the GitHub Actions workflow. If it is 85 | # true, fetch the latest version from GitHub and update files to match this 86 | # version if necessary. 87 | 88 | resp = HTTP.get("https://api.github.com/repos/TuringLang/Turing.jl/releases/latest") 89 | latest_version = VersionNumber(JSON.parse(String(resp.body))["tag_name"]) 90 | println("Latest Turing.jl version: ", latest_version) 91 | 92 | if !major_minor_match(latest_version, project_version) 93 | push!(errors, "$(PROJECT_TOML_PATH) out of date") 94 | println("$(PROJECT_TOML_PATH) is out of date; updating") 95 | update_project_toml(PROJECT_TOML_PATH, latest_version) 96 | end 97 | 98 | if !major_minor_match(latest_version, quarto_version) 99 | push!(errors, "$(QUARTO_YML_PATH) out of date") 100 | println("$(QUARTO_YML_PATH) is out of date; updating") 101 | update_quarto_yml(QUARTO_YML_PATH, latest_version) 102 | end 103 | 104 | if !major_minor_patch_match(latest_version, manifest_version) 105 | push!(errors, "$(MANIFEST_TOML_PATH) out of date") 106 | # Attempt to automatically update Manifest 107 | println("$(MANIFEST_TOML_PATH) is out of date; updating") 108 | old_env = Pkg.project().path 109 | Pkg.activate(".") 110 | try 111 | Pkg.add(name="Turing", version=latest_version) 112 | catch e 113 | # If the Manifest couldn't be updated, the error will be shown later 114 | println(e) 115 | end 116 | # Check if versions match now, error if not 117 | Pkg.activate(old_env) 118 | manifest_toml = TOML.parsefile(MANIFEST_TOML_PATH) 119 | manifest_version = VersionNumber(manifest_toml["deps"]["Turing"][1]["version"]) 120 | if !major_minor_patch_match(latest_version, manifest_version) 121 | push!(errors, "Failed to update $(MANIFEST_TOML_PATH) to match latest Turing.jl version") 122 | end 123 | end 124 | 125 | if isempty(errors) 126 | println("All good") 127 | else 128 | error("The following errors occurred during version checking: \n", join(errors, "\n")) 129 | end 130 | 131 | else 132 | # If this is not true, then we are running on a backport-v* branch, i.e. docs 133 | # for a non-latest version. In this case we don't attempt to fetch the latest 134 | # patch version from GitHub to check the Manifest (we could, but it is more 135 | # work as it would involve paging through the list of releases). Instead, 136 | # we just check that the minor versions match. 137 | if !major_minor_match(quarto_version, project_version, manifest_version) 138 | error("The minor versions of Turing.jl in _quarto.yml, Project.toml, and Manifest.toml are inconsistent: 139 | - _quarto.yml: $quarto_version_str 140 | - Project.toml: $project_version_str 141 | - Manifest.toml: $manifest_version 142 | ") 143 | end 144 | end 145 | -------------------------------------------------------------------------------- /.github/workflows/version_check.yml: -------------------------------------------------------------------------------- 1 | # This action checks that the minor versions of Turing.jl specified in the 2 | # Project.toml, _quarto.yml, and Manifest.toml files are consistent. 3 | # 4 | # For pushes to main or PRs to main, it additionally also checks that the 5 | # version specified in Manifest.toml matches the latest release on GitHub. 6 | # 7 | # If any discrepancies are observed, it will open a PR to fix them. 8 | 9 | name: Check Turing.jl version consistency 10 | on: 11 | push: 12 | branches: 13 | - main 14 | - backport-* 15 | pull_request: 16 | branches: 17 | - main 18 | - backport-* 19 | workflow_dispatch: 20 | 21 | jobs: 22 | check-version: 23 | runs-on: ubuntu-latest 24 | 25 | permissions: 26 | contents: write 27 | pull-requests: write 28 | 29 | env: 30 | # Determine whether the target branch is main (i.e. this is a push to 31 | # main or a PR to main). 32 | TARGET_IS_MAIN: ${{ (github.event_name == 'push' && github.ref_name == 'main') || (github.event_name == 'pull_request' && github.base_ref == 'main') }} 33 | IS_PR_FROM_FORK: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.fork }} 34 | # Disable precompilation as it takes a long time and is not needed for this workflow 35 | JULIA_PKG_PRECOMPILE_AUTO: 0 36 | 37 | steps: 38 | - name: Checkout 39 | uses: actions/checkout@v4 40 | 41 | - name: Setup Julia 42 | uses: julia-actions/setup-julia@v2 43 | 44 | - name: Log GitHub context variables 45 | run: | 46 | echo github.event_name: ${{ github.event_name }} 47 | echo github.ref_name: ${{ github.ref_name }} 48 | echo github.base_ref: ${{ github.base_ref }} 49 | echo TARGET_IS_MAIN: ${{ env.TARGET_IS_MAIN }} 50 | echo IS_PR_FROM_FORK: ${{ env.IS_PR_FROM_FORK }} 51 | 52 | - name: Check version consistency 53 | id: version_check 54 | run: julia --color=yes .github/workflows/version_check.jl 55 | 56 | - name: Create a PR with suggested changes 57 | id: create_pr 58 | if: always() && steps.version_check.outcome == 'failure' && env.TARGET_IS_MAIN && (! env.IS_PR_FROM_FORK) 59 | uses: peter-evans/create-pull-request@v6 60 | with: 61 | base: ${{ github.event_name == 'pull_request' && github.head_ref || github.ref_name }} 62 | branch: update-turing-version/${{ github.event_name == 'pull_request' && github.head_ref || github.ref_name }} 63 | commit-message: "Update Turing.jl version to match latest release" 64 | body: "This PR is automatically generated by the `version_check.yml` GitHub Action." 65 | title: "Update Turing.jl version to match latest release" 66 | 67 | - name: Comment on PR about suggested changes (if PR was made) 68 | if: always() && github.event_name == 'pull_request' && steps.create_pr.outputs.pull-request-operation == 'created' 69 | uses: thollander/actions-comment-pull-request@v2 70 | with: 71 | message: | 72 | Hello! The versions of Turing.jl in your `Project.toml`, `_quarto.yml`, and/or `Manifest.toml` did not match the latest release version found on GitHub (https://github.com/TuringLang/Turing.jl/releases/latest). 73 | 74 | I've made a PR to update these files to match the latest release: ${{ steps.create_pr.outputs.pull-request-url }} 75 | 76 | Please review the changes and merge the PR if they look good. 77 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .*.jl.cov 2 | *.jl.*.cov 3 | *.jl.mem 4 | .ipynb_checkpoints/ 5 | *.tmp 6 | *.aux 7 | *.log 8 | *.out 9 | *.tex 10 | /tutorials/**/*.html 11 | /tutorials/**/*.pdf 12 | /tutorials/**/.quarto/* 13 | /tutorials/**/index_files/* 14 | Testing/ 15 | /*/*/jl_*/ 16 | .vscode 17 | _freeze 18 | _site 19 | .quarto 20 | /.quarto/ 21 | changelog.qmd 22 | versions.qmd 23 | tmp.gif 24 | .venv 25 | venv 26 | 27 | 404.html 28 | site_libs 29 | .DS_Store 30 | -------------------------------------------------------------------------------- /404.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Page Not Found 3 | --- 4 | 5 | The page you requested cannot be found (perhaps it was moved or renamed). 6 | 7 | You may want to return to the [documentation home page](https://turinglang.org/docs). 8 | 9 | If you believe this is an error, please do report it by [opening an issue](https://github.com/TuringLang/docs/issues/new). 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018-2024, Hong Ge, the Turing language team 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Project.toml: -------------------------------------------------------------------------------- 1 | [deps] 2 | ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b" 3 | AbstractGPs = "99985d1d-32ba-4be9-9821-2ec096f28918" 4 | AbstractMCMC = "80f14c24-f653-4e6a-9b94-39d6b0f70001" 5 | AbstractPPL = "7a57a42e-76ec-4ea3-a279-07e840d6d9cf" 6 | AdvancedHMC = "0bf59076-c3b1-5ca4-86bd-e02cd72cde3d" 7 | AdvancedMH = "5b7e9947-ddc0-4b3f-9b55-0d8042f74170" 8 | Bijectors = "76274a88-744f-5084-9051-94815aaf08c4" 9 | CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" 10 | ComponentArrays = "b0b7db55-cfe3-40fc-9ded-d10e2dbeff66" 11 | DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" 12 | DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8" 13 | DifferentialEquations = "0c46a032-eb83-5123-abaf-570d42b7fbaa" 14 | Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b" 15 | Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f" 16 | DynamicHMC = "bbc10e6e-7c05-544b-b16e-64fede858acb" 17 | DynamicPPL = "366bfd00-2699-11ea-058f-f148b4cae6d8" 18 | FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b" 19 | Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c" 20 | ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210" 21 | Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196" 22 | GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a" 23 | HiddenMarkovModels = "84ca31d5-effc-45e0-bfda-5a68cd981f47" 24 | LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f" 25 | LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" 26 | LogDensityProblems = "6fdf6af0-433a-55f7-b3ed-c6c6e0b8df7c" 27 | LogDensityProblemsAD = "996a588d-648d-4e1f-a8f0-a84b347e47b1" 28 | LogExpFunctions = "2ab3a3ac-af41-5b50-aa03-7779005ae688" 29 | Lux = "b2108857-7c20-44ae-9111-449ecde12c47" 30 | MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d" 31 | MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d" 32 | MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54" 33 | MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09" 34 | Measures = "442fdcdd-2543-5da2-b0f3-8c86c306513e" 35 | Memoization = "6fafb56a-5788-4b4e-91ca-c0cea6611c73" 36 | MicroCanonicalHMC = "234d2aa0-2291-45f7-9047-6fa6f316b0a8" 37 | Mooncake = "da2b9cff-9c12-43a0-ae48-6db2b0edb7d6" 38 | NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd" 39 | Optimization = "7f7a1694-90dd-40f0-9382-eb1efda571ba" 40 | OptimizationNLopt = "4e6fcdb7-1186-4e1f-a706-475e75c168bb" 41 | OptimizationOptimJL = "36348300-93cb-4f02-beb5-3c3902f8871e" 42 | PDMats = "90014a1f-27ba-587c-ab20-58faa44d9150" 43 | Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" 44 | RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b" 45 | Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" 46 | ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267" 47 | SciMLSensitivity = "1ed8b502-d754-442c-8d5d-10ac956f44a1" 48 | Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" 49 | StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91" 50 | StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c" 51 | StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd" 52 | Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0" 53 | UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed" 54 | Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f" 55 | 56 | [compat] 57 | Turing = "0.38" 58 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Turing.jl Documentation and Tutorials 2 | 3 | **https://turinglang.org/docs/** 4 | 5 | ## Contributing 6 | 7 | The easiest way to contribute to the documentation is to simply open a pull request. 8 | A preview version of the documentation is built for PRs, so you can see how your changes look without having to build the entire site locally. 9 | (Note that if you are editing a tutorial that takes a long time to run, this feedback may take a while.) 10 | 11 | The `main` branch contains the Quarto source code. 12 | The HTML documentation is automatically built using GitHub Actions, and deployed to the `gh-pages` branch, so you do not have to build and commit the HTML files yourself. 13 | 14 | ## Local development 15 | 16 | If you wish to render the docs website locally, you'll need to have [Quarto](https://quarto.org/docs/download/) installed (at least version 1.6.31) on your computer. 17 | Then: 18 | 19 | 1. Clone this repository: 20 | 21 | ```bash 22 | git clone https://github.com/TuringLang/docs 23 | ``` 24 | 25 | 2. Navigate into the cloned directory: 26 | 27 | ```bash 28 | cd docs 29 | ``` 30 | 31 | 3. Instantiate the project environment: 32 | 33 | ```bash 34 | julia --project=. -e 'using Pkg; Pkg.instantiate()' 35 | ``` 36 | 37 | 4. Preview the website using Quarto. 38 | 39 | > [!WARNING] 40 | > 41 | > This will take a _very_ long time, as it will build every tutorial from scratch. See [below](#faster-rendering) for ways to speed this up. 42 | 43 | ```bash 44 | quarto preview 45 | ``` 46 | 47 | This will launch a local server at http://localhost:4200/, which you can view in your web browser by navigating to the link shown in your terminal. 48 | 49 | 5. Render the website locally: 50 | 51 | ```bash 52 | quarto render 53 | ``` 54 | 55 | This will build the entire documentation and place the output in the `_site` folder. 56 | You can then view the rendered website by launching a HTTP server from that directory, e.g. using Python: 57 | 58 | ```bash 59 | cd _site 60 | python -m http.server 8000 61 | ``` 62 | 63 | Then, navigate to http://localhost:8000/ in your web browser. 64 | 65 | ## Faster rendering 66 | 67 | Note that rendering the entire documentation site can take a long time (usually multiple hours). 68 | If you wish to speed up local rendering, there are two options available: 69 | 70 | 1. Render a single tutorial or `qmd` file without compiling the entire site. 71 | To do this, pass the `qmd` file as an argument to `quarto render`: 72 | 73 | ``` 74 | quarto render path/to/index.qmd 75 | ``` 76 | 77 | (Note that `quarto preview` does not support this single-file rendering.) 78 | 79 | 2. Download the most recent `_freeze` folder from the [GitHub releases of this repo](https://github.com/turinglang/docs/releases), and place it in the root of the project. 80 | The `_freeze` folder stores the cached outputs from a previous build of the documentation. 81 | If it is present, Quarto will reuse the outputs of previous computations for any files for which the source is unchanged. 82 | 83 | Note that the validity of a `_freeze` folder depends on the Julia environment that it was created with, because different package versions may lead to different outputs. 84 | In the GitHub release, the `Manifest.toml` is also provided, and you should also download this and place it in the root directory of the docs. 85 | 86 | If there isn't a suitably up-to-date `_freeze` folder in the releases, you can generate a new one by [triggering a run for the `create_release.yml` workflow](https://github.com/TuringLang/docs/actions/workflows/create_release.yml). 87 | (You will need to have the appropriate permissions; please create an issue if you need help with this.) 88 | 89 | ## Troubleshooting build issues 90 | 91 | As described in the [Quarto docs](https://quarto.org/docs/computations/julia.html#using-the-julia-engine), Quarto's Julia engine uses a worker process behind the scenes. 92 | Sometimes this can result in issues with old package code not being unloaded (e.g. when package versions are upgraded). 93 | If you find that Quarto's execution is failing with errors that aren't reproducible via a normal REPL, try adding the `--execute-daemon-restart` flag to the `quarto render` command: 94 | 95 | ```bash 96 | quarto render /path/to/index.qmd --execute-daemon-restart 97 | ``` 98 | 99 | And also, kill any stray Quarto processes that are still running (sometimes it keeps running in the background): 100 | 101 | ```bash 102 | pkill -9 -f quarto 103 | ``` 104 | 105 | ## License 106 | 107 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. 108 | -------------------------------------------------------------------------------- /_quarto.yml: -------------------------------------------------------------------------------- 1 | project: 2 | type: website 3 | preview: 4 | # Change port if it's busy in your system or just remove this line so that It will automatically use any free port 5 | port: 4200 6 | browser: true 7 | 8 | 9 | # These cannot be used as variables. They are reserved for the project configuration. 10 | website: 11 | title: "Turing.jl" 12 | site-url: "https://turinglang.org/docs/" 13 | favicon: "assets/favicon.ico" 14 | search: 15 | location: navbar 16 | type: overlay 17 | navbar: 18 | logo: "assets/images/turing-logo.svg" 19 | logo-href: https://turinglang.org/ 20 | background: "#073c44" 21 | foreground: "#ffffff" 22 | left: 23 | - href: getting-started/ 24 | text: Get Started 25 | - href: tutorials/coin-flipping/ 26 | text: Tutorials 27 | - href: https://turinglang.org/library/ 28 | text: Libraries 29 | - href: https://turinglang.org/news/ 30 | text: News 31 | - href: https://turinglang.org/team/ 32 | text: Team 33 | right: 34 | # Current version 35 | - text: "v0.38" 36 | menu: 37 | - text: Changelog 38 | href: https://turinglang.org/docs/changelog.html 39 | - text: All Versions 40 | href: https://turinglang.org/docs/versions.html 41 | tools: 42 | - icon: twitter 43 | href: https://x.com/TuringLang 44 | text: Turing Twitter 45 | - icon: github 46 | href: https://github.com/TuringLang/Turing.jl 47 | text: Turing GitHub 48 | 49 | sidebar: 50 | - text: documentation 51 | collapse-level: 1 52 | contents: 53 | - getting-started/index.qmd 54 | - core-functionality/index.qmd 55 | 56 | - section: "User Guide" 57 | collapse-level: 1 58 | contents: 59 | - usage/automatic-differentiation/index.qmd 60 | - usage/custom-distribution/index.qmd 61 | - usage/probability-interface/index.qmd 62 | - usage/modifying-logprob/index.qmd 63 | - usage/tracking-extra-quantities/index.qmd 64 | - usage/mode-estimation/index.qmd 65 | - usage/performance-tips/index.qmd 66 | - usage/sampler-visualisation/index.qmd 67 | - usage/dynamichmc/index.qmd 68 | - usage/external-samplers/index.qmd 69 | - usage/troubleshooting/index.qmd 70 | 71 | - section: "Tutorials" 72 | contents: 73 | - tutorials/coin-flipping/index.qmd 74 | - tutorials/gaussian-mixture-models/index.qmd 75 | - tutorials/bayesian-logistic-regression/index.qmd 76 | - tutorials/bayesian-neural-networks/index.qmd 77 | - tutorials/hidden-markov-models/index.qmd 78 | - tutorials/bayesian-linear-regression/index.qmd 79 | - tutorials/infinite-mixture-models/index.qmd 80 | - tutorials/bayesian-poisson-regression/index.qmd 81 | - tutorials/multinomial-logistic-regression/index.qmd 82 | - tutorials/variational-inference/index.qmd 83 | - tutorials/bayesian-differential-equations/index.qmd 84 | - tutorials/probabilistic-pca/index.qmd 85 | - tutorials/bayesian-time-series-analysis/index.qmd 86 | - tutorials/gaussian-processes-introduction/index.qmd 87 | - tutorials/gaussian-process-latent-variable-models/index.qmd 88 | 89 | - section: "Developers" 90 | contents: 91 | - developers/contributing/index.qmd 92 | 93 | - section: "DynamicPPL's Compiler" 94 | collapse-level: 1 95 | contents: 96 | - developers/compiler/model-manual/index.qmd 97 | - developers/compiler/minituring-compiler/index.qmd 98 | - developers/compiler/minituring-contexts/index.qmd 99 | - developers/compiler/design-overview/index.qmd 100 | 101 | - section: "DynamicPPL Contexts" 102 | collapse-level: 1 103 | contents: 104 | - developers/contexts/submodel-condition/index.qmd 105 | 106 | - section: "Variable Transformations" 107 | collapse-level: 1 108 | contents: 109 | - developers/transforms/distributions/index.qmd 110 | - developers/transforms/bijectors/index.qmd 111 | - developers/transforms/dynamicppl/index.qmd 112 | 113 | - section: "Inference in Detail" 114 | collapse-level: 1 115 | contents: 116 | - developers/inference/variational-inference/index.qmd 117 | - developers/inference/implementing-samplers/index.qmd 118 | 119 | page-footer: 120 | background: "#073c44" 121 | left: | 122 | Turing is created by Hong Ge, and lovingly maintained by the core team of volunteers.
123 | The contents of this website are © 2024 under the terms of the MIT License. 124 | 125 | right: 126 | - icon: twitter 127 | href: https://x.com/TuringLang 128 | aria-label: Turing Twitter 129 | - icon: github 130 | href: https://github.com/TuringLang/Turing.jl 131 | aria-label: Turing GitHub 132 | 133 | back-to-top-navigation: true 134 | repo-url: https://github.com/TuringLang/docs 135 | repo-actions: [edit, issue] 136 | repo-branch: main 137 | repo-link-target: _blank 138 | page-navigation: true 139 | 140 | format: 141 | html: 142 | theme: 143 | light: cosmo 144 | dark: [cosmo, theming/theme-dark.scss] 145 | css: theming/styles.css 146 | smooth-scroll: true 147 | output-block-background: true 148 | toc: true 149 | toc-title: "Table of Contents" 150 | code-fold: false 151 | code-overflow: scroll 152 | execute: 153 | echo: true 154 | output: true 155 | freeze: auto 156 | include-in-header: 157 | - text: | 158 | 166 | 167 | # These variables can be used in any qmd files, e.g. for links: 168 | # the [Getting Started page]({{< meta get-started >}}) 169 | # Note that you don't need to prepend `../../` to the link, Quarto will figure 170 | # it out automatically. 171 | 172 | get-started: tutorials/docs-00-getting-started 173 | tutorials-intro: tutorials/00-introduction 174 | gaussian-mixture-model: tutorials/01-gaussian-mixture-model 175 | logistic-regression: tutorials/02-logistic-regression 176 | bayesian-neural-network: tutorials/03-bayesian-neural-network 177 | hidden-markov-model: tutorials/04-hidden-markov-model 178 | linear-regression: tutorials/05-linear-regression 179 | infinite-mixture-model: tutorials/06-infinite-mixture-model 180 | poisson-regression: tutorials/07-poisson-regression 181 | multinomial-logistic-regression: tutorials/08-multinomial-logistic-regression 182 | variational-inference: tutorials/09-variational-inference 183 | bayesian-differential-equations: tutorials/10-bayesian-differential-equations 184 | probabilistic-pca: tutorials/11-probabilistic-pca 185 | gplvm: tutorials/12-gplvm 186 | seasonal-time-series: tutorials/13-seasonal-time-series 187 | using-turing-advanced: tutorials/docs-09-using-turing-advanced 188 | using-turing: tutorials/docs-12-using-turing-guide 189 | 190 | usage-automatic-differentiation: usage/automatic-differentiation 191 | usage-custom-distribution: usage/custom-distribution 192 | usage-dynamichmc: usage/dynamichmc 193 | usage-external-samplers: usage/external-samplers 194 | usage-mode-estimation: usage/mode-estimation 195 | usage-modifying-logprob: usage/modifying-logprob 196 | usage-performance-tips: usage/performance-tips 197 | usage-probability-interface: usage/probability-interface 198 | usage-sampler-visualisation: usage/sampler-visualisation 199 | usage-tracking-extra-quantities: usage/tracking-extra-quantities 200 | usage-troubleshooting: usage/troubleshooting 201 | 202 | contributing-guide: developers/contributing 203 | dev-model-manual: developers/compiler/model-manual 204 | contexts: developers/compiler/minituring-contexts 205 | minituring: developers/compiler/minituring-compiler 206 | using-turing-compiler: developers/compiler/design-overview 207 | using-turing-variational-inference: developers/inference/variational-inference 208 | using-turing-implementing-samplers: developers/inference/implementing-samplers 209 | dev-transforms-distributions: developers/transforms/distributions 210 | dev-transforms-bijectors: developers/transforms/bijectors 211 | dev-transforms-dynamicppl: developers/transforms/dynamicppl 212 | dev-contexts-submodel-condition: developers/contexts/submodel-condition 213 | -------------------------------------------------------------------------------- /assets/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TuringLang/docs/846abd6730853cacd7202e517eb011c42f89d45b/assets/favicon.ico -------------------------------------------------------------------------------- /assets/scripts/changelog.sh: -------------------------------------------------------------------------------- 1 | url="https://raw.githubusercontent.com/TuringLang/Turing.jl/main/HISTORY.md" 2 | 3 | changelog_content=$(curl -s "$url") 4 | 5 | cat << EOF > changelog.qmd 6 | --- 7 | title: Changelog 8 | repo-actions: false 9 | include-in-header: 10 | - text: | 11 | 19 | --- 20 | 21 | $changelog_content 22 | EOF 23 | -------------------------------------------------------------------------------- /assets/scripts/versions.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | REPO_URL="https://api.github.com/repos/TuringLang/Turing.jl/tags" 4 | 5 | # Fetch the tags 6 | TAGS=$(curl -s $REPO_URL | grep 'name' | sed 's/.*: "\(.*\)",/\1/') 7 | 8 | # Filter out pre-release version tags (e.g., 0.33.0-rc.1) and keep only stable version tags 9 | STABLE_TAGS=$(echo "$TAGS" | grep -Eo 'v[0-9]+\.[0-9]+\.[0-9]+$') 10 | 11 | # Find the latest version (including bug fix versions) 12 | LATEST_VERSION=$(echo "$STABLE_TAGS" | head -n 1) 13 | 14 | # Find the latest minor version (without bug fix) 15 | STABLE_VERSION=$(echo "$STABLE_TAGS" | grep -Eo 'v[0-9]+\.[0-9]+(\.0)?$' | head -n 1) 16 | 17 | # Filter out bug fix version tags from STABLE_TAGS to get only minor version tags 18 | MINOR_TAGS=$(echo "$STABLE_TAGS" | grep -Eo 'v[0-9]+\.[0-9]+(\.0)?$') 19 | 20 | # Set the minimum version to include in the "Previous Versions" section 21 | MIN_VERSION="v0.31.0" 22 | 23 | # Remove bug-fix version number from the display of a version 24 | remove_bugfix() { 25 | echo "$1" | sed -E 's/\.[0-9]$//' 26 | } 27 | 28 | # versions.qmd file will be generated from this content 29 | VERSIONS_CONTENT="--- 30 | pagetitle: Versions 31 | repo-actions: false 32 | include-in-header: 33 | - text: | 34 | 42 | --- 43 | 44 | # Latest Version 45 | | | | | 46 | | --- | --- | --- | 47 | | $(remove_bugfix "$LATEST_VERSION") | [Documentation](versions/${LATEST_VERSION}/) | [Changelog](changelog.qmd) | 48 | 49 | # Previous Versions 50 | | | | 51 | | --- | --- | 52 | " 53 | # Add previous versions, excluding the latest and stable versions 54 | for MINOR_TAG in $MINOR_TAGS; do 55 | if [ "$MINOR_TAG" != "$LATEST_VERSION" ] && [ "$MINOR_TAG" != "$STABLE_VERSION" ] && [ "$MINOR_TAG" \> "$MIN_VERSION" ]; then 56 | # Find the latest bug fix version for the current minor version 57 | LATEST_BUG_FIX=$(echo "$STABLE_TAGS" | grep "^${MINOR_TAG%.*}" | sort -r | head -n 1) 58 | # Remove trailing .0 from display version 59 | DISPLAY_MINOR_TAG=$(remove_bugfix "$MINOR_TAG") 60 | VERSIONS_CONTENT="${VERSIONS_CONTENT}| ${DISPLAY_MINOR_TAG} | [Documentation](versions/${LATEST_BUG_FIX}/) | 61 | " 62 | fi 63 | done 64 | 65 | # Add the Archived Versions section manually 66 | VERSIONS_CONTENT="${VERSIONS_CONTENT} 67 | # Archived Versions 68 | Documentation for archived versions is available on our deprecated documentation site. 69 | 70 | | | | 71 | | --- | --- | 72 | | v0.31 | [Documentation](../v0.31.4/) | 73 | | v0.30 | [Documentation](../v0.30.9/) | 74 | | v0.29 | [Documentation](../v0.29.3/) | 75 | | v0.28 | [Documentation](../v0.28.3/) | 76 | | v0.27 | [Documentation](../v0.27.0/) | 77 | | v0.26 | [Documentation](../v0.26.6/) | 78 | | v0.25 | [Documentation](../v0.25.3/) | 79 | | v0.24 | [Documentation](../v0.24.4/) | 80 | " 81 | 82 | # Write the content to the versions.qmd file 83 | echo "$VERSIONS_CONTENT" > versions.qmd 84 | -------------------------------------------------------------------------------- /developers/compiler/model-manual/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Manually Defining a Model 3 | engine: julia 4 | aliases: 5 | - ../../../tutorials/dev-model-manual/index.html 6 | --- 7 | 8 | Traditionally, models in Turing are defined using the `@model` macro: 9 | 10 | ```{julia} 11 | using Turing 12 | 13 | @model function gdemo(x) 14 | # Set priors. 15 | s² ~ InverseGamma(2, 3) 16 | m ~ Normal(0, sqrt(s²)) 17 | 18 | # Observe each value of x. 19 | x .~ Normal(m, sqrt(s²)) 20 | 21 | return nothing 22 | end 23 | 24 | model = gdemo([1.5, 2.0]) 25 | ``` 26 | 27 | The `@model` macro accepts a function definition and rewrites it such that call of the function generates a `Model` struct for use by the sampler. 28 | 29 | However, models can be constructed by hand without the use of a macro. 30 | Taking the `gdemo` model above as an example, the macro-based definition can be implemented also (a bit less generally) with the macro-free version 31 | 32 | ```{julia} 33 | using DynamicPPL 34 | 35 | # Create the model function. 36 | function gdemo2(model, varinfo, context, x) 37 | # Assume s² has an InverseGamma distribution. 38 | s², varinfo = DynamicPPL.tilde_assume!!( 39 | context, InverseGamma(2, 3), Turing.@varname(s²), varinfo 40 | ) 41 | 42 | # Assume m has a Normal distribution. 43 | m, varinfo = DynamicPPL.tilde_assume!!( 44 | context, Normal(0, sqrt(s²)), Turing.@varname(m), varinfo 45 | ) 46 | 47 | # Observe each value of x[i] according to a Normal distribution. 48 | for i in eachindex(x) 49 | _retval, varinfo = DynamicPPL.tilde_observe!!( 50 | context, Normal(m, sqrt(s²)), x[i], Turing.@varname(x[i]), varinfo 51 | ) 52 | end 53 | 54 | # The final return statement should comprise both the original return 55 | # value and the updated varinfo. 56 | return nothing, varinfo 57 | end 58 | gdemo2(x) = Turing.Model(gdemo2, (; x)) 59 | 60 | # Instantiate a Model object with our data variables. 61 | model2 = gdemo2([1.5, 2.0]) 62 | ``` 63 | 64 | We can sample from this model in the same way: 65 | 66 | ```{julia} 67 | chain = sample(model2, NUTS(), 1000; progress=false) 68 | ``` 69 | 70 | The subsequent pages in this section will show how the `@model` macro does this behind-the-scenes. 71 | -------------------------------------------------------------------------------- /developers/contributing/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Contributing 3 | aliases: 4 | - ../../tutorials/docs-01-contributing-guide/index.html 5 | --- 6 | 7 | Turing is an open-source project and is [hosted on GitHub](https://github.com/TuringLang). 8 | We welcome contributions from the community in all forms large or small: bug reports, feature implementations, code contributions, or improvements to documentation or infrastructure are all extremely valuable. 9 | We would also very much appreciate examples of models written using Turing. 10 | 11 | ### How to get involved 12 | 13 | Our outstanding issues are tabulated on our [issue tracker](https://github.com/TuringLang/Turing.jl/issues). 14 | Closing one of these may involve implementing new features, fixing bugs, or writing example models. 15 | 16 | You can also join the `#turing` channel on the [Julia Slack](https://julialang.org/slack/) and say hello! 17 | 18 | If you are new to open-source software, please see [GitHub's introduction](https://guides.github.com/introduction/flow/) or [Julia's contribution guide](https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md) on using version control for collaboration. 19 | 20 | ### Documentation 21 | 22 | Each of the packages in the Turing ecosystem (see [Libraries](/library)) has its own documentation, which is typically found in the `docs` folder of the corresponding package. 23 | For example, the source code for DynamicPPL's documentation can be found in [its repository](https://github.com/TuringLang/DynamicPPL.jl). 24 | 25 | The documentation for Turing.jl itself consists of the tutorials that you see on this website, and is built from the separate [`docs` repository](https://github.com/TuringLang/docs). 26 | None of the documentation is generated from the [main Turing.jl repository](https://github.com/TuringLang/Turing.jl); in particular, the API that Turing exports does not currently form part of the documentation. 27 | 28 | Other sections of the website (anything that isn't a package, or a tutorial) – for example, the list of libraries – is built from the [`turinglang.github.io` repository](https://github.com/TuringLang/turinglang.github.io). 29 | 30 | ### Tests 31 | 32 | Turing, like most software libraries, has a test suite. You can run the whole suite by running `julia --project=.` from the root of the Turing repository, and then running 33 | 34 | ```julia 35 | import Pkg; Pkg.test("Turing") 36 | ``` 37 | 38 | The test suite subdivides into files in the `test` folder, and you can run only some of them using commands like 39 | 40 | ```julia 41 | import Pkg; Pkg.test("Turing"; test_args=["optim", "hmc", "--skip", "ext"]) 42 | ``` 43 | 44 | This one would run all files with "optim" or "hmc" in their path, such as `test/optimisation/Optimisation.jl`, but not files with "ext" in their path. Alternatively, you can set these arguments as command line arguments when you run Julia 45 | 46 | ```julia 47 | julia --project=. -e 'import Pkg; Pkg.test(; test_args=ARGS)' -- optim hmc --skip ext 48 | ``` 49 | 50 | Or otherwise, set the global `ARGS` variable, and call `include("test/runtests.jl")`. 51 | 52 | ### Style Guide 53 | 54 | Turing has a style guide, described below. 55 | Reviewing it before making a pull request is not strictly necessary, but you may be asked to change portions of your code to conform with the style guide before it is merged. 56 | 57 | Most Turing code follows [Blue: a Style Guide for Julia](https://github.com/JuliaDiff/BlueStyle). 58 | These conventions were created from a variety of sources including Python's [PEP8](http://legacy.python.org/dev/peps/pep-0008/), Julia's [Notes for Contributors](https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md), and Julia's [Style Guide](https://docs.julialang.org/en/v1/manual/style-guide/). 59 | 60 | #### Synopsis 61 | 62 | - Use 4 spaces per indentation level, no tabs. 63 | - Try to adhere to a 92 character line length limit. 64 | - Use upper camel case convention for [modules](https://docs.julialang.org/en/v1/manual/modules/) and [types](https://docs.julialang.org/en/v1/manual/types/). 65 | - Use lower case with underscores for method names (note: Julia code likes to use lower case without underscores). 66 | - Comments are good, try to explain the intentions of the code. 67 | - Use whitespace to make the code more readable. 68 | - No whitespace at the end of a line (trailing whitespace). 69 | - Avoid padding brackets with spaces. ex. `Int64(value)` preferred over `Int64( value )`. 70 | 71 | #### A Word on Consistency 72 | 73 | When adhering to the Blue style, it's important to realize that these are guidelines, not rules. This is [stated best in the PEP8](http://legacy.python.org/dev/peps/pep-0008/#a-foolish-consistency-is-the-hobgoblin-of-little-minds): 74 | 75 | > A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important. 76 | 77 | > But most importantly: know when to be inconsistent – sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask! 78 | 79 | -------------------------------------------------------------------------------- /developers/transforms/dynamicppl/dynamicppl_link.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TuringLang/docs/846abd6730853cacd7202e517eb011c42f89d45b/developers/transforms/dynamicppl/dynamicppl_link.png -------------------------------------------------------------------------------- /developers/transforms/dynamicppl/dynamicppl_link2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TuringLang/docs/846abd6730853cacd7202e517eb011c42f89d45b/developers/transforms/dynamicppl/dynamicppl_link2.png -------------------------------------------------------------------------------- /getting-started/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Getting Started 3 | engine: julia 4 | aliases: 5 | - ../tutorials/docs-00-getting-started/index.html 6 | - ../index.html 7 | --- 8 | 9 | ```{julia} 10 | #| echo: false 11 | #| output: false 12 | using Pkg; 13 | Pkg.instantiate(); 14 | ``` 15 | 16 | ### Installation 17 | 18 | To use Turing, you need to install Julia first and then install Turing. 19 | 20 | You will need to install Julia 1.10 or greater, which you can get from [the official Julia website](http://julialang.org/downloads/). 21 | 22 | Turing is officially registered in the [Julia General package registry](https://github.com/JuliaRegistries/General), which means that you can install a stable version of Turing by running the following in the Julia REPL: 23 | 24 | ```{julia} 25 | #| eval: false 26 | #| output: false 27 | using Pkg 28 | Pkg.add("Turing") 29 | ``` 30 | 31 | ### Supported versions and platforms 32 | 33 | Formally, we only run continuous integration tests on: (1) the minimum supported minor version (typically an LTS release), and (2) the latest minor version of Julia. 34 | We test on Linux (x64), macOS (Apple Silicon), and Windows (x64). 35 | The Turing developer team will prioritise fixing issues on these platforms and versions. 36 | 37 | If you run into a problem on a different version (e.g. older patch releases) or platforms (e.g. 32-bit), please do feel free to [post an issue](https://github.com/TuringLang/Turing.jl/issues/new?template=01-bug-report.yml)! 38 | If we are able to help, we will try to fix it, but we cannot guarantee support for untested versions. 39 | 40 | ### Example usage 41 | 42 | First, we load the Turing and StatsPlots modules. 43 | The latter is required for visualising the results. 44 | 45 | ```{julia} 46 | using Turing 47 | using StatsPlots 48 | ``` 49 | 50 | We then specify our model, which is a simple Gaussian model with unknown mean and variance. 51 | Models are defined as ordinary Julia functions, prefixed with the `@model` macro. 52 | Each statement inside closely resembles how the model would be defined with mathematical notation. 53 | Here, both `x` and `y` are observed values, and are therefore passed as function parameters. 54 | `m` and `s²` are the parameters to be inferred. 55 | 56 | ```{julia} 57 | @model function gdemo(x, y) 58 | s² ~ InverseGamma(2, 3) 59 | m ~ Normal(0, sqrt(s²)) 60 | x ~ Normal(m, sqrt(s²)) 61 | y ~ Normal(m, sqrt(s²)) 62 | end 63 | ``` 64 | 65 | Suppose we observe `x = 1.5` and `y = 2`, and want to infer the mean and variance. 66 | We can pass these data as arguments to the `gdemo` function, and run a sampler to collect the results. 67 | Here, we collect 1000 samples using the No U-Turn Sampler (NUTS) algorithm. 68 | 69 | ```{julia} 70 | chain = sample(gdemo(1.5, 2), NUTS(), 1000, progress=false) 71 | ``` 72 | 73 | We can plot the results: 74 | 75 | ```{julia} 76 | plot(chain) 77 | ``` 78 | 79 | and obtain summary statistics by indexing the chain: 80 | 81 | ```{julia} 82 | mean(chain[:m]), mean(chain[:s²]) 83 | ``` 84 | 85 | ### Where to go next 86 | 87 | ::: {.callout-note title="Note on prerequisites"} 88 | Familiarity with Julia is assumed throughout the Turing documentation. 89 | If you are new to Julia, [Learning Julia](https://julialang.org/learning/) is a good starting point. 90 | 91 | The underlying theory of Bayesian machine learning is not explained in detail in this documentation. 92 | A thorough introduction to the field is [*Pattern Recognition and Machine Learning*](https://www.springer.com/us/book/9780387310732) (Bishop, 2006); an online version is available [here (PDF, 18.1 MB)](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf). 93 | ::: 94 | 95 | The next page on [Turing's core functionality]({{}}) explains the basic features of the Turing language. 96 | From there, you can either look at [worked examples of how different models are implemented in Turing]({{}}), or [specific tips and tricks that can help you get the most out of Turing]({{}}). 97 | -------------------------------------------------------------------------------- /theming/styles.css: -------------------------------------------------------------------------------- 1 | .navbar a:hover { 2 | text-decoration: none; 3 | } 4 | 5 | .cell-output { 6 | border: 1px dashed; 7 | } 8 | 9 | .cell-bg { 10 | background-color: #f1f3f5; 11 | } 12 | 13 | .cell-output-stdout code { 14 | word-break: break-wor !important; 15 | white-space: pre-wrap !important; 16 | } 17 | 18 | .cell-output-display svg { 19 | height: fit-content; 20 | width: fit-content; 21 | 22 | &.mermaid-js { 23 | /* fit-content for mermaid diagrams makes them really small, so we 24 | * default to 100% */ 25 | width: 100%; 26 | } 27 | } 28 | 29 | .cell-output-display img { 30 | max-width: 100%; 31 | max-height: 100%; 32 | object-fit: contain; 33 | } 34 | 35 | .nav-footer-center { 36 | display: flex; 37 | justify-content: center; 38 | } 39 | 40 | .dropdown-menu { 41 | text-align: center; 42 | min-width: 100px !important; 43 | border-radius: 5px; 44 | max-height: 250px; 45 | overflow: scroll; 46 | } 47 | -------------------------------------------------------------------------------- /theming/theme-dark.scss: -------------------------------------------------------------------------------- 1 | /*-- scss:defaults --*/ 2 | // Cosmo 5.3.3 3 | // Bootswatch 4 | 5 | $theme: "cosmo" !default; 6 | 7 | // Manually-added colors 8 | 9 | $background-nav: #192222; 10 | $background-body: #131818; 11 | $foreground: #1bb3ac; 12 | $links:#2aa198; 13 | $links-hover: #31dce6; 14 | $code-background-color: #172424; 15 | $li: #bcbcbc; 16 | 17 | // Quarto default colors 18 | 19 | $white: #ffffff !default; 20 | $gray-100: #f8f9fa !default; 21 | $gray-200: #e9ecef !default; 22 | $gray-300: #dee2e6 !default; 23 | $gray-400: #ced4da !default; 24 | $gray-500: #adb5bd !default; 25 | $gray-600: #868e96 !default; 26 | $gray-700: #495057 !default; 27 | $gray-800: #373a3c !default; 28 | $gray-900: #212529 !default; 29 | $black: #000000 !default; 30 | 31 | $indigo: #6610f2 !default; 32 | $purple: #613d7c !default; 33 | $pink: #e83e8c !default; 34 | $red: #ff0039 !default; 35 | $orange: #f0ad4e !default; 36 | $yellow: #ff7518 !default; 37 | $green: #3fb618 !default; 38 | $teal: #20c997 !default; 39 | $cyan: #9954bb !default; 40 | 41 | $primary: $links-hover !default; 42 | $secondary: $gray-800 !default; 43 | $success: $green !default; 44 | $info: $cyan !default; 45 | $warning: $yellow !default; 46 | $danger: $red !default; 47 | $light: $gray-100 !default; 48 | $dark: $gray-800 !default; 49 | 50 | $min-contrast-ratio: 2.6 !default; 51 | 52 | // Options 53 | 54 | $enable-rounded: false !default; 55 | 56 | // Fonts 57 | 58 | // stylelint-disable-next-line value-keyword-case 59 | $font-family-sans-serif: "Source Sans Pro", -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol" !default; 60 | $headings-font-weight: 400 !default; 61 | 62 | // Tables 63 | 64 | $table-color: initial !default; 65 | 66 | // Alerts 67 | 68 | $alert-border-width: 0 !default; 69 | 70 | // Progress bars 71 | 72 | $progress-height: .5rem !default; 73 | 74 | 75 | // Custom tweaks for Quarto-Cosmo 76 | 77 | $navbar-bg: $background-nav; 78 | $navbar-fg: $foreground; 79 | $footer-bg: $background-nav; 80 | $footer-fg: $foreground; 81 | $body-color: $white; 82 | $body-bg: $background-body; 83 | 84 | a, pre code { 85 | color: $links !important; 86 | } 87 | 88 | pre { 89 | color: $foreground !important; 90 | } 91 | a:hover { 92 | color: $links-hover !important; 93 | } 94 | 95 | code, p code, ol code, li code, h1 code { 96 | background-color: $code-background-color !important; 97 | color: $links; 98 | } 99 | 100 | .cell, .anchored code { 101 | background-color: $code-background-color !important; 102 | color: $links; 103 | } 104 | 105 | div.sourceCode { 106 | background-color: $code-background-color !important; 107 | } 108 | 109 | li { 110 | color: $li !important; 111 | } 112 | 113 | .menu-text:hover { 114 | color: $links-hover !important; 115 | } 116 | 117 | p { 118 | color: $li !important; 119 | } 120 | 121 | .quarto-title-breadcrumbs .breadcrumb li:last-of-type a { 122 | color: #6c757d !important; 123 | } 124 | 125 | .ansi-bright-black-fg{ 126 | color: $foreground !important; 127 | } 128 | ::selection { 129 | color: $links-hover; 130 | background: $background-nav; 131 | } 132 | 133 | 134 | .tooltip { 135 | --bs-tooltip-color: $black !important; 136 | --bs-tooltip-bg: $white !important; 137 | } 138 | -------------------------------------------------------------------------------- /tutorials/bayesian-linear-regression/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bayesian Linear Regression 3 | engine: julia 4 | aliases: 5 | - ../05-linear-regression/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | Turing is powerful when applied to complex hierarchical models, but it can also be put to task at common statistical procedures, like [linear regression](https://en.wikipedia.org/wiki/Linear_regression). 16 | This tutorial covers how to implement a linear regression model in Turing. 17 | 18 | ## Set Up 19 | 20 | We begin by importing all the necessary libraries. 21 | 22 | ```{julia} 23 | # Import Turing. 24 | using Turing 25 | 26 | # Package for loading the data set. 27 | using RDatasets 28 | 29 | # Package for visualization. 30 | using StatsPlots 31 | 32 | # Functionality for splitting the data. 33 | using MLUtils: splitobs 34 | 35 | # Functionality for constructing arrays with identical elements efficiently. 36 | using FillArrays 37 | 38 | # Functionality for normalizing the data and evaluating the model predictions. 39 | using StatsBase 40 | 41 | # Functionality for working with scaled identity matrices. 42 | using LinearAlgebra 43 | 44 | # Set a seed for reproducibility. 45 | using Random 46 | Random.seed!(0); 47 | ``` 48 | 49 | ```{julia} 50 | #| output: false 51 | setprogress!(false) 52 | ``` 53 | 54 | We will use the `mtcars` dataset from the [RDatasets](https://github.com/JuliaStats/RDatasets.jl) package. 55 | `mtcars` contains a variety of statistics on different car models, including their miles per gallon, number of cylinders, and horsepower, among others. 56 | 57 | We want to know if we can construct a Bayesian linear regression model to predict the miles per gallon of a car, given the other statistics it has. 58 | Let us take a look at the data we have. 59 | 60 | ```{julia} 61 | # Load the dataset. 62 | data = RDatasets.dataset("datasets", "mtcars") 63 | 64 | # Show the first six rows of the dataset. 65 | first(data, 6) 66 | ``` 67 | 68 | ```{julia} 69 | size(data) 70 | ``` 71 | 72 | The next step is to get our data ready for testing. We'll split the `mtcars` dataset into two subsets, one for training our model and one for evaluating our model. Then, we separate the targets we want to learn (`MPG`, in this case) and standardize the datasets by subtracting each column's means and dividing by the standard deviation of that column. The resulting data is not very familiar looking, but this standardization process helps the sampler converge far easier. 73 | 74 | ```{julia} 75 | # Remove the model column. 76 | select!(data, Not(:Model)) 77 | 78 | # Split our dataset 70%/30% into training/test sets. 79 | trainset, testset = map(DataFrame, splitobs(data; at=0.7, shuffle=true)) 80 | 81 | # Turing requires data in matrix form. 82 | target = :MPG 83 | train = Matrix(select(trainset, Not(target))) 84 | test = Matrix(select(testset, Not(target))) 85 | train_target = trainset[:, target] 86 | test_target = testset[:, target] 87 | 88 | # Standardize the features. 89 | dt_features = fit(ZScoreTransform, train; dims=1) 90 | StatsBase.transform!(dt_features, train) 91 | StatsBase.transform!(dt_features, test) 92 | 93 | # Standardize the targets. 94 | dt_targets = fit(ZScoreTransform, train_target) 95 | StatsBase.transform!(dt_targets, train_target) 96 | StatsBase.transform!(dt_targets, test_target); 97 | ``` 98 | 99 | ## Model Specification 100 | 101 | In a traditional frequentist model using [OLS](https://en.wikipedia.org/wiki/Ordinary_least_squares), our model might look like: 102 | 103 | $$ 104 | \mathrm{MPG}_i = \alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i} 105 | $$ 106 | 107 | where $\boldsymbol{\beta}$ is a vector of coefficients and $\boldsymbol{X}$ is a vector of inputs for observation $i$. The Bayesian model we are more concerned with is the following: 108 | 109 | $$ 110 | \mathrm{MPG}_i \sim \mathcal{N}(\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}, \sigma^2) 111 | $$ 112 | 113 | where $\alpha$ is an intercept term common to all observations, $\boldsymbol{\beta}$ is a coefficient vector, $\boldsymbol{X_i}$ is the observed data for car $i$, and $\sigma^2$ is a common variance term. 114 | 115 | For $\sigma^2$, we assign a prior of `truncated(Normal(0, 100); lower=0)`. 116 | This is consistent with [Andrew Gelman's recommendations](http://www.stat.columbia.edu/%7Egelman/research/published/taumain.pdf) on noninformative priors for variance. 117 | The intercept term ($\alpha$) is assumed to be normally distributed with a mean of zero and a variance of three. 118 | This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that. 119 | Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10. 120 | We do not know that our coefficients are different from zero, and we don't know which ones are likely to be the most important, so the variance term is quite high. 121 | Lastly, each observation $y_i$ is distributed according to the calculated `mu` term given by $\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}$. 122 | 123 | ```{julia} 124 | # Bayesian linear regression. 125 | @model function linear_regression(x, y) 126 | # Set variance prior. 127 | σ² ~ truncated(Normal(0, 100); lower=0) 128 | 129 | # Set intercept prior. 130 | intercept ~ Normal(0, sqrt(3)) 131 | 132 | # Set the priors on our coefficients. 133 | nfeatures = size(x, 2) 134 | coefficients ~ MvNormal(Zeros(nfeatures), 10.0 * I) 135 | 136 | # Calculate all the mu terms. 137 | mu = intercept .+ x * coefficients 138 | return y ~ MvNormal(mu, σ² * I) 139 | end 140 | ``` 141 | 142 | With our model specified, we can call the sampler. We will use the No U-Turn Sampler ([NUTS](https://turinglang.org/stable/docs/library/#Turing.Inference.NUTS)) here. 143 | 144 | ```{julia} 145 | model = linear_regression(train, train_target) 146 | chain = sample(model, NUTS(), 5_000) 147 | ``` 148 | 149 | We can also check the densities and traces of the parameters visually using the `plot` functionality. 150 | 151 | ```{julia} 152 | plot(chain) 153 | ``` 154 | 155 | It looks like all parameters have converged. 156 | 157 | ```{julia} 158 | #| echo: false 159 | let 160 | ess_df = ess(chain) 161 | @assert minimum(ess_df[:, :ess]) > 500 "Minimum ESS: $(minimum(ess_df[:, :ess])) - not > 700" 162 | @assert mean(ess_df[:, :ess]) > 2_000 "Mean ESS: $(mean(ess_df[:, :ess])) - not > 2000" 163 | @assert maximum(ess_df[:, :ess]) > 3_500 "Maximum ESS: $(maximum(ess_df[:, :ess])) - not > 3500" 164 | end 165 | ``` 166 | 167 | ## Comparing to OLS 168 | 169 | A satisfactory test of our model is to evaluate how well it predicts. Importantly, we want to compare our model to existing tools like OLS. The code below uses the [GLM.jl](https://juliastats.org/GLM.jl/stable/) package to generate a traditional OLS multiple regression model on the same data as our probabilistic model. 170 | 171 | ```{julia} 172 | # Import the GLM package. 173 | using GLM 174 | 175 | # Perform multiple regression OLS. 176 | train_with_intercept = hcat(ones(size(train, 1)), train) 177 | ols = lm(train_with_intercept, train_target) 178 | 179 | # Compute predictions on the training data set and unstandardize them. 180 | train_prediction_ols = GLM.predict(ols) 181 | StatsBase.reconstruct!(dt_targets, train_prediction_ols) 182 | 183 | # Compute predictions on the test data set and unstandardize them. 184 | test_with_intercept = hcat(ones(size(test, 1)), test) 185 | test_prediction_ols = GLM.predict(ols, test_with_intercept) 186 | StatsBase.reconstruct!(dt_targets, test_prediction_ols); 187 | ``` 188 | 189 | The function below accepts a chain and an input matrix and calculates predictions. We use the samples of the model parameters in the chain starting with sample 200. 190 | 191 | ```{julia} 192 | # Make a prediction given an input vector. 193 | function prediction(chain, x) 194 | p = get_params(chain[200:end, :, :]) 195 | targets = p.intercept' .+ x * reduce(hcat, p.coefficients)' 196 | return vec(mean(targets; dims=2)) 197 | end 198 | ``` 199 | 200 | When we make predictions, we unstandardize them so they are more understandable. 201 | 202 | ```{julia} 203 | # Calculate the predictions for the training and testing sets and unstandardize them. 204 | train_prediction_bayes = prediction(chain, train) 205 | StatsBase.reconstruct!(dt_targets, train_prediction_bayes) 206 | test_prediction_bayes = prediction(chain, test) 207 | StatsBase.reconstruct!(dt_targets, test_prediction_bayes) 208 | 209 | # Show the predictions on the test data set. 210 | DataFrame(; MPG=testset[!, target], Bayes=test_prediction_bayes, OLS=test_prediction_ols) 211 | ``` 212 | 213 | Now let's evaluate the loss for each method, and each prediction set. We will use the mean squared error to evaluate loss, given by 214 | $$ 215 | \mathrm{MSE} = \frac{1}{n} \sum_{i=1}^n {(y_i - \hat{y_i})^2} 216 | $$ 217 | where $y_i$ is the actual value (true MPG) and $\hat{y_i}$ is the predicted value using either OLS or Bayesian linear regression. A lower SSE indicates a closer fit to the data. 218 | 219 | ```{julia} 220 | println( 221 | "Training set:", 222 | "\n\tBayes loss: ", 223 | msd(train_prediction_bayes, trainset[!, target]), 224 | "\n\tOLS loss: ", 225 | msd(train_prediction_ols, trainset[!, target]), 226 | ) 227 | 228 | println( 229 | "Test set:", 230 | "\n\tBayes loss: ", 231 | msd(test_prediction_bayes, testset[!, target]), 232 | "\n\tOLS loss: ", 233 | msd(test_prediction_ols, testset[!, target]), 234 | ) 235 | ``` 236 | 237 | ```{julia} 238 | #| echo: false 239 | let 240 | bayes_train_loss = msd(train_prediction_bayes, trainset[!, target]) 241 | bayes_test_loss = msd(test_prediction_bayes, testset[!, target]) 242 | ols_train_loss = msd(train_prediction_ols, trainset[!, target]) 243 | ols_test_loss = msd(test_prediction_ols, testset[!, target]) 244 | @assert bayes_train_loss < bayes_test_loss "Bayesian training loss ($bayes_train_loss) >= Bayesian test loss ($bayes_test_loss)" 245 | @assert ols_train_loss < ols_test_loss "OLS training loss ($ols_train_loss) >= OLS test loss ($ols_test_loss)" 246 | @assert isapprox(bayes_train_loss, ols_train_loss; rtol=0.01) "Difference between Bayesian training loss ($bayes_train_loss) and OLS training loss ($ols_train_loss) unexpectedly large!" 247 | @assert isapprox(bayes_test_loss, ols_test_loss; rtol=0.05) "Difference between Bayesian test loss ($bayes_test_loss) and OLS test loss ($ols_test_loss) unexpectedly large!" 248 | end 249 | ``` 250 | 251 | As we can see above, OLS and our Bayesian model fit our training and test data set about the same. 252 | -------------------------------------------------------------------------------- /tutorials/bayesian-logistic-regression/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bayesian Logistic Regression 3 | engine: julia 4 | aliases: 5 | - ../02-logistic-regression/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | [Bayesian logistic regression](https://en.wikipedia.org/wiki/Logistic_regression#Bayesian) is the Bayesian counterpart to a common tool in machine learning, logistic regression. 16 | The goal of logistic regression is to predict a one or a zero for a given training item. 17 | An example might be predicting whether someone is sick or ill given their symptoms and personal information. 18 | 19 | In our example, we'll be working to predict whether someone is likely to default with a synthetic dataset found in the `RDatasets` package. This dataset, `Defaults`, comes from R's [ISLR](https://cran.r-project.org/web/packages/ISLR/index.html) package and contains information on borrowers. 20 | 21 | To start, let's import all the libraries we'll need. 22 | 23 | ```{julia} 24 | # Import Turing and Distributions. 25 | using Turing, Distributions 26 | 27 | # Import RDatasets. 28 | using RDatasets 29 | 30 | # Import MCMCChains, Plots, and StatsPlots for visualizations and diagnostics. 31 | using MCMCChains, Plots, StatsPlots 32 | 33 | # We need a logistic function, which is provided by StatsFuns. 34 | using StatsFuns: logistic 35 | 36 | # Functionality for splitting and normalizing the data 37 | using MLDataUtils: shuffleobs, stratifiedobs, rescale! 38 | 39 | # Set a seed for reproducibility. 40 | using Random 41 | Random.seed!(0); 42 | ``` 43 | 44 | ## Data Cleaning & Set Up 45 | 46 | Now we're going to import our dataset. The first six rows of the dataset are shown below so you can get a good feel for what kind of data we have. 47 | 48 | ```{julia} 49 | # Import the "Default" dataset. 50 | data = RDatasets.dataset("ISLR", "Default"); 51 | 52 | # Show the first six rows of the dataset. 53 | first(data, 6) 54 | ``` 55 | 56 | Most machine learning processes require some effort to tidy up the data, and this is no different. We need to convert the `Default` and `Student` columns, which say "Yes" or "No" into 1s and 0s. Afterwards, we'll get rid of the old words-based columns. 57 | 58 | ```{julia} 59 | # Convert "Default" and "Student" to numeric values. 60 | data[!, :DefaultNum] = [r.Default == "Yes" ? 1.0 : 0.0 for r in eachrow(data)] 61 | data[!, :StudentNum] = [r.Student == "Yes" ? 1.0 : 0.0 for r in eachrow(data)] 62 | 63 | # Delete the old columns which say "Yes" and "No". 64 | select!(data, Not([:Default, :Student])) 65 | 66 | # Show the first six rows of our edited dataset. 67 | first(data, 6) 68 | ``` 69 | 70 | After we've done that tidying, it's time to split our dataset into training and testing sets, and separate the labels from the data. We separate our data into two halves, `train` and `test`. You can use a higher percentage of splitting (or a lower one) by modifying the `at = 0.05` argument. We have highlighted the use of only a 5% sample to show the power of Bayesian inference with small sample sizes. 71 | 72 | We must rescale our variables so that they are centered around zero by subtracting each column by the mean and dividing it by the standard deviation. Without this step, Turing's sampler will have a hard time finding a place to start searching for parameter estimates. To do this we will leverage `MLDataUtils`, which also lets us effortlessly shuffle our observations and perform a stratified split to get a representative test set. 73 | 74 | ```{julia} 75 | function split_data(df, target; at=0.70) 76 | shuffled = shuffleobs(df) 77 | return trainset, testset = stratifiedobs(row -> row[target], shuffled; p=at) 78 | end 79 | 80 | features = [:StudentNum, :Balance, :Income] 81 | numerics = [:Balance, :Income] 82 | target = :DefaultNum 83 | 84 | trainset, testset = split_data(data, target; at=0.05) 85 | for feature in numerics 86 | μ, σ = rescale!(trainset[!, feature]; obsdim=1) 87 | rescale!(testset[!, feature], μ, σ; obsdim=1) 88 | end 89 | 90 | # Turing requires data in matrix form, not dataframe 91 | train = Matrix(trainset[:, features]) 92 | test = Matrix(testset[:, features]) 93 | train_label = trainset[:, target] 94 | test_label = testset[:, target]; 95 | ``` 96 | 97 | ## Model Declaration 98 | 99 | Finally, we can define our model. 100 | 101 | `logistic_regression` takes four arguments: 102 | 103 | - `x` is our set of independent variables; 104 | - `y` is the element we want to predict; 105 | - `n` is the number of observations we have; and 106 | - `σ` is the standard deviation we want to assume for our priors. 107 | 108 | Within the model, we create four coefficients (`intercept`, `student`, `balance`, and `income`) and assign a prior of normally distributed with means of zero and standard deviations of `σ`. We want to find values of these four coefficients to predict any given `y`. 109 | 110 | The `for` block creates a variable `v` which is the logistic function. We then observe the likelihood of calculating `v` given the actual label, `y[i]`. 111 | 112 | ```{julia} 113 | # Bayesian logistic regression (LR) 114 | @model function logistic_regression(x, y, n, σ) 115 | intercept ~ Normal(0, σ) 116 | 117 | student ~ Normal(0, σ) 118 | balance ~ Normal(0, σ) 119 | income ~ Normal(0, σ) 120 | 121 | for i in 1:n 122 | v = logistic(intercept + student * x[i, 1] + balance * x[i, 2] + income * x[i, 3]) 123 | y[i] ~ Bernoulli(v) 124 | end 125 | end; 126 | ``` 127 | 128 | ## Sampling 129 | 130 | Now we can run our sampler. This time we'll use [`NUTS`](https://turinglang.org/stable/docs/library/#Turing.Inference.NUTS) to sample from our posterior. 131 | 132 | ```{julia} 133 | #| output: false 134 | setprogress!(false) 135 | ``` 136 | 137 | ```{julia} 138 | #| output: false 139 | # Retrieve the number of observations. 140 | n, _ = size(train) 141 | 142 | # Sample using NUTS. 143 | m = logistic_regression(train, train_label, n, 1) 144 | chain = sample(m, NUTS(), MCMCThreads(), 1_500, 3) 145 | ``` 146 | 147 | ```{julia} 148 | #| echo: false 149 | chain 150 | ``` 151 | 152 | ::: {.callout-warning collapse="true"} 153 | ## Sampling With Multiple Threads 154 | The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains 155 | will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.](https://turinglang.org/dev/docs/using-turing/guide/#sampling-multiple-chains) 156 | ::: 157 | 158 | ```{julia} 159 | #| echo: false 160 | let 161 | mean_params = mean(chain) 162 | @assert mean_params[:student, :mean] < 0.1 163 | @assert mean_params[:balance, :mean] > 1 164 | end 165 | ``` 166 | 167 | Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points. 168 | 169 | ```{julia} 170 | plot(chain) 171 | ``` 172 | 173 | ```{julia} 174 | #| echo: false 175 | let 176 | mean_params = mapreduce(hcat, mean(chain; append_chains=false)) do df 177 | return df[:, :mean] 178 | end 179 | for i in (2, 3) 180 | @assert mean_params[:, i] != mean_params[:, 1] 181 | @assert isapprox(mean_params[:, i], mean_params[:, 1]; rtol=5e-2) 182 | end 183 | end 184 | ``` 185 | 186 | Looks good! 187 | 188 | We can also use the `corner` function from MCMCChains to show the distributions of the various parameters of our logistic regression. 189 | 190 | ```{julia} 191 | # The labels to use. 192 | l = [:student, :balance, :income] 193 | 194 | # Use the corner function. Requires StatsPlots and MCMCChains. 195 | corner(chain, l) 196 | ``` 197 | 198 | Fortunately the corner plot appears to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter's sampled values to estimate our model to make predictions. 199 | 200 | ## Making Predictions 201 | 202 | How do we test how well the model actually predicts whether someone is likely to default? We need to build a prediction function that takes the `test` object we made earlier and runs it through the average parameter calculated during sampling. 203 | 204 | The `prediction` function below takes a `Matrix` and a `Chain` object. It takes the mean of each parameter's sampled values and re-runs the logistic function using those mean values for every element in the test set. 205 | 206 | ```{julia} 207 | function prediction(x::Matrix, chain, threshold) 208 | # Pull the means from each parameter's sampled values in the chain. 209 | intercept = mean(chain[:intercept]) 210 | student = mean(chain[:student]) 211 | balance = mean(chain[:balance]) 212 | income = mean(chain[:income]) 213 | 214 | # Retrieve the number of rows. 215 | n, _ = size(x) 216 | 217 | # Generate a vector to store our predictions. 218 | v = Vector{Float64}(undef, n) 219 | 220 | # Calculate the logistic function for each element in the test set. 221 | for i in 1:n 222 | num = logistic( 223 | intercept .+ student * x[i, 1] + balance * x[i, 2] + income * x[i, 3] 224 | ) 225 | if num >= threshold 226 | v[i] = 1 227 | else 228 | v[i] = 0 229 | end 230 | end 231 | return v 232 | end; 233 | ``` 234 | 235 | Let's see how we did! We run the test matrix through the prediction function, and compute the [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error) (MSE) for our prediction. The `threshold` variable sets the sensitivity of the predictions. For example, a threshold of 0.07 will predict a defualt value of 1 for any predicted value greater than 0.07 and no default if it is less than 0.07. 236 | 237 | ```{julia} 238 | # Set the prediction threshold. 239 | threshold = 0.07 240 | 241 | # Make the predictions. 242 | predictions = prediction(test, chain, threshold) 243 | 244 | # Calculate MSE for our test set. 245 | loss = sum((predictions - test_label) .^ 2) / length(test_label) 246 | ``` 247 | 248 | Perhaps more important is to see what percentage of defaults we correctly predicted. The code below simply counts defaults and predictions and presents the results. 249 | 250 | ```{julia} 251 | defaults = sum(test_label) 252 | not_defaults = length(test_label) - defaults 253 | 254 | predicted_defaults = sum(test_label .== predictions .== 1) 255 | predicted_not_defaults = sum(test_label .== predictions .== 0) 256 | 257 | println("Defaults: $defaults 258 | Predictions: $predicted_defaults 259 | Percentage defaults correct $(predicted_defaults/defaults)") 260 | 261 | println("Not defaults: $not_defaults 262 | Predictions: $predicted_not_defaults 263 | Percentage non-defaults correct $(predicted_not_defaults/not_defaults)") 264 | ``` 265 | 266 | ```{julia} 267 | #| echo: false 268 | let 269 | percentage_correct = predicted_defaults / defaults 270 | @assert 0.6 < percentage_correct 271 | end 272 | ``` 273 | 274 | The above shows that with a threshold of 0.07, we correctly predict a respectable portion of the defaults, and correctly identify most non-defaults. This is fairly sensitive to a choice of threshold, and you may wish to experiment with it. 275 | 276 | This tutorial has demonstrated how to use Turing to perform Bayesian logistic regression. 277 | -------------------------------------------------------------------------------- /tutorials/bayesian-neural-networks/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bayesian Neural Networks 3 | engine: julia 4 | aliases: 5 | - ../03-bayesian-neural-network/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | In this tutorial, we demonstrate how one can implement a Bayesian Neural Network using a combination of Turing and [Lux](https://github.com/LuxDL/Lux.jl), a suite of machine learning tools. We will use Lux to specify the neural network's layers and Turing to implement the probabilistic inference, with the goal of implementing a classification algorithm. 16 | 17 | We will begin with importing the relevant libraries. 18 | 19 | ```{julia} 20 | using Turing 21 | using FillArrays 22 | using Lux 23 | using Plots 24 | import Mooncake 25 | using Functors 26 | 27 | using LinearAlgebra 28 | using Random 29 | ``` 30 | 31 | Our goal here is to use a Bayesian neural network to classify points in an artificial dataset. 32 | The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we will be working with. 33 | 34 | ```{julia} 35 | # Number of points to generate 36 | N = 80 37 | M = round(Int, N / 4) 38 | rng = Random.default_rng() 39 | Random.seed!(rng, 1234) 40 | 41 | # Generate artificial data 42 | x1s = rand(rng, Float32, M) * 4.5f0; 43 | x2s = rand(rng, Float32, M) * 4.5f0; 44 | xt1s = Array([[x1s[i] + 0.5f0; x2s[i] + 0.5f0] for i in 1:M]) 45 | x1s = rand(rng, Float32, M) * 4.5f0; 46 | x2s = rand(rng, Float32, M) * 4.5f0; 47 | append!(xt1s, Array([[x1s[i] - 5.0f0; x2s[i] - 5.0f0] for i in 1:M])) 48 | 49 | x1s = rand(rng, Float32, M) * 4.5f0; 50 | x2s = rand(rng, Float32, M) * 4.5f0; 51 | xt0s = Array([[x1s[i] + 0.5f0; x2s[i] - 5.0f0] for i in 1:M]) 52 | x1s = rand(rng, Float32, M) * 4.5f0; 53 | x2s = rand(rng, Float32, M) * 4.5f0; 54 | append!(xt0s, Array([[x1s[i] - 5.0f0; x2s[i] + 0.5f0] for i in 1:M])) 55 | 56 | # Store all the data for later 57 | xs = [xt1s; xt0s] 58 | ts = [ones(2 * M); zeros(2 * M)] 59 | 60 | # Plot data points. 61 | function plot_data() 62 | x1 = map(e -> e[1], xt1s) 63 | y1 = map(e -> e[2], xt1s) 64 | x2 = map(e -> e[1], xt0s) 65 | y2 = map(e -> e[2], xt0s) 66 | 67 | Plots.scatter(x1, y1; color="red", clim=(0, 1)) 68 | return Plots.scatter!(x2, y2; color="blue", clim=(0, 1)) 69 | end 70 | 71 | plot_data() 72 | ``` 73 | 74 | ## Building a Neural Network 75 | 76 | The next step is to define a [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network) where we express our parameters as distributions, and not single points as with traditional neural networks. 77 | For this we will use `Dense` to define liner layers and compose them via `Chain`, both are neural network primitives from Lux. 78 | The network `nn_initial` we created has two hidden layers with `tanh` activations and one output layer with sigmoid (`σ`) activation, as shown below. 79 | 80 | ```{dot} 81 | //| echo: false 82 | graph G { 83 | rankdir=LR; 84 | nodesep=0.8; 85 | ranksep=0.8; 86 | node [shape=circle, fixedsize=true, width=0.8, style="filled", color=black, fillcolor="white", fontsize=12]; 87 | 88 | // Input layer 89 | subgraph cluster_input { 90 | node [label=""]; 91 | input1; 92 | input2; 93 | style="rounded" 94 | } 95 | 96 | // Hidden layers 97 | subgraph cluster_hidden1 { 98 | node [label=""]; 99 | hidden11; 100 | hidden12; 101 | hidden13; 102 | style="rounded" 103 | } 104 | 105 | subgraph cluster_hidden2 { 106 | node [label=""]; 107 | hidden21; 108 | hidden22; 109 | style="rounded" 110 | } 111 | 112 | // Output layer 113 | subgraph cluster_output { 114 | output1 [label=""]; 115 | style="rounded" 116 | } 117 | 118 | // Connections from input to hidden layer 1 119 | input1 -- hidden11; 120 | input1 -- hidden12; 121 | input1 -- hidden13; 122 | input2 -- hidden11; 123 | input2 -- hidden12; 124 | input2 -- hidden13; 125 | 126 | // Connections from hidden layer 1 to hidden layer 2 127 | hidden11 -- hidden21; 128 | hidden11 -- hidden22; 129 | hidden12 -- hidden21; 130 | hidden12 -- hidden22; 131 | hidden13 -- hidden21; 132 | hidden13 -- hidden22; 133 | 134 | // Connections from hidden layer 2 to output 135 | hidden21 -- output1; 136 | hidden22 -- output1; 137 | 138 | // Labels 139 | labelloc="b"; 140 | fontsize=17; 141 | label="Input layer Hidden layers Output layer"; 142 | } 143 | ``` 144 | 145 | The `nn_initial` is an instance that acts as a function and can take data as inputs and output predictions. 146 | We will define distributions on the neural network parameters. 147 | 148 | ```{julia} 149 | # Construct a neural network using Lux 150 | nn_initial = Chain(Dense(2 => 3, tanh), Dense(3 => 2, tanh), Dense(2 => 1, σ)) 151 | 152 | # Initialize the model weights and state 153 | ps, st = Lux.setup(rng, nn_initial) 154 | 155 | Lux.parameterlength(nn_initial) # number of parameters in NN 156 | ``` 157 | 158 | The probabilistic model specification below creates a `parameters` variable, which has IID normal variables. The `parameters` vector represents all parameters of our neural net (weights and biases). 159 | 160 | ```{julia} 161 | # Create a regularization term and a Gaussian prior variance term. 162 | alpha = 0.09 163 | sigma = sqrt(1.0 / alpha) 164 | ``` 165 | 166 | We also define a function to construct a named tuple from a vector of sampled parameters. 167 | (We could use [`ComponentArrays`](https://github.com/jonniedie/ComponentArrays.jl) here and broadcast to avoid doing this, but this way avoids introducing an extra dependency.) 168 | 169 | ```{julia} 170 | function vector_to_parameters(ps_new::AbstractVector, ps::NamedTuple) 171 | @assert length(ps_new) == Lux.parameterlength(ps) 172 | i = 1 173 | function get_ps(x) 174 | z = reshape(view(ps_new, i:(i + length(x) - 1)), size(x)) 175 | i += length(x) 176 | return z 177 | end 178 | return fmap(get_ps, ps) 179 | end 180 | ``` 181 | 182 | To interface with external libraries it is often desirable to use the [`StatefulLuxLayer`](https://lux.csail.mit.edu/stable/api/Lux/utilities#Lux.StatefulLuxLayer) to automatically handle the neural network states. 183 | 184 | ```{julia} 185 | const nn = StatefulLuxLayer{true}(nn_initial, nothing, st) 186 | 187 | # Specify the probabilistic model. 188 | @model function bayes_nn(xs, ts; sigma = sigma, ps = ps, nn = nn) 189 | # Sample the parameters 190 | nparameters = Lux.parameterlength(nn_initial) 191 | parameters ~ MvNormal(zeros(nparameters), Diagonal(abs2.(sigma .* ones(nparameters)))) 192 | 193 | # Forward NN to make predictions 194 | preds = Lux.apply(nn, xs, f32(vector_to_parameters(parameters, ps))) 195 | 196 | # Observe each prediction. 197 | for i in eachindex(ts) 198 | ts[i] ~ Bernoulli(preds[i]) 199 | end 200 | end 201 | ``` 202 | 203 | Inference can now be performed by calling `sample`. We use the `NUTS` Hamiltonian Monte Carlo sampler here. 204 | 205 | ```{julia} 206 | #| output: false 207 | setprogress!(false) 208 | ``` 209 | 210 | ```{julia} 211 | # Perform inference. 212 | n_iters = 2_000 213 | ch = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoMooncake(; config=nothing)), n_iters); 214 | ``` 215 | 216 | Now we extract the parameter samples from the sampled chain as `θ` (this is of size `5000 x 20` where `5000` is the number of iterations and `20` is the number of parameters). 217 | We'll use these primarily to determine how good our model's classifier is. 218 | 219 | ```{julia} 220 | # Extract all weight and bias parameters. 221 | θ = MCMCChains.group(ch, :parameters).value; 222 | ``` 223 | 224 | ## Prediction Visualization 225 | 226 | We can use [MAP estimation](https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation) to classify our population by using the set of weights that provided the highest log posterior. 227 | 228 | ```{julia} 229 | # A helper to run the nn through data `x` using parameters `θ` 230 | nn_forward(x, θ) = nn(x, vector_to_parameters(θ, ps)) 231 | 232 | # Plot the data we have. 233 | fig = plot_data() 234 | 235 | # Find the index that provided the highest log posterior in the chain. 236 | _, i = findmax(ch[:lp]) 237 | 238 | # Extract the max row value from i. 239 | i = i.I[1] 240 | 241 | # Plot the posterior distribution with a contour plot 242 | x1_range = collect(range(-6; stop=6, length=25)) 243 | x2_range = collect(range(-6; stop=6, length=25)) 244 | Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range] 245 | contour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright) 246 | fig 247 | ``` 248 | 249 | The contour plot above shows that the MAP method is not too bad at classifying our data. 250 | 251 | Now we can visualize our predictions. 252 | 253 | $$ 254 | p(\tilde{x} | X, \alpha) = \int_{\theta} p(\tilde{x} | \theta) p(\theta | X, \alpha) \approx \sum_{\theta \sim p(\theta | X, \alpha)}f_{\theta}(\tilde{x}) 255 | $$ 256 | 257 | The `nn_predict` function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain. 258 | 259 | ```{julia} 260 | # Return the average predicted value across 261 | # multiple weights. 262 | function nn_predict(x, θ, num) 263 | num = min(num, size(θ, 1)) # make sure num does not exceed the number of samples 264 | return mean([first(nn_forward(x, view(θ, i, :))) for i in 1:10:num]) 265 | end 266 | ``` 267 | 268 | Next, we use the `nn_predict` function to predict the value at a sample of points where the `x1` and `x2` coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data, and more importantly, we can also see where the neural network is uncertain about its predictions much easier---those regions between cluster boundaries. 269 | 270 | ```{julia} 271 | # Plot the average prediction. 272 | fig = plot_data() 273 | 274 | n_end = 1500 275 | x1_range = collect(range(-6; stop=6, length=25)) 276 | x2_range = collect(range(-6; stop=6, length=25)) 277 | Z = [nn_predict([x1, x2], θ, n_end)[1] for x1 in x1_range, x2 in x2_range] 278 | contour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright) 279 | fig 280 | ``` 281 | 282 | Suppose we are interested in how the predictive power of our Bayesian neural network evolved between samples. In that case, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000. 283 | 284 | ```{julia} 285 | # Number of iterations to plot. 286 | n_end = 500 287 | 288 | anim = @gif for i in 1:n_end 289 | plot_data() 290 | Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range] 291 | contour!(x1_range, x2_range, Z; title="Iteration $i", clim=(0, 1)) 292 | end every 5 293 | ``` 294 | 295 | This has been an introduction to the applications of Turing and Lux in defining Bayesian neural networks. 296 | -------------------------------------------------------------------------------- /tutorials/bayesian-poisson-regression/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bayesian Poisson Regression 3 | engine: julia 4 | aliases: 5 | - ../07-poisson-regression/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | This notebook is ported from the [example notebook](https://www.pymc.io/projects/examples/en/latest/generalized_linear_models/GLM-poisson-regression.html) of PyMC3 on Poisson Regression. 16 | 17 | [Poisson Regression](https://en.wikipedia.org/wiki/Poisson_regression) is a technique commonly used to model count data. 18 | Some of the applications include predicting the number of people defaulting on their loans or the number of cars running on a highway on a given day. 19 | This example describes a method to implement the Bayesian version of this technique using Turing. 20 | 21 | We will generate the dataset that we will be working on which describes the relationship between number of times a person sneezes during the day with his alcohol consumption and medicinal intake. 22 | 23 | We start by importing the required libraries. 24 | 25 | ```{julia} 26 | #Import Turing, Distributions and DataFrames 27 | using Turing, Distributions, DataFrames, Distributed 28 | 29 | # Import MCMCChain, Plots, and StatsPlots for visualizations and diagnostics. 30 | using MCMCChains, Plots, StatsPlots 31 | 32 | # Set a seed for reproducibility. 33 | using Random 34 | Random.seed!(12); 35 | ``` 36 | 37 | # Generating data 38 | 39 | We start off by creating a toy dataset. We take the case of a person who takes medicine to prevent excessive sneezing. Alcohol consumption increases the rate of sneezing for that person. Thus, the two factors affecting the number of sneezes in a given day are alcohol consumption and whether the person has taken his medicine. Both these variable are taken as boolean valued while the number of sneezes will be a count valued variable. We also take into consideration that the interaction between the two boolean variables will affect the number of sneezes 40 | 41 | 5 random rows are printed from the generated data to get a gist of the data generated. 42 | 43 | ```{julia} 44 | theta_noalcohol_meds = 1 # no alcohol, took medicine 45 | theta_alcohol_meds = 3 # alcohol, took medicine 46 | theta_noalcohol_nomeds = 6 # no alcohol, no medicine 47 | theta_alcohol_nomeds = 36 # alcohol, no medicine 48 | 49 | # no of samples for each of the above cases 50 | q = 100 51 | 52 | #Generate data from different Poisson distributions 53 | noalcohol_meds = Poisson(theta_noalcohol_meds) 54 | alcohol_meds = Poisson(theta_alcohol_meds) 55 | noalcohol_nomeds = Poisson(theta_noalcohol_nomeds) 56 | alcohol_nomeds = Poisson(theta_alcohol_nomeds) 57 | 58 | nsneeze_data = vcat( 59 | rand(noalcohol_meds, q), 60 | rand(alcohol_meds, q), 61 | rand(noalcohol_nomeds, q), 62 | rand(alcohol_nomeds, q), 63 | ) 64 | alcohol_data = vcat(zeros(q), ones(q), zeros(q), ones(q)) 65 | meds_data = vcat(zeros(q), zeros(q), ones(q), ones(q)) 66 | 67 | df = DataFrame(; 68 | nsneeze=nsneeze_data, 69 | alcohol_taken=alcohol_data, 70 | nomeds_taken=meds_data, 71 | product_alcohol_meds=meds_data .* alcohol_data, 72 | ) 73 | df[sample(1:nrow(df), 5; replace=false), :] 74 | ``` 75 | 76 | # Visualisation of the dataset 77 | 78 | We plot the distribution of the number of sneezes for the 4 different cases taken above. As expected, the person sneezes the most when he has taken alcohol and not taken his medicine. He sneezes the least when he doesn't consume alcohol and takes his medicine. 79 | 80 | ```{julia} 81 | # Data Plotting 82 | 83 | p1 = Plots.histogram( 84 | df[(df[:, :alcohol_taken] .== 0) .& (df[:, :nomeds_taken] .== 0), 1]; 85 | title="no_alcohol+meds", 86 | ) 87 | p2 = Plots.histogram( 88 | (df[(df[:, :alcohol_taken] .== 1) .& (df[:, :nomeds_taken] .== 0), 1]); 89 | title="alcohol+meds", 90 | ) 91 | p3 = Plots.histogram( 92 | (df[(df[:, :alcohol_taken] .== 0) .& (df[:, :nomeds_taken] .== 1), 1]); 93 | title="no_alcohol+no_meds", 94 | ) 95 | p4 = Plots.histogram( 96 | (df[(df[:, :alcohol_taken] .== 1) .& (df[:, :nomeds_taken] .== 1), 1]); 97 | title="alcohol+no_meds", 98 | ) 99 | plot(p1, p2, p3, p4; layout=(2, 2), legend=false) 100 | ``` 101 | 102 | We must convert our `DataFrame` data into the `Matrix` form as the manipulations that we are about are designed to work with `Matrix` data. We also separate the features from the labels which will be later used by the Turing sampler to generate samples from the posterior. 103 | 104 | ```{julia} 105 | # Convert the DataFrame object to matrices. 106 | data = Matrix(df[:, [:alcohol_taken, :nomeds_taken, :product_alcohol_meds]]) 107 | data_labels = df[:, :nsneeze] 108 | data 109 | ``` 110 | 111 | We must recenter our data about 0 to help the Turing sampler in initialising the parameter estimates. So, normalising the data in each column by subtracting the mean and dividing by the standard deviation: 112 | 113 | ```{julia} 114 | # Rescale our matrices. 115 | data = (data .- mean(data; dims=1)) ./ std(data; dims=1) 116 | ``` 117 | 118 | # Declaring the Model: Poisson Regression 119 | 120 | Our model, `poisson_regression` takes four arguments: 121 | 122 | - `x` is our set of independent variables; 123 | - `y` is the element we want to predict; 124 | - `n` is the number of observations we have; and 125 | - `σ²` is the standard deviation we want to assume for our priors. 126 | 127 | Within the model, we create four coefficients (`b0`, `b1`, `b2`, and `b3`) and assign a prior of normally distributed with means of zero and standard deviations of `σ²`. We want to find values of these four coefficients to predict any given `y`. 128 | 129 | Intuitively, we can think of the coefficients as: 130 | 131 | - `b1` is the coefficient which represents the effect of taking alcohol on the number of sneezes; 132 | - `b2` is the coefficient which represents the effect of taking in no medicines on the number of sneezes; 133 | - `b3` is the coefficient which represents the effect of interaction between taking alcohol and no medicine on the number of sneezes; 134 | 135 | The `for` block creates a variable `theta` which is the weighted combination of the input features. We have defined the priors on these weights above. We then observe the likelihood of calculating `theta` given the actual label, `y[i]`. 136 | 137 | ```{julia} 138 | # Bayesian poisson regression (LR) 139 | @model function poisson_regression(x, y, n, σ²) 140 | b0 ~ Normal(0, σ²) 141 | b1 ~ Normal(0, σ²) 142 | b2 ~ Normal(0, σ²) 143 | b3 ~ Normal(0, σ²) 144 | for i in 1:n 145 | theta = b0 + b1 * x[i, 1] + b2 * x[i, 2] + b3 * x[i, 3] 146 | y[i] ~ Poisson(exp(theta)) 147 | end 148 | end; 149 | ``` 150 | 151 | # Sampling from the posterior 152 | 153 | We use the `NUTS` sampler to sample values from the posterior. We run multiple chains using the `MCMCThreads()` function to nullify the effect of a problematic chain. We then use the Gelman, Rubin, and Brooks Diagnostic to check the convergence of these multiple chains. 154 | 155 | ```{julia} 156 | #| output: false 157 | # Retrieve the number of observations. 158 | n, _ = size(data) 159 | 160 | # Sample using NUTS. 161 | 162 | num_chains = 4 163 | m = poisson_regression(data, data_labels, n, 10) 164 | chain = sample(m, NUTS(), MCMCThreads(), 2_500, num_chains; discard_adapt=false, progress=false) 165 | ``` 166 | 167 | ```{julia} 168 | #| echo: false 169 | chain 170 | ``` 171 | 172 | ::: {.callout-warning collapse="true"} 173 | ## Sampling With Multiple Threads 174 | The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains 175 | will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.](https://turinglang.org/dev/docs/using-turing/guide/#sampling-multiple-chains) 176 | ::: 177 | 178 | # Viewing the Diagnostics 179 | 180 | We use the Gelman, Rubin, and Brooks Diagnostic to check whether our chains have converged. Note that we require multiple chains to use this diagnostic which analyses the difference between these multiple chains. 181 | 182 | We expect the chains to have converged. This is because we have taken sufficient number of iterations (1500) for the NUTS sampler. However, in case the test fails, then we will have to take a larger number of iterations, resulting in longer computation time. 183 | 184 | ```{julia} 185 | gelmandiag(chain) 186 | ``` 187 | 188 | From the above diagnostic, we can conclude that the chains have converged because the PSRF values of the coefficients are close to 1. 189 | 190 | So, we have obtained the posterior distributions of the parameters. We transform the coefficients and recover theta values by taking the exponent of the meaned values of the coefficients `b0`, `b1`, `b2` and `b3`. We take the exponent of the means to get a better comparison of the relative values of the coefficients. We then compare this with the intuitive meaning that was described earlier. 191 | 192 | ```{julia} 193 | # Taking the first chain 194 | c1 = chain[:, :, 1] 195 | 196 | # Calculating the exponentiated means 197 | b0_exp = exp(mean(c1[:b0])) 198 | b1_exp = exp(mean(c1[:b1])) 199 | b2_exp = exp(mean(c1[:b2])) 200 | b3_exp = exp(mean(c1[:b3])) 201 | 202 | print("The exponent of the meaned values of the weights (or coefficients are): \n") 203 | println("b0: ", b0_exp) 204 | println("b1: ", b1_exp) 205 | println("b2: ", b2_exp) 206 | println("b3: ", b3_exp) 207 | print("The posterior distributions obtained after sampling can be visualised as :\n") 208 | ``` 209 | 210 | Visualising the posterior by plotting it: 211 | 212 | ```{julia} 213 | plot(chain) 214 | ``` 215 | 216 | # Interpreting the Obtained Mean Values 217 | 218 | The exponentiated mean of the coefficient `b1` is roughly half of that of `b2`. This makes sense because in the data that we generated, the number of sneezes was more sensitive to the medicinal intake as compared to the alcohol consumption. We also get a weaker dependence on the interaction between the alcohol consumption and the medicinal intake as can be seen from the value of `b3`. 219 | 220 | # Removing the Warmup Samples 221 | 222 | As can be seen from the plots above, the parameters converge to their final distributions after a few iterations. 223 | The initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions. 224 | Thus, we remove these warmup values and once again view the diagnostics. 225 | To remove these warmup values, we take all values except the first 200. 226 | This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200. 227 | 228 | ```{julia} 229 | chains_new = chain[201:end, :, :] 230 | ``` 231 | 232 | ```{julia} 233 | plot(chains_new) 234 | ``` 235 | 236 | As can be seen from the numeric values and the plots above, the standard deviation values have decreased and all the plotted values are from the estimated posteriors. The exponentiated mean values, with the warmup samples removed, have not changed by much and they are still in accordance with their intuitive meanings as described earlier. 237 | -------------------------------------------------------------------------------- /tutorials/coin-flipping/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Introduction: Coin Flipping" 3 | engine: julia 4 | aliases: 5 | - ../00-introduction/index.html 6 | - ../00-introduction/ 7 | --- 8 | 9 | ```{julia} 10 | #| echo: false 11 | #| output: false 12 | using Pkg; 13 | Pkg.instantiate(); 14 | ``` 15 | 16 | This is the first of a series of guided tutorials on the Turing language. 17 | In this tutorial, we will use Bayesian inference to estimate the probability that a coin flip will result in heads, given a series of observations. 18 | 19 | ### Setup 20 | 21 | First, let us load some packages that we need to simulate a coin flip: 22 | 23 | ```{julia} 24 | using Distributions 25 | 26 | using Random 27 | Random.seed!(12); # Set seed for reproducibility 28 | ``` 29 | 30 | and to visualize our results. 31 | 32 | ```{julia} 33 | using StatsPlots 34 | ``` 35 | 36 | Note that Turing is not loaded here — we do not use it in this example. 37 | Next, we configure the data generating model. Let us set the true probability that a coin flip turns up heads 38 | 39 | ```{julia} 40 | p_true = 0.5; 41 | ``` 42 | 43 | and set the number of coin flips we will show our model. 44 | 45 | ```{julia} 46 | N = 100; 47 | ``` 48 | 49 | We simulate `N` coin flips by drawing N random samples from the Bernoulli distribution with success probability `p_true`. The draws are collected in a variable called `data`: 50 | 51 | ```{julia} 52 | data = rand(Bernoulli(p_true), N); 53 | ``` 54 | 55 | Here are the first five coin flips: 56 | 57 | ```{julia} 58 | data[1:5] 59 | ``` 60 | 61 | 62 | ### Coin Flipping Without Turing 63 | 64 | The following example illustrates the effect of updating our beliefs with every piece of new evidence we observe. 65 | 66 | Assume that we are unsure about the probability of heads in a coin flip. To get an intuitive understanding of what "updating our beliefs" is, we will visualize the probability of heads in a coin flip after each observed evidence. 67 | 68 | We begin by specifying a prior belief about the distribution of heads and tails in a coin toss. Here we choose a [Beta](https://en.wikipedia.org/wiki/Beta_distribution) distribution as prior distribution for the probability of heads. Before any coin flip is observed, we assume a uniform distribution $\operatorname{U}(0, 1) = \operatorname{Beta}(1, 1)$ of the probability of heads. I.e., every probability is equally likely initially. 69 | 70 | ```{julia} 71 | prior_belief = Beta(1, 1); 72 | ``` 73 | 74 | With our priors set and our data at hand, we can perform Bayesian inference. 75 | 76 | This is a fairly simple process. We expose one additional coin flip to our model every iteration, such that the first run only sees the first coin flip, while the last iteration sees all the coin flips. In each iteration we update our belief to an updated version of the original Beta distribution that accounts for the new proportion of heads and tails. The update is particularly simple since our prior distribution is a [conjugate prior](https://en.wikipedia.org/wiki/Conjugate_prior). Note that a closed-form expression for the posterior (implemented in the `updated_belief` expression below) is not accessible in general and usually does not exist for more interesting models. 77 | 78 | ```{julia} 79 | function updated_belief(prior_belief::Beta, data::AbstractArray{Bool}) 80 | # Count the number of heads and tails. 81 | heads = sum(data) 82 | tails = length(data) - heads 83 | 84 | # Update our prior belief in closed form (this is possible because we use a conjugate prior). 85 | return Beta(prior_belief.α + heads, prior_belief.β + tails) 86 | end 87 | 88 | # Show updated belief for increasing number of observations 89 | @gif for n in 0:N 90 | plot( 91 | updated_belief(prior_belief, data[1:n]); 92 | size=(500, 250), 93 | title="Updated belief after $n observations", 94 | xlabel="probability of heads", 95 | ylabel="", 96 | legend=nothing, 97 | xlim=(0, 1), 98 | fill=0, 99 | α=0.3, 100 | w=3, 101 | ) 102 | vline!([p_true]) 103 | end 104 | ``` 105 | 106 | The animation above shows that with increasing evidence our belief about the probability of heads in a coin flip slowly adjusts towards the true value. 107 | The orange line in the animation represents the true probability of seeing heads on a single coin flip, while the mode of the distribution shows what the model believes the probability of a heads is given the evidence it has seen. 108 | 109 | For the mathematically inclined, the $\operatorname{Beta}$ distribution is updated by adding each coin flip to the parameters $\alpha$ and $\beta$ of the distribution. 110 | Initially, the parameters are defined as $\alpha = 1$ and $\beta = 1$. 111 | Over time, with more and more coin flips, $\alpha$ and $\beta$ will be approximately equal to each other as we are equally likely to flip a heads or a tails. 112 | 113 | The mean of the $\operatorname{Beta}(\alpha, \beta)$ distribution is 114 | 115 | $$\operatorname{E}[X] = \dfrac{\alpha}{\alpha+\beta}.$$ 116 | 117 | This implies that the plot of the distribution will become centered around 0.5 for a large enough number of coin flips, as we expect $\alpha \approx \beta$. 118 | 119 | The variance of the $\operatorname{Beta}(\alpha, \beta)$ distribution is 120 | 121 | $$\operatorname{var}[X] = \dfrac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}.$$ 122 | 123 | Thus the variance of the distribution will approach 0 with more and more samples, as the denominator will grow faster than will the numerator. 124 | More samples means less variance. 125 | This implies that the distribution will reflect less uncertainty about the probability of receiving a heads and the plot will become more tightly centered around 0.5 for a large enough number of coin flips. 126 | 127 | ### Coin Flipping With Turing 128 | 129 | We now move away from the closed-form expression above. 130 | We use **Turing** to specify the same model and to approximate the posterior distribution with samples. 131 | To do so, we first need to load `Turing`. 132 | 133 | ```{julia} 134 | using Turing 135 | ``` 136 | 137 | Additionally, we load `MCMCChains`, a library for analyzing and visualizing the samples with which we approximate the posterior distribution. 138 | 139 | ```{julia} 140 | using MCMCChains 141 | ``` 142 | 143 | First, we define the coin-flip model using Turing. 144 | 145 | ```{julia} 146 | # Unconditioned coinflip model with `N` observations. 147 | @model function coinflip(; N::Int) 148 | # Our prior belief about the probability of heads in a coin toss. 149 | p ~ Beta(1, 1) 150 | 151 | # Heads or tails of a coin are drawn from `N` independent and identically 152 | # distributed Bernoulli distributions with success rate `p`. 153 | y ~ filldist(Bernoulli(p), N) 154 | 155 | return y 156 | end; 157 | ``` 158 | 159 | In the Turing model the prior distribution of the variable `p`, the probability of heads in a coin toss, and the distribution of the observations `y` are specified on the right-hand side of the `~` expressions. 160 | The `@model` macro modifies the body of the Julia function `coinflip` and, e.g., replaces the `~` statements with internal function calls that are used for sampling. 161 | 162 | Here we defined a model that is not conditioned on any specific observations as this allows us to easily obtain samples of both `p` and `y` with 163 | 164 | ```{julia} 165 | rand(coinflip(; N)) 166 | ``` 167 | 168 | The model can be conditioned on some observations with `|`. 169 | See the [documentation of the `condition` syntax](https://turinglang.github.io/DynamicPPL.jl/stable/api/#Condition-and-decondition) in `DynamicPPL.jl` for more details. 170 | In the conditioned `model` the observations `y` are fixed to `data`. 171 | 172 | ```{julia} 173 | coinflip(y::AbstractVector{<:Real}) = coinflip(; N=length(y)) | (; y) 174 | 175 | model = coinflip(data); 176 | ``` 177 | 178 | After defining the model, we can approximate the posterior distribution by drawing samples from the distribution. 179 | In this example, we use a [Hamiltonian Monte Carlo](https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo) sampler to draw these samples. 180 | Other tutorials give more information on the samplers available in Turing and discuss their use for different models. 181 | 182 | ```{julia} 183 | sampler = NUTS(); 184 | ``` 185 | 186 | We approximate the posterior distribution with 1000 samples: 187 | 188 | ```{julia} 189 | chain = sample(model, sampler, 2_000, progress=false); 190 | ``` 191 | 192 | The `sample` function and common keyword arguments are explained more extensively in the documentation of [AbstractMCMC.jl](https://turinglang.github.io/AbstractMCMC.jl/dev/api/). 193 | 194 | After finishing the sampling process, we can visually compare the closed-form posterior distribution with the approximation obtained with Turing. 195 | 196 | ```{julia} 197 | histogram(chain) 198 | ``` 199 | 200 | Now we can build our plot: 201 | 202 | ```{julia} 203 | #| echo: false 204 | @assert isapprox(mean(chain, :p), 0.5; atol=0.1) "Estimated mean of parameter p: $(mean(chain, :p)) - not in [0.4, 0.6]!" 205 | ``` 206 | 207 | ```{julia} 208 | # Visualize a blue density plot of the approximate posterior distribution using HMC (see Chain 1 in the legend). 209 | density(chain; xlim=(0, 1), legend=:best, w=2, c=:blue) 210 | 211 | # Visualize a green density plot of the posterior distribution in closed-form. 212 | plot!( 213 | 0:0.01:1, 214 | pdf.(updated_belief(prior_belief, data), 0:0.01:1); 215 | xlabel="probability of heads", 216 | ylabel="", 217 | title="", 218 | xlim=(0, 1), 219 | label="Closed-form", 220 | fill=0, 221 | α=0.3, 222 | w=3, 223 | c=:lightgreen, 224 | ) 225 | 226 | # Visualize the true probability of heads in red. 227 | vline!([p_true]; label="True probability", c=:red) 228 | ``` 229 | 230 | As we can see, the samples obtained with Turing closely approximate the true posterior distribution. 231 | Hopefully this tutorial has provided an easy-to-follow, yet informative introduction to Turing's simpler applications. 232 | More advanced usage is demonstrated in other tutorials. 233 | -------------------------------------------------------------------------------- /tutorials/gaussian-process-latent-variable-models/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Gaussian Process Latent Variable Models 3 | engine: julia 4 | aliases: 5 | - ../12-gplvm/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | In a previous tutorial, we have discussed latent variable models, in particular probabilistic principal component analysis (pPCA). 16 | Here, we show how we can extend the mapping provided by pPCA to non-linear mappings between input and output. 17 | For more details about the Gaussian Process Latent Variable Model (GPLVM), 18 | we refer the reader to the [original publication](https://jmlr.org/papers/v6/lawrence05a.html) and a [further extension](http://proceedings.mlr.press/v9/titsias10a/titsias10a.pdf). 19 | 20 | In short, the GPVLM is a dimensionality reduction technique that allows us to embed a high-dimensional dataset in a lower-dimensional embedding. 21 | Importantly, it provides the advantage that the linear mappings from the embedded space can be non-linearised through the use of Gaussian Processes. 22 | 23 | ### Let's start by loading some dependencies. 24 | 25 | ```{julia} 26 | #| eval: false 27 | using Turing 28 | using AbstractGPs 29 | using FillArrays 30 | using LaTeXStrings 31 | using Plots 32 | using RDatasets 33 | using ReverseDiff 34 | using StatsBase 35 | 36 | using LinearAlgebra 37 | using Random 38 | 39 | Random.seed!(1789); 40 | ``` 41 | 42 | We demonstrate the GPLVM with a very small dataset: [Fisher's Iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set). 43 | This is mostly for reasons of run time, so the tutorial can be run quickly. 44 | As you will see, one of the major drawbacks of using GPs is their speed, 45 | although this is an active area of research. 46 | We will briefly touch on some ways to speed things up at the end of this tutorial. 47 | We transform the original data with non-linear operations in order to demonstrate the power of GPs to work on non-linear relationships, while keeping the problem reasonably small. 48 | 49 | ```{julia} 50 | #| eval: false 51 | data = dataset("datasets", "iris") 52 | species = data[!, "Species"] 53 | index = shuffle(1:150) 54 | # we extract the four measured quantities, 55 | # so the dimension of the data is only d=4 for this toy example 56 | dat = Matrix(data[index, 1:4]) 57 | labels = data[index, "Species"] 58 | 59 | # non-linearize data to demonstrate ability of GPs to deal with non-linearity 60 | dat[:, 1] = 0.5 * dat[:, 1] .^ 2 + 0.1 * dat[:, 1] .^ 3 61 | dat[:, 2] = dat[:, 2] .^ 3 + 0.2 * dat[:, 2] .^ 4 62 | dat[:, 3] = 0.1 * exp.(dat[:, 3]) - 0.2 * dat[:, 3] .^ 2 63 | dat[:, 4] = 0.5 * log.(dat[:, 4]) .^ 2 + 0.01 * dat[:, 3] .^ 5 64 | 65 | # normalize data 66 | dt = fit(ZScoreTransform, dat; dims=1); 67 | StatsBase.transform!(dt, dat); 68 | ``` 69 | 70 | We will start out by demonstrating the basic similarity between pPCA (see the tutorial on this topic) and the GPLVM model. 71 | Indeed, pPCA is basically equivalent to running the GPLVM model with an automatic relevance determination (ARD) linear kernel. 72 | 73 | First, we re-introduce the pPCA model (see the tutorial on pPCA for details) 74 | 75 | ```{julia} 76 | #| eval: false 77 | @model function pPCA(x) 78 | # Dimensionality of the problem. 79 | N, D = size(x) 80 | # latent variable z 81 | z ~ filldist(Normal(), D, N) 82 | # weights/loadings W 83 | w ~ filldist(Normal(), D, D) 84 | mu = (w * z)' 85 | for d in 1:D 86 | x[:, d] ~ MvNormal(mu[:, d], I) 87 | end 88 | return nothing 89 | end; 90 | ``` 91 | 92 | We define two different kernels, a simple linear kernel with an Automatic Relevance Determination transform and a 93 | squared exponential kernel. 94 | 95 | 96 | ```{julia} 97 | #| eval: false 98 | linear_kernel(α) = LinearKernel() ∘ ARDTransform(α) 99 | sekernel(α, σ) = σ * SqExponentialKernel() ∘ ARDTransform(α); 100 | ``` 101 | 102 | And here is the GPLVM model. 103 | We create separate models for the two types of kernel. 104 | 105 | ```{julia} 106 | #| eval: false 107 | @model function GPLVM_linear(Y, K) 108 | # Dimensionality of the problem. 109 | N, D = size(Y) 110 | # K is the dimension of the latent space 111 | @assert K <= D 112 | noise = 1e-3 113 | 114 | # Priors 115 | α ~ MvLogNormal(MvNormal(Zeros(K), I)) 116 | Z ~ filldist(Normal(), K, N) 117 | mu ~ filldist(Normal(), N) 118 | 119 | gp = GP(linear_kernel(α)) 120 | gpz = gp(ColVecs(Z), noise) 121 | Y ~ filldist(MvNormal(mu, cov(gpz)), D) 122 | 123 | return nothing 124 | end; 125 | 126 | @model function GPLVM(Y, K) 127 | # Dimensionality of the problem. 128 | N, D = size(Y) 129 | # K is the dimension of the latent space 130 | @assert K <= D 131 | noise = 1e-3 132 | 133 | # Priors 134 | α ~ MvLogNormal(MvNormal(Zeros(K), I)) 135 | σ ~ LogNormal(0.0, 1.0) 136 | Z ~ filldist(Normal(), K, N) 137 | mu ~ filldist(Normal(), N) 138 | 139 | gp = GP(sekernel(α, σ)) 140 | gpz = gp(ColVecs(Z), noise) 141 | Y ~ filldist(MvNormal(mu, cov(gpz)), D) 142 | 143 | return nothing 144 | end; 145 | ``` 146 | 147 | ```{julia} 148 | #| eval: false 149 | # Standard GPs don't scale very well in n, so we use a small subsample for the purpose of this tutorial 150 | n_data = 40 151 | # number of features to use from dataset 152 | n_features = 4 153 | # latent dimension for GP case 154 | ndim = 4; 155 | ``` 156 | 157 | ```{julia} 158 | #| eval: false 159 | ppca = pPCA(dat[1:n_data, 1:n_features]) 160 | chain_ppca = sample(ppca, NUTS{Turing.ReverseDiffAD{true}}(), 1000); 161 | ``` 162 | 163 | ```{julia} 164 | #| eval: false 165 | # we extract the posterior mean estimates of the parameters from the chain 166 | z_mean = reshape(mean(group(chain_ppca, :z))[:, 2], (n_features, n_data)) 167 | scatter(z_mean[1, :], z_mean[2, :]; group=labels[1:n_data], xlabel=L"z_1", ylabel=L"z_2") 168 | ``` 169 | 170 | We can see that the pPCA fails to distinguish the groups. 171 | In particular, the `setosa` species is not clearly separated from `versicolor` and `virginica`. 172 | This is due to the non-linearities that we introduced, as without them the two groups can be clearly distinguished 173 | using pPCA (see the pPCA tutorial). 174 | 175 | Let's try the same with our linear kernel GPLVM model. 176 | 177 | ```{julia} 178 | #| eval: false 179 | gplvm_linear = GPLVM_linear(dat[1:n_data, 1:n_features], ndim) 180 | chain_linear = sample(gplvm_linear, NUTS{Turing.ReverseDiffAD{true}}(), 500); 181 | ``` 182 | 183 | ```{julia} 184 | #| eval: false 185 | # we extract the posterior mean estimates of the parameters from the chain 186 | z_mean = reshape(mean(group(chain_linear, :Z))[:, 2], (n_features, n_data)) 187 | alpha_mean = mean(group(chain_linear, :α))[:, 2] 188 | 189 | alpha1, alpha2 = partialsortperm(alpha_mean, 1:2; rev=true) 190 | scatter( 191 | z_mean[alpha1, :], 192 | z_mean[alpha2, :]; 193 | group=labels[1:n_data], 194 | xlabel=L"z_{\mathrm{ard}_1}", 195 | ylabel=L"z_{\mathrm{ard}_2}", 196 | ) 197 | ``` 198 | 199 | We can see that similar to the pPCA case, the linear kernel GPLVM fails to distinguish between the two groups 200 | (`setosa` on the one hand, and `virginica` and `verticolor` on the other). 201 | 202 | Finally, we demonstrate that by changing the kernel to a non-linear function, we are able to separate the data again. 203 | 204 | ```{julia} 205 | #| eval: false 206 | gplvm = GPLVM(dat[1:n_data, 1:n_features], ndim) 207 | chain_gplvm = sample(gplvm, NUTS{Turing.ReverseDiffAD{true}}(), 500); 208 | ``` 209 | 210 | ```{julia} 211 | #| eval: false 212 | # we extract the posterior mean estimates of the parameters from the chain 213 | z_mean = reshape(mean(group(chain_gplvm, :Z))[:, 2], (ndim, n_data)) 214 | alpha_mean = mean(group(chain_gplvm, :α))[:, 2] 215 | 216 | alpha1, alpha2 = partialsortperm(alpha_mean, 1:2; rev=true) 217 | scatter( 218 | z_mean[alpha1, :], 219 | z_mean[alpha2, :]; 220 | group=labels[1:n_data], 221 | xlabel=L"z_{\mathrm{ard}_1}", 222 | ylabel=L"z_{\mathrm{ard}_2}", 223 | ) 224 | ``` 225 | 226 | ```{julia} 227 | #| eval: false 228 | let 229 | @assert abs( 230 | mean(z_mean[alpha1, labels[1:n_data] .== "setosa"]) - 231 | mean(z_mean[alpha1, labels[1:n_data] .!= "setosa"]), 232 | ) > 1 233 | end 234 | ``` 235 | 236 | Now, the split between the two groups is visible again. 237 | -------------------------------------------------------------------------------- /tutorials/gaussian-processes-introduction/golf.dat: -------------------------------------------------------------------------------- 1 | distance n y 2 | 2 1443 1346 3 | 3 694 577 4 | 4 455 337 5 | 5 353 208 6 | 6 272 149 7 | 7 256 136 8 | 8 240 111 9 | 9 217 69 10 | 10 200 67 11 | 11 237 75 12 | 12 202 52 13 | 13 192 46 14 | 14 174 54 15 | 15 167 28 16 | 16 201 27 17 | 17 195 31 18 | 18 191 33 19 | 19 147 20 20 | 20 152 24 21 | -------------------------------------------------------------------------------- /tutorials/gaussian-processes-introduction/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Gaussian Processes: Introduction" 3 | engine: julia 4 | aliases: 5 | - ../15-gaussian-processes/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | [JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) packages integrate well with Turing.jl because they implement the Distributions.jl 16 | interface. 17 | You should be able to understand what is going on in this tutorial if you know what a GP is. 18 | For a more in-depth understanding of the 19 | [JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) functionality 20 | used here, please consult the 21 | [JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) docs. 22 | 23 | In this tutorial, we will model the putting dataset discussed in Chapter 21 of 24 | [Bayesian Data Analysis](http://www.stat.columbia.edu/%7Egelman/book/). 25 | The dataset comprises the result of measuring how often a golfer successfully gets the ball 26 | in the hole, depending on how far away from it they are. 27 | The goal of inference is to estimate the probability of any given shot being successful at a 28 | given distance. 29 | 30 | ### Let's download the data and take a look at it: 31 | 32 | ```{julia} 33 | using CSV, DataFrames 34 | 35 | df = CSV.read("golf.dat", DataFrame; delim=' ', ignorerepeated=true) 36 | df[1:5, :] 37 | ``` 38 | 39 | We've printed the first 5 rows of the dataset (which comprises only 19 rows in total). 40 | Observe it has three columns: 41 | 42 | 1. `distance` -- how far away from the hole. I'll refer to `distance` as `d` throughout the rest of this tutorial 43 | 2. `n` -- how many shots were taken from a given distance 44 | 3. `y` -- how many shots were successful from a given distance 45 | 46 | We will use a Binomial model for the data, whose success probability is parametrised by a 47 | transformation of a GP. Something along the lines of: 48 | $$ 49 | \begin{aligned} 50 | f & \sim \operatorname{GP}(0, k) \\ 51 | y_j \mid f(d_j) & \sim \operatorname{Binomial}(n_j, g(f(d_j))) \\ 52 | g(x) & := \frac{1}{1 + e^{-x}} 53 | \end{aligned} 54 | $$ 55 | 56 | To do this, let's define our Turing.jl model: 57 | 58 | ```{julia} 59 | using AbstractGPs, LogExpFunctions, Turing 60 | 61 | @model function putting_model(d, n; jitter=1e-4) 62 | v ~ Gamma(2, 1) 63 | l ~ Gamma(4, 1) 64 | f = GP(v * with_lengthscale(SEKernel(), l)) 65 | f_latent ~ f(d, jitter) 66 | y ~ product_distribution(Binomial.(n, logistic.(f_latent))) 67 | return (fx=f(d, jitter), f_latent=f_latent, y=y) 68 | end 69 | ``` 70 | 71 | We first define an `AbstractGPs.GP`, which represents a distribution over functions, and 72 | is entirely separate from Turing.jl. 73 | We place a prior over its variance `v` and length-scale `l`. 74 | `f(d, jitter)` constructs the multivariate Gaussian comprising the random variables 75 | in `f` whose indices are in `d` (plus a bit of independent Gaussian noise with variance 76 | `jitter` -- see [the docs](https://juliagaussianprocesses.github.io/AbstractGPs.jl/dev/api/#FiniteGP-and-AbstractGP) 77 | for more details). 78 | `f(d, jitter)` has the type `AbstractMvNormal`, and is the bit of AbstractGPs.jl that implements the 79 | Distributions.jl interface, so it's legal to put it on the right-hand side 80 | of a `~`. 81 | From this you should deduce that `f_latent` is distributed according to a multivariate 82 | Gaussian. 83 | The remaining lines comprise standard Turing.jl code that is encountered in other tutorials 84 | and Turing documentation. 85 | 86 | Before performing inference, we might want to inspect the prior that our model places over 87 | the data, to see whether there is anything obviously wrong. 88 | These kinds of prior predictive checks are straightforward to perform using Turing.jl, since 89 | it is possible to sample from the prior easily by just calling the model: 90 | 91 | ```{julia} 92 | m = putting_model(Float64.(df.distance), df.n) 93 | m().y 94 | ``` 95 | 96 | We make use of this to see what kinds of datasets we simulate from the prior: 97 | 98 | ```{julia} 99 | using Plots 100 | 101 | function plot_data(d, n, y, xticks, yticks) 102 | ylims = (0, round(maximum(n), RoundUp; sigdigits=2)) 103 | margin = -0.5 * Plots.mm 104 | plt = plot(; xticks=xticks, yticks=yticks, ylims=ylims, margin=margin, grid=false) 105 | bar!(plt, d, n; color=:red, label="", alpha=0.5) 106 | bar!(plt, d, y; label="", color=:blue, alpha=0.7) 107 | return plt 108 | end 109 | 110 | # Construct model and run some prior predictive checks. 111 | m = putting_model(Float64.(df.distance), df.n) 112 | hists = map(1:20) do j 113 | xticks = j > 15 ? :auto : nothing 114 | yticks = rem(j, 5) == 1 ? :auto : nothing 115 | return plot_data(df.distance, df.n, m().y, xticks, yticks) 116 | end 117 | plot(hists...; layout=(4, 5)) 118 | ``` 119 | 120 | In this case, the only prior knowledge I have is that the proportion of successful shots 121 | ought to decrease monotonically as the distance from the hole increases, which should show 122 | up in the data as the blue lines generally go down as we move from left to right on each 123 | graph. 124 | Unfortunately, there is not a simple way to enforce monotonicity in the samples from a GP, 125 | and we can see this in some of the plots above, so we must hope that we have enough data to 126 | ensure that this relationship holds approximately under the posterior. 127 | In any case, you can judge for yourself whether you think this is the most useful 128 | visualisation that we can perform -- if you think there is something better to look at, 129 | please let us know! 130 | 131 | Moving on, we generate samples from the posterior using the default `NUTS` sampler. 132 | We'll make use of [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl), as it has 133 | better performance than [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl/) on 134 | this example. See Turing.jl's docs on Automatic Differentiation for more info. 135 | 136 | 137 | ```{julia} 138 | using Random, ReverseDiff 139 | 140 | m_post = m | (y=df.y,) 141 | chn = sample(Xoshiro(123456), m_post, NUTS(; adtype=AutoReverseDiff()), 1_000, progress=false) 142 | ``` 143 | 144 | We can use these samples and the `posterior` function from `AbstractGPs` to sample from the 145 | posterior probability of success at any distance we choose: 146 | 147 | ```{julia} 148 | d_pred = 1:0.2:21 149 | samples = map(returned(m_post, chn)[1:10:end]) do x 150 | return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4))) 151 | end 152 | p = plot() 153 | plot!(d_pred, reduce(hcat, samples); label="", color=:blue, alpha=0.2) 154 | scatter!(df.distance, df.y ./ df.n; label="", color=:red) 155 | ``` 156 | 157 | We can see that the general trend is indeed down as the distance from the hole increases, 158 | and that if we move away from the data, the posterior uncertainty quickly inflates. 159 | This suggests that the model is probably going to do a reasonable job of interpolating 160 | between observed data, but less good a job at extrapolating to larger distances. 161 | -------------------------------------------------------------------------------- /tutorials/hidden-markov-models/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hidden Markov Models 3 | engine: julia 4 | aliases: 5 | - ../04-hidden-markov-model/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | This tutorial illustrates training Bayesian [hidden Markov models](https://en.wikipedia.org/wiki/Hidden_Markov_model) (HMMs) using Turing. 16 | The main goals are learning the transition matrix, emission parameter, and hidden states. 17 | For a more rigorous academic overview of hidden Markov models, see [An Introduction to Hidden Markov Models and Bayesian Networks](https://mlg.eng.cam.ac.uk/zoubin/papers/ijprai.pdf) (Ghahramani, 2001). 18 | 19 | In this tutorial, we assume there are $k$ discrete hidden states; the observations are continuous and normally distributed - centered around the hidden states. This assumption reduces the number of parameters to be estimated in the emission matrix. 20 | 21 | Let's load the libraries we'll need, and set a random seed for reproducibility. 22 | 23 | ```{julia} 24 | # Load libraries. 25 | using Turing, StatsPlots, Random, Bijectors 26 | 27 | # Set a random seed 28 | Random.seed!(12345678); 29 | ``` 30 | 31 | ## Simple State Detection 32 | 33 | In this example, we'll use something where the states and emission parameters are straightforward. 34 | 35 | ```{julia} 36 | # Define the emission parameter. 37 | y = [fill(1.0, 6)..., fill(2.0, 6)..., fill(3.0, 7)..., 38 | fill(2.0, 4)..., fill(1.0, 7)...] 39 | N = length(y); 40 | K = 3; 41 | 42 | # Plot the data we just made. 43 | plot(y; xlim=(0, 30), ylim=(-1, 5), size=(500, 250), legend = false) 44 | scatter!(y, color = :blue; xlim=(0, 30), ylim=(-1, 5), size=(500, 250), legend = false) 45 | ``` 46 | 47 | We can see that we have three states, one for each height of the plot (1, 2, 3). This height is also our emission parameter, so state one produces a value of one, state two produces a value of two, and so on. 48 | 49 | Ultimately, we would like to understand three major parameters: 50 | 51 | 1. The transition matrix. This is a matrix that assigns a probability of switching from one state to any other state, including the state that we are already in. 52 | 2. The emission parameters, which describes a typical value emitted by some state. In the plot above, the emission parameter for state one is simply one. 53 | 3. The state sequence is our understanding of what state we were actually in when we observed some data. This is very important in more sophisticated HMMs, where the emission value does not equal our state. 54 | 55 | With this in mind, let's set up our model. We are going to use some of our knowledge as modelers to provide additional information about our system. This takes the form of the prior on our emission parameter. 56 | 57 | $$ 58 | m_i \sim \mathrm{Normal}(i, 0.5) \quad \text{where} \quad m = \{1,2,3\} 59 | $$ 60 | 61 | Simply put, this says that we expect state one to emit values in a Normally distributed manner, where the mean of each state's emissions is that state's value. The variance of 0.5 helps the model converge more quickly — consider the case where we have a variance of 1 or 2. In this case, the likelihood of observing a 2 when we are in state 1 is actually quite high, as it is within a standard deviation of the true emission value. Applying the prior that we are likely to be tightly centered around the mean prevents our model from being too confused about the state that is generating our observations. 62 | 63 | The priors on our transition matrix are noninformative, using `T[i] ~ Dirichlet(ones(K)/K)`. The Dirichlet prior used in this way assumes that the state is likely to change to any other state with equal probability. As we'll see, this transition matrix prior will be overwritten as we observe data. 64 | 65 | ```{julia} 66 | # Turing model definition. 67 | @model function BayesHmm(y, K) 68 | # Get observation length. 69 | N = length(y) 70 | 71 | # State sequence. 72 | s = zeros(Int, N) 73 | 74 | # Emission matrix. 75 | m = Vector(undef, K) 76 | 77 | # Transition matrix. 78 | T = Vector{Vector}(undef, K) 79 | 80 | # Assign distributions to each element 81 | # of the transition matrix and the 82 | # emission matrix. 83 | for i in 1:K 84 | T[i] ~ Dirichlet(ones(K) / K) 85 | m[i] ~ Normal(i, 0.5) 86 | end 87 | 88 | # Observe each point of the input. 89 | s[1] ~ Categorical(K) 90 | y[1] ~ Normal(m[s[1]], 0.1) 91 | 92 | for i in 2:N 93 | s[i] ~ Categorical(vec(T[s[i - 1]])) 94 | y[i] ~ Normal(m[s[i]], 0.1) 95 | end 96 | end; 97 | ``` 98 | 99 | We will use a combination of two samplers (HMC and Particle Gibbs) by passing them to the Gibbs sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters. (For API details of these samplers, please see [Turing.jl's API documentation](https://turinglang.org/Turing.jl/stable/api/Inference/).) 100 | 101 | In this case, we use HMC for `m` and `T`, representing the emission and transition matrices respectively. We use the Particle Gibbs sampler for `s`, the state sequence. You may wonder why it is that we are not assigning `s` to the HMC sampler, and why it is that we need compositional Gibbs sampling at all. 102 | 103 | The parameter `s` is not a continuous variable. 104 | It is a vector of **integers**, and thus Hamiltonian methods like HMC and NUTS won't work correctly. 105 | Gibbs allows us to apply the right tools to the best effect. 106 | If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space. 107 | 108 | Time to run our sampler. 109 | 110 | ```{julia} 111 | #| output: false 112 | #| echo: false 113 | setprogress!(false) 114 | ``` 115 | 116 | ```{julia} 117 | g = Gibbs((:m, :T) => HMC(0.01, 50), :s => PG(120)) 118 | chn = sample(BayesHmm(y, 3), g, 1000) 119 | ``` 120 | 121 | Let's see how well our chain performed. 122 | Ordinarily, using `display(chn)` would be a good first step, but we have generated a lot of parameters here (`s[1]`, `s[2]`, `m[1]`, and so on). 123 | It's a bit easier to show how our model performed graphically. 124 | 125 | The code below generates an animation showing the graph of the data above, and the data our model generates in each sample. 126 | 127 | ```{julia} 128 | # Extract our m and s parameters from the chain. 129 | m_set = MCMCChains.group(chn, :m).value 130 | s_set = MCMCChains.group(chn, :s).value 131 | 132 | # Iterate through the MCMC samples. 133 | Ns = 1:length(chn) 134 | 135 | # Make an animation. 136 | animation = @gif for i in Ns 137 | m = m_set[i, :] 138 | s = Int.(s_set[i, :]) 139 | emissions = m[s] 140 | 141 | p = plot( 142 | y; 143 | chn=:red, 144 | size=(500, 250), 145 | xlabel="Time", 146 | ylabel="State", 147 | legend=:topright, 148 | label="True data", 149 | xlim=(0, 30), 150 | ylim=(-1, 5), 151 | ) 152 | plot!(emissions; color=:blue, label="Sample $i") 153 | end every 3 154 | ``` 155 | 156 | Looks like our model did a pretty good job, but we should also check to make sure our chain converges. A quick check is to examine whether the diagonal (representing the probability of remaining in the current state) of the transition matrix appears to be stationary. The code below extracts the diagonal and shows a traceplot of each persistence probability. 157 | 158 | ```{julia} 159 | # Index the chain with the persistence probabilities. 160 | subchain = chn[["T[1][1]", "T[2][2]", "T[3][3]"]] 161 | 162 | plot(subchain; seriestype=:traceplot, title="Persistence Probability", legend=false) 163 | ``` 164 | 165 | A cursory examination of the traceplot above indicates that all three chains converged to something resembling 166 | stationary. We can use the diagnostic functions provided by [MCMCChains](https://github.com/TuringLang/MCMCChains.jl) to engage in some more formal tests, like the Heidelberg and Welch diagnostic: 167 | 168 | ```{julia} 169 | heideldiag(MCMCChains.group(chn, :T))[1] 170 | ``` 171 | 172 | The p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be reasonably confident that our transition matrix has converged to something reasonable. 173 | 174 | ## Efficient Inference With The Forward Algorithm 175 | 176 | While the above method works well for the simple example in this tutorial, some users may desire a more efficient method, especially when their model is more complicated. 177 | One simple way to improve inference is to marginalize out the hidden states of the model with an appropriate algorithm, calculating only the posterior over the continuous random variables. 178 | Not only does this allow more efficient inference via Rao-Blackwellization, but now we can sample our model with `NUTS()` alone, which is usually a much more performant MCMC kernel. 179 | 180 | Thankfully, [HiddenMarkovModels.jl](https://github.com/gdalle/HiddenMarkovModels.jl) provides an extremely efficient implementation of many algorithms related to hidden Markov models. This allows us to rewrite our model as: 181 | 182 | ```{julia} 183 | using HiddenMarkovModels 184 | using FillArrays 185 | using LinearAlgebra 186 | using LogExpFunctions 187 | 188 | 189 | @model function BayesHmm2(y, K) 190 | m ~ Bijectors.ordered(MvNormal([1.0, 2.0, 3.0], 0.5I)) 191 | T ~ filldist(Dirichlet(fill(1/K, K)), K) 192 | 193 | hmm = HMM(softmax(ones(K)), copy(T'), [Normal(m[i], 0.1) for i in 1:K]) 194 | Turing.@addlogprob! logdensityof(hmm, y) 195 | end 196 | 197 | chn2 = sample(BayesHmm2(y, 3), NUTS(), 1000) 198 | ``` 199 | 200 | 201 | We can compare the chains of these two models, confirming the posterior estimate is similar (modulo label switching concerns with the Gibbs model): 202 | ```{julia} 203 | #| code-fold: true 204 | #| code-summary: "Plotting Chains" 205 | 206 | plot(chn["m[1]"], label = "m[1], Model 1, Gibbs", color = :lightblue) 207 | plot!(chn2["m[1]"], label = "m[1], Model 2, NUTS", color = :blue) 208 | plot!(chn["m[2]"], label = "m[2], Model 1, Gibbs", color = :pink) 209 | plot!(chn2["m[2]"], label = "m[2], Model 2, NUTS", color = :red) 210 | plot!(chn["m[3]"], label = "m[3], Model 1, Gibbs", color = :yellow) 211 | plot!(chn2["m[3]"], label = "m[3], Model 2, NUTS", color = :orange) 212 | ``` 213 | 214 | 215 | ### Recovering Marginalized Trajectories 216 | 217 | We can use the `viterbi()` algorithm, also from the `HiddenMarkovModels` package, to recover the most probable state for each parameter set in our posterior sample: 218 | ```{julia} 219 | @model function BayesHmmRecover(y, K, IncludeGenerated = false) 220 | m ~ Bijectors.ordered(MvNormal([1.0, 2.0, 3.0], 0.5I)) 221 | T ~ filldist(Dirichlet(fill(1/K, K)), K) 222 | 223 | hmm = HMM(softmax(ones(K)), copy(T'), [Normal(m[i], 0.1) for i in 1:K]) 224 | Turing.@addlogprob! logdensityof(hmm, y) 225 | 226 | # Conditional generation of the hidden states. 227 | if IncludeGenerated 228 | seq, _ = viterbi(hmm, y) 229 | s := [m[s] for s in seq] 230 | end 231 | end 232 | 233 | chn_recover = sample(BayesHmmRecover(y, 3, true), NUTS(), 1000) 234 | ``` 235 | 236 | Plotting the estimated states, we can see that the results align well with our expectations: 237 | 238 | ```{julia} 239 | p = plot(xlim=(0, 30), ylim=(-1, 5), size=(500, 250)) 240 | for i in 1:100 241 | ind = rand(DiscreteUniform(1, 1000)) 242 | plot!(MCMCChains.group(chn_recover, :s).value[ind,:], color = :grey, opacity = 0.1, legend = :false) 243 | end 244 | scatter!(y, color = :blue) 245 | 246 | p 247 | ``` 248 | -------------------------------------------------------------------------------- /tutorials/infinite-mixture-models/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Infinite Mixture Models 3 | engine: julia 4 | aliases: 5 | - ../06-infinite-mixture-model/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | In many applications it is desirable to allow the model to adjust its complexity to the amount of data. Consider for example the task of assigning objects into clusters or groups. This task often involves the specification of the number of groups. However, often times it is not known beforehand how many groups exist. Moreover, in some applictions, e.g. modelling topics in text documents or grouping species, the number of examples per group is heavy tailed. This makes it impossible to predefine the number of groups and requiring the model to form new groups when data points from previously unseen groups are observed. 16 | 17 | A natural approach for such applications is the use of non-parametric models. This tutorial will introduce how to use the Dirichlet process in a mixture of infinitely many Gaussians using Turing. For further information on Bayesian nonparametrics and the Dirichlet process we refer to the [introduction by Zoubin Ghahramani](http://mlg.eng.cam.ac.uk/pub/pdf/Gha12.pdf) and the book "Fundamentals of Nonparametric Bayesian Inference" by Subhashis Ghosal and Aad van der Vaart. 18 | 19 | ```{julia} 20 | using Turing 21 | ``` 22 | 23 | ## Mixture Model 24 | 25 | Before introducing infinite mixture models in Turing, we will briefly review the construction of finite mixture models. Subsequently, we will define how to use the [Chinese restaurant process](https://en.wikipedia.org/wiki/Chinese_restaurant_process) construction of a Dirichlet process for non-parametric clustering. 26 | 27 | #### Two-Component Model 28 | 29 | First, consider the simple case of a mixture model with two Gaussian components with fixed covariance. 30 | The generative process of such a model can be written as: 31 | 32 | \begin{equation*} 33 | \begin{aligned} 34 | \pi_1 &\sim \mathrm{Beta}(a, b) \\ 35 | \pi_2 &= 1-\pi_1 \\ 36 | \mu_1 &\sim \mathrm{Normal}(\mu_0, \Sigma_0) \\ 37 | \mu_2 &\sim \mathrm{Normal}(\mu_0, \Sigma_0) \\ 38 | z_i &\sim \mathrm{Categorical}(\pi_1, \pi_2) \\ 39 | x_i &\sim \mathrm{Normal}(\mu_{z_i}, \Sigma) 40 | \end{aligned} 41 | \end{equation*} 42 | 43 | where $\pi_1, \pi_2$ are the mixing weights of the mixture model, i.e. $\pi_1 + \pi_2 = 1$, and $z_i$ is a latent assignment of the observation $x_i$ to a component (Gaussian). 44 | 45 | We can implement this model in Turing for 1D data as follows: 46 | 47 | ```{julia} 48 | @model function two_model(x) 49 | # Hyper-parameters 50 | μ0 = 0.0 51 | σ0 = 1.0 52 | 53 | # Draw weights. 54 | π1 ~ Beta(1, 1) 55 | π2 = 1 - π1 56 | 57 | # Draw locations of the components. 58 | μ1 ~ Normal(μ0, σ0) 59 | μ2 ~ Normal(μ0, σ0) 60 | 61 | # Draw latent assignment. 62 | z ~ Categorical([π1, π2]) 63 | 64 | # Draw observation from selected component. 65 | if z == 1 66 | x ~ Normal(μ1, 1.0) 67 | else 68 | x ~ Normal(μ2, 1.0) 69 | end 70 | end 71 | ``` 72 | 73 | #### Finite Mixture Model 74 | 75 | If we have more than two components, this model can elegantly be extended using a Dirichlet distribution as prior for the mixing weights $\pi_1, \dots, \pi_K$. Note that the Dirichlet distribution is the multivariate generalization of the beta distribution. The resulting model can be written as: 76 | 77 | $$ 78 | \begin{align} 79 | (\pi_1, \dots, \pi_K) &\sim Dirichlet(K, \alpha) \\ 80 | \mu_k &\sim \mathrm{Normal}(\mu_0, \Sigma_0), \;\; \forall k \\ 81 | z &\sim Categorical(\pi_1, \dots, \pi_K) \\ 82 | x &\sim \mathrm{Normal}(\mu_z, \Sigma) 83 | \end{align} 84 | $$ 85 | 86 | which resembles the model in the [Gaussian mixture model tutorial]({{}}) with a slightly different notation. 87 | 88 | ## Infinite Mixture Model 89 | 90 | The question now arises, is there a generalization of a Dirichlet distribution for which the dimensionality $K$ is infinite, i.e. $K = \infty$? 91 | 92 | But first, to implement an infinite Gaussian mixture model in Turing, we first need to load the `Turing.RandomMeasures` module. `RandomMeasures` contains a variety of tools useful in nonparametrics. 93 | 94 | ```{julia} 95 | using Turing.RandomMeasures 96 | ``` 97 | 98 | We now will utilize the fact that one can integrate out the mixing weights in a Gaussian mixture model allowing us to arrive at the Chinese restaurant process construction. See Carl E. Rasmussen: [The Infinite Gaussian Mixture Model](https://www.seas.harvard.edu/courses/cs281/papers/rasmussen-1999a.pdf), NIPS (2000) for details. 99 | 100 | In fact, if the mixing weights are integrated out, the conditional prior for the latent variable $z$ is given by: 101 | 102 | $$ 103 | p(z_i = k \mid z_{\not i}, \alpha) = \frac{n_k + \alpha K}{N - 1 + \alpha} 104 | $$ 105 | 106 | where $z_{\not i}$ are the latent assignments of all observations except observation $i$. Note that we use $n_k$ to denote the number of observations at component $k$ excluding observation $i$. The parameter $\alpha$ is the concentration parameter of the Dirichlet distribution used as prior over the mixing weights. 107 | 108 | #### Chinese Restaurant Process 109 | 110 | To obtain the Chinese restaurant process construction, we can now derive the conditional prior if $K \rightarrow \infty$. 111 | 112 | For $n_k > 0$ we obtain: 113 | 114 | $$ 115 | p(z_i = k \mid z_{\not i}, \alpha) = \frac{n_k}{N - 1 + \alpha} 116 | $$ 117 | 118 | and for all infinitely many clusters that are empty (combined) we get: 119 | 120 | $$ 121 | p(z_i = k \mid z_{\not i}, \alpha) = \frac{\alpha}{N - 1 + \alpha} 122 | $$ 123 | 124 | Those equations show that the conditional prior for component assignments is proportional to the number of such observations, meaning that the Chinese restaurant process has a rich get richer property. 125 | 126 | To get a better understanding of this property, we can plot the cluster choosen by for each new observation drawn from the conditional prior. 127 | 128 | ```{julia} 129 | # Concentration parameter. 130 | α = 10.0 131 | 132 | # Random measure, e.g. Dirichlet process. 133 | rpm = DirichletProcess(α) 134 | 135 | # Cluster assignments for each observation. 136 | z = Vector{Int}() 137 | 138 | # Maximum number of observations we observe. 139 | Nmax = 500 140 | 141 | for i in 1:Nmax 142 | # Number of observations per cluster. 143 | K = isempty(z) ? 0 : maximum(z) 144 | nk = Vector{Int}(map(k -> sum(z .== k), 1:K)) 145 | 146 | # Draw new assignment. 147 | push!(z, rand(ChineseRestaurantProcess(rpm, nk))) 148 | end 149 | ``` 150 | 151 | ```{julia} 152 | using Plots 153 | 154 | # Plot the cluster assignments over time 155 | @gif for i in 1:Nmax 156 | scatter( 157 | collect(1:i), 158 | z[1:i]; 159 | markersize=2, 160 | xlabel="observation (i)", 161 | ylabel="cluster (k)", 162 | legend=false, 163 | ) 164 | end 165 | ``` 166 | 167 | Further, we can see that the number of clusters is logarithmic in the number of observations and data points. This is a side-effect of the "rich-get-richer" phenomenon, i.e. we expect large clusters and thus the number of clusters has to be smaller than the number of observations. 168 | 169 | $$ 170 | \mathbb{E}[K \mid N] \approx \alpha \cdot log \big(1 + \frac{N}{\alpha}\big) 171 | $$ 172 | 173 | We can see from the equation that the concentration parameter $\alpha$ allows us to control the number of clusters formed *a priori*. 174 | 175 | In Turing we can implement an infinite Gaussian mixture model using the Chinese restaurant process construction of a Dirichlet process as follows: 176 | 177 | ```{julia} 178 | @model function infiniteGMM(x) 179 | # Hyper-parameters, i.e. concentration parameter and parameters of H. 180 | α = 1.0 181 | μ0 = 0.0 182 | σ0 = 1.0 183 | 184 | # Define random measure, e.g. Dirichlet process. 185 | rpm = DirichletProcess(α) 186 | 187 | # Define the base distribution, i.e. expected value of the Dirichlet process. 188 | H = Normal(μ0, σ0) 189 | 190 | # Latent assignment. 191 | z = zeros(Int, length(x)) 192 | 193 | # Locations of the infinitely many clusters. 194 | μ = zeros(Float64, 0) 195 | 196 | for i in 1:length(x) 197 | 198 | # Number of clusters. 199 | K = maximum(z) 200 | nk = Vector{Int}(map(k -> sum(z .== k), 1:K)) 201 | 202 | # Draw the latent assignment. 203 | z[i] ~ ChineseRestaurantProcess(rpm, nk) 204 | 205 | # Create a new cluster? 206 | if z[i] > K 207 | push!(μ, 0.0) 208 | 209 | # Draw location of new cluster. 210 | μ[z[i]] ~ H 211 | end 212 | 213 | # Draw observation. 214 | x[i] ~ Normal(μ[z[i]], 1.0) 215 | end 216 | end 217 | ``` 218 | 219 | We can now use Turing to infer the assignments of some data points. First, we will create some random data that comes from three clusters, with means of 0, -5, and 10. 220 | 221 | ```{julia} 222 | using Plots, Random 223 | 224 | # Generate some test data. 225 | Random.seed!(1) 226 | data = vcat(randn(10), randn(10) .- 5, randn(10) .+ 10) 227 | data .-= mean(data) 228 | data /= std(data); 229 | ``` 230 | 231 | Next, we'll sample from our posterior using SMC. 232 | 233 | ```{julia} 234 | #| output: false 235 | setprogress!(false) 236 | ``` 237 | 238 | ```{julia} 239 | # MCMC sampling 240 | Random.seed!(2) 241 | iterations = 1000 242 | model_fun = infiniteGMM(data); 243 | chain = sample(model_fun, SMC(), iterations); 244 | ``` 245 | 246 | Finally, we can plot the number of clusters in each sample. 247 | 248 | ```{julia} 249 | # Extract the number of clusters for each sample of the Markov chain. 250 | k = map( 251 | t -> length(unique(vec(chain[t, MCMCChains.namesingroup(chain, :z), :].value))), 252 | 1:iterations, 253 | ); 254 | 255 | # Visualize the number of clusters. 256 | plot(k; xlabel="Iteration", ylabel="Number of clusters", label="Chain 1") 257 | ``` 258 | 259 | If we visualize the histogram of the number of clusters sampled from our posterior, we observe that the model seems to prefer 3 clusters, which is the true number of clusters. Note that the number of clusters in a Dirichlet process mixture model is not limited a priori and will grow to infinity with probability one. However, if conditioned on data the posterior will concentrate on a finite number of clusters enforcing the resulting model to have a finite amount of clusters. It is, however, not given that the posterior of a Dirichlet process Gaussian mixture model converges to the true number of clusters, given that data comes from a finite mixture model. See Jeffrey Miller and Matthew Harrison: [A simple example of Dirichlet process mixture inconsitency for the number of components](https://arxiv.org/pdf/1301.2708.pdf) for details. 260 | 261 | ```{julia} 262 | histogram(k; xlabel="Number of clusters", legend=false) 263 | ``` 264 | 265 | One issue with the Chinese restaurant process construction is that the number of latent parameters we need to sample scales with the number of observations. It may be desirable to use alternative constructions in certain cases. Alternative methods of constructing a Dirichlet process can be employed via the following representations: 266 | 267 | Size-Biased Sampling Process 268 | 269 | $$ 270 | j_k \sim \mathrm{Beta}(1, \alpha) \cdot \mathrm{surplus} 271 | $$ 272 | 273 | Stick-Breaking Process 274 | $$ 275 | v_k \sim \mathrm{Beta}(1, \alpha) 276 | $$ 277 | 278 | Chinese Restaurant Process 279 | $$ 280 | p(z_n = k | z_{1:n-1}) \propto \begin{cases} 281 | \frac{m_k}{n-1+\alpha}, \text{ if } m_k > 0\\\ 282 | \frac{\alpha}{n-1+\alpha} 283 | \end{cases} 284 | $$ 285 | 286 | For more details see [this article](https://www.stats.ox.ac.uk/%7Eteh/research/npbayes/Teh2010a.pdf). 287 | -------------------------------------------------------------------------------- /tutorials/multinomial-logistic-regression/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Multinomial Logistic Regression 3 | engine: julia 4 | aliases: 5 | - ../08-multinomial-logistic-regression/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | [Multinomial logistic regression](https://en.wikipedia.org/wiki/Multinomial_logistic_regression) is an extension of logistic regression. Logistic regression is used to model problems in which there are exactly two possible discrete outcomes. Multinomial logistic regression is used to model problems in which there are two or more possible discrete outcomes. 16 | 17 | In our example, we'll be using the iris dataset. The iris multiclass problem aims to predict the species of a flower given measurements (in centimeters) of sepal length and width and petal length and width. There are three possible species: Iris setosa, Iris versicolor, and Iris virginica. 18 | 19 | To start, let's import all the libraries we'll need. 20 | 21 | ```{julia} 22 | # Load Turing. 23 | using Turing 24 | 25 | # Load RDatasets. 26 | using RDatasets 27 | 28 | # Load StatsPlots for visualizations and diagnostics. 29 | using StatsPlots 30 | 31 | # Functionality for splitting and normalizing the data. 32 | using MLDataUtils: shuffleobs, splitobs, rescale! 33 | 34 | # We need a softmax function which is provided by NNlib. 35 | using NNlib: softmax 36 | 37 | # Functionality for constructing arrays with identical elements efficiently. 38 | using FillArrays 39 | 40 | # Functionality for working with scaled identity matrices. 41 | using LinearAlgebra 42 | 43 | # Set a seed for reproducibility. 44 | using Random 45 | Random.seed!(0); 46 | ``` 47 | 48 | ## Data Cleaning & Set Up 49 | 50 | Now we're going to import our dataset. Twenty rows of the dataset are shown below so you can get a good feel for what kind of data we have. 51 | 52 | ```{julia} 53 | # Import the "iris" dataset. 54 | data = RDatasets.dataset("datasets", "iris"); 55 | 56 | # Show twenty random rows. 57 | data[rand(1:size(data, 1), 20), :] 58 | ``` 59 | 60 | In this data set, the outcome `Species` is currently coded as a string. We convert it to a numerical value by using indices `1`, `2`, and `3` to indicate species `setosa`, `versicolor`, and `virginica`, respectively. 61 | 62 | ```{julia} 63 | # Recode the `Species` column. 64 | species = ["setosa", "versicolor", "virginica"] 65 | data[!, :Species_index] = indexin(data[!, :Species], species) 66 | 67 | # Show twenty random rows of the new species columns 68 | data[rand(1:size(data, 1), 20), [:Species, :Species_index]] 69 | ``` 70 | 71 | After we've done that tidying, it's time to split our dataset into training and testing sets, and separate the features and target from the data. Additionally, we must rescale our feature variables so that they are centered around zero by subtracting each column by the mean and dividing it by the standard deviation. Without this step, Turing's sampler will have a hard time finding a place to start searching for parameter estimates. 72 | 73 | ```{julia} 74 | # Split our dataset 50%/50% into training/test sets. 75 | trainset, testset = splitobs(shuffleobs(data), 0.5) 76 | 77 | # Define features and target. 78 | features = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth] 79 | target = :Species_index 80 | 81 | # Turing requires data in matrix and vector form. 82 | train_features = Matrix(trainset[!, features]) 83 | test_features = Matrix(testset[!, features]) 84 | train_target = trainset[!, target] 85 | test_target = testset[!, target] 86 | 87 | # Standardize the features. 88 | μ, σ = rescale!(train_features; obsdim=1) 89 | rescale!(test_features, μ, σ; obsdim=1); 90 | ``` 91 | 92 | ## Model Declaration 93 | 94 | Finally, we can define our model `logistic_regression`. It is a function that takes three arguments where 95 | 96 | - `x` is our set of independent variables; 97 | - `y` is the element we want to predict; 98 | - `σ` is the standard deviation we want to assume for our priors. 99 | 100 | We select the `setosa` species as the baseline class (the choice does not matter). Then we create the intercepts and vectors of coefficients for the other classes against that baseline. More concretely, we create scalar intercepts `intercept_versicolor` and `intersept_virginica` and coefficient vectors `coefficients_versicolor` and `coefficients_virginica` with four coefficients each for the features `SepalLength`, `SepalWidth`, `PetalLength` and `PetalWidth`. We assume a normal distribution with mean zero and standard deviation `σ` as prior for each scalar parameter. We want to find the posterior distribution of these, in total ten, parameters to be able to predict the species for any given set of features. 101 | 102 | ```{julia} 103 | # Bayesian multinomial logistic regression 104 | @model function logistic_regression(x, y, σ) 105 | n = size(x, 1) 106 | length(y) == n || 107 | throw(DimensionMismatch("number of observations in `x` and `y` is not equal")) 108 | 109 | # Priors of intercepts and coefficients. 110 | intercept_versicolor ~ Normal(0, σ) 111 | intercept_virginica ~ Normal(0, σ) 112 | coefficients_versicolor ~ MvNormal(Zeros(4), σ^2 * I) 113 | coefficients_virginica ~ MvNormal(Zeros(4), σ^2 * I) 114 | 115 | # Compute the likelihood of the observations. 116 | values_versicolor = intercept_versicolor .+ x * coefficients_versicolor 117 | values_virginica = intercept_virginica .+ x * coefficients_virginica 118 | for i in 1:n 119 | # the 0 corresponds to the base category `setosa` 120 | v = softmax([0, values_versicolor[i], values_virginica[i]]) 121 | y[i] ~ Categorical(v) 122 | end 123 | end; 124 | ``` 125 | 126 | ## Sampling 127 | 128 | Now we can run our sampler. This time we'll use [`NUTS`](https://turinglang.org/stable/docs/library/#Turing.Inference.NUTS) to sample from our posterior. 129 | 130 | ```{julia} 131 | #| output: false 132 | setprogress!(false) 133 | ``` 134 | 135 | ```{julia} 136 | #| output: false 137 | m = logistic_regression(train_features, train_target, 1) 138 | chain = sample(m, NUTS(), MCMCThreads(), 1_500, 3) 139 | ``` 140 | 141 | 142 | ```{julia} 143 | #| echo: false 144 | chain 145 | ``` 146 | 147 | ::: {.callout-warning collapse="true"} 148 | ## Sampling With Multiple Threads 149 | The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains 150 | will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.]({{}}#sampling-multiple-chains) 151 | ::: 152 | 153 | Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points. 154 | 155 | ```{julia} 156 | plot(chain) 157 | ``` 158 | 159 | Looks good! 160 | 161 | We can also use the `corner` function from MCMCChains to show the distributions of the various parameters of our multinomial logistic regression. The corner function requires MCMCChains and StatsPlots. 162 | 163 | ```{julia} 164 | # Only plotting the first 3 coefficients due to a bug in Plots.jl 165 | corner( 166 | chain, 167 | MCMCChains.namesingroup(chain, :coefficients_versicolor)[1:3]; 168 | ) 169 | ``` 170 | 171 | ```{julia} 172 | # Only plotting the first 3 coefficients due to a bug in Plots.jl 173 | corner( 174 | chain, 175 | MCMCChains.namesingroup(chain, :coefficients_virginica)[1:3]; 176 | ) 177 | ``` 178 | 179 | Fortunately the corner plots appear to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter's sampled values to estimate our model to make predictions. 180 | 181 | ## Making Predictions 182 | 183 | How do we test how well the model actually predicts which of the three classes an iris flower belongs to? We need to build a `prediction` function that takes the test dataset and runs it through the average parameter calculated during sampling. 184 | 185 | The `prediction` function below takes a `Matrix` and a `Chains` object. It computes the mean of the sampled parameters and calculates the species with the highest probability for each observation. Note that we do not have to evaluate the `softmax` function since it does not affect the order of its inputs. 186 | 187 | ```{julia} 188 | function prediction(x::Matrix, chain) 189 | # Pull the means from each parameter's sampled values in the chain. 190 | intercept_versicolor = mean(chain, :intercept_versicolor) 191 | intercept_virginica = mean(chain, :intercept_virginica) 192 | coefficients_versicolor = [ 193 | mean(chain, k) for k in MCMCChains.namesingroup(chain, :coefficients_versicolor) 194 | ] 195 | coefficients_virginica = [ 196 | mean(chain, k) for k in MCMCChains.namesingroup(chain, :coefficients_virginica) 197 | ] 198 | 199 | # Compute the index of the species with the highest probability for each observation. 200 | values_versicolor = intercept_versicolor .+ x * coefficients_versicolor 201 | values_virginica = intercept_virginica .+ x * coefficients_virginica 202 | species_indices = [ 203 | argmax((0, x, y)) for (x, y) in zip(values_versicolor, values_virginica) 204 | ] 205 | 206 | return species_indices 207 | end; 208 | ``` 209 | 210 | Let's see how we did! We run the test matrix through the prediction function, and compute the accuracy for our prediction. 211 | 212 | ```{julia} 213 | # Make the predictions. 214 | predictions = prediction(test_features, chain) 215 | 216 | # Calculate accuracy for our test set. 217 | mean(predictions .== testset[!, :Species_index]) 218 | ``` 219 | 220 | Perhaps more important is to see the accuracy per class. 221 | 222 | ```{julia} 223 | for s in 1:3 224 | rows = testset[!, :Species_index] .== s 225 | println("Number of `", species[s], "`: ", count(rows)) 226 | println( 227 | "Percentage of `", 228 | species[s], 229 | "` predicted correctly: ", 230 | mean(predictions[rows] .== testset[rows, :Species_index]), 231 | ) 232 | end 233 | ``` 234 | 235 | This tutorial has demonstrated how to use Turing to perform Bayesian multinomial logistic regression. 236 | -------------------------------------------------------------------------------- /usage/automatic-differentiation/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Automatic Differentiation 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/docs-10-using-turing-autodiff/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | ## Switching AD Modes 16 | 17 | Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl) and [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl) for reverse-mode AD. 18 | `ForwardDiff` is automatically imported by Turing. To utilize `Mooncake` or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake` or `import ReverseDiff`, alongside the usual `using Turing`. 19 | 20 | As of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of [`AdTypes.jl`](https://github.com/SciML/ADTypes.jl), allowing users to specify the AD backend for individual samplers independently. 21 | Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`. 22 | 23 | For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler constructor. A `chunksize` of `nothing` permits the chunk size to be automatically determined. For more information regarding the selection of `chunksize`, please refer to [related section of `ForwardDiff`'s documentation](https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Configuring-Chunk-Size). 24 | 25 | For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional keyword argument called `compile` can be provided to `AutoReverseDiff`. It specifies whether to pre-record the tape only once and reuse it later (`compile` is set to `false` by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care. 26 | 27 | Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model. 28 | 29 | Thus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and `if`-statements should consistently execute the same branches. 30 | For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data. 31 | However, `if`-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect. 32 | Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter. 33 | 34 | The previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` have been removed. 35 | 36 | For `Mooncake`, pass `adtype=AutoMooncake(; config=nothing)` to the sampler constructor. 37 | 38 | ## Compositional Sampling with Differing AD Modes 39 | 40 | Turing supports intermixed automatic differentiation methods for different variable spaces. The snippet below shows using `ForwardDiff` to sample the mean (`m`) parameter, and using `ReverseDiff` for the variance (`s`) parameter: 41 | 42 | ```{julia} 43 | using Turing 44 | using ReverseDiff 45 | 46 | # Define a simple Normal model with unknown mean and variance. 47 | @model function gdemo(x, y) 48 | s² ~ InverseGamma(2, 3) 49 | m ~ Normal(0, sqrt(s²)) 50 | x ~ Normal(m, sqrt(s²)) 51 | return y ~ Normal(m, sqrt(s²)) 52 | end 53 | 54 | # Sample using Gibbs and varying autodiff backends. 55 | c = sample( 56 | gdemo(1.5, 2), 57 | Gibbs( 58 | :m => HMC(0.1, 5; adtype=AutoForwardDiff(; chunksize=0)), 59 | :s² => HMC(0.1, 5; adtype=AutoReverseDiff(false)), 60 | ), 61 | 1000, 62 | progress=false, 63 | ) 64 | ``` 65 | 66 | Generally, reverse-mode AD, for instance `ReverseDiff`, is faster when sampling from variables of high dimensionality (greater than 20), while forward-mode AD, for instance `ForwardDiff`, is more efficient for lower-dimension variables. This functionality allows those who are performance sensitive to fine tune their automatic differentiation for their specific models. 67 | 68 | If the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is. 69 | Currently, this defaults to `ForwardDiff`. 70 | 71 | The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see [the API documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities)): 72 | 73 | ```{julia} 74 | using DynamicPPL.TestUtils.AD: run_ad, ADResult 75 | using ForwardDiff, ReverseDiff 76 | 77 | model = gdemo(1.5, 2) 78 | 79 | for adtype in [AutoForwardDiff(), AutoReverseDiff()] 80 | result = run_ad(model, adtype; benchmark=true) 81 | @show result.time_vs_primal 82 | end 83 | ``` 84 | 85 | In this specific instance, ForwardDiff is clearly faster (due to the small size of the model). 86 | 87 | We also have a table of benchmarks for various models and AD backends in [the ADTests website](https://turinglang.org/ADTests/). 88 | These models aim to capture a variety of different Turing.jl features. 89 | If you have suggestions for things to include, please do let us know by [creating an issue on GitHub](https://github.com/TuringLang/ADTests/issues/new)! 90 | -------------------------------------------------------------------------------- /usage/custom-distribution/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Custom Distributions 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/usage-custom-distribution/index.html 6 | - ../../tutorials/docs-09-using-turing-advanced/index.html 7 | --- 8 | 9 | ```{julia} 10 | #| echo: false 11 | #| output: false 12 | using Pkg; 13 | Pkg.instantiate(); 14 | ``` 15 | 16 | `Turing.jl` supports the use of distributions from the Distributions.jl package. 17 | By extension, it also supports the use of customized distributions by defining them as subtypes of `Distribution` type of the Distributions.jl package, as well as corresponding functions. 18 | 19 | This page shows a workflow of how to define a customized distribution, using our own implementation of a simple `Uniform` distribution as a simple example. 20 | 21 | ```{julia} 22 | #| output: false 23 | using Distributions, Turing, Random, Bijectors 24 | ``` 25 | 26 | ## Define the Distribution Type 27 | 28 | First, define a type of the distribution, as a subtype of a corresponding distribution type in the Distributions.jl package. 29 | 30 | ```{julia} 31 | struct CustomUniform <: ContinuousUnivariateDistribution end 32 | ``` 33 | 34 | ## Implement Sampling and Evaluation of the log-pdf 35 | 36 | Second, implement the `rand` and `logpdf` functions for your new distribution, which will be used to run the model. 37 | 38 | ```{julia} 39 | # sample in [0, 1] 40 | Distributions.rand(rng::AbstractRNG, d::CustomUniform) = rand(rng) 41 | 42 | # p(x) = 1 → log[p(x)] = 0 43 | Distributions.logpdf(d::CustomUniform, x::Real) = zero(x) 44 | ``` 45 | 46 | ## Define Helper Functions 47 | 48 | In most cases, it may be required to define some helper functions. 49 | 50 | ### Domain Transformation 51 | 52 | Certain samplers, such as `HMC`, require the domain of the priors to be unbounded. 53 | Therefore, to use our `CustomUniform` as a prior in a model we also need to define how to transform samples from `[0, 1]` to `ℝ`. 54 | To do this, we need to define the corresponding `Bijector` from `Bijectors.jl`, which is what `Turing.jl` uses internally to deal with constrained distributions. 55 | 56 | To transform from `[0, 1]` to `ℝ` we can use the `Logit` bijector: 57 | 58 | ```{julia} 59 | Bijectors.bijector(d::CustomUniform) = Logit(0.0, 1.0) 60 | ``` 61 | 62 | In the present example, `CustomUniform` is a subtype of `ContinuousUnivariateDistribution`. 63 | The procedure for subtypes of `ContinuousMultivariateDistribution` and `ContinuousMatrixDistribution` is exactly the same. 64 | For example, `Wishart` defines a distribution over positive-definite matrices and so `bijector` returns a `PDBijector` when called with a `Wishart` distribution as an argument. 65 | For discrete distributions, there is no need to define a bijector; the `Identity` bijector is used by default. 66 | 67 | As an alternative to the above, for `UnivariateDistribution` we could define the `minimum` and `maximum` of the distribution: 68 | 69 | ```{julia} 70 | Distributions.minimum(d::CustomUniform) = 0.0 71 | Distributions.maximum(d::CustomUniform) = 1.0 72 | ``` 73 | 74 | and `Bijectors.jl` will return a default `Bijector` called `TruncatedBijector` which makes use of `minimum` and `maximum` derive the correct transformation. 75 | 76 | Internally, Turing basically does the following when it needs to convert a constrained distribution to an unconstrained distribution, e.g. when sampling using `HMC`: 77 | 78 | ```{julia} 79 | dist = Gamma(2,3) 80 | b = bijector(dist) 81 | transformed_dist = transformed(dist, b) # results in distribution with transformed support + correction for logpdf 82 | ``` 83 | 84 | and then we can call `rand` and `logpdf` as usual, where 85 | 86 | - `rand(transformed_dist)` returns a sample in the unconstrained space, and 87 | - `logpdf(transformed_dist, y)` returns the log density of the original distribution, but with `y` living in the unconstrained space. 88 | 89 | To read more about Bijectors.jl, check out [its documentation](https://turinglang.org/Bijectors.jl/stable/). 90 | -------------------------------------------------------------------------------- /usage/dynamichmc/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Using DynamicHMC 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/docs-11-using-turing-dynamichmc/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | Turing supports the use of [DynamicHMC](https://github.com/tpapp/DynamicHMC.jl) as a sampler through the `DynamicNUTS` function. 16 | 17 | To use the `DynamicNUTS` function, you must import the `DynamicHMC` package as well as Turing. Turing does not formally require `DynamicHMC` but will include additional functionality if both packages are present. 18 | 19 | Here is a brief example: 20 | 21 | ### How to apply `DynamicNUTS`: 22 | 23 | ```{julia} 24 | # Import Turing and DynamicHMC. 25 | using DynamicHMC, Turing 26 | 27 | # Model definition. 28 | @model function gdemo(x, y) 29 | s² ~ InverseGamma(2, 3) 30 | m ~ Normal(0, sqrt(s²)) 31 | x ~ Normal(m, sqrt(s²)) 32 | return y ~ Normal(m, sqrt(s²)) 33 | end 34 | 35 | # Pull 2,000 samples using DynamicNUTS. 36 | dynamic_nuts = externalsampler(DynamicHMC.NUTS()) 37 | chn = sample(gdemo(1.5, 2.0), dynamic_nuts, 2000, progress=false) 38 | ``` 39 | -------------------------------------------------------------------------------- /usage/external-samplers/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Using External Samplers 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/docs-16-using-turing-external-samplers/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | ## Using External Samplers on Turing Models 16 | 17 | `Turing` provides several wrapped samplers from external sampling libraries, e.g., HMC samplers from `AdvancedHMC`. 18 | These wrappers allow new users to seamlessly sample statistical models without leaving `Turing` 19 | However, these wrappers might only sometimes be complete, missing some functionality from the wrapped sampling library. 20 | Moreover, users might want to use samplers currently not wrapped within `Turing`. 21 | 22 | For these reasons, `Turing` also makes running external samplers on Turing models easy without any necessary modifications or wrapping! 23 | Throughout, we will use a 10-dimensional Neal's funnel as a running example:: 24 | 25 | ```{julia} 26 | # Import libraries. 27 | using Turing, Random, LinearAlgebra 28 | 29 | d = 10 30 | @model function funnel() 31 | θ ~ Truncated(Normal(0, 3), -3, 3) 32 | z ~ MvNormal(zeros(d - 1), exp(θ) * I) 33 | return x ~ MvNormal(z, I) 34 | end 35 | ``` 36 | 37 | Now we sample the model to generate some observations, which we can then condition on. 38 | 39 | ```{julia} 40 | (; x) = rand(funnel() | (θ=0,)) 41 | model = funnel() | (; x); 42 | ``` 43 | 44 | Users can use any sampler algorithm to sample this model if it follows the `AbstractMCMC` API. 45 | Before discussing how this is done in practice, giving a high-level description of the process is interesting. 46 | Imagine that we created an instance of an external sampler that we will call `spl` such that `typeof(spl)<:AbstractMCMC.AbstractSampler`. 47 | In order to avoid type ambiguity within Turing, at the moment it is necessary to declare `spl` as an external sampler to Turing `espl = externalsampler(spl)`, where `externalsampler(s::AbstractMCMC.AbstractSampler)` is a Turing function that types our external sampler adequately. 48 | 49 | An excellent point to start to show how this is done in practice is by looking at the sampling library `AdvancedMH` ([`AdvancedMH`'s GitHub](https://github.com/TuringLang/AdvancedMH.jl)) for Metropolis-Hastings (MH) methods. 50 | Let's say we want to use a random walk Metropolis-Hastings sampler without specifying the proposal distributions. 51 | The code below constructs an MH sampler using a multivariate Gaussian distribution with zero mean and unit variance in `d` dimensions as a random walk proposal. 52 | 53 | ```{julia} 54 | # Importing the sampling library 55 | using AdvancedMH 56 | rwmh = AdvancedMH.RWMH(d) 57 | ``` 58 | 59 | ```{julia} 60 | #| output: false 61 | setprogress!(false) 62 | ``` 63 | 64 | Sampling is then as easy as: 65 | 66 | 67 | ```{julia} 68 | chain = sample(model, externalsampler(rwmh), 10_000) 69 | ``` 70 | 71 | ## Going beyond the Turing API 72 | 73 | As previously mentioned, the Turing wrappers can often limit the capabilities of the sampling libraries they wrap. 74 | `AdvancedHMC`[^1] ([`AdvancedHMC`'s GitHub](https://github.com/TuringLang/AdvancedHMC.jl)) is a clear example of this. A common practice when performing HMC is to provide an initial guess for the mass matrix. 75 | However, the native HMC sampler within Turing only allows the user to specify the type of the mass matrix despite the two options being possible within `AdvancedHMC`. 76 | Thankfully, we can use Turing's support for external samplers to define an HMC sampler with a custom mass matrix in `AdvancedHMC` and then use it to sample our Turing model. 77 | 78 | We can use the library `Pathfinder`[^2] ([`Pathfinder`'s GitHub](https://github.com/mlcolab/Pathfinder.jl)) to construct our estimate of mass matrix. 79 | `Pathfinder` is a variational inference algorithm that first finds the maximum a posteriori (MAP) estimate of a target posterior distribution and then uses the trace of the optimization to construct a sequence of multivariate normal approximations to the target distribution. 80 | In this process, `Pathfinder` computes an estimate of the mass matrix the user can access. 81 | You can see an example of how to use `Pathfinder` with Turing in [`Pathfinder`'s docs](https://mlcolab.github.io/Pathfinder.jl/stable/examples/turing/). 82 | 83 | ## Using new inference methods 84 | 85 | So far we have used Turing's support for external samplers to go beyond the capabilities of the wrappers. 86 | We want to use this support to employ a sampler not supported within Turing's ecosystem yet. 87 | We will use the recently developed Micro-Cannoncial Hamiltonian Monte Carlo (MCHMC) sampler to showcase this. 88 | MCHMC[[^3],[^4]] ((MCHMC's GitHub)[https://github.com/JaimeRZP/MicroCanonicalHMC.jl]) is HMC sampler that uses one single Hamiltonian energy level to explore the whole parameter space. 89 | This is achieved by simulating the dynamics of a microcanonical Hamiltonian with an additional noise term to ensure ergodicity. 90 | 91 | Using this as well as other inference methods outside the Turing ecosystem is as simple as executing the code shown below: 92 | 93 | ```{julia} 94 | using MicroCanonicalHMC 95 | # Create MCHMC sampler 96 | n_adapts = 1_000 # adaptation steps 97 | tev = 0.01 # target energy variance 98 | mchmc = MCHMC(n_adapts, tev; adaptive=true) 99 | 100 | # Sample 101 | chain = sample(model, externalsampler(mchmc), 10_000) 102 | ``` 103 | 104 | The only requirement to work with `externalsampler` is that the provided `sampler` must implement the AbstractMCMC.jl-interface [INSERT LINK] for a `model` of type `AbstractMCMC.LogDensityModel` [INSERT LINK]. 105 | 106 | As previously stated, in order to use external sampling libraries within `Turing` they must follow the `AbstractMCMC` API. 107 | In this section, we will briefly dwell on what this entails. 108 | First and foremost, the sampler should be a subtype of `AbstractMCMC.AbstractSampler`. 109 | Second, the stepping function of the MCMC algorithm must be made defined using `AbstractMCMC.step` and follow the structure below: 110 | 111 | ```{julia} 112 | #| eval: false 113 | # First step 114 | function AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}( 115 | rng::Random.AbstractRNG, 116 | model::AbstractMCMC.LogDensityModel, 117 | spl::T; 118 | kwargs..., 119 | ) 120 | [...] 121 | return transition, sample 122 | end 123 | 124 | # N+1 step 125 | function AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}( 126 | rng::Random.AbstractRNG, 127 | model::AbstractMCMC.LogDensityModel, 128 | sampler::T, 129 | state; 130 | kwargs..., 131 | ) 132 | [...] 133 | return transition, sample 134 | end 135 | ``` 136 | 137 | There are several characteristics to note in these functions: 138 | 139 | - There must be two `step` functions: 140 | 141 | + A function that performs the first step and initializes the sampler. 142 | + A function that performs the following steps and takes an extra input, `state`, which carries the initialization information. 143 | 144 | - The functions must follow the displayed signatures. 145 | - The output of the functions must be a transition, the current state of the sampler, and a sample, what is saved to the MCMC chain. 146 | 147 | The last requirement is that the transition must be structured with a field `θ`, which contains the values of the parameters of the model for said transition. 148 | This allows `Turing` to seamlessly extract the parameter values at each step of the chain when bundling the chains. 149 | Note that if the external sampler produces transitions that Turing cannot parse, the bundling of the samples will be different or fail. 150 | 151 | For practical examples of how to adapt a sampling library to the `AbstractMCMC` interface, the readers can consult the following libraries: 152 | 153 | - [AdvancedMH](https://github.com/TuringLang/AdvancedMH.jl/blob/458a602ac32a8514a117d4c671396a9ba8acbdab/src/mh-core.jl#L73-L115) 154 | - [AdvancedHMC](https://github.com/TuringLang/AdvancedHMC.jl/blob/762e55f894d142495a41a6eba0eed9201da0a600/src/abstractmcmc.jl#L102-L170) 155 | - [MicroCanonicalHMC](https://github.com/JaimeRZP/MicroCanonicalHMC.jl/blob/master/src/abstractmcmc.jl) 156 | 157 | 158 | [^1]: Xu et al., [AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms](http://proceedings.mlr.press/v118/xu20a/xu20a.pdf), 2019 159 | [^2]: Zhang et al., [Pathfinder: Parallel quasi-Newton variational inference](https://arxiv.org/abs/2108.03782), 2021 160 | [^3]: Robnik et al, [Microcanonical Hamiltonian Monte Carlo](https://arxiv.org/abs/2212.08549), 2022 161 | [^4]: Robnik and Seljak, [Langevine Hamiltonian Monte Carlo](https://arxiv.org/abs/2303.18221), 2023 162 | -------------------------------------------------------------------------------- /usage/mode-estimation/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Mode Estimation 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/docs-17-mode-estimation/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | After defining a statistical model, in addition to sampling from its distributions, one may be interested in finding the parameter values that maximise for instance the posterior distribution density function or the likelihood. This is called mode estimation. Turing provides support for two mode estimation techniques, [maximum likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) and [maximum a posterior](https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation) (MAP) estimation. 16 | 17 | To demonstrate mode estimation, let us load Turing and declare a model: 18 | 19 | ```{julia} 20 | using Turing 21 | 22 | @model function gdemo(x) 23 | s² ~ InverseGamma(2, 3) 24 | m ~ Normal(0, sqrt(s²)) 25 | 26 | for i in eachindex(x) 27 | x[i] ~ Normal(m, sqrt(s²)) 28 | end 29 | end 30 | ``` 31 | 32 | Once the model is defined, we can construct a model instance as we normally would: 33 | 34 | ```{julia} 35 | # Instantiate the gdemo model with our data. 36 | data = [1.5, 2.0] 37 | model = gdemo(data) 38 | ``` 39 | 40 | Finding the maximum aposteriori or maximum likelihood parameters is as simple as 41 | 42 | ```{julia} 43 | # Generate a MLE estimate. 44 | mle_estimate = maximum_likelihood(model) 45 | 46 | # Generate a MAP estimate. 47 | map_estimate = maximum_a_posteriori(model) 48 | ``` 49 | 50 | The estimates are returned as instances of the `ModeResult` type. It has the fields `values` for the parameter values found and `lp` for the log probability at the optimum, as well as `f` for the objective function and `optim_result` for more detailed results of the optimisation procedure. 51 | 52 | ```{julia} 53 | @show mle_estimate.values 54 | @show mle_estimate.lp; 55 | ``` 56 | 57 | ## Controlling the optimisation process 58 | 59 | Under the hood `maximum_likelihood` and `maximum_a_posteriori` use the [Optimization.jl](https://github.com/SciML/Optimization.jl) package, which provides a unified interface to many other optimisation packages. By default Turing typically uses the [LBFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) method from [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) to find the mode estimate, but we can easily change that: 60 | 61 | ```{julia} 62 | using OptimizationOptimJL: NelderMead 63 | @show maximum_likelihood(model, NelderMead()) 64 | 65 | using OptimizationNLopt: NLopt.LD_TNEWTON_PRECOND_RESTART 66 | @show maximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART()); 67 | ``` 68 | 69 | The above are just two examples, Optimization.jl supports [many more](https://docs.sciml.ai/Optimization/stable/). 70 | 71 | We can also help the optimisation by giving it a starting point we know is close to the final solution, or by specifying an automatic differentiation method 72 | 73 | ```{julia} 74 | using ADTypes: AutoReverseDiff 75 | import ReverseDiff 76 | maximum_likelihood( 77 | model, NelderMead(); initial_params=[0.1, 2], adtype=AutoReverseDiff() 78 | ) 79 | ``` 80 | 81 | When providing values to arguments like `initial_params` the parameters are typically specified in the order in which they appear in the code of the model, so in this case first `s²` then `m`. More precisely it's the order returned by `Turing.Inference.getparams(model, Turing.VarInfo(model))`. 82 | 83 | We can also do constrained optimisation, by providing either intervals within which the parameters must stay, or costraint functions that they need to respect. For instance, here's how one can find the MLE with the constraint that the variance must be less than 0.01 and the mean must be between -1 and 1.: 84 | 85 | ```{julia} 86 | maximum_likelihood(model; lb=[0.0, -1.0], ub=[0.01, 1.0]) 87 | ``` 88 | 89 | The arguments for lower (`lb`) and upper (`ub`) bounds follow the arguments of `Optimization.OptimizationProblem`, as do other parameters for providing [constraints](https://docs.sciml.ai/Optimization/stable/tutorials/constraints/), such as `cons`. Any extraneous keyword arguments given to `maximum_likelihood` or `maximum_a_posteriori` are passed to `Optimization.solve`. Some often useful ones are `maxiters` for controlling the maximum number of iterations and `abstol` and `reltol` for the absolute and relative convergence tolerances: 90 | 91 | ```{julia} 92 | badly_converged_mle = maximum_likelihood( 93 | model, NelderMead(); maxiters=10, reltol=1e-9 94 | ) 95 | ``` 96 | 97 | We can check whether the optimisation converged using the `optim_result` field of the result: 98 | 99 | ```{julia} 100 | @show badly_converged_mle.optim_result; 101 | ``` 102 | 103 | For more details, such as a full list of possible arguments, we encourage the reader to read the docstring of the function `Turing.Optimisation.estimate_mode`, which is what `maximum_likelihood` and `maximum_a_posteriori` call, and the documentation of [Optimization.jl](https://docs.sciml.ai/Optimization/stable/). 104 | 105 | ## Analyzing your mode estimate 106 | 107 | Turing extends several methods from `StatsBase` that can be used to analyze your mode estimation results. Methods implemented include `vcov`, `informationmatrix`, `coeftable`, `params`, and `coef`, among others. 108 | 109 | For example, let's examine our ML estimate from above using `coeftable`: 110 | 111 | ```{julia} 112 | using StatsBase: coeftable 113 | coeftable(mle_estimate) 114 | ``` 115 | 116 | Standard errors are calculated from the Fisher information matrix (inverse Hessian of the log likelihood or log joint). Note that standard errors calculated in this way may not always be appropriate for MAP estimates, so please be cautious in interpreting them. 117 | 118 | ## Sampling with the MAP/MLE as initial states 119 | 120 | You can begin sampling your chain from an MLE/MAP estimate by extracting the vector of parameter values and providing it to the `sample` function with the keyword `initial_params`. For example, here is how to sample from the full posterior using the MAP estimate as the starting point: 121 | 122 | ```{julia} 123 | #| eval: false 124 | map_estimate = maximum_a_posteriori(model) 125 | chain = sample(model, NUTS(), 1_000; initial_params=map_estimate.values.array) 126 | ``` 127 | -------------------------------------------------------------------------------- /usage/modifying-logprob/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Modifying the Log Probability 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/usage-modifying-logprob/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | Turing accumulates log probabilities internally in an internal data structure that is accessible through the internal variable `__varinfo__` inside of the model definition. 16 | To avoid users having to deal with internal data structures, Turing provides the `Turing.@addlogprob!` macro which increases the accumulated log probability. 17 | For instance, this allows you to 18 | [include arbitrary terms in the likelihood](https://github.com/TuringLang/Turing.jl/issues/1332) 19 | 20 | ```{julia} 21 | using Turing 22 | 23 | myloglikelihood(x, μ) = loglikelihood(Normal(μ, 1), x) 24 | 25 | @model function demo(x) 26 | μ ~ Normal() 27 | Turing.@addlogprob! myloglikelihood(x, μ) 28 | end 29 | ``` 30 | 31 | and to force a sampler to [reject a sample](https://github.com/TuringLang/Turing.jl/issues/1328): 32 | 33 | ```{julia} 34 | using Turing 35 | using LinearAlgebra 36 | 37 | @model function demo(x) 38 | m ~ MvNormal(zero(x), I) 39 | if dot(m, x) < 0 40 | Turing.@addlogprob! -Inf 41 | # Exit the model evaluation early 42 | return nothing 43 | end 44 | 45 | x ~ MvNormal(m, I) 46 | return nothing 47 | end 48 | ``` 49 | 50 | Note that `@addlogprob!` always increases the accumulated log probability, regardless of the provided 51 | sampling context. 52 | For instance, if you do not want to apply `Turing.@addlogprob!` when evaluating the prior of your model but only when computing the log likelihood and the log joint probability, then you should [check the type of the internal variable `__context_`](https://github.com/TuringLang/DynamicPPL.jl/issues/154), as in the following example: 53 | 54 | ```{julia} 55 | #| eval: false 56 | if DynamicPPL.leafcontext(__context__) !== Turing.PriorContext() 57 | Turing.@addlogprob! myloglikelihood(x, μ) 58 | end 59 | ``` 60 | -------------------------------------------------------------------------------- /usage/performance-tips/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Performance Tips 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/docs-13-using-turing-performance-tips/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | This section briefly summarises a few common techniques to ensure good performance when using Turing. 16 | We refer to [the Julia documentation](https://docs.julialang.org/en/v1/manual/performance-tips/index.html) for general techniques to ensure good performance of Julia programs. 17 | 18 | ## Use multivariate distributions 19 | 20 | It is generally preferable to use multivariate distributions if possible. 21 | 22 | The following example: 23 | 24 | ```{julia} 25 | using Turing 26 | @model function gmodel(x) 27 | m ~ Normal() 28 | for i in 1:length(x) 29 | x[i] ~ Normal(m, 0.2) 30 | end 31 | end 32 | ``` 33 | 34 | can be directly expressed more efficiently using a simple transformation: 35 | 36 | ```{julia} 37 | using FillArrays 38 | 39 | @model function gmodel(x) 40 | m ~ Normal() 41 | return x ~ MvNormal(Fill(m, length(x)), 0.04 * I) 42 | end 43 | ``` 44 | 45 | ## Choose your AD backend 46 | 47 | Automatic differentiation (AD) makes it possible to use modern, efficient gradient-based samplers like NUTS and HMC, and that means a good AD system is incredibly important. Turing currently 48 | supports several AD backends, including [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) (the default), 49 | [Mooncake](https://github.com/compintell/Mooncake.jl), 50 | [Zygote](https://github.com/FluxML/Zygote.jl), and 51 | [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl). 52 | 53 | For many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you need more speed, you can try 54 | different backends via the standard [ADTypes](https://github.com/SciML/ADTypes.jl) interface by passing an `AbstractADType` to the sampler with the optional `adtype` argument, e.g. 55 | `NUTS(adtype = AutoZygote())`. See [Automatic Differentiation]({{}}) for details. Generally, `adtype = AutoForwardDiff()` is likely to be the fastest and most reliable for models with 56 | few parameters (say, less than 20 or so), while reverse-mode backends such as `AutoZygote()` or `AutoReverseDiff()` will perform better for models with many parameters or linear algebra 57 | operations. If in doubt, it's easy to try a few different backends to see how they compare. 58 | 59 | ### Special care for Zygote 60 | 61 | Note that Zygote will not perform well if your model contains `for`-loops, due to the way reverse-mode AD is implemented in these packages. Zygote also cannot differentiate code 62 | that contains mutating operations. If you can't implement your model without `for`-loops or mutation, `ReverseDiff` will be a better, more performant option. In general, though, 63 | vectorized operations are still likely to perform best. 64 | 65 | Avoiding loops can be done using `filldist(dist, N)` and `arraydist(dists)`. `filldist(dist, N)` creates a multivariate distribution that is composed of `N` identical and independent 66 | copies of the univariate distribution `dist` if `dist` is univariate, or it creates a matrix-variate distribution composed of `N` identical and independent copies of the multivariate 67 | distribution `dist` if `dist` is multivariate. `filldist(dist, N, M)` can also be used to create a matrix-variate distribution from a univariate distribution `dist`. `arraydist(dists)` 68 | is similar to `filldist` but it takes an array of distributions `dists` as input. Writing a [custom distribution](advanced) with a custom adjoint is another option to avoid loops. 69 | 70 | ### Special care for ReverseDiff with a compiled tape 71 | 72 | For large models, the fastest option is often ReverseDiff with a compiled tape, specified as `adtype=AutoReverseDiff(true)`. However, it is important to note that if your model contains any 73 | branching code, such as `if`-`else` statements, **the gradients from a compiled tape may be inaccurate, leading to erroneous results**. If you use this option for the (considerable) speedup it 74 | can provide, make sure to check your code. It's also a good idea to verify your gradients with another backend. 75 | 76 | ## Ensure that types in your model can be inferred 77 | 78 | For efficient gradient-based inference, e.g. using HMC, NUTS or ADVI, it is important to ensure the types in your model can be inferred. 79 | 80 | The following example with abstract types 81 | 82 | ```{julia} 83 | @model function tmodel(x, y) 84 | p, n = size(x) 85 | params = Vector{Real}(undef, n) 86 | for i in 1:n 87 | params[i] ~ truncated(Normal(); lower=0) 88 | end 89 | 90 | a = x * params 91 | return y ~ MvNormal(a, I) 92 | end 93 | ``` 94 | 95 | can be transformed into the following representation with concrete types: 96 | 97 | ```{julia} 98 | @model function tmodel(x, y, ::Type{T}=Float64) where {T} 99 | p, n = size(x) 100 | params = Vector{T}(undef, n) 101 | for i in 1:n 102 | params[i] ~ truncated(Normal(); lower=0) 103 | end 104 | 105 | a = x * params 106 | return y ~ MvNormal(a, I) 107 | end 108 | ``` 109 | 110 | Alternatively, you could use `filldist` in this example: 111 | 112 | ```{julia} 113 | @model function tmodel(x, y) 114 | params ~ filldist(truncated(Normal(); lower=0), size(x, 2)) 115 | a = x * params 116 | return y ~ MvNormal(a, I) 117 | end 118 | ``` 119 | 120 | Note that you can use `@code_warntype` to find types in your model definition that the compiler cannot infer. 121 | They are marked in red in the Julia REPL. 122 | 123 | For example, consider the following simple program: 124 | 125 | ```{julia} 126 | @model function tmodel(x) 127 | p = Vector{Real}(undef, 1) 128 | p[1] ~ Normal() 129 | p = p .+ 1 130 | return x ~ Normal(p[1]) 131 | end 132 | ``` 133 | 134 | We can use 135 | 136 | ```{julia} 137 | #| eval: false 138 | using Random 139 | 140 | model = tmodel(1.0) 141 | 142 | @code_warntype model.f( 143 | model, 144 | Turing.VarInfo(model), 145 | Turing.SamplingContext( 146 | Random.default_rng(), Turing.SampleFromPrior(), Turing.DefaultContext() 147 | ), 148 | model.args..., 149 | ) 150 | ``` 151 | 152 | to inspect type inference in the model. 153 | -------------------------------------------------------------------------------- /usage/probability-interface/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Querying Model Probabilities 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/usage-probability-interface/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | The easiest way to manipulate and query Turing models is via the DynamicPPL probability interface. 16 | 17 | Let's use a simple model of normally-distributed data as an example. 18 | 19 | ```{julia} 20 | using Turing 21 | using DynamicPPL 22 | using Random 23 | 24 | @model function gdemo(n) 25 | μ ~ Normal(0, 1) 26 | x ~ MvNormal(fill(μ, n), I) 27 | end 28 | ``` 29 | 30 | We generate some data using `μ = 0`: 31 | 32 | ```{julia} 33 | Random.seed!(1776) 34 | dataset = randn(100) 35 | dataset[1:5] 36 | ``` 37 | 38 | ## Conditioning and Deconditioning 39 | 40 | Bayesian models can be transformed with two main operations, conditioning and deconditioning (also known as marginalization). 41 | Conditioning takes a variable and fixes its value as known. 42 | We do this by passing a model and a collection of conditioned variables to `|`, or its alias, `condition`: 43 | 44 | ```{julia} 45 | # (equivalently) 46 | # conditioned_model = condition(gdemo(length(dataset)), (x=dataset, μ=0)) 47 | conditioned_model = gdemo(length(dataset)) | (x=dataset, μ=0) 48 | ``` 49 | 50 | This operation can be reversed by applying `decondition`: 51 | 52 | ```{julia} 53 | original_model = decondition(conditioned_model) 54 | ``` 55 | 56 | We can also decondition only some of the variables: 57 | 58 | ```{julia} 59 | partially_conditioned = decondition(conditioned_model, :μ) 60 | ``` 61 | 62 | We can see which of the variables in a model have been conditioned with `DynamicPPL.conditioned`: 63 | 64 | ```{julia} 65 | DynamicPPL.conditioned(partially_conditioned) 66 | ``` 67 | 68 | ::: {.callout-note} 69 | Sometimes it is helpful to define convenience functions for conditioning on some variable(s). 70 | For instance, in this example we might want to define a version of `gdemo` that conditions on some observations of `x`: 71 | 72 | ```julia 73 | gdemo(x::AbstractVector{<:Real}) = gdemo(length(x)) | (; x) 74 | ``` 75 | 76 | For illustrative purposes, however, we do not use this function in the examples below. 77 | ::: 78 | 79 | ## Probabilities and Densities 80 | 81 | We often want to calculate the (unnormalized) probability density for an event. 82 | This probability might be a prior, a likelihood, or a posterior (joint) density. 83 | DynamicPPL provides convenient functions for this. 84 | To begin, let's define a model `gdemo`, condition it on a dataset, and draw a sample. 85 | The returned sample only contains `μ`, since the value of `x` has already been fixed: 86 | 87 | ```{julia} 88 | model = gdemo(length(dataset)) | (x=dataset,) 89 | 90 | Random.seed!(124) 91 | sample = rand(model) 92 | ``` 93 | 94 | We can then calculate the joint probability of a set of samples (here drawn from the prior) with `logjoint`. 95 | 96 | ```{julia} 97 | logjoint(model, sample) 98 | ``` 99 | 100 | For models with many variables `rand(model)` can be prohibitively slow since it returns a `NamedTuple` of samples from the prior distribution of the unconditioned variables. 101 | We recommend working with samples of type `DataStructures.OrderedDict` in this case (which Turing re-exports, so can be used directly): 102 | 103 | ```{julia} 104 | Random.seed!(124) 105 | sample_dict = rand(OrderedDict, model) 106 | ``` 107 | 108 | `logjoint` can also be used on this sample: 109 | 110 | ```{julia} 111 | logjoint(model, sample_dict) 112 | ``` 113 | 114 | The prior probability and the likelihood of a set of samples can be calculated with the functions `logprior` and `loglikelihood` respectively. 115 | The log joint probability is the sum of these two quantities: 116 | 117 | ```{julia} 118 | logjoint(model, sample) ≈ loglikelihood(model, sample) + logprior(model, sample) 119 | ``` 120 | 121 | ```{julia} 122 | logjoint(model, sample_dict) ≈ loglikelihood(model, sample_dict) + logprior(model, sample_dict) 123 | ``` 124 | 125 | ## Example: Cross-validation 126 | 127 | To give an example of the probability interface in use, we can use it to estimate the performance of our model using cross-validation. 128 | In cross-validation, we split the dataset into several equal parts. 129 | Then, we choose one of these sets to serve as the validation set. 130 | Here, we measure fit using the cross entropy (Bayes loss).[^1] 131 | (For the sake of simplicity, in the following code, we enforce that `nfolds` must divide the number of data points. 132 | For a more competent implementation, see [MLUtils.jl](https://juliaml.github.io/MLUtils.jl/dev/api/#MLUtils.kfolds).) 133 | 134 | ```{julia} 135 | # Calculate the train/validation splits across `nfolds` partitions, assume `length(dataset)` divides `nfolds` 136 | function kfolds(dataset::Array{<:Real}, nfolds::Int) 137 | fold_size, remaining = divrem(length(dataset), nfolds) 138 | if remaining != 0 139 | error("The number of folds must divide the number of data points.") 140 | end 141 | first_idx = firstindex(dataset) 142 | last_idx = lastindex(dataset) 143 | splits = map(0:(nfolds - 1)) do i 144 | start_idx = first_idx + i * fold_size 145 | end_idx = start_idx + fold_size 146 | train_set_indices = [first_idx:(start_idx - 1); end_idx:last_idx] 147 | return (view(dataset, train_set_indices), view(dataset, start_idx:(end_idx - 1))) 148 | end 149 | return splits 150 | end 151 | 152 | function cross_val( 153 | dataset::Vector{<:Real}; 154 | nfolds::Int=5, 155 | nsamples::Int=1_000, 156 | rng::Random.AbstractRNG=Random.default_rng(), 157 | ) 158 | # Initialize `loss` in a way such that the loop below does not change its type 159 | model = gdemo(1) | (x=[first(dataset)],) 160 | loss = zero(logjoint(model, rand(rng, model))) 161 | 162 | for (train, validation) in kfolds(dataset, nfolds) 163 | # First, we train the model on the training set, i.e., we obtain samples from the posterior. 164 | # For normally-distributed data, the posterior can be computed in closed form. 165 | # For general models, however, typically samples will be generated using MCMC with Turing. 166 | posterior = Normal(mean(train), 1) 167 | samples = rand(rng, posterior, nsamples) 168 | 169 | # Evaluation on the validation set. 170 | validation_model = gdemo(length(validation)) | (x=validation,) 171 | loss += sum(samples) do sample 172 | logjoint(validation_model, (μ=sample,)) 173 | end 174 | end 175 | 176 | return loss 177 | end 178 | 179 | cross_val(dataset) 180 | ``` 181 | 182 | [^1]: See [ParetoSmooth.jl](https://github.com/TuringLang/ParetoSmooth.jl) for a faster and more accurate implementation of cross-validation than the one provided here. 183 | -------------------------------------------------------------------------------- /usage/sampler-visualisation/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Sampler Visualization 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/docs-15-using-turing-sampler-viz/index.html 6 | --- 7 | 8 | ```{julia} 9 | #| echo: false 10 | #| output: false 11 | using Pkg; 12 | Pkg.instantiate(); 13 | ``` 14 | 15 | ## Introduction 16 | 17 | ### The Code 18 | 19 | For each sampler, we will use the same code to plot sampler paths. The block below loads the relevant libraries and defines a function for plotting the sampler's trajectory across the posterior. 20 | 21 | The Turing model definition used here is not especially practical, but it is designed in such a way as to produce visually interesting posterior surfaces to show how different samplers move along the distribution. 22 | 23 | ```{julia} 24 | ENV["GKS_ENCODING"] = "utf-8" # Allows the use of unicode characters in Plots.jl 25 | using Plots 26 | using StatsPlots 27 | using Turing 28 | using Random 29 | using Bijectors 30 | 31 | # Set a seed. 32 | Random.seed!(0) 33 | 34 | # Define a strange model. 35 | @model function gdemo(x) 36 | s² ~ InverseGamma(2, 3) 37 | m ~ Normal(0, sqrt(s²)) 38 | bumps = sin(m) + cos(m) 39 | m = m + 5 * bumps 40 | for i in eachindex(x) 41 | x[i] ~ Normal(m, sqrt(s²)) 42 | end 43 | return s², m 44 | end 45 | 46 | # Define our data points. 47 | x = [1.5, 2.0, 13.0, 2.1, 0.0] 48 | 49 | # Set up the model call, sample from the prior. 50 | model = gdemo(x) 51 | 52 | # Evaluate surface at coordinates. 53 | evaluate(m1, m2) = logjoint(model, (m=m2, s²=invlink.(Ref(InverseGamma(2, 3)), m1))) 54 | 55 | function plot_sampler(chain; label="") 56 | # Extract values from chain. 57 | val = get(chain, [:s², :m, :lp]) 58 | ss = link.(Ref(InverseGamma(2, 3)), val.s²) 59 | ms = val.m 60 | lps = val.lp 61 | 62 | # How many surface points to sample. 63 | granularity = 100 64 | 65 | # Range start/stop points. 66 | spread = 0.5 67 | σ_start = minimum(ss) - spread * std(ss) 68 | σ_stop = maximum(ss) + spread * std(ss) 69 | μ_start = minimum(ms) - spread * std(ms) 70 | μ_stop = maximum(ms) + spread * std(ms) 71 | σ_rng = collect(range(σ_start; stop=σ_stop, length=granularity)) 72 | μ_rng = collect(range(μ_start; stop=μ_stop, length=granularity)) 73 | 74 | # Make surface plot. 75 | p = surface( 76 | σ_rng, 77 | μ_rng, 78 | evaluate; 79 | camera=(30, 65), 80 | # ticks=nothing, 81 | colorbar=false, 82 | color=:inferno, 83 | title=label, 84 | ) 85 | 86 | line_range = 1:length(ms) 87 | 88 | scatter3d!( 89 | ss[line_range], 90 | ms[line_range], 91 | lps[line_range]; 92 | mc=:viridis, 93 | marker_z=collect(line_range), 94 | msw=0, 95 | legend=false, 96 | colorbar=false, 97 | alpha=0.5, 98 | xlabel="σ", 99 | ylabel="μ", 100 | zlabel="Log probability", 101 | title=label, 102 | ) 103 | 104 | return p 105 | end; 106 | ``` 107 | 108 | ```{julia} 109 | #| output: false 110 | setprogress!(false) 111 | ``` 112 | 113 | ## Samplers 114 | 115 | ### Gibbs 116 | 117 | Gibbs sampling tends to exhibit a "jittery" trajectory. The example below combines `HMC` and `PG` sampling to traverse the posterior. 118 | 119 | ```{julia} 120 | c = sample(model, Gibbs(:s² => HMC(0.01, 5), :m => PG(20)), 1000) 121 | plot_sampler(c) 122 | ``` 123 | 124 | ### HMC 125 | 126 | Hamiltonian Monte Carlo (HMC) sampling is a typical sampler to use, as it tends to be fairly good at converging in a efficient manner. It can often be tricky to set the correct parameters for this sampler however, and the `NUTS` sampler is often easier to run if you don't want to spend too much time fiddling with step size and and the number of steps to take. Note however that `HMC` does not explore the positive values μ very well, likely due to the leapfrog and step size parameter settings. 127 | 128 | ```{julia} 129 | c = sample(model, HMC(0.01, 10), 1000) 130 | plot_sampler(c) 131 | ``` 132 | 133 | ### HMCDA 134 | 135 | The HMCDA sampler is an implementation of the Hamiltonian Monte Carlo with Dual Averaging algorithm found in the paper "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo" by Hoffman and Gelman (2011). The paper can be found on [arXiv](https://arxiv.org/abs/1111.4246) for the interested reader. 136 | 137 | ```{julia} 138 | c = sample(model, HMCDA(200, 0.65, 0.3), 1000) 139 | plot_sampler(c) 140 | ``` 141 | 142 | ### MH 143 | 144 | Metropolis-Hastings (MH) sampling is one of the earliest Markov Chain Monte Carlo methods. MH sampling does not "move" a lot, unlike many of the other samplers implemented in Turing. Typically a much longer chain is required to converge to an appropriate parameter estimate. 145 | 146 | The plot below only uses 1,000 iterations of Metropolis-Hastings. 147 | 148 | ```{julia} 149 | c = sample(model, MH(), 1000) 150 | plot_sampler(c) 151 | ``` 152 | 153 | As you can see, the MH sampler doesn't move parameter estimates very often. 154 | 155 | ### NUTS 156 | 157 | The No U-Turn Sampler (NUTS) is an implementation of the algorithm found in the paper "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo" by Hoffman and Gelman (2011). The paper can be found on [arXiv](https://arxiv.org/abs/1111.4246) for the interested reader. 158 | 159 | NUTS tends to be very good at traversing complex posteriors quickly. 160 | 161 | 162 | ```{julia} 163 | c = sample(model, NUTS(0.65), 1000) 164 | plot_sampler(c) 165 | ``` 166 | 167 | The only parameter that needs to be set other than the number of iterations to run is the target acceptance rate. In the Hoffman and Gelman paper, they note that a target acceptance rate of 0.65 is typical. 168 | 169 | Here is a plot showing a very high acceptance rate. Note that it appears to "stick" to a mode and is not particularly good at exploring the posterior as compared to the 0.65 target acceptance ratio case. 170 | 171 | ```{julia} 172 | c = sample(model, NUTS(0.95), 1000) 173 | plot_sampler(c) 174 | ``` 175 | 176 | An exceptionally low acceptance rate will show very few moves on the posterior: 177 | 178 | ```{julia} 179 | c = sample(model, NUTS(0.2), 1000) 180 | plot_sampler(c) 181 | ``` 182 | 183 | ### PG 184 | 185 | The Particle Gibbs (PG) sampler is an implementation of an algorithm from the paper "Particle Markov chain Monte Carlo methods" by Andrieu, Doucet, and Holenstein (2010). The interested reader can learn more [here](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2009.00736.x). 186 | 187 | The two parameters are the number of particles, and the number of iterations. The plot below shows the use of 20 particles. 188 | 189 | ```{julia} 190 | c = sample(model, PG(20), 1000) 191 | plot_sampler(c) 192 | ``` 193 | 194 | Next, we plot using 50 particles. 195 | 196 | ```{julia} 197 | c = sample(model, PG(50), 1000) 198 | plot_sampler(c) 199 | ``` 200 | -------------------------------------------------------------------------------- /usage/tracking-extra-quantities/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Tracking Extra Quantities 3 | engine: julia 4 | aliases: 5 | - ../../tutorials/usage-generated-quantities/index.html 6 | - ../generated-quantities/index.html 7 | --- 8 | 9 | ```{julia} 10 | #| echo: false 11 | #| output: false 12 | using Pkg; 13 | Pkg.instantiate(); 14 | ``` 15 | 16 | Often, there are quantities in models that we might be interested in viewing the values of, but which are not random variables in the model that are explicitly drawn from a distribution. 17 | 18 | As a motivating example, the most natural parameterization for a model might not be the most computationally feasible. 19 | Consider the following (efficiently reparametrized) implementation of Neal's funnel [(Neal, 2003)](https://arxiv.org/abs/physics/0009028): 20 | 21 | ```{julia} 22 | using Turing 23 | setprogress!(false) 24 | 25 | @model function Neal() 26 | # Raw draws 27 | y_raw ~ Normal(0, 1) 28 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9]) 29 | 30 | # Transform: 31 | y = 3 * y_raw 32 | x = exp.(y ./ 2) .* x_raw 33 | return nothing 34 | end 35 | ``` 36 | 37 | In this case, the random variables exposed in the chain (`x_raw`, `y_raw`) are not in a helpful form — what we're after are the deterministically transformed variables `x` and `y`. 38 | 39 | There are two ways to track these extra quantities in Turing.jl. 40 | 41 | ## Using `:=` (during inference) 42 | 43 | The first way is to use the `:=` operator, which behaves exactly like `=` except that the values of the variables on its left-hand side are automatically added to the chain returned by the sampler. 44 | For example: 45 | 46 | ```{julia} 47 | @model function Neal_coloneq() 48 | # Raw draws 49 | y_raw ~ Normal(0, 1) 50 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9]) 51 | 52 | # Transform: 53 | y := 3 * y_raw 54 | x := exp.(y ./ 2) .* x_raw 55 | end 56 | 57 | sample(Neal_coloneq(), NUTS(), 1000) 58 | ``` 59 | 60 | ## Using `returned` (post-inference) 61 | 62 | Alternatively, one can specify the extra quantities as part of the model function's return statement: 63 | 64 | ```{julia} 65 | @model function Neal_return() 66 | # Raw draws 67 | y_raw ~ Normal(0, 1) 68 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9]) 69 | 70 | # Transform and return as a NamedTuple 71 | y = 3 * y_raw 72 | x = exp.(y ./ 2) .* x_raw 73 | return (x=x, y=y) 74 | end 75 | 76 | chain = sample(Neal_return(), NUTS(), 1000) 77 | ``` 78 | 79 | The sampled chain does not contain `x` and `y`, but we can extract the values using the `returned` function. 80 | Calling this function outputs an array: 81 | 82 | ```{julia} 83 | nts = returned(Neal_return(), chain) 84 | ``` 85 | 86 | where each element of which is a NamedTuple, as specified in the return statement of the model. 87 | 88 | ```{julia} 89 | nts[1] 90 | ``` 91 | 92 | ## Which to use? 93 | 94 | There are some pros and cons of using `returned`, as opposed to `:=`. 95 | 96 | Firstly, `returned` is more flexible, as it allows you to track any type of object; `:=` only works with variables that can be inserted into an `MCMCChains.Chains` object. 97 | (Notice that `x` is a vector, and in the first case where we used `:=`, reconstructing the vector value of `x` can also be rather annoying as the chain stores each individual element of `x` separately.) 98 | 99 | A drawback is that naively using `returned` can lead to unnecessary computation during inference. 100 | This is because during the sampling process, the return values are also calculated (since they are part of the model function), but then thrown away. 101 | So, if the extra quantities are expensive to compute, this can be a problem. 102 | 103 | To avoid this, you will essentially have to create two different models, one for inference and one for post-inference. 104 | The simplest way of doing this is to add a parameter to the model argument: 105 | 106 | ```{julia} 107 | @model function Neal_coloneq_optional(track::Bool) 108 | # Raw draws 109 | y_raw ~ Normal(0, 1) 110 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9]) 111 | 112 | if track 113 | y = 3 * y_raw 114 | x = exp.(y ./ 2) .* x_raw 115 | return (x=x, y=y) 116 | else 117 | return nothing 118 | end 119 | end 120 | 121 | chain = sample(Neal_coloneq_optional(false), NUTS(), 1000) 122 | ``` 123 | 124 | The above ensures that `x` and `y` are not calculated during inference, but allows us to still use `returned` to extract them: 125 | 126 | ```{julia} 127 | returned(Neal_coloneq_optional(true), chain) 128 | ``` 129 | 130 | Another equivalent option is to use a submodel: 131 | 132 | ```{julia} 133 | @model function Neal() 134 | y_raw ~ Normal(0, 1) 135 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9]) 136 | return (x_raw=x_raw, y_raw=y_raw) 137 | end 138 | 139 | chain = sample(Neal(), NUTS(), 1000) 140 | 141 | @model function Neal_with_extras() 142 | neal ~ to_submodel(Neal(), false) 143 | y = 3 * neal.y_raw 144 | x = exp.(y ./ 2) .* neal.x_raw 145 | return (x=x, y=y) 146 | end 147 | 148 | returned(Neal_with_extras(), chain) 149 | ``` 150 | 151 | Note that for the `returned` call to work, the `Neal_with_extras()` model must have the same variable names as stored in `chain`. 152 | This means the submodel `Neal()` must not be prefixed, i.e. `to_submodel()` must be passed a second parameter `false`. 153 | -------------------------------------------------------------------------------- /usage/troubleshooting/index.qmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: Troubleshooting 3 | engine: julia 4 | --- 5 | 6 | ```{julia} 7 | #| echo: false 8 | #| output: false 9 | using Pkg; 10 | Pkg.instantiate(); 11 | ``` 12 | 13 | This page collects a number of common error messages observed when using Turing, along with suggestions on how to fix them. 14 | 15 | If the suggestions here do not resolve your problem, please do feel free to [open an issue](https://github.com/TuringLang/Turing.jl/issues). 16 | 17 | ```{julia} 18 | using Turing 19 | Turing.setprogress!(false) 20 | ``` 21 | 22 | ## Initial parameters 23 | 24 | > failed to find valid initial parameters in {N} tries. This may indicate an error with the model or AD backend... 25 | 26 | This error is seen when a Hamiltonian Monte Carlo sampler is unable to determine a valid set of initial parameters for the sampling. 27 | Here, 'valid' means that the log probability density of the model, as well as its gradient with respect to each parameter, is finite and not `NaN`. 28 | 29 | ### `NaN` gradient 30 | 31 | One of the most common causes of this error is having a `NaN` gradient. 32 | To find out whether this is happening, you can evaluate the gradient manually. 33 | Here is an example with a model that is known to be problematic: 34 | 35 | ```{julia} 36 | using Turing 37 | using DynamicPPL.TestUtils.AD: run_ad 38 | 39 | @model function initial_bad() 40 | a ~ Normal() 41 | x ~ truncated(Normal(a), 0, Inf) 42 | end 43 | 44 | model = initial_bad() 45 | adtype = AutoForwardDiff() 46 | result = run_ad(model, adtype; test=false, benchmark=false) 47 | result.grad_actual 48 | ``` 49 | 50 | (See [the DynamicPPL docs](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities) for more details on the `run_ad` function and its return type.) 51 | 52 | In this case, the `NaN` gradient is caused by the `Inf` argument to `truncated`. 53 | (See, e.g., [this issue on Distributions.jl](https://github.com/JuliaStats/Distributions.jl/issues/1910).) 54 | Here, the upper bound of `Inf` is not needed, so it can be removed: 55 | 56 | ```{julia} 57 | @model function initial_good() 58 | a ~ Normal() 59 | x ~ truncated(Normal(a); lower=0) 60 | end 61 | 62 | model = initial_good() 63 | adtype = AutoForwardDiff() 64 | run_ad(model, adtype; test=false, benchmark=false).grad_actual 65 | ``` 66 | 67 | More generally, you could try using a different AD backend; if you don't know why a model is returning `NaN` gradients, feel free to open an issue. 68 | 69 | ### `-Inf` log density 70 | 71 | Another cause of this error is having models with very extreme parameters. 72 | This example is taken from [this Turing.jl issue](https://github.com/TuringLang/Turing.jl/issues/2476): 73 | 74 | ```{julia} 75 | @model function initial_bad2() 76 | x ~ Exponential(100) 77 | y ~ Uniform(0, x) 78 | end 79 | model = initial_bad2() | (y = 50.0,) 80 | ``` 81 | 82 | The problem here is that HMC attempts to find initial values for parameters inside the region of `[-2, 2]`, _after_ the parameters have been transformed to unconstrained space. 83 | For a distribution of `Exponential(100)`, the appropriate transformation is `log(x)` (see the [variable transformation docs]({{< meta dev-transforms-distributions >}}) for more info). 84 | 85 | Thus, HMC attempts to find initial values of `log(x)` in the region of `[-2, 2]`, which corresponds to `x` in the region of `[exp(-2), exp(2)]` = `[0.135, 7.39]`. 86 | However, all of these values of `x` will give rise to a zero probability density for `y` because the value of `y = 50.0` is outside the support of `Uniform(0, x)`. 87 | Thus, the log density of the model is `-Inf`, as can be seen with `logjoint`: 88 | 89 | ```{julia} 90 | logjoint(model, (x = exp(-2),)) 91 | ``` 92 | 93 | ```{julia} 94 | logjoint(model, (x = exp(2),)) 95 | ``` 96 | 97 | The most direct way of fixing this is to manually provide a set of initial parameters that are valid. 98 | For example, you can obtain a set of initial parameters with `rand(Vector, model)`, and then pass this as the `initial_params` keyword argument to `sample`: 99 | 100 | ```{julia} 101 | sample(model, NUTS(), 1000; initial_params=rand(Vector, model)) 102 | ``` 103 | 104 | More generally, you may also consider reparameterising the model to avoid such issues. 105 | --------------------------------------------------------------------------------