├── .gitattributes
├── .github
└── workflows
│ ├── create_release.yml
│ ├── preview.yml
│ ├── publish.yml
│ ├── remove_preview.yml
│ ├── resolve_manifest.yml
│ ├── version_check.jl
│ └── version_check.yml
├── .gitignore
├── 404.qmd
├── LICENSE
├── Manifest.toml
├── Project.toml
├── README.md
├── _quarto.yml
├── assets
├── favicon.ico
├── images
│ ├── turing-logo-wide.svg
│ └── turing-logo.svg
└── scripts
│ ├── changelog.sh
│ └── versions.sh
├── core-functionality
└── index.qmd
├── developers
├── compiler
│ ├── design-overview
│ │ └── index.qmd
│ ├── minituring-compiler
│ │ └── index.qmd
│ ├── minituring-contexts
│ │ └── index.qmd
│ └── model-manual
│ │ └── index.qmd
├── contexts
│ └── submodel-condition
│ │ └── index.qmd
├── contributing
│ └── index.qmd
├── inference
│ ├── abstractmcmc-interface
│ │ └── index.qmd
│ ├── abstractmcmc-turing
│ │ └── index.qmd
│ ├── implementing-samplers
│ │ └── index.qmd
│ └── variational-inference
│ │ └── index.qmd
└── transforms
│ ├── bijectors
│ └── index.qmd
│ ├── distributions
│ └── index.qmd
│ └── dynamicppl
│ ├── dynamicppl_link.png
│ ├── dynamicppl_link2.png
│ └── index.qmd
├── getting-started
└── index.qmd
├── theming
├── styles.css
└── theme-dark.scss
├── tutorials
├── bayesian-differential-equations
│ └── index.qmd
├── bayesian-linear-regression
│ └── index.qmd
├── bayesian-logistic-regression
│ └── index.qmd
├── bayesian-neural-networks
│ └── index.qmd
├── bayesian-poisson-regression
│ └── index.qmd
├── bayesian-time-series-analysis
│ └── index.qmd
├── coin-flipping
│ └── index.qmd
├── gaussian-mixture-models
│ └── index.qmd
├── gaussian-process-latent-variable-models
│ └── index.qmd
├── gaussian-processes-introduction
│ ├── golf.dat
│ └── index.qmd
├── hidden-markov-models
│ └── index.qmd
├── infinite-mixture-models
│ └── index.qmd
├── multinomial-logistic-regression
│ └── index.qmd
├── probabilistic-pca
│ └── index.qmd
└── variational-inference
│ └── index.qmd
└── usage
├── automatic-differentiation
└── index.qmd
├── custom-distribution
└── index.qmd
├── dynamichmc
└── index.qmd
├── external-samplers
└── index.qmd
├── mode-estimation
└── index.qmd
├── modifying-logprob
└── index.qmd
├── performance-tips
└── index.qmd
├── probability-interface
└── index.qmd
├── sampler-visualisation
└── index.qmd
├── tracking-extra-quantities
└── index.qmd
└── troubleshooting
└── index.qmd
/.gitattributes:
--------------------------------------------------------------------------------
1 | ################
2 | # Line endings #
3 | ################
4 | * text=auto
5 |
6 | ###################
7 | # GitHub Linguist #
8 | ###################
9 | *.qmd linguist-detectable
10 | *.qmd linguist-language=Markdown
11 |
--------------------------------------------------------------------------------
/.github/workflows/create_release.yml:
--------------------------------------------------------------------------------
1 | name: Create release with _freeze
2 |
3 | on:
4 | workflow_dispatch:
5 |
6 | permissions:
7 | contents: write
8 |
9 | jobs:
10 | build:
11 | runs-on: ubuntu-latest
12 | steps:
13 | - name: Checkout
14 | uses: actions/checkout@v4
15 |
16 | - name: Setup Julia
17 | uses: julia-actions/setup-julia@v2
18 | with:
19 | version: '1.11'
20 |
21 | - name: Load Julia packages from cache
22 | uses: julia-actions/cache@v2
23 |
24 | - name: Set up Quarto
25 | uses: quarto-dev/quarto-actions/setup@v2
26 | with:
27 | # Needs Quarto 1.6 (which is currently a pre-release version) to fix #533
28 | version: pre-release
29 |
30 | - name: Restore cached _freeze folder
31 | id: cache-restore
32 | uses: actions/cache/restore@v4
33 | with:
34 | path: |
35 | ./_freeze/
36 | key: |
37 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }}
38 | restore-keys: |
39 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}
40 |
41 | - name: Render
42 | run: quarto render
43 |
44 | - name: Compress _freeze folder
45 | run: tar -czf _freeze.tar.gz _freeze
46 |
47 | - name: Generate tag name for release
48 | id: tag
49 | run: echo "tag_name=freeze_$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
50 |
51 | - name: Create GitHub release
52 | uses: softprops/action-gh-release@v2
53 | with:
54 | tag_name: ${{ steps.tag.outputs.tag_name }}
55 | files: |
56 | _freeze.tar.gz
57 | Manifest.toml
58 | body: |
59 | This release contains the `_freeze` folder generated by Quarto when
60 | rendering the docs. You can use this to speed up the rendering
61 | process on your local machine by downloading and extracting the
62 | `_freeze` folder, then placing it at the root of the project.
63 |
64 | Note that the contents of the `_freeze` folder only hash the
65 | contents of the .qmd files, and do not include information about
66 | the Julia environment. Thus, each `_freeze` folder is only valid
67 | for a given Julia environment, which is specified in the
68 | Manifest.toml file included in this release. To ensure
69 | reproducibility, you should make sure to use the Manifest.toml file
70 | locally as well.
71 |
72 | These releases are not automatically generated. To make an updated
73 | release with the contents of the `_freeze` folder from the main
74 | branch, you can run the `Create release with _freeze` workflow from
75 | https://github.com/TuringLang/docs/actions/workflows/create_release.yml.
76 |
--------------------------------------------------------------------------------
/.github/workflows/preview.yml:
--------------------------------------------------------------------------------
1 | name: PR Preview Workflow
2 |
3 | on:
4 | pull_request:
5 | types:
6 | - opened
7 | - synchronize
8 |
9 | concurrency:
10 | group: docs
11 |
12 | permissions:
13 | contents: write
14 | pull-requests: write
15 |
16 | jobs:
17 | build-and-preview:
18 | if: github.event.action == 'opened' || github.event.action == 'synchronize'
19 | runs-on: ubuntu-latest
20 | steps:
21 | - name: Checkout
22 | uses: actions/checkout@v4
23 | with:
24 | ref: ${{ github.event.pull_request.head.sha }}
25 |
26 | - name: Setup Julia
27 | uses: julia-actions/setup-julia@v2
28 | with:
29 | version: '1.11'
30 |
31 | - name: Load Julia packages from cache
32 | id: julia-cache
33 | uses: julia-actions/cache@v2
34 | with:
35 | cache-name: julia-cache;${{ hashFiles('**/Manifest.toml') }}
36 | delete-old-caches: false
37 |
38 | # Note: needs resolve() to fix #518
39 | - name: Instantiate Julia environment
40 | run: julia --project=. -e 'using Pkg; Pkg.instantiate(); Pkg.resolve()'
41 |
42 | - name: Set up Quarto
43 | uses: quarto-dev/quarto-actions/setup@v2
44 |
45 | - name: Restore cached _freeze folder
46 | id: cache-restore
47 | uses: actions/cache/restore@v4
48 | with:
49 | path: |
50 | ./_freeze/
51 | key: |
52 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }}
53 | restore-keys: |
54 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}
55 |
56 | - name: Render Quarto site
57 | run: quarto render
58 |
59 | - name: Save _freeze folder
60 | id: cache-save
61 | if: ${{ !cancelled() }}
62 | uses: actions/cache/save@v4
63 | with:
64 | path: |
65 | ./_freeze/
66 | key: ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }}
67 |
68 | - name: Save Julia depot cache
69 | id: julia-cache-save
70 | if: ${{ !cancelled() && steps.julia-cache.outputs.cache-hit != 'true' }}
71 | uses: actions/cache/save@v4
72 | with:
73 | path: ${{ steps.julia-cache.outputs.cache-paths }}
74 | key: ${{ steps.julia-cache.outputs.cache-key }}
75 |
76 | - name: Deploy to GitHub Pages
77 | uses: JamesIves/github-pages-deploy-action@v4
78 | with:
79 | branch: gh-pages
80 | folder: _site
81 | target-folder: pr-previews/${{ github.event.pull_request.number }}
82 | clean: false
83 | commit-message: Deploy preview for PR ${{ github.event.pull_request.number }}
84 | token: ${{ secrets.GITHUB_TOKEN }}
85 |
86 | - name: Comment preview URL
87 | uses: thollander/actions-comment-pull-request@v2
88 | with:
89 | message: |
90 |
91 | Preview the changes: https://turinglang.org/docs/pr-previews/${{ github.event.pull_request.number }}
92 | Please avoid using the search feature and navigation bar in PR previews!
93 | comment_tag: preview-url-comment
94 |
--------------------------------------------------------------------------------
/.github/workflows/publish.yml:
--------------------------------------------------------------------------------
1 | name: Render Docs Website
2 |
3 | on:
4 | push:
5 | branches:
6 | - main
7 | - backport-v0.*
8 | workflow_dispatch: # manual trigger for testing
9 |
10 | concurrency:
11 | group: docs
12 | cancel-in-progress: true
13 |
14 | permissions:
15 | contents: write
16 |
17 | jobs:
18 | build-and-deploy:
19 | runs-on: ubuntu-latest
20 |
21 | steps:
22 | - name: Checkout
23 | uses: actions/checkout@v4
24 |
25 | - name: Setup Julia
26 | uses: julia-actions/setup-julia@v2
27 | with:
28 | version: '1.11'
29 |
30 | - name: Load Julia packages from cache
31 | id: julia-cache
32 | uses: julia-actions/cache@v2
33 | with:
34 | cache-name: julia-cache;${{ hashFiles('**/Manifest.toml') }}
35 | delete-old-caches: false
36 |
37 | # Note: needs resolve() to fix #518
38 | - name: Instantiate Julia environment
39 | run: julia --project=. -e 'using Pkg; Pkg.instantiate(); Pkg.resolve()'
40 |
41 | - name: Set up Quarto
42 | uses: quarto-dev/quarto-actions/setup@v2
43 |
44 | - name: Install jq
45 | run: sudo apt-get install jq
46 |
47 | - name: Restore cached _freeze folder
48 | id: cache-restore
49 | uses: actions/cache/restore@v4
50 | with:
51 | path: |
52 | ./_freeze/
53 | key: |
54 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }}
55 | restore-keys: |
56 | ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}
57 |
58 | - name: Extract version from _quarto.yml
59 | id: extract_version
60 | run: |
61 | minor_version=$(grep -oP 'text:\s+"v\K\d+\.\d+' _quarto.yml)
62 | echo "minor_version=$minor_version" >> $GITHUB_ENV
63 |
64 | - name: Fetch latest bugfix version for the extracted minor version
65 | id: fetch_latest_bugfix
66 | run: |
67 | repo_url="https://api.github.com/repos/TuringLang/Turing.jl/tags"
68 | tags=$(curl -s $repo_url | jq -r '.[].name')
69 | stable_tags=$(echo "$tags" | grep -Eo 'v[0-9]+\.[0-9]+\.[0-9]+$')
70 | latest_bugfix=$(echo "$stable_tags" | grep "^v${{ env.minor_version }}" | sort -rV | head -n 1)
71 | echo "version=$latest_bugfix" >> $GITHUB_ENV
72 |
73 | - name: Fetch the actual latest bugfix version
74 | id: fetch_latest_bugfix_actual
75 | run: |
76 | latest=$(curl --silent "https://api.github.com/repos/TuringLang/Turing.jl/releases/latest" | jq -r .tag_name)
77 | echo "LATEST=$latest" >> $GITHUB_ENV
78 |
79 | - name: Run Changelog and Versions Scripts
80 | if: env.version == env.LATEST
81 | run: |
82 | sh assets/scripts/changelog.sh
83 | sh assets/scripts/versions.sh
84 |
85 | - name: Render Quarto site
86 | run: quarto render
87 |
88 | - name: Rename original search index
89 | run: mv _site/search.json _site/search_original.json
90 |
91 | - name: Save _freeze folder
92 | id: cache-save
93 | if: ${{ !cancelled() }}
94 | uses: actions/cache/save@v4
95 | with:
96 | path: |
97 | ./_freeze/
98 | key: ${{ runner.os }}-${{ hashFiles('**/Manifest.toml') }}-${{ hashFiles('**/index.qmd') }}
99 |
100 | - name: Save Julia depot cache
101 | id: julia-cache-save
102 | if: ${{ !cancelled() && steps.julia-cache.outputs.cache-hit != 'true' }}
103 | uses: actions/cache/save@v4
104 | with:
105 | path: ${{ steps.julia-cache.outputs.cache-paths }}
106 | key: ${{ steps.julia-cache.outputs.cache-key }}
107 |
108 | - name: Fetch search_original.json from main site
109 | run: curl -O https://raw.githubusercontent.com/TuringLang/turinglang.github.io/gh-pages/search_original.json
110 |
111 | - name: Convert main site search index URLs to relative URLs
112 | run: |
113 | jq 'map(
114 | if .href then .href = "../" + .href else . end |
115 | if .objectID then .objectID = "../" + .objectID else . end)' search_original.json > fixed_main_search.json
116 |
117 | - name: Merge both search index
118 | run: |
119 | jq -s '.[0] + .[1]' _site/search_original.json fixed_main_search.json > _site/search.json
120 |
121 | - name: Checkout gh-pages branch
122 | uses: actions/checkout@v4
123 | with:
124 | ref: gh-pages
125 | path: gh-pages
126 |
127 | - name: Update gh-pages branch
128 | run: |
129 | # Copy to versions/ subdirectory
130 | mkdir -p gh-pages/versions/${{ env.version }}
131 | cp -r _site/* gh-pages/versions/${{ env.version }}
132 |
133 | # Find the latest version of the docs and copy that to the root
134 | cd gh-pages/versions
135 | LATEST_DOCS=$(ls -d * | sort -V | tail -n 1)
136 | cp -r $LATEST_DOCS/* ../
137 |
138 | # Commit and push
139 | git config --global user.name "github-actions[bot]"
140 | git config --global user.email "github-actions[bot]@users.noreply.github.com"
141 | git add -A
142 | git commit -m "Publish docs @ ${GITHUB_REPOSITORY}@${GITHUB_SHA}"
143 | git push
144 |
--------------------------------------------------------------------------------
/.github/workflows/remove_preview.yml:
--------------------------------------------------------------------------------
1 | name: Remove PR previews
2 |
3 | on:
4 | pull_request_target:
5 | types:
6 | - closed
7 |
8 | permissions:
9 | contents: write
10 |
11 | jobs:
12 | delete-preview-directory:
13 | if: github.event.action == 'closed' || github.event.pull_request.merged == true
14 | runs-on: ubuntu-latest
15 | steps:
16 | - name: Checkout gh-pages branch
17 | uses: actions/checkout@v4
18 | with:
19 | ref: gh-pages
20 |
21 | - name: Remove PR Preview Directory
22 | run: |
23 | PR_NUMBER=${{ github.event.pull_request.number }}
24 | PREVIEW_DIR="pr-previews/${PR_NUMBER}"
25 | git config --global user.name "github-actions[bot]"
26 | git config --global user.email "github-actions[bot]@users.noreply.github.com"
27 | git pull origin gh-pages
28 | rm -rf ${PREVIEW_DIR}
29 | git add .
30 | git commit -m "Remove preview for merged PR #${PR_NUMBER}"
31 | git push
32 |
--------------------------------------------------------------------------------
/.github/workflows/resolve_manifest.yml:
--------------------------------------------------------------------------------
1 | # This action runs Pkg.instantiate() and Pkg.resolve() every time the main
2 | # branch is pushed to. If this leads to a change in the Manifest.toml file, it
3 | # will open a PR to update the Manifest.toml file. This ensures that the
4 | # contents of the Manifest in the repository are consistent with the contents
5 | # of the Manifest used by the CI system (i.e. during the actual docs
6 | # generation).
7 | #
8 | # See https://github.com/TuringLang/docs/issues/518 for motivation.
9 |
10 | name: Resolve Manifest
11 | on:
12 | push:
13 | branches:
14 | - main
15 | workflow_dispatch:
16 |
17 | jobs:
18 | check-version:
19 | runs-on: ubuntu-latest
20 |
21 | permissions:
22 | contents: write
23 | pull-requests: write
24 |
25 | env:
26 | # Disable precompilation as it takes a long time and is not needed for this workflow
27 | JULIA_PKG_PRECOMPILE_AUTO: 0
28 |
29 | steps:
30 | - name: Checkout
31 | uses: actions/checkout@v4
32 |
33 | - name: Setup Julia
34 | uses: julia-actions/setup-julia@v2
35 | with:
36 | version: '1.11'
37 |
38 | - name: Instantiate and resolve
39 | run: |
40 | julia -e 'using Pkg; Pkg.instantiate(); Pkg.resolve()'
41 |
42 | - name: Open PR
43 | id: create_pr
44 | uses: peter-evans/create-pull-request@v6
45 | with:
46 | branch: resolve-manifest
47 | add-paths: Manifest.toml
48 | commit-message: "Update Manifest.toml"
49 | body: "This PR is automatically generated by the `resolve_manifest.yml` GitHub Action."
50 | title: "Update Manifest.toml to match CI environment"
51 |
--------------------------------------------------------------------------------
/.github/workflows/version_check.jl:
--------------------------------------------------------------------------------
1 | # Set up a temporary environment just to run this script
2 | using Pkg
3 | Pkg.activate(temp=true)
4 | Pkg.add(["YAML", "TOML", "JSON", "HTTP"])
5 | import YAML
6 | import TOML
7 | import JSON
8 | import HTTP
9 |
10 | PROJECT_TOML_PATH = "Project.toml"
11 | QUARTO_YML_PATH = "_quarto.yml"
12 | MANIFEST_TOML_PATH = "Manifest.toml"
13 |
14 | function major_minor_match(vs...)
15 | first = vs[1]
16 | all(v.:major == first.:major && v.:minor == first.:minor for v in vs)
17 | end
18 |
19 | function major_minor_patch_match(vs...)
20 | first = vs[1]
21 | all(v.:major == first.:major && v.:minor == first.:minor && v.:patch == first.:patch for v in vs)
22 | end
23 |
24 | """
25 | Update the version number in Project.toml to match `target_version`.
26 |
27 | This uses a naive regex replacement on lines, i.e. sed-like behaviour. Parsing
28 | the file, editing the TOML and then re-serialising also works and would be more
29 | correct, but the entries in the output file can end up being scrambled, which
30 | would lead to unnecessarily large diffs in the PR.
31 | """
32 | function update_project_toml(filename, target_version::VersionNumber)
33 | lines = readlines(filename)
34 | open(filename, "w") do io
35 | for line in lines
36 | if occursin(r"^Turing\s*=\s*\"\d+\.\d+\"\s*$", line)
37 | println(io, "Turing = \"$(target_version.:major).$(target_version.:minor)\"")
38 | else
39 | println(io, line)
40 | end
41 | end
42 | end
43 | end
44 |
45 | """
46 | Update the version number in _quarto.yml to match `target_version`.
47 |
48 | See `update_project_toml` for implementation rationale.
49 | """
50 | function update_quarto_yml(filename, target_version::VersionNumber)
51 | # Don't deserialise/serialise as this will scramble lines
52 | lines = readlines(filename)
53 | open(filename, "w") do io
54 | for line in lines
55 | m = match(r"^(\s+)- text:\s*\"v\d+\.\d+\"\s*$", line)
56 | if m !== nothing
57 | println(io, "$(m[1])- text: \"v$(target_version.:major).$(target_version.:minor)\"")
58 | else
59 | println(io, line)
60 | end
61 | end
62 | end
63 | end
64 |
65 | # Retain the original version number string for error messages, as
66 | # VersionNumber() will tack on a patch version of 0
67 | quarto_yaml = YAML.load_file(QUARTO_YML_PATH)
68 | quarto_version_str = quarto_yaml["website"]["navbar"]["right"][1]["text"]
69 | quarto_version = VersionNumber(quarto_version_str)
70 | println("_quarto.yml version: ", quarto_version_str)
71 |
72 | project_toml = TOML.parsefile(PROJECT_TOML_PATH)
73 | project_version_str = project_toml["compat"]["Turing"]
74 | project_version = VersionNumber(project_version_str)
75 | println("Project.toml version: ", project_version_str)
76 |
77 | manifest_toml = TOML.parsefile(MANIFEST_TOML_PATH)
78 | manifest_version = VersionNumber(manifest_toml["deps"]["Turing"][1]["version"])
79 | println("Manifest.toml version: ", manifest_version)
80 |
81 | errors = []
82 |
83 | if ENV["TARGET_IS_MAIN"] == "true"
84 | # This environment variable is set by the GitHub Actions workflow. If it is
85 | # true, fetch the latest version from GitHub and update files to match this
86 | # version if necessary.
87 |
88 | resp = HTTP.get("https://api.github.com/repos/TuringLang/Turing.jl/releases/latest")
89 | latest_version = VersionNumber(JSON.parse(String(resp.body))["tag_name"])
90 | println("Latest Turing.jl version: ", latest_version)
91 |
92 | if !major_minor_match(latest_version, project_version)
93 | push!(errors, "$(PROJECT_TOML_PATH) out of date")
94 | println("$(PROJECT_TOML_PATH) is out of date; updating")
95 | update_project_toml(PROJECT_TOML_PATH, latest_version)
96 | end
97 |
98 | if !major_minor_match(latest_version, quarto_version)
99 | push!(errors, "$(QUARTO_YML_PATH) out of date")
100 | println("$(QUARTO_YML_PATH) is out of date; updating")
101 | update_quarto_yml(QUARTO_YML_PATH, latest_version)
102 | end
103 |
104 | if !major_minor_patch_match(latest_version, manifest_version)
105 | push!(errors, "$(MANIFEST_TOML_PATH) out of date")
106 | # Attempt to automatically update Manifest
107 | println("$(MANIFEST_TOML_PATH) is out of date; updating")
108 | old_env = Pkg.project().path
109 | Pkg.activate(".")
110 | try
111 | Pkg.add(name="Turing", version=latest_version)
112 | catch e
113 | # If the Manifest couldn't be updated, the error will be shown later
114 | println(e)
115 | end
116 | # Check if versions match now, error if not
117 | Pkg.activate(old_env)
118 | manifest_toml = TOML.parsefile(MANIFEST_TOML_PATH)
119 | manifest_version = VersionNumber(manifest_toml["deps"]["Turing"][1]["version"])
120 | if !major_minor_patch_match(latest_version, manifest_version)
121 | push!(errors, "Failed to update $(MANIFEST_TOML_PATH) to match latest Turing.jl version")
122 | end
123 | end
124 |
125 | if isempty(errors)
126 | println("All good")
127 | else
128 | error("The following errors occurred during version checking: \n", join(errors, "\n"))
129 | end
130 |
131 | else
132 | # If this is not true, then we are running on a backport-v* branch, i.e. docs
133 | # for a non-latest version. In this case we don't attempt to fetch the latest
134 | # patch version from GitHub to check the Manifest (we could, but it is more
135 | # work as it would involve paging through the list of releases). Instead,
136 | # we just check that the minor versions match.
137 | if !major_minor_match(quarto_version, project_version, manifest_version)
138 | error("The minor versions of Turing.jl in _quarto.yml, Project.toml, and Manifest.toml are inconsistent:
139 | - _quarto.yml: $quarto_version_str
140 | - Project.toml: $project_version_str
141 | - Manifest.toml: $manifest_version
142 | ")
143 | end
144 | end
145 |
--------------------------------------------------------------------------------
/.github/workflows/version_check.yml:
--------------------------------------------------------------------------------
1 | # This action checks that the minor versions of Turing.jl specified in the
2 | # Project.toml, _quarto.yml, and Manifest.toml files are consistent.
3 | #
4 | # For pushes to main or PRs to main, it additionally also checks that the
5 | # version specified in Manifest.toml matches the latest release on GitHub.
6 | #
7 | # If any discrepancies are observed, it will open a PR to fix them.
8 |
9 | name: Check Turing.jl version consistency
10 | on:
11 | push:
12 | branches:
13 | - main
14 | - backport-*
15 | pull_request:
16 | branches:
17 | - main
18 | - backport-*
19 | workflow_dispatch:
20 |
21 | jobs:
22 | check-version:
23 | runs-on: ubuntu-latest
24 |
25 | permissions:
26 | contents: write
27 | pull-requests: write
28 |
29 | env:
30 | # Determine whether the target branch is main (i.e. this is a push to
31 | # main or a PR to main).
32 | TARGET_IS_MAIN: ${{ (github.event_name == 'push' && github.ref_name == 'main') || (github.event_name == 'pull_request' && github.base_ref == 'main') }}
33 | IS_PR_FROM_FORK: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.fork }}
34 | # Disable precompilation as it takes a long time and is not needed for this workflow
35 | JULIA_PKG_PRECOMPILE_AUTO: 0
36 |
37 | steps:
38 | - name: Checkout
39 | uses: actions/checkout@v4
40 |
41 | - name: Setup Julia
42 | uses: julia-actions/setup-julia@v2
43 |
44 | - name: Log GitHub context variables
45 | run: |
46 | echo github.event_name: ${{ github.event_name }}
47 | echo github.ref_name: ${{ github.ref_name }}
48 | echo github.base_ref: ${{ github.base_ref }}
49 | echo TARGET_IS_MAIN: ${{ env.TARGET_IS_MAIN }}
50 | echo IS_PR_FROM_FORK: ${{ env.IS_PR_FROM_FORK }}
51 |
52 | - name: Check version consistency
53 | id: version_check
54 | run: julia --color=yes .github/workflows/version_check.jl
55 |
56 | - name: Create a PR with suggested changes
57 | id: create_pr
58 | if: always() && steps.version_check.outcome == 'failure' && env.TARGET_IS_MAIN && (! env.IS_PR_FROM_FORK)
59 | uses: peter-evans/create-pull-request@v6
60 | with:
61 | base: ${{ github.event_name == 'pull_request' && github.head_ref || github.ref_name }}
62 | branch: update-turing-version/${{ github.event_name == 'pull_request' && github.head_ref || github.ref_name }}
63 | commit-message: "Update Turing.jl version to match latest release"
64 | body: "This PR is automatically generated by the `version_check.yml` GitHub Action."
65 | title: "Update Turing.jl version to match latest release"
66 |
67 | - name: Comment on PR about suggested changes (if PR was made)
68 | if: always() && github.event_name == 'pull_request' && steps.create_pr.outputs.pull-request-operation == 'created'
69 | uses: thollander/actions-comment-pull-request@v2
70 | with:
71 | message: |
72 | Hello! The versions of Turing.jl in your `Project.toml`, `_quarto.yml`, and/or `Manifest.toml` did not match the latest release version found on GitHub (https://github.com/TuringLang/Turing.jl/releases/latest).
73 |
74 | I've made a PR to update these files to match the latest release: ${{ steps.create_pr.outputs.pull-request-url }}
75 |
76 | Please review the changes and merge the PR if they look good.
77 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .*.jl.cov
2 | *.jl.*.cov
3 | *.jl.mem
4 | .ipynb_checkpoints/
5 | *.tmp
6 | *.aux
7 | *.log
8 | *.out
9 | *.tex
10 | /tutorials/**/*.html
11 | /tutorials/**/*.pdf
12 | /tutorials/**/.quarto/*
13 | /tutorials/**/index_files/*
14 | Testing/
15 | /*/*/jl_*/
16 | .vscode
17 | _freeze
18 | _site
19 | .quarto
20 | /.quarto/
21 | changelog.qmd
22 | versions.qmd
23 | tmp.gif
24 | .venv
25 | venv
26 |
27 | 404.html
28 | site_libs
29 | .DS_Store
30 |
--------------------------------------------------------------------------------
/404.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Page Not Found
3 | ---
4 |
5 | The page you requested cannot be found (perhaps it was moved or renamed).
6 |
7 | You may want to return to the [documentation home page](https://turinglang.org/docs).
8 |
9 | If you believe this is an error, please do report it by [opening an issue](https://github.com/TuringLang/docs/issues/new).
10 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2018-2024, Hong Ge, the Turing language team
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/Project.toml:
--------------------------------------------------------------------------------
1 | [deps]
2 | ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
3 | AbstractGPs = "99985d1d-32ba-4be9-9821-2ec096f28918"
4 | AbstractMCMC = "80f14c24-f653-4e6a-9b94-39d6b0f70001"
5 | AbstractPPL = "7a57a42e-76ec-4ea3-a279-07e840d6d9cf"
6 | AdvancedHMC = "0bf59076-c3b1-5ca4-86bd-e02cd72cde3d"
7 | AdvancedMH = "5b7e9947-ddc0-4b3f-9b55-0d8042f74170"
8 | Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
9 | CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
10 | ComponentArrays = "b0b7db55-cfe3-40fc-9ded-d10e2dbeff66"
11 | DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
12 | DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
13 | DifferentialEquations = "0c46a032-eb83-5123-abaf-570d42b7fbaa"
14 | Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
15 | Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
16 | DynamicHMC = "bbc10e6e-7c05-544b-b16e-64fede858acb"
17 | DynamicPPL = "366bfd00-2699-11ea-058f-f148b4cae6d8"
18 | FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
19 | Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
20 | ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
21 | Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
22 | GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
23 | HiddenMarkovModels = "84ca31d5-effc-45e0-bfda-5a68cd981f47"
24 | LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
25 | LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
26 | LogDensityProblems = "6fdf6af0-433a-55f7-b3ed-c6c6e0b8df7c"
27 | LogDensityProblemsAD = "996a588d-648d-4e1f-a8f0-a84b347e47b1"
28 | LogExpFunctions = "2ab3a3ac-af41-5b50-aa03-7779005ae688"
29 | Lux = "b2108857-7c20-44ae-9111-449ecde12c47"
30 | MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
31 | MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d"
32 | MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
33 | MacroTools = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
34 | Measures = "442fdcdd-2543-5da2-b0f3-8c86c306513e"
35 | Memoization = "6fafb56a-5788-4b4e-91ca-c0cea6611c73"
36 | MicroCanonicalHMC = "234d2aa0-2291-45f7-9047-6fa6f316b0a8"
37 | Mooncake = "da2b9cff-9c12-43a0-ae48-6db2b0edb7d6"
38 | NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
39 | Optimization = "7f7a1694-90dd-40f0-9382-eb1efda571ba"
40 | OptimizationNLopt = "4e6fcdb7-1186-4e1f-a706-475e75c168bb"
41 | OptimizationOptimJL = "36348300-93cb-4f02-beb5-3c3902f8871e"
42 | PDMats = "90014a1f-27ba-587c-ab20-58faa44d9150"
43 | Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
44 | RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
45 | Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
46 | ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
47 | SciMLSensitivity = "1ed8b502-d754-442c-8d5d-10ac956f44a1"
48 | Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
49 | StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
50 | StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
51 | StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
52 | Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
53 | UnPack = "3a884ed6-31ef-47d7-9d2a-63182c4928ed"
54 | Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
55 |
56 | [compat]
57 | Turing = "0.38"
58 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Turing.jl Documentation and Tutorials
2 |
3 | **https://turinglang.org/docs/**
4 |
5 | ## Contributing
6 |
7 | The easiest way to contribute to the documentation is to simply open a pull request.
8 | A preview version of the documentation is built for PRs, so you can see how your changes look without having to build the entire site locally.
9 | (Note that if you are editing a tutorial that takes a long time to run, this feedback may take a while.)
10 |
11 | The `main` branch contains the Quarto source code.
12 | The HTML documentation is automatically built using GitHub Actions, and deployed to the `gh-pages` branch, so you do not have to build and commit the HTML files yourself.
13 |
14 | ## Local development
15 |
16 | If you wish to render the docs website locally, you'll need to have [Quarto](https://quarto.org/docs/download/) installed (at least version 1.6.31) on your computer.
17 | Then:
18 |
19 | 1. Clone this repository:
20 |
21 | ```bash
22 | git clone https://github.com/TuringLang/docs
23 | ```
24 |
25 | 2. Navigate into the cloned directory:
26 |
27 | ```bash
28 | cd docs
29 | ```
30 |
31 | 3. Instantiate the project environment:
32 |
33 | ```bash
34 | julia --project=. -e 'using Pkg; Pkg.instantiate()'
35 | ```
36 |
37 | 4. Preview the website using Quarto.
38 |
39 | > [!WARNING]
40 | >
41 | > This will take a _very_ long time, as it will build every tutorial from scratch. See [below](#faster-rendering) for ways to speed this up.
42 |
43 | ```bash
44 | quarto preview
45 | ```
46 |
47 | This will launch a local server at http://localhost:4200/, which you can view in your web browser by navigating to the link shown in your terminal.
48 |
49 | 5. Render the website locally:
50 |
51 | ```bash
52 | quarto render
53 | ```
54 |
55 | This will build the entire documentation and place the output in the `_site` folder.
56 | You can then view the rendered website by launching a HTTP server from that directory, e.g. using Python:
57 |
58 | ```bash
59 | cd _site
60 | python -m http.server 8000
61 | ```
62 |
63 | Then, navigate to http://localhost:8000/ in your web browser.
64 |
65 | ## Faster rendering
66 |
67 | Note that rendering the entire documentation site can take a long time (usually multiple hours).
68 | If you wish to speed up local rendering, there are two options available:
69 |
70 | 1. Render a single tutorial or `qmd` file without compiling the entire site.
71 | To do this, pass the `qmd` file as an argument to `quarto render`:
72 |
73 | ```
74 | quarto render path/to/index.qmd
75 | ```
76 |
77 | (Note that `quarto preview` does not support this single-file rendering.)
78 |
79 | 2. Download the most recent `_freeze` folder from the [GitHub releases of this repo](https://github.com/turinglang/docs/releases), and place it in the root of the project.
80 | The `_freeze` folder stores the cached outputs from a previous build of the documentation.
81 | If it is present, Quarto will reuse the outputs of previous computations for any files for which the source is unchanged.
82 |
83 | Note that the validity of a `_freeze` folder depends on the Julia environment that it was created with, because different package versions may lead to different outputs.
84 | In the GitHub release, the `Manifest.toml` is also provided, and you should also download this and place it in the root directory of the docs.
85 |
86 | If there isn't a suitably up-to-date `_freeze` folder in the releases, you can generate a new one by [triggering a run for the `create_release.yml` workflow](https://github.com/TuringLang/docs/actions/workflows/create_release.yml).
87 | (You will need to have the appropriate permissions; please create an issue if you need help with this.)
88 |
89 | ## Troubleshooting build issues
90 |
91 | As described in the [Quarto docs](https://quarto.org/docs/computations/julia.html#using-the-julia-engine), Quarto's Julia engine uses a worker process behind the scenes.
92 | Sometimes this can result in issues with old package code not being unloaded (e.g. when package versions are upgraded).
93 | If you find that Quarto's execution is failing with errors that aren't reproducible via a normal REPL, try adding the `--execute-daemon-restart` flag to the `quarto render` command:
94 |
95 | ```bash
96 | quarto render /path/to/index.qmd --execute-daemon-restart
97 | ```
98 |
99 | And also, kill any stray Quarto processes that are still running (sometimes it keeps running in the background):
100 |
101 | ```bash
102 | pkill -9 -f quarto
103 | ```
104 |
105 | ## License
106 |
107 | This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
108 |
--------------------------------------------------------------------------------
/_quarto.yml:
--------------------------------------------------------------------------------
1 | project:
2 | type: website
3 | preview:
4 | # Change port if it's busy in your system or just remove this line so that It will automatically use any free port
5 | port: 4200
6 | browser: true
7 |
8 |
9 | # These cannot be used as variables. They are reserved for the project configuration.
10 | website:
11 | title: "Turing.jl"
12 | site-url: "https://turinglang.org/docs/"
13 | favicon: "assets/favicon.ico"
14 | search:
15 | location: navbar
16 | type: overlay
17 | navbar:
18 | logo: "assets/images/turing-logo.svg"
19 | logo-href: https://turinglang.org/
20 | background: "#073c44"
21 | foreground: "#ffffff"
22 | left:
23 | - href: getting-started/
24 | text: Get Started
25 | - href: tutorials/coin-flipping/
26 | text: Tutorials
27 | - href: https://turinglang.org/library/
28 | text: Libraries
29 | - href: https://turinglang.org/news/
30 | text: News
31 | - href: https://turinglang.org/team/
32 | text: Team
33 | right:
34 | # Current version
35 | - text: "v0.38"
36 | menu:
37 | - text: Changelog
38 | href: https://turinglang.org/docs/changelog.html
39 | - text: All Versions
40 | href: https://turinglang.org/docs/versions.html
41 | tools:
42 | - icon: twitter
43 | href: https://x.com/TuringLang
44 | text: Turing Twitter
45 | - icon: github
46 | href: https://github.com/TuringLang/Turing.jl
47 | text: Turing GitHub
48 |
49 | sidebar:
50 | - text: documentation
51 | collapse-level: 1
52 | contents:
53 | - getting-started/index.qmd
54 | - core-functionality/index.qmd
55 |
56 | - section: "User Guide"
57 | collapse-level: 1
58 | contents:
59 | - usage/automatic-differentiation/index.qmd
60 | - usage/custom-distribution/index.qmd
61 | - usage/probability-interface/index.qmd
62 | - usage/modifying-logprob/index.qmd
63 | - usage/tracking-extra-quantities/index.qmd
64 | - usage/mode-estimation/index.qmd
65 | - usage/performance-tips/index.qmd
66 | - usage/sampler-visualisation/index.qmd
67 | - usage/dynamichmc/index.qmd
68 | - usage/external-samplers/index.qmd
69 | - usage/troubleshooting/index.qmd
70 |
71 | - section: "Tutorials"
72 | contents:
73 | - tutorials/coin-flipping/index.qmd
74 | - tutorials/gaussian-mixture-models/index.qmd
75 | - tutorials/bayesian-logistic-regression/index.qmd
76 | - tutorials/bayesian-neural-networks/index.qmd
77 | - tutorials/hidden-markov-models/index.qmd
78 | - tutorials/bayesian-linear-regression/index.qmd
79 | - tutorials/infinite-mixture-models/index.qmd
80 | - tutorials/bayesian-poisson-regression/index.qmd
81 | - tutorials/multinomial-logistic-regression/index.qmd
82 | - tutorials/variational-inference/index.qmd
83 | - tutorials/bayesian-differential-equations/index.qmd
84 | - tutorials/probabilistic-pca/index.qmd
85 | - tutorials/bayesian-time-series-analysis/index.qmd
86 | - tutorials/gaussian-processes-introduction/index.qmd
87 | - tutorials/gaussian-process-latent-variable-models/index.qmd
88 |
89 | - section: "Developers"
90 | contents:
91 | - developers/contributing/index.qmd
92 |
93 | - section: "DynamicPPL's Compiler"
94 | collapse-level: 1
95 | contents:
96 | - developers/compiler/model-manual/index.qmd
97 | - developers/compiler/minituring-compiler/index.qmd
98 | - developers/compiler/minituring-contexts/index.qmd
99 | - developers/compiler/design-overview/index.qmd
100 |
101 | - section: "DynamicPPL Contexts"
102 | collapse-level: 1
103 | contents:
104 | - developers/contexts/submodel-condition/index.qmd
105 |
106 | - section: "Variable Transformations"
107 | collapse-level: 1
108 | contents:
109 | - developers/transforms/distributions/index.qmd
110 | - developers/transforms/bijectors/index.qmd
111 | - developers/transforms/dynamicppl/index.qmd
112 |
113 | - section: "Inference in Detail"
114 | collapse-level: 1
115 | contents:
116 | - developers/inference/variational-inference/index.qmd
117 | - developers/inference/implementing-samplers/index.qmd
118 |
119 | page-footer:
120 | background: "#073c44"
121 | left: |
122 | Turing is created by Hong Ge, and lovingly maintained by the core team of volunteers.
123 | The contents of this website are © 2024 under the terms of the MIT License.
124 |
125 | right:
126 | - icon: twitter
127 | href: https://x.com/TuringLang
128 | aria-label: Turing Twitter
129 | - icon: github
130 | href: https://github.com/TuringLang/Turing.jl
131 | aria-label: Turing GitHub
132 |
133 | back-to-top-navigation: true
134 | repo-url: https://github.com/TuringLang/docs
135 | repo-actions: [edit, issue]
136 | repo-branch: main
137 | repo-link-target: _blank
138 | page-navigation: true
139 |
140 | format:
141 | html:
142 | theme:
143 | light: cosmo
144 | dark: [cosmo, theming/theme-dark.scss]
145 | css: theming/styles.css
146 | smooth-scroll: true
147 | output-block-background: true
148 | toc: true
149 | toc-title: "Table of Contents"
150 | code-fold: false
151 | code-overflow: scroll
152 | execute:
153 | echo: true
154 | output: true
155 | freeze: auto
156 | include-in-header:
157 | - text: |
158 |
166 |
167 | # These variables can be used in any qmd files, e.g. for links:
168 | # the [Getting Started page]({{< meta get-started >}})
169 | # Note that you don't need to prepend `../../` to the link, Quarto will figure
170 | # it out automatically.
171 |
172 | get-started: tutorials/docs-00-getting-started
173 | tutorials-intro: tutorials/00-introduction
174 | gaussian-mixture-model: tutorials/01-gaussian-mixture-model
175 | logistic-regression: tutorials/02-logistic-regression
176 | bayesian-neural-network: tutorials/03-bayesian-neural-network
177 | hidden-markov-model: tutorials/04-hidden-markov-model
178 | linear-regression: tutorials/05-linear-regression
179 | infinite-mixture-model: tutorials/06-infinite-mixture-model
180 | poisson-regression: tutorials/07-poisson-regression
181 | multinomial-logistic-regression: tutorials/08-multinomial-logistic-regression
182 | variational-inference: tutorials/09-variational-inference
183 | bayesian-differential-equations: tutorials/10-bayesian-differential-equations
184 | probabilistic-pca: tutorials/11-probabilistic-pca
185 | gplvm: tutorials/12-gplvm
186 | seasonal-time-series: tutorials/13-seasonal-time-series
187 | using-turing-advanced: tutorials/docs-09-using-turing-advanced
188 | using-turing: tutorials/docs-12-using-turing-guide
189 |
190 | usage-automatic-differentiation: usage/automatic-differentiation
191 | usage-custom-distribution: usage/custom-distribution
192 | usage-dynamichmc: usage/dynamichmc
193 | usage-external-samplers: usage/external-samplers
194 | usage-mode-estimation: usage/mode-estimation
195 | usage-modifying-logprob: usage/modifying-logprob
196 | usage-performance-tips: usage/performance-tips
197 | usage-probability-interface: usage/probability-interface
198 | usage-sampler-visualisation: usage/sampler-visualisation
199 | usage-tracking-extra-quantities: usage/tracking-extra-quantities
200 | usage-troubleshooting: usage/troubleshooting
201 |
202 | contributing-guide: developers/contributing
203 | dev-model-manual: developers/compiler/model-manual
204 | contexts: developers/compiler/minituring-contexts
205 | minituring: developers/compiler/minituring-compiler
206 | using-turing-compiler: developers/compiler/design-overview
207 | using-turing-variational-inference: developers/inference/variational-inference
208 | using-turing-implementing-samplers: developers/inference/implementing-samplers
209 | dev-transforms-distributions: developers/transforms/distributions
210 | dev-transforms-bijectors: developers/transforms/bijectors
211 | dev-transforms-dynamicppl: developers/transforms/dynamicppl
212 | dev-contexts-submodel-condition: developers/contexts/submodel-condition
213 |
--------------------------------------------------------------------------------
/assets/favicon.ico:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TuringLang/docs/846abd6730853cacd7202e517eb011c42f89d45b/assets/favicon.ico
--------------------------------------------------------------------------------
/assets/scripts/changelog.sh:
--------------------------------------------------------------------------------
1 | url="https://raw.githubusercontent.com/TuringLang/Turing.jl/main/HISTORY.md"
2 |
3 | changelog_content=$(curl -s "$url")
4 |
5 | cat << EOF > changelog.qmd
6 | ---
7 | title: Changelog
8 | repo-actions: false
9 | include-in-header:
10 | - text: |
11 |
19 | ---
20 |
21 | $changelog_content
22 | EOF
23 |
--------------------------------------------------------------------------------
/assets/scripts/versions.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | REPO_URL="https://api.github.com/repos/TuringLang/Turing.jl/tags"
4 |
5 | # Fetch the tags
6 | TAGS=$(curl -s $REPO_URL | grep 'name' | sed 's/.*: "\(.*\)",/\1/')
7 |
8 | # Filter out pre-release version tags (e.g., 0.33.0-rc.1) and keep only stable version tags
9 | STABLE_TAGS=$(echo "$TAGS" | grep -Eo 'v[0-9]+\.[0-9]+\.[0-9]+$')
10 |
11 | # Find the latest version (including bug fix versions)
12 | LATEST_VERSION=$(echo "$STABLE_TAGS" | head -n 1)
13 |
14 | # Find the latest minor version (without bug fix)
15 | STABLE_VERSION=$(echo "$STABLE_TAGS" | grep -Eo 'v[0-9]+\.[0-9]+(\.0)?$' | head -n 1)
16 |
17 | # Filter out bug fix version tags from STABLE_TAGS to get only minor version tags
18 | MINOR_TAGS=$(echo "$STABLE_TAGS" | grep -Eo 'v[0-9]+\.[0-9]+(\.0)?$')
19 |
20 | # Set the minimum version to include in the "Previous Versions" section
21 | MIN_VERSION="v0.31.0"
22 |
23 | # Remove bug-fix version number from the display of a version
24 | remove_bugfix() {
25 | echo "$1" | sed -E 's/\.[0-9]$//'
26 | }
27 |
28 | # versions.qmd file will be generated from this content
29 | VERSIONS_CONTENT="---
30 | pagetitle: Versions
31 | repo-actions: false
32 | include-in-header:
33 | - text: |
34 |
42 | ---
43 |
44 | # Latest Version
45 | | | | |
46 | | --- | --- | --- |
47 | | $(remove_bugfix "$LATEST_VERSION") | [Documentation](versions/${LATEST_VERSION}/) | [Changelog](changelog.qmd) |
48 |
49 | # Previous Versions
50 | | | |
51 | | --- | --- |
52 | "
53 | # Add previous versions, excluding the latest and stable versions
54 | for MINOR_TAG in $MINOR_TAGS; do
55 | if [ "$MINOR_TAG" != "$LATEST_VERSION" ] && [ "$MINOR_TAG" != "$STABLE_VERSION" ] && [ "$MINOR_TAG" \> "$MIN_VERSION" ]; then
56 | # Find the latest bug fix version for the current minor version
57 | LATEST_BUG_FIX=$(echo "$STABLE_TAGS" | grep "^${MINOR_TAG%.*}" | sort -r | head -n 1)
58 | # Remove trailing .0 from display version
59 | DISPLAY_MINOR_TAG=$(remove_bugfix "$MINOR_TAG")
60 | VERSIONS_CONTENT="${VERSIONS_CONTENT}| ${DISPLAY_MINOR_TAG} | [Documentation](versions/${LATEST_BUG_FIX}/) |
61 | "
62 | fi
63 | done
64 |
65 | # Add the Archived Versions section manually
66 | VERSIONS_CONTENT="${VERSIONS_CONTENT}
67 | # Archived Versions
68 | Documentation for archived versions is available on our deprecated documentation site.
69 |
70 | | | |
71 | | --- | --- |
72 | | v0.31 | [Documentation](../v0.31.4/) |
73 | | v0.30 | [Documentation](../v0.30.9/) |
74 | | v0.29 | [Documentation](../v0.29.3/) |
75 | | v0.28 | [Documentation](../v0.28.3/) |
76 | | v0.27 | [Documentation](../v0.27.0/) |
77 | | v0.26 | [Documentation](../v0.26.6/) |
78 | | v0.25 | [Documentation](../v0.25.3/) |
79 | | v0.24 | [Documentation](../v0.24.4/) |
80 | "
81 |
82 | # Write the content to the versions.qmd file
83 | echo "$VERSIONS_CONTENT" > versions.qmd
84 |
--------------------------------------------------------------------------------
/developers/compiler/model-manual/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Manually Defining a Model
3 | engine: julia
4 | aliases:
5 | - ../../../tutorials/dev-model-manual/index.html
6 | ---
7 |
8 | Traditionally, models in Turing are defined using the `@model` macro:
9 |
10 | ```{julia}
11 | using Turing
12 |
13 | @model function gdemo(x)
14 | # Set priors.
15 | s² ~ InverseGamma(2, 3)
16 | m ~ Normal(0, sqrt(s²))
17 |
18 | # Observe each value of x.
19 | x .~ Normal(m, sqrt(s²))
20 |
21 | return nothing
22 | end
23 |
24 | model = gdemo([1.5, 2.0])
25 | ```
26 |
27 | The `@model` macro accepts a function definition and rewrites it such that call of the function generates a `Model` struct for use by the sampler.
28 |
29 | However, models can be constructed by hand without the use of a macro.
30 | Taking the `gdemo` model above as an example, the macro-based definition can be implemented also (a bit less generally) with the macro-free version
31 |
32 | ```{julia}
33 | using DynamicPPL
34 |
35 | # Create the model function.
36 | function gdemo2(model, varinfo, context, x)
37 | # Assume s² has an InverseGamma distribution.
38 | s², varinfo = DynamicPPL.tilde_assume!!(
39 | context, InverseGamma(2, 3), Turing.@varname(s²), varinfo
40 | )
41 |
42 | # Assume m has a Normal distribution.
43 | m, varinfo = DynamicPPL.tilde_assume!!(
44 | context, Normal(0, sqrt(s²)), Turing.@varname(m), varinfo
45 | )
46 |
47 | # Observe each value of x[i] according to a Normal distribution.
48 | for i in eachindex(x)
49 | _retval, varinfo = DynamicPPL.tilde_observe!!(
50 | context, Normal(m, sqrt(s²)), x[i], Turing.@varname(x[i]), varinfo
51 | )
52 | end
53 |
54 | # The final return statement should comprise both the original return
55 | # value and the updated varinfo.
56 | return nothing, varinfo
57 | end
58 | gdemo2(x) = Turing.Model(gdemo2, (; x))
59 |
60 | # Instantiate a Model object with our data variables.
61 | model2 = gdemo2([1.5, 2.0])
62 | ```
63 |
64 | We can sample from this model in the same way:
65 |
66 | ```{julia}
67 | chain = sample(model2, NUTS(), 1000; progress=false)
68 | ```
69 |
70 | The subsequent pages in this section will show how the `@model` macro does this behind-the-scenes.
71 |
--------------------------------------------------------------------------------
/developers/contributing/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Contributing
3 | aliases:
4 | - ../../tutorials/docs-01-contributing-guide/index.html
5 | ---
6 |
7 | Turing is an open-source project and is [hosted on GitHub](https://github.com/TuringLang).
8 | We welcome contributions from the community in all forms large or small: bug reports, feature implementations, code contributions, or improvements to documentation or infrastructure are all extremely valuable.
9 | We would also very much appreciate examples of models written using Turing.
10 |
11 | ### How to get involved
12 |
13 | Our outstanding issues are tabulated on our [issue tracker](https://github.com/TuringLang/Turing.jl/issues).
14 | Closing one of these may involve implementing new features, fixing bugs, or writing example models.
15 |
16 | You can also join the `#turing` channel on the [Julia Slack](https://julialang.org/slack/) and say hello!
17 |
18 | If you are new to open-source software, please see [GitHub's introduction](https://guides.github.com/introduction/flow/) or [Julia's contribution guide](https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md) on using version control for collaboration.
19 |
20 | ### Documentation
21 |
22 | Each of the packages in the Turing ecosystem (see [Libraries](/library)) has its own documentation, which is typically found in the `docs` folder of the corresponding package.
23 | For example, the source code for DynamicPPL's documentation can be found in [its repository](https://github.com/TuringLang/DynamicPPL.jl).
24 |
25 | The documentation for Turing.jl itself consists of the tutorials that you see on this website, and is built from the separate [`docs` repository](https://github.com/TuringLang/docs).
26 | None of the documentation is generated from the [main Turing.jl repository](https://github.com/TuringLang/Turing.jl); in particular, the API that Turing exports does not currently form part of the documentation.
27 |
28 | Other sections of the website (anything that isn't a package, or a tutorial) – for example, the list of libraries – is built from the [`turinglang.github.io` repository](https://github.com/TuringLang/turinglang.github.io).
29 |
30 | ### Tests
31 |
32 | Turing, like most software libraries, has a test suite. You can run the whole suite by running `julia --project=.` from the root of the Turing repository, and then running
33 |
34 | ```julia
35 | import Pkg; Pkg.test("Turing")
36 | ```
37 |
38 | The test suite subdivides into files in the `test` folder, and you can run only some of them using commands like
39 |
40 | ```julia
41 | import Pkg; Pkg.test("Turing"; test_args=["optim", "hmc", "--skip", "ext"])
42 | ```
43 |
44 | This one would run all files with "optim" or "hmc" in their path, such as `test/optimisation/Optimisation.jl`, but not files with "ext" in their path. Alternatively, you can set these arguments as command line arguments when you run Julia
45 |
46 | ```julia
47 | julia --project=. -e 'import Pkg; Pkg.test(; test_args=ARGS)' -- optim hmc --skip ext
48 | ```
49 |
50 | Or otherwise, set the global `ARGS` variable, and call `include("test/runtests.jl")`.
51 |
52 | ### Style Guide
53 |
54 | Turing has a style guide, described below.
55 | Reviewing it before making a pull request is not strictly necessary, but you may be asked to change portions of your code to conform with the style guide before it is merged.
56 |
57 | Most Turing code follows [Blue: a Style Guide for Julia](https://github.com/JuliaDiff/BlueStyle).
58 | These conventions were created from a variety of sources including Python's [PEP8](http://legacy.python.org/dev/peps/pep-0008/), Julia's [Notes for Contributors](https://github.com/JuliaLang/julia/blob/master/CONTRIBUTING.md), and Julia's [Style Guide](https://docs.julialang.org/en/v1/manual/style-guide/).
59 |
60 | #### Synopsis
61 |
62 | - Use 4 spaces per indentation level, no tabs.
63 | - Try to adhere to a 92 character line length limit.
64 | - Use upper camel case convention for [modules](https://docs.julialang.org/en/v1/manual/modules/) and [types](https://docs.julialang.org/en/v1/manual/types/).
65 | - Use lower case with underscores for method names (note: Julia code likes to use lower case without underscores).
66 | - Comments are good, try to explain the intentions of the code.
67 | - Use whitespace to make the code more readable.
68 | - No whitespace at the end of a line (trailing whitespace).
69 | - Avoid padding brackets with spaces. ex. `Int64(value)` preferred over `Int64( value )`.
70 |
71 | #### A Word on Consistency
72 |
73 | When adhering to the Blue style, it's important to realize that these are guidelines, not rules. This is [stated best in the PEP8](http://legacy.python.org/dev/peps/pep-0008/#a-foolish-consistency-is-the-hobgoblin-of-little-minds):
74 |
75 | > A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important.
76 |
77 | > But most importantly: know when to be inconsistent – sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!
78 |
79 |
--------------------------------------------------------------------------------
/developers/transforms/dynamicppl/dynamicppl_link.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TuringLang/docs/846abd6730853cacd7202e517eb011c42f89d45b/developers/transforms/dynamicppl/dynamicppl_link.png
--------------------------------------------------------------------------------
/developers/transforms/dynamicppl/dynamicppl_link2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TuringLang/docs/846abd6730853cacd7202e517eb011c42f89d45b/developers/transforms/dynamicppl/dynamicppl_link2.png
--------------------------------------------------------------------------------
/getting-started/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Getting Started
3 | engine: julia
4 | aliases:
5 | - ../tutorials/docs-00-getting-started/index.html
6 | - ../index.html
7 | ---
8 |
9 | ```{julia}
10 | #| echo: false
11 | #| output: false
12 | using Pkg;
13 | Pkg.instantiate();
14 | ```
15 |
16 | ### Installation
17 |
18 | To use Turing, you need to install Julia first and then install Turing.
19 |
20 | You will need to install Julia 1.10 or greater, which you can get from [the official Julia website](http://julialang.org/downloads/).
21 |
22 | Turing is officially registered in the [Julia General package registry](https://github.com/JuliaRegistries/General), which means that you can install a stable version of Turing by running the following in the Julia REPL:
23 |
24 | ```{julia}
25 | #| eval: false
26 | #| output: false
27 | using Pkg
28 | Pkg.add("Turing")
29 | ```
30 |
31 | ### Supported versions and platforms
32 |
33 | Formally, we only run continuous integration tests on: (1) the minimum supported minor version (typically an LTS release), and (2) the latest minor version of Julia.
34 | We test on Linux (x64), macOS (Apple Silicon), and Windows (x64).
35 | The Turing developer team will prioritise fixing issues on these platforms and versions.
36 |
37 | If you run into a problem on a different version (e.g. older patch releases) or platforms (e.g. 32-bit), please do feel free to [post an issue](https://github.com/TuringLang/Turing.jl/issues/new?template=01-bug-report.yml)!
38 | If we are able to help, we will try to fix it, but we cannot guarantee support for untested versions.
39 |
40 | ### Example usage
41 |
42 | First, we load the Turing and StatsPlots modules.
43 | The latter is required for visualising the results.
44 |
45 | ```{julia}
46 | using Turing
47 | using StatsPlots
48 | ```
49 |
50 | We then specify our model, which is a simple Gaussian model with unknown mean and variance.
51 | Models are defined as ordinary Julia functions, prefixed with the `@model` macro.
52 | Each statement inside closely resembles how the model would be defined with mathematical notation.
53 | Here, both `x` and `y` are observed values, and are therefore passed as function parameters.
54 | `m` and `s²` are the parameters to be inferred.
55 |
56 | ```{julia}
57 | @model function gdemo(x, y)
58 | s² ~ InverseGamma(2, 3)
59 | m ~ Normal(0, sqrt(s²))
60 | x ~ Normal(m, sqrt(s²))
61 | y ~ Normal(m, sqrt(s²))
62 | end
63 | ```
64 |
65 | Suppose we observe `x = 1.5` and `y = 2`, and want to infer the mean and variance.
66 | We can pass these data as arguments to the `gdemo` function, and run a sampler to collect the results.
67 | Here, we collect 1000 samples using the No U-Turn Sampler (NUTS) algorithm.
68 |
69 | ```{julia}
70 | chain = sample(gdemo(1.5, 2), NUTS(), 1000, progress=false)
71 | ```
72 |
73 | We can plot the results:
74 |
75 | ```{julia}
76 | plot(chain)
77 | ```
78 |
79 | and obtain summary statistics by indexing the chain:
80 |
81 | ```{julia}
82 | mean(chain[:m]), mean(chain[:s²])
83 | ```
84 |
85 | ### Where to go next
86 |
87 | ::: {.callout-note title="Note on prerequisites"}
88 | Familiarity with Julia is assumed throughout the Turing documentation.
89 | If you are new to Julia, [Learning Julia](https://julialang.org/learning/) is a good starting point.
90 |
91 | The underlying theory of Bayesian machine learning is not explained in detail in this documentation.
92 | A thorough introduction to the field is [*Pattern Recognition and Machine Learning*](https://www.springer.com/us/book/9780387310732) (Bishop, 2006); an online version is available [here (PDF, 18.1 MB)](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf).
93 | :::
94 |
95 | The next page on [Turing's core functionality]({{}}) explains the basic features of the Turing language.
96 | From there, you can either look at [worked examples of how different models are implemented in Turing]({{}}), or [specific tips and tricks that can help you get the most out of Turing]({{}}).
97 |
--------------------------------------------------------------------------------
/theming/styles.css:
--------------------------------------------------------------------------------
1 | .navbar a:hover {
2 | text-decoration: none;
3 | }
4 |
5 | .cell-output {
6 | border: 1px dashed;
7 | }
8 |
9 | .cell-bg {
10 | background-color: #f1f3f5;
11 | }
12 |
13 | .cell-output-stdout code {
14 | word-break: break-wor !important;
15 | white-space: pre-wrap !important;
16 | }
17 |
18 | .cell-output-display svg {
19 | height: fit-content;
20 | width: fit-content;
21 |
22 | &.mermaid-js {
23 | /* fit-content for mermaid diagrams makes them really small, so we
24 | * default to 100% */
25 | width: 100%;
26 | }
27 | }
28 |
29 | .cell-output-display img {
30 | max-width: 100%;
31 | max-height: 100%;
32 | object-fit: contain;
33 | }
34 |
35 | .nav-footer-center {
36 | display: flex;
37 | justify-content: center;
38 | }
39 |
40 | .dropdown-menu {
41 | text-align: center;
42 | min-width: 100px !important;
43 | border-radius: 5px;
44 | max-height: 250px;
45 | overflow: scroll;
46 | }
47 |
--------------------------------------------------------------------------------
/theming/theme-dark.scss:
--------------------------------------------------------------------------------
1 | /*-- scss:defaults --*/
2 | // Cosmo 5.3.3
3 | // Bootswatch
4 |
5 | $theme: "cosmo" !default;
6 |
7 | // Manually-added colors
8 |
9 | $background-nav: #192222;
10 | $background-body: #131818;
11 | $foreground: #1bb3ac;
12 | $links:#2aa198;
13 | $links-hover: #31dce6;
14 | $code-background-color: #172424;
15 | $li: #bcbcbc;
16 |
17 | // Quarto default colors
18 |
19 | $white: #ffffff !default;
20 | $gray-100: #f8f9fa !default;
21 | $gray-200: #e9ecef !default;
22 | $gray-300: #dee2e6 !default;
23 | $gray-400: #ced4da !default;
24 | $gray-500: #adb5bd !default;
25 | $gray-600: #868e96 !default;
26 | $gray-700: #495057 !default;
27 | $gray-800: #373a3c !default;
28 | $gray-900: #212529 !default;
29 | $black: #000000 !default;
30 |
31 | $indigo: #6610f2 !default;
32 | $purple: #613d7c !default;
33 | $pink: #e83e8c !default;
34 | $red: #ff0039 !default;
35 | $orange: #f0ad4e !default;
36 | $yellow: #ff7518 !default;
37 | $green: #3fb618 !default;
38 | $teal: #20c997 !default;
39 | $cyan: #9954bb !default;
40 |
41 | $primary: $links-hover !default;
42 | $secondary: $gray-800 !default;
43 | $success: $green !default;
44 | $info: $cyan !default;
45 | $warning: $yellow !default;
46 | $danger: $red !default;
47 | $light: $gray-100 !default;
48 | $dark: $gray-800 !default;
49 |
50 | $min-contrast-ratio: 2.6 !default;
51 |
52 | // Options
53 |
54 | $enable-rounded: false !default;
55 |
56 | // Fonts
57 |
58 | // stylelint-disable-next-line value-keyword-case
59 | $font-family-sans-serif: "Source Sans Pro", -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol" !default;
60 | $headings-font-weight: 400 !default;
61 |
62 | // Tables
63 |
64 | $table-color: initial !default;
65 |
66 | // Alerts
67 |
68 | $alert-border-width: 0 !default;
69 |
70 | // Progress bars
71 |
72 | $progress-height: .5rem !default;
73 |
74 |
75 | // Custom tweaks for Quarto-Cosmo
76 |
77 | $navbar-bg: $background-nav;
78 | $navbar-fg: $foreground;
79 | $footer-bg: $background-nav;
80 | $footer-fg: $foreground;
81 | $body-color: $white;
82 | $body-bg: $background-body;
83 |
84 | a, pre code {
85 | color: $links !important;
86 | }
87 |
88 | pre {
89 | color: $foreground !important;
90 | }
91 | a:hover {
92 | color: $links-hover !important;
93 | }
94 |
95 | code, p code, ol code, li code, h1 code {
96 | background-color: $code-background-color !important;
97 | color: $links;
98 | }
99 |
100 | .cell, .anchored code {
101 | background-color: $code-background-color !important;
102 | color: $links;
103 | }
104 |
105 | div.sourceCode {
106 | background-color: $code-background-color !important;
107 | }
108 |
109 | li {
110 | color: $li !important;
111 | }
112 |
113 | .menu-text:hover {
114 | color: $links-hover !important;
115 | }
116 |
117 | p {
118 | color: $li !important;
119 | }
120 |
121 | .quarto-title-breadcrumbs .breadcrumb li:last-of-type a {
122 | color: #6c757d !important;
123 | }
124 |
125 | .ansi-bright-black-fg{
126 | color: $foreground !important;
127 | }
128 | ::selection {
129 | color: $links-hover;
130 | background: $background-nav;
131 | }
132 |
133 |
134 | .tooltip {
135 | --bs-tooltip-color: $black !important;
136 | --bs-tooltip-bg: $white !important;
137 | }
138 |
--------------------------------------------------------------------------------
/tutorials/bayesian-linear-regression/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian Linear Regression
3 | engine: julia
4 | aliases:
5 | - ../05-linear-regression/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | Turing is powerful when applied to complex hierarchical models, but it can also be put to task at common statistical procedures, like [linear regression](https://en.wikipedia.org/wiki/Linear_regression).
16 | This tutorial covers how to implement a linear regression model in Turing.
17 |
18 | ## Set Up
19 |
20 | We begin by importing all the necessary libraries.
21 |
22 | ```{julia}
23 | # Import Turing.
24 | using Turing
25 |
26 | # Package for loading the data set.
27 | using RDatasets
28 |
29 | # Package for visualization.
30 | using StatsPlots
31 |
32 | # Functionality for splitting the data.
33 | using MLUtils: splitobs
34 |
35 | # Functionality for constructing arrays with identical elements efficiently.
36 | using FillArrays
37 |
38 | # Functionality for normalizing the data and evaluating the model predictions.
39 | using StatsBase
40 |
41 | # Functionality for working with scaled identity matrices.
42 | using LinearAlgebra
43 |
44 | # Set a seed for reproducibility.
45 | using Random
46 | Random.seed!(0);
47 | ```
48 |
49 | ```{julia}
50 | #| output: false
51 | setprogress!(false)
52 | ```
53 |
54 | We will use the `mtcars` dataset from the [RDatasets](https://github.com/JuliaStats/RDatasets.jl) package.
55 | `mtcars` contains a variety of statistics on different car models, including their miles per gallon, number of cylinders, and horsepower, among others.
56 |
57 | We want to know if we can construct a Bayesian linear regression model to predict the miles per gallon of a car, given the other statistics it has.
58 | Let us take a look at the data we have.
59 |
60 | ```{julia}
61 | # Load the dataset.
62 | data = RDatasets.dataset("datasets", "mtcars")
63 |
64 | # Show the first six rows of the dataset.
65 | first(data, 6)
66 | ```
67 |
68 | ```{julia}
69 | size(data)
70 | ```
71 |
72 | The next step is to get our data ready for testing. We'll split the `mtcars` dataset into two subsets, one for training our model and one for evaluating our model. Then, we separate the targets we want to learn (`MPG`, in this case) and standardize the datasets by subtracting each column's means and dividing by the standard deviation of that column. The resulting data is not very familiar looking, but this standardization process helps the sampler converge far easier.
73 |
74 | ```{julia}
75 | # Remove the model column.
76 | select!(data, Not(:Model))
77 |
78 | # Split our dataset 70%/30% into training/test sets.
79 | trainset, testset = map(DataFrame, splitobs(data; at=0.7, shuffle=true))
80 |
81 | # Turing requires data in matrix form.
82 | target = :MPG
83 | train = Matrix(select(trainset, Not(target)))
84 | test = Matrix(select(testset, Not(target)))
85 | train_target = trainset[:, target]
86 | test_target = testset[:, target]
87 |
88 | # Standardize the features.
89 | dt_features = fit(ZScoreTransform, train; dims=1)
90 | StatsBase.transform!(dt_features, train)
91 | StatsBase.transform!(dt_features, test)
92 |
93 | # Standardize the targets.
94 | dt_targets = fit(ZScoreTransform, train_target)
95 | StatsBase.transform!(dt_targets, train_target)
96 | StatsBase.transform!(dt_targets, test_target);
97 | ```
98 |
99 | ## Model Specification
100 |
101 | In a traditional frequentist model using [OLS](https://en.wikipedia.org/wiki/Ordinary_least_squares), our model might look like:
102 |
103 | $$
104 | \mathrm{MPG}_i = \alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}
105 | $$
106 |
107 | where $\boldsymbol{\beta}$ is a vector of coefficients and $\boldsymbol{X}$ is a vector of inputs for observation $i$. The Bayesian model we are more concerned with is the following:
108 |
109 | $$
110 | \mathrm{MPG}_i \sim \mathcal{N}(\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}, \sigma^2)
111 | $$
112 |
113 | where $\alpha$ is an intercept term common to all observations, $\boldsymbol{\beta}$ is a coefficient vector, $\boldsymbol{X_i}$ is the observed data for car $i$, and $\sigma^2$ is a common variance term.
114 |
115 | For $\sigma^2$, we assign a prior of `truncated(Normal(0, 100); lower=0)`.
116 | This is consistent with [Andrew Gelman's recommendations](http://www.stat.columbia.edu/%7Egelman/research/published/taumain.pdf) on noninformative priors for variance.
117 | The intercept term ($\alpha$) is assumed to be normally distributed with a mean of zero and a variance of three.
118 | This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that.
119 | Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10.
120 | We do not know that our coefficients are different from zero, and we don't know which ones are likely to be the most important, so the variance term is quite high.
121 | Lastly, each observation $y_i$ is distributed according to the calculated `mu` term given by $\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}$.
122 |
123 | ```{julia}
124 | # Bayesian linear regression.
125 | @model function linear_regression(x, y)
126 | # Set variance prior.
127 | σ² ~ truncated(Normal(0, 100); lower=0)
128 |
129 | # Set intercept prior.
130 | intercept ~ Normal(0, sqrt(3))
131 |
132 | # Set the priors on our coefficients.
133 | nfeatures = size(x, 2)
134 | coefficients ~ MvNormal(Zeros(nfeatures), 10.0 * I)
135 |
136 | # Calculate all the mu terms.
137 | mu = intercept .+ x * coefficients
138 | return y ~ MvNormal(mu, σ² * I)
139 | end
140 | ```
141 |
142 | With our model specified, we can call the sampler. We will use the No U-Turn Sampler ([NUTS](https://turinglang.org/stable/docs/library/#Turing.Inference.NUTS)) here.
143 |
144 | ```{julia}
145 | model = linear_regression(train, train_target)
146 | chain = sample(model, NUTS(), 5_000)
147 | ```
148 |
149 | We can also check the densities and traces of the parameters visually using the `plot` functionality.
150 |
151 | ```{julia}
152 | plot(chain)
153 | ```
154 |
155 | It looks like all parameters have converged.
156 |
157 | ```{julia}
158 | #| echo: false
159 | let
160 | ess_df = ess(chain)
161 | @assert minimum(ess_df[:, :ess]) > 500 "Minimum ESS: $(minimum(ess_df[:, :ess])) - not > 700"
162 | @assert mean(ess_df[:, :ess]) > 2_000 "Mean ESS: $(mean(ess_df[:, :ess])) - not > 2000"
163 | @assert maximum(ess_df[:, :ess]) > 3_500 "Maximum ESS: $(maximum(ess_df[:, :ess])) - not > 3500"
164 | end
165 | ```
166 |
167 | ## Comparing to OLS
168 |
169 | A satisfactory test of our model is to evaluate how well it predicts. Importantly, we want to compare our model to existing tools like OLS. The code below uses the [GLM.jl](https://juliastats.org/GLM.jl/stable/) package to generate a traditional OLS multiple regression model on the same data as our probabilistic model.
170 |
171 | ```{julia}
172 | # Import the GLM package.
173 | using GLM
174 |
175 | # Perform multiple regression OLS.
176 | train_with_intercept = hcat(ones(size(train, 1)), train)
177 | ols = lm(train_with_intercept, train_target)
178 |
179 | # Compute predictions on the training data set and unstandardize them.
180 | train_prediction_ols = GLM.predict(ols)
181 | StatsBase.reconstruct!(dt_targets, train_prediction_ols)
182 |
183 | # Compute predictions on the test data set and unstandardize them.
184 | test_with_intercept = hcat(ones(size(test, 1)), test)
185 | test_prediction_ols = GLM.predict(ols, test_with_intercept)
186 | StatsBase.reconstruct!(dt_targets, test_prediction_ols);
187 | ```
188 |
189 | The function below accepts a chain and an input matrix and calculates predictions. We use the samples of the model parameters in the chain starting with sample 200.
190 |
191 | ```{julia}
192 | # Make a prediction given an input vector.
193 | function prediction(chain, x)
194 | p = get_params(chain[200:end, :, :])
195 | targets = p.intercept' .+ x * reduce(hcat, p.coefficients)'
196 | return vec(mean(targets; dims=2))
197 | end
198 | ```
199 |
200 | When we make predictions, we unstandardize them so they are more understandable.
201 |
202 | ```{julia}
203 | # Calculate the predictions for the training and testing sets and unstandardize them.
204 | train_prediction_bayes = prediction(chain, train)
205 | StatsBase.reconstruct!(dt_targets, train_prediction_bayes)
206 | test_prediction_bayes = prediction(chain, test)
207 | StatsBase.reconstruct!(dt_targets, test_prediction_bayes)
208 |
209 | # Show the predictions on the test data set.
210 | DataFrame(; MPG=testset[!, target], Bayes=test_prediction_bayes, OLS=test_prediction_ols)
211 | ```
212 |
213 | Now let's evaluate the loss for each method, and each prediction set. We will use the mean squared error to evaluate loss, given by
214 | $$
215 | \mathrm{MSE} = \frac{1}{n} \sum_{i=1}^n {(y_i - \hat{y_i})^2}
216 | $$
217 | where $y_i$ is the actual value (true MPG) and $\hat{y_i}$ is the predicted value using either OLS or Bayesian linear regression. A lower SSE indicates a closer fit to the data.
218 |
219 | ```{julia}
220 | println(
221 | "Training set:",
222 | "\n\tBayes loss: ",
223 | msd(train_prediction_bayes, trainset[!, target]),
224 | "\n\tOLS loss: ",
225 | msd(train_prediction_ols, trainset[!, target]),
226 | )
227 |
228 | println(
229 | "Test set:",
230 | "\n\tBayes loss: ",
231 | msd(test_prediction_bayes, testset[!, target]),
232 | "\n\tOLS loss: ",
233 | msd(test_prediction_ols, testset[!, target]),
234 | )
235 | ```
236 |
237 | ```{julia}
238 | #| echo: false
239 | let
240 | bayes_train_loss = msd(train_prediction_bayes, trainset[!, target])
241 | bayes_test_loss = msd(test_prediction_bayes, testset[!, target])
242 | ols_train_loss = msd(train_prediction_ols, trainset[!, target])
243 | ols_test_loss = msd(test_prediction_ols, testset[!, target])
244 | @assert bayes_train_loss < bayes_test_loss "Bayesian training loss ($bayes_train_loss) >= Bayesian test loss ($bayes_test_loss)"
245 | @assert ols_train_loss < ols_test_loss "OLS training loss ($ols_train_loss) >= OLS test loss ($ols_test_loss)"
246 | @assert isapprox(bayes_train_loss, ols_train_loss; rtol=0.01) "Difference between Bayesian training loss ($bayes_train_loss) and OLS training loss ($ols_train_loss) unexpectedly large!"
247 | @assert isapprox(bayes_test_loss, ols_test_loss; rtol=0.05) "Difference between Bayesian test loss ($bayes_test_loss) and OLS test loss ($ols_test_loss) unexpectedly large!"
248 | end
249 | ```
250 |
251 | As we can see above, OLS and our Bayesian model fit our training and test data set about the same.
252 |
--------------------------------------------------------------------------------
/tutorials/bayesian-logistic-regression/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian Logistic Regression
3 | engine: julia
4 | aliases:
5 | - ../02-logistic-regression/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | [Bayesian logistic regression](https://en.wikipedia.org/wiki/Logistic_regression#Bayesian) is the Bayesian counterpart to a common tool in machine learning, logistic regression.
16 | The goal of logistic regression is to predict a one or a zero for a given training item.
17 | An example might be predicting whether someone is sick or ill given their symptoms and personal information.
18 |
19 | In our example, we'll be working to predict whether someone is likely to default with a synthetic dataset found in the `RDatasets` package. This dataset, `Defaults`, comes from R's [ISLR](https://cran.r-project.org/web/packages/ISLR/index.html) package and contains information on borrowers.
20 |
21 | To start, let's import all the libraries we'll need.
22 |
23 | ```{julia}
24 | # Import Turing and Distributions.
25 | using Turing, Distributions
26 |
27 | # Import RDatasets.
28 | using RDatasets
29 |
30 | # Import MCMCChains, Plots, and StatsPlots for visualizations and diagnostics.
31 | using MCMCChains, Plots, StatsPlots
32 |
33 | # We need a logistic function, which is provided by StatsFuns.
34 | using StatsFuns: logistic
35 |
36 | # Functionality for splitting and normalizing the data
37 | using MLDataUtils: shuffleobs, stratifiedobs, rescale!
38 |
39 | # Set a seed for reproducibility.
40 | using Random
41 | Random.seed!(0);
42 | ```
43 |
44 | ## Data Cleaning & Set Up
45 |
46 | Now we're going to import our dataset. The first six rows of the dataset are shown below so you can get a good feel for what kind of data we have.
47 |
48 | ```{julia}
49 | # Import the "Default" dataset.
50 | data = RDatasets.dataset("ISLR", "Default");
51 |
52 | # Show the first six rows of the dataset.
53 | first(data, 6)
54 | ```
55 |
56 | Most machine learning processes require some effort to tidy up the data, and this is no different. We need to convert the `Default` and `Student` columns, which say "Yes" or "No" into 1s and 0s. Afterwards, we'll get rid of the old words-based columns.
57 |
58 | ```{julia}
59 | # Convert "Default" and "Student" to numeric values.
60 | data[!, :DefaultNum] = [r.Default == "Yes" ? 1.0 : 0.0 for r in eachrow(data)]
61 | data[!, :StudentNum] = [r.Student == "Yes" ? 1.0 : 0.0 for r in eachrow(data)]
62 |
63 | # Delete the old columns which say "Yes" and "No".
64 | select!(data, Not([:Default, :Student]))
65 |
66 | # Show the first six rows of our edited dataset.
67 | first(data, 6)
68 | ```
69 |
70 | After we've done that tidying, it's time to split our dataset into training and testing sets, and separate the labels from the data. We separate our data into two halves, `train` and `test`. You can use a higher percentage of splitting (or a lower one) by modifying the `at = 0.05` argument. We have highlighted the use of only a 5% sample to show the power of Bayesian inference with small sample sizes.
71 |
72 | We must rescale our variables so that they are centered around zero by subtracting each column by the mean and dividing it by the standard deviation. Without this step, Turing's sampler will have a hard time finding a place to start searching for parameter estimates. To do this we will leverage `MLDataUtils`, which also lets us effortlessly shuffle our observations and perform a stratified split to get a representative test set.
73 |
74 | ```{julia}
75 | function split_data(df, target; at=0.70)
76 | shuffled = shuffleobs(df)
77 | return trainset, testset = stratifiedobs(row -> row[target], shuffled; p=at)
78 | end
79 |
80 | features = [:StudentNum, :Balance, :Income]
81 | numerics = [:Balance, :Income]
82 | target = :DefaultNum
83 |
84 | trainset, testset = split_data(data, target; at=0.05)
85 | for feature in numerics
86 | μ, σ = rescale!(trainset[!, feature]; obsdim=1)
87 | rescale!(testset[!, feature], μ, σ; obsdim=1)
88 | end
89 |
90 | # Turing requires data in matrix form, not dataframe
91 | train = Matrix(trainset[:, features])
92 | test = Matrix(testset[:, features])
93 | train_label = trainset[:, target]
94 | test_label = testset[:, target];
95 | ```
96 |
97 | ## Model Declaration
98 |
99 | Finally, we can define our model.
100 |
101 | `logistic_regression` takes four arguments:
102 |
103 | - `x` is our set of independent variables;
104 | - `y` is the element we want to predict;
105 | - `n` is the number of observations we have; and
106 | - `σ` is the standard deviation we want to assume for our priors.
107 |
108 | Within the model, we create four coefficients (`intercept`, `student`, `balance`, and `income`) and assign a prior of normally distributed with means of zero and standard deviations of `σ`. We want to find values of these four coefficients to predict any given `y`.
109 |
110 | The `for` block creates a variable `v` which is the logistic function. We then observe the likelihood of calculating `v` given the actual label, `y[i]`.
111 |
112 | ```{julia}
113 | # Bayesian logistic regression (LR)
114 | @model function logistic_regression(x, y, n, σ)
115 | intercept ~ Normal(0, σ)
116 |
117 | student ~ Normal(0, σ)
118 | balance ~ Normal(0, σ)
119 | income ~ Normal(0, σ)
120 |
121 | for i in 1:n
122 | v = logistic(intercept + student * x[i, 1] + balance * x[i, 2] + income * x[i, 3])
123 | y[i] ~ Bernoulli(v)
124 | end
125 | end;
126 | ```
127 |
128 | ## Sampling
129 |
130 | Now we can run our sampler. This time we'll use [`NUTS`](https://turinglang.org/stable/docs/library/#Turing.Inference.NUTS) to sample from our posterior.
131 |
132 | ```{julia}
133 | #| output: false
134 | setprogress!(false)
135 | ```
136 |
137 | ```{julia}
138 | #| output: false
139 | # Retrieve the number of observations.
140 | n, _ = size(train)
141 |
142 | # Sample using NUTS.
143 | m = logistic_regression(train, train_label, n, 1)
144 | chain = sample(m, NUTS(), MCMCThreads(), 1_500, 3)
145 | ```
146 |
147 | ```{julia}
148 | #| echo: false
149 | chain
150 | ```
151 |
152 | ::: {.callout-warning collapse="true"}
153 | ## Sampling With Multiple Threads
154 | The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
155 | will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.](https://turinglang.org/dev/docs/using-turing/guide/#sampling-multiple-chains)
156 | :::
157 |
158 | ```{julia}
159 | #| echo: false
160 | let
161 | mean_params = mean(chain)
162 | @assert mean_params[:student, :mean] < 0.1
163 | @assert mean_params[:balance, :mean] > 1
164 | end
165 | ```
166 |
167 | Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.
168 |
169 | ```{julia}
170 | plot(chain)
171 | ```
172 |
173 | ```{julia}
174 | #| echo: false
175 | let
176 | mean_params = mapreduce(hcat, mean(chain; append_chains=false)) do df
177 | return df[:, :mean]
178 | end
179 | for i in (2, 3)
180 | @assert mean_params[:, i] != mean_params[:, 1]
181 | @assert isapprox(mean_params[:, i], mean_params[:, 1]; rtol=5e-2)
182 | end
183 | end
184 | ```
185 |
186 | Looks good!
187 |
188 | We can also use the `corner` function from MCMCChains to show the distributions of the various parameters of our logistic regression.
189 |
190 | ```{julia}
191 | # The labels to use.
192 | l = [:student, :balance, :income]
193 |
194 | # Use the corner function. Requires StatsPlots and MCMCChains.
195 | corner(chain, l)
196 | ```
197 |
198 | Fortunately the corner plot appears to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter's sampled values to estimate our model to make predictions.
199 |
200 | ## Making Predictions
201 |
202 | How do we test how well the model actually predicts whether someone is likely to default? We need to build a prediction function that takes the `test` object we made earlier and runs it through the average parameter calculated during sampling.
203 |
204 | The `prediction` function below takes a `Matrix` and a `Chain` object. It takes the mean of each parameter's sampled values and re-runs the logistic function using those mean values for every element in the test set.
205 |
206 | ```{julia}
207 | function prediction(x::Matrix, chain, threshold)
208 | # Pull the means from each parameter's sampled values in the chain.
209 | intercept = mean(chain[:intercept])
210 | student = mean(chain[:student])
211 | balance = mean(chain[:balance])
212 | income = mean(chain[:income])
213 |
214 | # Retrieve the number of rows.
215 | n, _ = size(x)
216 |
217 | # Generate a vector to store our predictions.
218 | v = Vector{Float64}(undef, n)
219 |
220 | # Calculate the logistic function for each element in the test set.
221 | for i in 1:n
222 | num = logistic(
223 | intercept .+ student * x[i, 1] + balance * x[i, 2] + income * x[i, 3]
224 | )
225 | if num >= threshold
226 | v[i] = 1
227 | else
228 | v[i] = 0
229 | end
230 | end
231 | return v
232 | end;
233 | ```
234 |
235 | Let's see how we did! We run the test matrix through the prediction function, and compute the [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error) (MSE) for our prediction. The `threshold` variable sets the sensitivity of the predictions. For example, a threshold of 0.07 will predict a defualt value of 1 for any predicted value greater than 0.07 and no default if it is less than 0.07.
236 |
237 | ```{julia}
238 | # Set the prediction threshold.
239 | threshold = 0.07
240 |
241 | # Make the predictions.
242 | predictions = prediction(test, chain, threshold)
243 |
244 | # Calculate MSE for our test set.
245 | loss = sum((predictions - test_label) .^ 2) / length(test_label)
246 | ```
247 |
248 | Perhaps more important is to see what percentage of defaults we correctly predicted. The code below simply counts defaults and predictions and presents the results.
249 |
250 | ```{julia}
251 | defaults = sum(test_label)
252 | not_defaults = length(test_label) - defaults
253 |
254 | predicted_defaults = sum(test_label .== predictions .== 1)
255 | predicted_not_defaults = sum(test_label .== predictions .== 0)
256 |
257 | println("Defaults: $defaults
258 | Predictions: $predicted_defaults
259 | Percentage defaults correct $(predicted_defaults/defaults)")
260 |
261 | println("Not defaults: $not_defaults
262 | Predictions: $predicted_not_defaults
263 | Percentage non-defaults correct $(predicted_not_defaults/not_defaults)")
264 | ```
265 |
266 | ```{julia}
267 | #| echo: false
268 | let
269 | percentage_correct = predicted_defaults / defaults
270 | @assert 0.6 < percentage_correct
271 | end
272 | ```
273 |
274 | The above shows that with a threshold of 0.07, we correctly predict a respectable portion of the defaults, and correctly identify most non-defaults. This is fairly sensitive to a choice of threshold, and you may wish to experiment with it.
275 |
276 | This tutorial has demonstrated how to use Turing to perform Bayesian logistic regression.
277 |
--------------------------------------------------------------------------------
/tutorials/bayesian-neural-networks/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian Neural Networks
3 | engine: julia
4 | aliases:
5 | - ../03-bayesian-neural-network/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | In this tutorial, we demonstrate how one can implement a Bayesian Neural Network using a combination of Turing and [Lux](https://github.com/LuxDL/Lux.jl), a suite of machine learning tools. We will use Lux to specify the neural network's layers and Turing to implement the probabilistic inference, with the goal of implementing a classification algorithm.
16 |
17 | We will begin with importing the relevant libraries.
18 |
19 | ```{julia}
20 | using Turing
21 | using FillArrays
22 | using Lux
23 | using Plots
24 | import Mooncake
25 | using Functors
26 |
27 | using LinearAlgebra
28 | using Random
29 | ```
30 |
31 | Our goal here is to use a Bayesian neural network to classify points in an artificial dataset.
32 | The code below generates data points arranged in a box-like pattern and displays a graph of the dataset we will be working with.
33 |
34 | ```{julia}
35 | # Number of points to generate
36 | N = 80
37 | M = round(Int, N / 4)
38 | rng = Random.default_rng()
39 | Random.seed!(rng, 1234)
40 |
41 | # Generate artificial data
42 | x1s = rand(rng, Float32, M) * 4.5f0;
43 | x2s = rand(rng, Float32, M) * 4.5f0;
44 | xt1s = Array([[x1s[i] + 0.5f0; x2s[i] + 0.5f0] for i in 1:M])
45 | x1s = rand(rng, Float32, M) * 4.5f0;
46 | x2s = rand(rng, Float32, M) * 4.5f0;
47 | append!(xt1s, Array([[x1s[i] - 5.0f0; x2s[i] - 5.0f0] for i in 1:M]))
48 |
49 | x1s = rand(rng, Float32, M) * 4.5f0;
50 | x2s = rand(rng, Float32, M) * 4.5f0;
51 | xt0s = Array([[x1s[i] + 0.5f0; x2s[i] - 5.0f0] for i in 1:M])
52 | x1s = rand(rng, Float32, M) * 4.5f0;
53 | x2s = rand(rng, Float32, M) * 4.5f0;
54 | append!(xt0s, Array([[x1s[i] - 5.0f0; x2s[i] + 0.5f0] for i in 1:M]))
55 |
56 | # Store all the data for later
57 | xs = [xt1s; xt0s]
58 | ts = [ones(2 * M); zeros(2 * M)]
59 |
60 | # Plot data points.
61 | function plot_data()
62 | x1 = map(e -> e[1], xt1s)
63 | y1 = map(e -> e[2], xt1s)
64 | x2 = map(e -> e[1], xt0s)
65 | y2 = map(e -> e[2], xt0s)
66 |
67 | Plots.scatter(x1, y1; color="red", clim=(0, 1))
68 | return Plots.scatter!(x2, y2; color="blue", clim=(0, 1))
69 | end
70 |
71 | plot_data()
72 | ```
73 |
74 | ## Building a Neural Network
75 |
76 | The next step is to define a [feedforward neural network](https://en.wikipedia.org/wiki/Feedforward_neural_network) where we express our parameters as distributions, and not single points as with traditional neural networks.
77 | For this we will use `Dense` to define liner layers and compose them via `Chain`, both are neural network primitives from Lux.
78 | The network `nn_initial` we created has two hidden layers with `tanh` activations and one output layer with sigmoid (`σ`) activation, as shown below.
79 |
80 | ```{dot}
81 | //| echo: false
82 | graph G {
83 | rankdir=LR;
84 | nodesep=0.8;
85 | ranksep=0.8;
86 | node [shape=circle, fixedsize=true, width=0.8, style="filled", color=black, fillcolor="white", fontsize=12];
87 |
88 | // Input layer
89 | subgraph cluster_input {
90 | node [label=""];
91 | input1;
92 | input2;
93 | style="rounded"
94 | }
95 |
96 | // Hidden layers
97 | subgraph cluster_hidden1 {
98 | node [label=""];
99 | hidden11;
100 | hidden12;
101 | hidden13;
102 | style="rounded"
103 | }
104 |
105 | subgraph cluster_hidden2 {
106 | node [label=""];
107 | hidden21;
108 | hidden22;
109 | style="rounded"
110 | }
111 |
112 | // Output layer
113 | subgraph cluster_output {
114 | output1 [label=""];
115 | style="rounded"
116 | }
117 |
118 | // Connections from input to hidden layer 1
119 | input1 -- hidden11;
120 | input1 -- hidden12;
121 | input1 -- hidden13;
122 | input2 -- hidden11;
123 | input2 -- hidden12;
124 | input2 -- hidden13;
125 |
126 | // Connections from hidden layer 1 to hidden layer 2
127 | hidden11 -- hidden21;
128 | hidden11 -- hidden22;
129 | hidden12 -- hidden21;
130 | hidden12 -- hidden22;
131 | hidden13 -- hidden21;
132 | hidden13 -- hidden22;
133 |
134 | // Connections from hidden layer 2 to output
135 | hidden21 -- output1;
136 | hidden22 -- output1;
137 |
138 | // Labels
139 | labelloc="b";
140 | fontsize=17;
141 | label="Input layer Hidden layers Output layer";
142 | }
143 | ```
144 |
145 | The `nn_initial` is an instance that acts as a function and can take data as inputs and output predictions.
146 | We will define distributions on the neural network parameters.
147 |
148 | ```{julia}
149 | # Construct a neural network using Lux
150 | nn_initial = Chain(Dense(2 => 3, tanh), Dense(3 => 2, tanh), Dense(2 => 1, σ))
151 |
152 | # Initialize the model weights and state
153 | ps, st = Lux.setup(rng, nn_initial)
154 |
155 | Lux.parameterlength(nn_initial) # number of parameters in NN
156 | ```
157 |
158 | The probabilistic model specification below creates a `parameters` variable, which has IID normal variables. The `parameters` vector represents all parameters of our neural net (weights and biases).
159 |
160 | ```{julia}
161 | # Create a regularization term and a Gaussian prior variance term.
162 | alpha = 0.09
163 | sigma = sqrt(1.0 / alpha)
164 | ```
165 |
166 | We also define a function to construct a named tuple from a vector of sampled parameters.
167 | (We could use [`ComponentArrays`](https://github.com/jonniedie/ComponentArrays.jl) here and broadcast to avoid doing this, but this way avoids introducing an extra dependency.)
168 |
169 | ```{julia}
170 | function vector_to_parameters(ps_new::AbstractVector, ps::NamedTuple)
171 | @assert length(ps_new) == Lux.parameterlength(ps)
172 | i = 1
173 | function get_ps(x)
174 | z = reshape(view(ps_new, i:(i + length(x) - 1)), size(x))
175 | i += length(x)
176 | return z
177 | end
178 | return fmap(get_ps, ps)
179 | end
180 | ```
181 |
182 | To interface with external libraries it is often desirable to use the [`StatefulLuxLayer`](https://lux.csail.mit.edu/stable/api/Lux/utilities#Lux.StatefulLuxLayer) to automatically handle the neural network states.
183 |
184 | ```{julia}
185 | const nn = StatefulLuxLayer{true}(nn_initial, nothing, st)
186 |
187 | # Specify the probabilistic model.
188 | @model function bayes_nn(xs, ts; sigma = sigma, ps = ps, nn = nn)
189 | # Sample the parameters
190 | nparameters = Lux.parameterlength(nn_initial)
191 | parameters ~ MvNormal(zeros(nparameters), Diagonal(abs2.(sigma .* ones(nparameters))))
192 |
193 | # Forward NN to make predictions
194 | preds = Lux.apply(nn, xs, f32(vector_to_parameters(parameters, ps)))
195 |
196 | # Observe each prediction.
197 | for i in eachindex(ts)
198 | ts[i] ~ Bernoulli(preds[i])
199 | end
200 | end
201 | ```
202 |
203 | Inference can now be performed by calling `sample`. We use the `NUTS` Hamiltonian Monte Carlo sampler here.
204 |
205 | ```{julia}
206 | #| output: false
207 | setprogress!(false)
208 | ```
209 |
210 | ```{julia}
211 | # Perform inference.
212 | n_iters = 2_000
213 | ch = sample(bayes_nn(reduce(hcat, xs), ts), NUTS(; adtype=AutoMooncake(; config=nothing)), n_iters);
214 | ```
215 |
216 | Now we extract the parameter samples from the sampled chain as `θ` (this is of size `5000 x 20` where `5000` is the number of iterations and `20` is the number of parameters).
217 | We'll use these primarily to determine how good our model's classifier is.
218 |
219 | ```{julia}
220 | # Extract all weight and bias parameters.
221 | θ = MCMCChains.group(ch, :parameters).value;
222 | ```
223 |
224 | ## Prediction Visualization
225 |
226 | We can use [MAP estimation](https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation) to classify our population by using the set of weights that provided the highest log posterior.
227 |
228 | ```{julia}
229 | # A helper to run the nn through data `x` using parameters `θ`
230 | nn_forward(x, θ) = nn(x, vector_to_parameters(θ, ps))
231 |
232 | # Plot the data we have.
233 | fig = plot_data()
234 |
235 | # Find the index that provided the highest log posterior in the chain.
236 | _, i = findmax(ch[:lp])
237 |
238 | # Extract the max row value from i.
239 | i = i.I[1]
240 |
241 | # Plot the posterior distribution with a contour plot
242 | x1_range = collect(range(-6; stop=6, length=25))
243 | x2_range = collect(range(-6; stop=6, length=25))
244 | Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]
245 | contour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)
246 | fig
247 | ```
248 |
249 | The contour plot above shows that the MAP method is not too bad at classifying our data.
250 |
251 | Now we can visualize our predictions.
252 |
253 | $$
254 | p(\tilde{x} | X, \alpha) = \int_{\theta} p(\tilde{x} | \theta) p(\theta | X, \alpha) \approx \sum_{\theta \sim p(\theta | X, \alpha)}f_{\theta}(\tilde{x})
255 | $$
256 |
257 | The `nn_predict` function takes the average predicted value from a network parameterized by weights drawn from the MCMC chain.
258 |
259 | ```{julia}
260 | # Return the average predicted value across
261 | # multiple weights.
262 | function nn_predict(x, θ, num)
263 | num = min(num, size(θ, 1)) # make sure num does not exceed the number of samples
264 | return mean([first(nn_forward(x, view(θ, i, :))) for i in 1:10:num])
265 | end
266 | ```
267 |
268 | Next, we use the `nn_predict` function to predict the value at a sample of points where the `x1` and `x2` coordinates range between -6 and 6. As we can see below, we still have a satisfactory fit to our data, and more importantly, we can also see where the neural network is uncertain about its predictions much easier---those regions between cluster boundaries.
269 |
270 | ```{julia}
271 | # Plot the average prediction.
272 | fig = plot_data()
273 |
274 | n_end = 1500
275 | x1_range = collect(range(-6; stop=6, length=25))
276 | x2_range = collect(range(-6; stop=6, length=25))
277 | Z = [nn_predict([x1, x2], θ, n_end)[1] for x1 in x1_range, x2 in x2_range]
278 | contour!(x1_range, x2_range, Z; linewidth=3, colormap=:seaborn_bright)
279 | fig
280 | ```
281 |
282 | Suppose we are interested in how the predictive power of our Bayesian neural network evolved between samples. In that case, the following graph displays an animation of the contour plot generated from the network weights in samples 1 to 1,000.
283 |
284 | ```{julia}
285 | # Number of iterations to plot.
286 | n_end = 500
287 |
288 | anim = @gif for i in 1:n_end
289 | plot_data()
290 | Z = [nn_forward([x1, x2], θ[i, :])[1] for x1 in x1_range, x2 in x2_range]
291 | contour!(x1_range, x2_range, Z; title="Iteration $i", clim=(0, 1))
292 | end every 5
293 | ```
294 |
295 | This has been an introduction to the applications of Turing and Lux in defining Bayesian neural networks.
296 |
--------------------------------------------------------------------------------
/tutorials/bayesian-poisson-regression/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bayesian Poisson Regression
3 | engine: julia
4 | aliases:
5 | - ../07-poisson-regression/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | This notebook is ported from the [example notebook](https://www.pymc.io/projects/examples/en/latest/generalized_linear_models/GLM-poisson-regression.html) of PyMC3 on Poisson Regression.
16 |
17 | [Poisson Regression](https://en.wikipedia.org/wiki/Poisson_regression) is a technique commonly used to model count data.
18 | Some of the applications include predicting the number of people defaulting on their loans or the number of cars running on a highway on a given day.
19 | This example describes a method to implement the Bayesian version of this technique using Turing.
20 |
21 | We will generate the dataset that we will be working on which describes the relationship between number of times a person sneezes during the day with his alcohol consumption and medicinal intake.
22 |
23 | We start by importing the required libraries.
24 |
25 | ```{julia}
26 | #Import Turing, Distributions and DataFrames
27 | using Turing, Distributions, DataFrames, Distributed
28 |
29 | # Import MCMCChain, Plots, and StatsPlots for visualizations and diagnostics.
30 | using MCMCChains, Plots, StatsPlots
31 |
32 | # Set a seed for reproducibility.
33 | using Random
34 | Random.seed!(12);
35 | ```
36 |
37 | # Generating data
38 |
39 | We start off by creating a toy dataset. We take the case of a person who takes medicine to prevent excessive sneezing. Alcohol consumption increases the rate of sneezing for that person. Thus, the two factors affecting the number of sneezes in a given day are alcohol consumption and whether the person has taken his medicine. Both these variable are taken as boolean valued while the number of sneezes will be a count valued variable. We also take into consideration that the interaction between the two boolean variables will affect the number of sneezes
40 |
41 | 5 random rows are printed from the generated data to get a gist of the data generated.
42 |
43 | ```{julia}
44 | theta_noalcohol_meds = 1 # no alcohol, took medicine
45 | theta_alcohol_meds = 3 # alcohol, took medicine
46 | theta_noalcohol_nomeds = 6 # no alcohol, no medicine
47 | theta_alcohol_nomeds = 36 # alcohol, no medicine
48 |
49 | # no of samples for each of the above cases
50 | q = 100
51 |
52 | #Generate data from different Poisson distributions
53 | noalcohol_meds = Poisson(theta_noalcohol_meds)
54 | alcohol_meds = Poisson(theta_alcohol_meds)
55 | noalcohol_nomeds = Poisson(theta_noalcohol_nomeds)
56 | alcohol_nomeds = Poisson(theta_alcohol_nomeds)
57 |
58 | nsneeze_data = vcat(
59 | rand(noalcohol_meds, q),
60 | rand(alcohol_meds, q),
61 | rand(noalcohol_nomeds, q),
62 | rand(alcohol_nomeds, q),
63 | )
64 | alcohol_data = vcat(zeros(q), ones(q), zeros(q), ones(q))
65 | meds_data = vcat(zeros(q), zeros(q), ones(q), ones(q))
66 |
67 | df = DataFrame(;
68 | nsneeze=nsneeze_data,
69 | alcohol_taken=alcohol_data,
70 | nomeds_taken=meds_data,
71 | product_alcohol_meds=meds_data .* alcohol_data,
72 | )
73 | df[sample(1:nrow(df), 5; replace=false), :]
74 | ```
75 |
76 | # Visualisation of the dataset
77 |
78 | We plot the distribution of the number of sneezes for the 4 different cases taken above. As expected, the person sneezes the most when he has taken alcohol and not taken his medicine. He sneezes the least when he doesn't consume alcohol and takes his medicine.
79 |
80 | ```{julia}
81 | # Data Plotting
82 |
83 | p1 = Plots.histogram(
84 | df[(df[:, :alcohol_taken] .== 0) .& (df[:, :nomeds_taken] .== 0), 1];
85 | title="no_alcohol+meds",
86 | )
87 | p2 = Plots.histogram(
88 | (df[(df[:, :alcohol_taken] .== 1) .& (df[:, :nomeds_taken] .== 0), 1]);
89 | title="alcohol+meds",
90 | )
91 | p3 = Plots.histogram(
92 | (df[(df[:, :alcohol_taken] .== 0) .& (df[:, :nomeds_taken] .== 1), 1]);
93 | title="no_alcohol+no_meds",
94 | )
95 | p4 = Plots.histogram(
96 | (df[(df[:, :alcohol_taken] .== 1) .& (df[:, :nomeds_taken] .== 1), 1]);
97 | title="alcohol+no_meds",
98 | )
99 | plot(p1, p2, p3, p4; layout=(2, 2), legend=false)
100 | ```
101 |
102 | We must convert our `DataFrame` data into the `Matrix` form as the manipulations that we are about are designed to work with `Matrix` data. We also separate the features from the labels which will be later used by the Turing sampler to generate samples from the posterior.
103 |
104 | ```{julia}
105 | # Convert the DataFrame object to matrices.
106 | data = Matrix(df[:, [:alcohol_taken, :nomeds_taken, :product_alcohol_meds]])
107 | data_labels = df[:, :nsneeze]
108 | data
109 | ```
110 |
111 | We must recenter our data about 0 to help the Turing sampler in initialising the parameter estimates. So, normalising the data in each column by subtracting the mean and dividing by the standard deviation:
112 |
113 | ```{julia}
114 | # Rescale our matrices.
115 | data = (data .- mean(data; dims=1)) ./ std(data; dims=1)
116 | ```
117 |
118 | # Declaring the Model: Poisson Regression
119 |
120 | Our model, `poisson_regression` takes four arguments:
121 |
122 | - `x` is our set of independent variables;
123 | - `y` is the element we want to predict;
124 | - `n` is the number of observations we have; and
125 | - `σ²` is the standard deviation we want to assume for our priors.
126 |
127 | Within the model, we create four coefficients (`b0`, `b1`, `b2`, and `b3`) and assign a prior of normally distributed with means of zero and standard deviations of `σ²`. We want to find values of these four coefficients to predict any given `y`.
128 |
129 | Intuitively, we can think of the coefficients as:
130 |
131 | - `b1` is the coefficient which represents the effect of taking alcohol on the number of sneezes;
132 | - `b2` is the coefficient which represents the effect of taking in no medicines on the number of sneezes;
133 | - `b3` is the coefficient which represents the effect of interaction between taking alcohol and no medicine on the number of sneezes;
134 |
135 | The `for` block creates a variable `theta` which is the weighted combination of the input features. We have defined the priors on these weights above. We then observe the likelihood of calculating `theta` given the actual label, `y[i]`.
136 |
137 | ```{julia}
138 | # Bayesian poisson regression (LR)
139 | @model function poisson_regression(x, y, n, σ²)
140 | b0 ~ Normal(0, σ²)
141 | b1 ~ Normal(0, σ²)
142 | b2 ~ Normal(0, σ²)
143 | b3 ~ Normal(0, σ²)
144 | for i in 1:n
145 | theta = b0 + b1 * x[i, 1] + b2 * x[i, 2] + b3 * x[i, 3]
146 | y[i] ~ Poisson(exp(theta))
147 | end
148 | end;
149 | ```
150 |
151 | # Sampling from the posterior
152 |
153 | We use the `NUTS` sampler to sample values from the posterior. We run multiple chains using the `MCMCThreads()` function to nullify the effect of a problematic chain. We then use the Gelman, Rubin, and Brooks Diagnostic to check the convergence of these multiple chains.
154 |
155 | ```{julia}
156 | #| output: false
157 | # Retrieve the number of observations.
158 | n, _ = size(data)
159 |
160 | # Sample using NUTS.
161 |
162 | num_chains = 4
163 | m = poisson_regression(data, data_labels, n, 10)
164 | chain = sample(m, NUTS(), MCMCThreads(), 2_500, num_chains; discard_adapt=false, progress=false)
165 | ```
166 |
167 | ```{julia}
168 | #| echo: false
169 | chain
170 | ```
171 |
172 | ::: {.callout-warning collapse="true"}
173 | ## Sampling With Multiple Threads
174 | The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
175 | will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.](https://turinglang.org/dev/docs/using-turing/guide/#sampling-multiple-chains)
176 | :::
177 |
178 | # Viewing the Diagnostics
179 |
180 | We use the Gelman, Rubin, and Brooks Diagnostic to check whether our chains have converged. Note that we require multiple chains to use this diagnostic which analyses the difference between these multiple chains.
181 |
182 | We expect the chains to have converged. This is because we have taken sufficient number of iterations (1500) for the NUTS sampler. However, in case the test fails, then we will have to take a larger number of iterations, resulting in longer computation time.
183 |
184 | ```{julia}
185 | gelmandiag(chain)
186 | ```
187 |
188 | From the above diagnostic, we can conclude that the chains have converged because the PSRF values of the coefficients are close to 1.
189 |
190 | So, we have obtained the posterior distributions of the parameters. We transform the coefficients and recover theta values by taking the exponent of the meaned values of the coefficients `b0`, `b1`, `b2` and `b3`. We take the exponent of the means to get a better comparison of the relative values of the coefficients. We then compare this with the intuitive meaning that was described earlier.
191 |
192 | ```{julia}
193 | # Taking the first chain
194 | c1 = chain[:, :, 1]
195 |
196 | # Calculating the exponentiated means
197 | b0_exp = exp(mean(c1[:b0]))
198 | b1_exp = exp(mean(c1[:b1]))
199 | b2_exp = exp(mean(c1[:b2]))
200 | b3_exp = exp(mean(c1[:b3]))
201 |
202 | print("The exponent of the meaned values of the weights (or coefficients are): \n")
203 | println("b0: ", b0_exp)
204 | println("b1: ", b1_exp)
205 | println("b2: ", b2_exp)
206 | println("b3: ", b3_exp)
207 | print("The posterior distributions obtained after sampling can be visualised as :\n")
208 | ```
209 |
210 | Visualising the posterior by plotting it:
211 |
212 | ```{julia}
213 | plot(chain)
214 | ```
215 |
216 | # Interpreting the Obtained Mean Values
217 |
218 | The exponentiated mean of the coefficient `b1` is roughly half of that of `b2`. This makes sense because in the data that we generated, the number of sneezes was more sensitive to the medicinal intake as compared to the alcohol consumption. We also get a weaker dependence on the interaction between the alcohol consumption and the medicinal intake as can be seen from the value of `b3`.
219 |
220 | # Removing the Warmup Samples
221 |
222 | As can be seen from the plots above, the parameters converge to their final distributions after a few iterations.
223 | The initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions.
224 | Thus, we remove these warmup values and once again view the diagnostics.
225 | To remove these warmup values, we take all values except the first 200.
226 | This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200.
227 |
228 | ```{julia}
229 | chains_new = chain[201:end, :, :]
230 | ```
231 |
232 | ```{julia}
233 | plot(chains_new)
234 | ```
235 |
236 | As can be seen from the numeric values and the plots above, the standard deviation values have decreased and all the plotted values are from the estimated posteriors. The exponentiated mean values, with the warmup samples removed, have not changed by much and they are still in accordance with their intuitive meanings as described earlier.
237 |
--------------------------------------------------------------------------------
/tutorials/coin-flipping/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Introduction: Coin Flipping"
3 | engine: julia
4 | aliases:
5 | - ../00-introduction/index.html
6 | - ../00-introduction/
7 | ---
8 |
9 | ```{julia}
10 | #| echo: false
11 | #| output: false
12 | using Pkg;
13 | Pkg.instantiate();
14 | ```
15 |
16 | This is the first of a series of guided tutorials on the Turing language.
17 | In this tutorial, we will use Bayesian inference to estimate the probability that a coin flip will result in heads, given a series of observations.
18 |
19 | ### Setup
20 |
21 | First, let us load some packages that we need to simulate a coin flip:
22 |
23 | ```{julia}
24 | using Distributions
25 |
26 | using Random
27 | Random.seed!(12); # Set seed for reproducibility
28 | ```
29 |
30 | and to visualize our results.
31 |
32 | ```{julia}
33 | using StatsPlots
34 | ```
35 |
36 | Note that Turing is not loaded here — we do not use it in this example.
37 | Next, we configure the data generating model. Let us set the true probability that a coin flip turns up heads
38 |
39 | ```{julia}
40 | p_true = 0.5;
41 | ```
42 |
43 | and set the number of coin flips we will show our model.
44 |
45 | ```{julia}
46 | N = 100;
47 | ```
48 |
49 | We simulate `N` coin flips by drawing N random samples from the Bernoulli distribution with success probability `p_true`. The draws are collected in a variable called `data`:
50 |
51 | ```{julia}
52 | data = rand(Bernoulli(p_true), N);
53 | ```
54 |
55 | Here are the first five coin flips:
56 |
57 | ```{julia}
58 | data[1:5]
59 | ```
60 |
61 |
62 | ### Coin Flipping Without Turing
63 |
64 | The following example illustrates the effect of updating our beliefs with every piece of new evidence we observe.
65 |
66 | Assume that we are unsure about the probability of heads in a coin flip. To get an intuitive understanding of what "updating our beliefs" is, we will visualize the probability of heads in a coin flip after each observed evidence.
67 |
68 | We begin by specifying a prior belief about the distribution of heads and tails in a coin toss. Here we choose a [Beta](https://en.wikipedia.org/wiki/Beta_distribution) distribution as prior distribution for the probability of heads. Before any coin flip is observed, we assume a uniform distribution $\operatorname{U}(0, 1) = \operatorname{Beta}(1, 1)$ of the probability of heads. I.e., every probability is equally likely initially.
69 |
70 | ```{julia}
71 | prior_belief = Beta(1, 1);
72 | ```
73 |
74 | With our priors set and our data at hand, we can perform Bayesian inference.
75 |
76 | This is a fairly simple process. We expose one additional coin flip to our model every iteration, such that the first run only sees the first coin flip, while the last iteration sees all the coin flips. In each iteration we update our belief to an updated version of the original Beta distribution that accounts for the new proportion of heads and tails. The update is particularly simple since our prior distribution is a [conjugate prior](https://en.wikipedia.org/wiki/Conjugate_prior). Note that a closed-form expression for the posterior (implemented in the `updated_belief` expression below) is not accessible in general and usually does not exist for more interesting models.
77 |
78 | ```{julia}
79 | function updated_belief(prior_belief::Beta, data::AbstractArray{Bool})
80 | # Count the number of heads and tails.
81 | heads = sum(data)
82 | tails = length(data) - heads
83 |
84 | # Update our prior belief in closed form (this is possible because we use a conjugate prior).
85 | return Beta(prior_belief.α + heads, prior_belief.β + tails)
86 | end
87 |
88 | # Show updated belief for increasing number of observations
89 | @gif for n in 0:N
90 | plot(
91 | updated_belief(prior_belief, data[1:n]);
92 | size=(500, 250),
93 | title="Updated belief after $n observations",
94 | xlabel="probability of heads",
95 | ylabel="",
96 | legend=nothing,
97 | xlim=(0, 1),
98 | fill=0,
99 | α=0.3,
100 | w=3,
101 | )
102 | vline!([p_true])
103 | end
104 | ```
105 |
106 | The animation above shows that with increasing evidence our belief about the probability of heads in a coin flip slowly adjusts towards the true value.
107 | The orange line in the animation represents the true probability of seeing heads on a single coin flip, while the mode of the distribution shows what the model believes the probability of a heads is given the evidence it has seen.
108 |
109 | For the mathematically inclined, the $\operatorname{Beta}$ distribution is updated by adding each coin flip to the parameters $\alpha$ and $\beta$ of the distribution.
110 | Initially, the parameters are defined as $\alpha = 1$ and $\beta = 1$.
111 | Over time, with more and more coin flips, $\alpha$ and $\beta$ will be approximately equal to each other as we are equally likely to flip a heads or a tails.
112 |
113 | The mean of the $\operatorname{Beta}(\alpha, \beta)$ distribution is
114 |
115 | $$\operatorname{E}[X] = \dfrac{\alpha}{\alpha+\beta}.$$
116 |
117 | This implies that the plot of the distribution will become centered around 0.5 for a large enough number of coin flips, as we expect $\alpha \approx \beta$.
118 |
119 | The variance of the $\operatorname{Beta}(\alpha, \beta)$ distribution is
120 |
121 | $$\operatorname{var}[X] = \dfrac{\alpha\beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}.$$
122 |
123 | Thus the variance of the distribution will approach 0 with more and more samples, as the denominator will grow faster than will the numerator.
124 | More samples means less variance.
125 | This implies that the distribution will reflect less uncertainty about the probability of receiving a heads and the plot will become more tightly centered around 0.5 for a large enough number of coin flips.
126 |
127 | ### Coin Flipping With Turing
128 |
129 | We now move away from the closed-form expression above.
130 | We use **Turing** to specify the same model and to approximate the posterior distribution with samples.
131 | To do so, we first need to load `Turing`.
132 |
133 | ```{julia}
134 | using Turing
135 | ```
136 |
137 | Additionally, we load `MCMCChains`, a library for analyzing and visualizing the samples with which we approximate the posterior distribution.
138 |
139 | ```{julia}
140 | using MCMCChains
141 | ```
142 |
143 | First, we define the coin-flip model using Turing.
144 |
145 | ```{julia}
146 | # Unconditioned coinflip model with `N` observations.
147 | @model function coinflip(; N::Int)
148 | # Our prior belief about the probability of heads in a coin toss.
149 | p ~ Beta(1, 1)
150 |
151 | # Heads or tails of a coin are drawn from `N` independent and identically
152 | # distributed Bernoulli distributions with success rate `p`.
153 | y ~ filldist(Bernoulli(p), N)
154 |
155 | return y
156 | end;
157 | ```
158 |
159 | In the Turing model the prior distribution of the variable `p`, the probability of heads in a coin toss, and the distribution of the observations `y` are specified on the right-hand side of the `~` expressions.
160 | The `@model` macro modifies the body of the Julia function `coinflip` and, e.g., replaces the `~` statements with internal function calls that are used for sampling.
161 |
162 | Here we defined a model that is not conditioned on any specific observations as this allows us to easily obtain samples of both `p` and `y` with
163 |
164 | ```{julia}
165 | rand(coinflip(; N))
166 | ```
167 |
168 | The model can be conditioned on some observations with `|`.
169 | See the [documentation of the `condition` syntax](https://turinglang.github.io/DynamicPPL.jl/stable/api/#Condition-and-decondition) in `DynamicPPL.jl` for more details.
170 | In the conditioned `model` the observations `y` are fixed to `data`.
171 |
172 | ```{julia}
173 | coinflip(y::AbstractVector{<:Real}) = coinflip(; N=length(y)) | (; y)
174 |
175 | model = coinflip(data);
176 | ```
177 |
178 | After defining the model, we can approximate the posterior distribution by drawing samples from the distribution.
179 | In this example, we use a [Hamiltonian Monte Carlo](https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo) sampler to draw these samples.
180 | Other tutorials give more information on the samplers available in Turing and discuss their use for different models.
181 |
182 | ```{julia}
183 | sampler = NUTS();
184 | ```
185 |
186 | We approximate the posterior distribution with 1000 samples:
187 |
188 | ```{julia}
189 | chain = sample(model, sampler, 2_000, progress=false);
190 | ```
191 |
192 | The `sample` function and common keyword arguments are explained more extensively in the documentation of [AbstractMCMC.jl](https://turinglang.github.io/AbstractMCMC.jl/dev/api/).
193 |
194 | After finishing the sampling process, we can visually compare the closed-form posterior distribution with the approximation obtained with Turing.
195 |
196 | ```{julia}
197 | histogram(chain)
198 | ```
199 |
200 | Now we can build our plot:
201 |
202 | ```{julia}
203 | #| echo: false
204 | @assert isapprox(mean(chain, :p), 0.5; atol=0.1) "Estimated mean of parameter p: $(mean(chain, :p)) - not in [0.4, 0.6]!"
205 | ```
206 |
207 | ```{julia}
208 | # Visualize a blue density plot of the approximate posterior distribution using HMC (see Chain 1 in the legend).
209 | density(chain; xlim=(0, 1), legend=:best, w=2, c=:blue)
210 |
211 | # Visualize a green density plot of the posterior distribution in closed-form.
212 | plot!(
213 | 0:0.01:1,
214 | pdf.(updated_belief(prior_belief, data), 0:0.01:1);
215 | xlabel="probability of heads",
216 | ylabel="",
217 | title="",
218 | xlim=(0, 1),
219 | label="Closed-form",
220 | fill=0,
221 | α=0.3,
222 | w=3,
223 | c=:lightgreen,
224 | )
225 |
226 | # Visualize the true probability of heads in red.
227 | vline!([p_true]; label="True probability", c=:red)
228 | ```
229 |
230 | As we can see, the samples obtained with Turing closely approximate the true posterior distribution.
231 | Hopefully this tutorial has provided an easy-to-follow, yet informative introduction to Turing's simpler applications.
232 | More advanced usage is demonstrated in other tutorials.
233 |
--------------------------------------------------------------------------------
/tutorials/gaussian-process-latent-variable-models/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Gaussian Process Latent Variable Models
3 | engine: julia
4 | aliases:
5 | - ../12-gplvm/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | In a previous tutorial, we have discussed latent variable models, in particular probabilistic principal component analysis (pPCA).
16 | Here, we show how we can extend the mapping provided by pPCA to non-linear mappings between input and output.
17 | For more details about the Gaussian Process Latent Variable Model (GPLVM),
18 | we refer the reader to the [original publication](https://jmlr.org/papers/v6/lawrence05a.html) and a [further extension](http://proceedings.mlr.press/v9/titsias10a/titsias10a.pdf).
19 |
20 | In short, the GPVLM is a dimensionality reduction technique that allows us to embed a high-dimensional dataset in a lower-dimensional embedding.
21 | Importantly, it provides the advantage that the linear mappings from the embedded space can be non-linearised through the use of Gaussian Processes.
22 |
23 | ### Let's start by loading some dependencies.
24 |
25 | ```{julia}
26 | #| eval: false
27 | using Turing
28 | using AbstractGPs
29 | using FillArrays
30 | using LaTeXStrings
31 | using Plots
32 | using RDatasets
33 | using ReverseDiff
34 | using StatsBase
35 |
36 | using LinearAlgebra
37 | using Random
38 |
39 | Random.seed!(1789);
40 | ```
41 |
42 | We demonstrate the GPLVM with a very small dataset: [Fisher's Iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set).
43 | This is mostly for reasons of run time, so the tutorial can be run quickly.
44 | As you will see, one of the major drawbacks of using GPs is their speed,
45 | although this is an active area of research.
46 | We will briefly touch on some ways to speed things up at the end of this tutorial.
47 | We transform the original data with non-linear operations in order to demonstrate the power of GPs to work on non-linear relationships, while keeping the problem reasonably small.
48 |
49 | ```{julia}
50 | #| eval: false
51 | data = dataset("datasets", "iris")
52 | species = data[!, "Species"]
53 | index = shuffle(1:150)
54 | # we extract the four measured quantities,
55 | # so the dimension of the data is only d=4 for this toy example
56 | dat = Matrix(data[index, 1:4])
57 | labels = data[index, "Species"]
58 |
59 | # non-linearize data to demonstrate ability of GPs to deal with non-linearity
60 | dat[:, 1] = 0.5 * dat[:, 1] .^ 2 + 0.1 * dat[:, 1] .^ 3
61 | dat[:, 2] = dat[:, 2] .^ 3 + 0.2 * dat[:, 2] .^ 4
62 | dat[:, 3] = 0.1 * exp.(dat[:, 3]) - 0.2 * dat[:, 3] .^ 2
63 | dat[:, 4] = 0.5 * log.(dat[:, 4]) .^ 2 + 0.01 * dat[:, 3] .^ 5
64 |
65 | # normalize data
66 | dt = fit(ZScoreTransform, dat; dims=1);
67 | StatsBase.transform!(dt, dat);
68 | ```
69 |
70 | We will start out by demonstrating the basic similarity between pPCA (see the tutorial on this topic) and the GPLVM model.
71 | Indeed, pPCA is basically equivalent to running the GPLVM model with an automatic relevance determination (ARD) linear kernel.
72 |
73 | First, we re-introduce the pPCA model (see the tutorial on pPCA for details)
74 |
75 | ```{julia}
76 | #| eval: false
77 | @model function pPCA(x)
78 | # Dimensionality of the problem.
79 | N, D = size(x)
80 | # latent variable z
81 | z ~ filldist(Normal(), D, N)
82 | # weights/loadings W
83 | w ~ filldist(Normal(), D, D)
84 | mu = (w * z)'
85 | for d in 1:D
86 | x[:, d] ~ MvNormal(mu[:, d], I)
87 | end
88 | return nothing
89 | end;
90 | ```
91 |
92 | We define two different kernels, a simple linear kernel with an Automatic Relevance Determination transform and a
93 | squared exponential kernel.
94 |
95 |
96 | ```{julia}
97 | #| eval: false
98 | linear_kernel(α) = LinearKernel() ∘ ARDTransform(α)
99 | sekernel(α, σ) = σ * SqExponentialKernel() ∘ ARDTransform(α);
100 | ```
101 |
102 | And here is the GPLVM model.
103 | We create separate models for the two types of kernel.
104 |
105 | ```{julia}
106 | #| eval: false
107 | @model function GPLVM_linear(Y, K)
108 | # Dimensionality of the problem.
109 | N, D = size(Y)
110 | # K is the dimension of the latent space
111 | @assert K <= D
112 | noise = 1e-3
113 |
114 | # Priors
115 | α ~ MvLogNormal(MvNormal(Zeros(K), I))
116 | Z ~ filldist(Normal(), K, N)
117 | mu ~ filldist(Normal(), N)
118 |
119 | gp = GP(linear_kernel(α))
120 | gpz = gp(ColVecs(Z), noise)
121 | Y ~ filldist(MvNormal(mu, cov(gpz)), D)
122 |
123 | return nothing
124 | end;
125 |
126 | @model function GPLVM(Y, K)
127 | # Dimensionality of the problem.
128 | N, D = size(Y)
129 | # K is the dimension of the latent space
130 | @assert K <= D
131 | noise = 1e-3
132 |
133 | # Priors
134 | α ~ MvLogNormal(MvNormal(Zeros(K), I))
135 | σ ~ LogNormal(0.0, 1.0)
136 | Z ~ filldist(Normal(), K, N)
137 | mu ~ filldist(Normal(), N)
138 |
139 | gp = GP(sekernel(α, σ))
140 | gpz = gp(ColVecs(Z), noise)
141 | Y ~ filldist(MvNormal(mu, cov(gpz)), D)
142 |
143 | return nothing
144 | end;
145 | ```
146 |
147 | ```{julia}
148 | #| eval: false
149 | # Standard GPs don't scale very well in n, so we use a small subsample for the purpose of this tutorial
150 | n_data = 40
151 | # number of features to use from dataset
152 | n_features = 4
153 | # latent dimension for GP case
154 | ndim = 4;
155 | ```
156 |
157 | ```{julia}
158 | #| eval: false
159 | ppca = pPCA(dat[1:n_data, 1:n_features])
160 | chain_ppca = sample(ppca, NUTS{Turing.ReverseDiffAD{true}}(), 1000);
161 | ```
162 |
163 | ```{julia}
164 | #| eval: false
165 | # we extract the posterior mean estimates of the parameters from the chain
166 | z_mean = reshape(mean(group(chain_ppca, :z))[:, 2], (n_features, n_data))
167 | scatter(z_mean[1, :], z_mean[2, :]; group=labels[1:n_data], xlabel=L"z_1", ylabel=L"z_2")
168 | ```
169 |
170 | We can see that the pPCA fails to distinguish the groups.
171 | In particular, the `setosa` species is not clearly separated from `versicolor` and `virginica`.
172 | This is due to the non-linearities that we introduced, as without them the two groups can be clearly distinguished
173 | using pPCA (see the pPCA tutorial).
174 |
175 | Let's try the same with our linear kernel GPLVM model.
176 |
177 | ```{julia}
178 | #| eval: false
179 | gplvm_linear = GPLVM_linear(dat[1:n_data, 1:n_features], ndim)
180 | chain_linear = sample(gplvm_linear, NUTS{Turing.ReverseDiffAD{true}}(), 500);
181 | ```
182 |
183 | ```{julia}
184 | #| eval: false
185 | # we extract the posterior mean estimates of the parameters from the chain
186 | z_mean = reshape(mean(group(chain_linear, :Z))[:, 2], (n_features, n_data))
187 | alpha_mean = mean(group(chain_linear, :α))[:, 2]
188 |
189 | alpha1, alpha2 = partialsortperm(alpha_mean, 1:2; rev=true)
190 | scatter(
191 | z_mean[alpha1, :],
192 | z_mean[alpha2, :];
193 | group=labels[1:n_data],
194 | xlabel=L"z_{\mathrm{ard}_1}",
195 | ylabel=L"z_{\mathrm{ard}_2}",
196 | )
197 | ```
198 |
199 | We can see that similar to the pPCA case, the linear kernel GPLVM fails to distinguish between the two groups
200 | (`setosa` on the one hand, and `virginica` and `verticolor` on the other).
201 |
202 | Finally, we demonstrate that by changing the kernel to a non-linear function, we are able to separate the data again.
203 |
204 | ```{julia}
205 | #| eval: false
206 | gplvm = GPLVM(dat[1:n_data, 1:n_features], ndim)
207 | chain_gplvm = sample(gplvm, NUTS{Turing.ReverseDiffAD{true}}(), 500);
208 | ```
209 |
210 | ```{julia}
211 | #| eval: false
212 | # we extract the posterior mean estimates of the parameters from the chain
213 | z_mean = reshape(mean(group(chain_gplvm, :Z))[:, 2], (ndim, n_data))
214 | alpha_mean = mean(group(chain_gplvm, :α))[:, 2]
215 |
216 | alpha1, alpha2 = partialsortperm(alpha_mean, 1:2; rev=true)
217 | scatter(
218 | z_mean[alpha1, :],
219 | z_mean[alpha2, :];
220 | group=labels[1:n_data],
221 | xlabel=L"z_{\mathrm{ard}_1}",
222 | ylabel=L"z_{\mathrm{ard}_2}",
223 | )
224 | ```
225 |
226 | ```{julia}
227 | #| eval: false
228 | let
229 | @assert abs(
230 | mean(z_mean[alpha1, labels[1:n_data] .== "setosa"]) -
231 | mean(z_mean[alpha1, labels[1:n_data] .!= "setosa"]),
232 | ) > 1
233 | end
234 | ```
235 |
236 | Now, the split between the two groups is visible again.
237 |
--------------------------------------------------------------------------------
/tutorials/gaussian-processes-introduction/golf.dat:
--------------------------------------------------------------------------------
1 | distance n y
2 | 2 1443 1346
3 | 3 694 577
4 | 4 455 337
5 | 5 353 208
6 | 6 272 149
7 | 7 256 136
8 | 8 240 111
9 | 9 217 69
10 | 10 200 67
11 | 11 237 75
12 | 12 202 52
13 | 13 192 46
14 | 14 174 54
15 | 15 167 28
16 | 16 201 27
17 | 17 195 31
18 | 18 191 33
19 | 19 147 20
20 | 20 152 24
21 |
--------------------------------------------------------------------------------
/tutorials/gaussian-processes-introduction/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Gaussian Processes: Introduction"
3 | engine: julia
4 | aliases:
5 | - ../15-gaussian-processes/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | [JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) packages integrate well with Turing.jl because they implement the Distributions.jl
16 | interface.
17 | You should be able to understand what is going on in this tutorial if you know what a GP is.
18 | For a more in-depth understanding of the
19 | [JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) functionality
20 | used here, please consult the
21 | [JuliaGPs](https://github.com/JuliaGaussianProcesses/#welcome-to-juliagps) docs.
22 |
23 | In this tutorial, we will model the putting dataset discussed in Chapter 21 of
24 | [Bayesian Data Analysis](http://www.stat.columbia.edu/%7Egelman/book/).
25 | The dataset comprises the result of measuring how often a golfer successfully gets the ball
26 | in the hole, depending on how far away from it they are.
27 | The goal of inference is to estimate the probability of any given shot being successful at a
28 | given distance.
29 |
30 | ### Let's download the data and take a look at it:
31 |
32 | ```{julia}
33 | using CSV, DataFrames
34 |
35 | df = CSV.read("golf.dat", DataFrame; delim=' ', ignorerepeated=true)
36 | df[1:5, :]
37 | ```
38 |
39 | We've printed the first 5 rows of the dataset (which comprises only 19 rows in total).
40 | Observe it has three columns:
41 |
42 | 1. `distance` -- how far away from the hole. I'll refer to `distance` as `d` throughout the rest of this tutorial
43 | 2. `n` -- how many shots were taken from a given distance
44 | 3. `y` -- how many shots were successful from a given distance
45 |
46 | We will use a Binomial model for the data, whose success probability is parametrised by a
47 | transformation of a GP. Something along the lines of:
48 | $$
49 | \begin{aligned}
50 | f & \sim \operatorname{GP}(0, k) \\
51 | y_j \mid f(d_j) & \sim \operatorname{Binomial}(n_j, g(f(d_j))) \\
52 | g(x) & := \frac{1}{1 + e^{-x}}
53 | \end{aligned}
54 | $$
55 |
56 | To do this, let's define our Turing.jl model:
57 |
58 | ```{julia}
59 | using AbstractGPs, LogExpFunctions, Turing
60 |
61 | @model function putting_model(d, n; jitter=1e-4)
62 | v ~ Gamma(2, 1)
63 | l ~ Gamma(4, 1)
64 | f = GP(v * with_lengthscale(SEKernel(), l))
65 | f_latent ~ f(d, jitter)
66 | y ~ product_distribution(Binomial.(n, logistic.(f_latent)))
67 | return (fx=f(d, jitter), f_latent=f_latent, y=y)
68 | end
69 | ```
70 |
71 | We first define an `AbstractGPs.GP`, which represents a distribution over functions, and
72 | is entirely separate from Turing.jl.
73 | We place a prior over its variance `v` and length-scale `l`.
74 | `f(d, jitter)` constructs the multivariate Gaussian comprising the random variables
75 | in `f` whose indices are in `d` (plus a bit of independent Gaussian noise with variance
76 | `jitter` -- see [the docs](https://juliagaussianprocesses.github.io/AbstractGPs.jl/dev/api/#FiniteGP-and-AbstractGP)
77 | for more details).
78 | `f(d, jitter)` has the type `AbstractMvNormal`, and is the bit of AbstractGPs.jl that implements the
79 | Distributions.jl interface, so it's legal to put it on the right-hand side
80 | of a `~`.
81 | From this you should deduce that `f_latent` is distributed according to a multivariate
82 | Gaussian.
83 | The remaining lines comprise standard Turing.jl code that is encountered in other tutorials
84 | and Turing documentation.
85 |
86 | Before performing inference, we might want to inspect the prior that our model places over
87 | the data, to see whether there is anything obviously wrong.
88 | These kinds of prior predictive checks are straightforward to perform using Turing.jl, since
89 | it is possible to sample from the prior easily by just calling the model:
90 |
91 | ```{julia}
92 | m = putting_model(Float64.(df.distance), df.n)
93 | m().y
94 | ```
95 |
96 | We make use of this to see what kinds of datasets we simulate from the prior:
97 |
98 | ```{julia}
99 | using Plots
100 |
101 | function plot_data(d, n, y, xticks, yticks)
102 | ylims = (0, round(maximum(n), RoundUp; sigdigits=2))
103 | margin = -0.5 * Plots.mm
104 | plt = plot(; xticks=xticks, yticks=yticks, ylims=ylims, margin=margin, grid=false)
105 | bar!(plt, d, n; color=:red, label="", alpha=0.5)
106 | bar!(plt, d, y; label="", color=:blue, alpha=0.7)
107 | return plt
108 | end
109 |
110 | # Construct model and run some prior predictive checks.
111 | m = putting_model(Float64.(df.distance), df.n)
112 | hists = map(1:20) do j
113 | xticks = j > 15 ? :auto : nothing
114 | yticks = rem(j, 5) == 1 ? :auto : nothing
115 | return plot_data(df.distance, df.n, m().y, xticks, yticks)
116 | end
117 | plot(hists...; layout=(4, 5))
118 | ```
119 |
120 | In this case, the only prior knowledge I have is that the proportion of successful shots
121 | ought to decrease monotonically as the distance from the hole increases, which should show
122 | up in the data as the blue lines generally go down as we move from left to right on each
123 | graph.
124 | Unfortunately, there is not a simple way to enforce monotonicity in the samples from a GP,
125 | and we can see this in some of the plots above, so we must hope that we have enough data to
126 | ensure that this relationship holds approximately under the posterior.
127 | In any case, you can judge for yourself whether you think this is the most useful
128 | visualisation that we can perform -- if you think there is something better to look at,
129 | please let us know!
130 |
131 | Moving on, we generate samples from the posterior using the default `NUTS` sampler.
132 | We'll make use of [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl), as it has
133 | better performance than [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl/) on
134 | this example. See Turing.jl's docs on Automatic Differentiation for more info.
135 |
136 |
137 | ```{julia}
138 | using Random, ReverseDiff
139 |
140 | m_post = m | (y=df.y,)
141 | chn = sample(Xoshiro(123456), m_post, NUTS(; adtype=AutoReverseDiff()), 1_000, progress=false)
142 | ```
143 |
144 | We can use these samples and the `posterior` function from `AbstractGPs` to sample from the
145 | posterior probability of success at any distance we choose:
146 |
147 | ```{julia}
148 | d_pred = 1:0.2:21
149 | samples = map(returned(m_post, chn)[1:10:end]) do x
150 | return logistic.(rand(posterior(x.fx, x.f_latent)(d_pred, 1e-4)))
151 | end
152 | p = plot()
153 | plot!(d_pred, reduce(hcat, samples); label="", color=:blue, alpha=0.2)
154 | scatter!(df.distance, df.y ./ df.n; label="", color=:red)
155 | ```
156 |
157 | We can see that the general trend is indeed down as the distance from the hole increases,
158 | and that if we move away from the data, the posterior uncertainty quickly inflates.
159 | This suggests that the model is probably going to do a reasonable job of interpolating
160 | between observed data, but less good a job at extrapolating to larger distances.
161 |
--------------------------------------------------------------------------------
/tutorials/hidden-markov-models/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Hidden Markov Models
3 | engine: julia
4 | aliases:
5 | - ../04-hidden-markov-model/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | This tutorial illustrates training Bayesian [hidden Markov models](https://en.wikipedia.org/wiki/Hidden_Markov_model) (HMMs) using Turing.
16 | The main goals are learning the transition matrix, emission parameter, and hidden states.
17 | For a more rigorous academic overview of hidden Markov models, see [An Introduction to Hidden Markov Models and Bayesian Networks](https://mlg.eng.cam.ac.uk/zoubin/papers/ijprai.pdf) (Ghahramani, 2001).
18 |
19 | In this tutorial, we assume there are $k$ discrete hidden states; the observations are continuous and normally distributed - centered around the hidden states. This assumption reduces the number of parameters to be estimated in the emission matrix.
20 |
21 | Let's load the libraries we'll need, and set a random seed for reproducibility.
22 |
23 | ```{julia}
24 | # Load libraries.
25 | using Turing, StatsPlots, Random, Bijectors
26 |
27 | # Set a random seed
28 | Random.seed!(12345678);
29 | ```
30 |
31 | ## Simple State Detection
32 |
33 | In this example, we'll use something where the states and emission parameters are straightforward.
34 |
35 | ```{julia}
36 | # Define the emission parameter.
37 | y = [fill(1.0, 6)..., fill(2.0, 6)..., fill(3.0, 7)...,
38 | fill(2.0, 4)..., fill(1.0, 7)...]
39 | N = length(y);
40 | K = 3;
41 |
42 | # Plot the data we just made.
43 | plot(y; xlim=(0, 30), ylim=(-1, 5), size=(500, 250), legend = false)
44 | scatter!(y, color = :blue; xlim=(0, 30), ylim=(-1, 5), size=(500, 250), legend = false)
45 | ```
46 |
47 | We can see that we have three states, one for each height of the plot (1, 2, 3). This height is also our emission parameter, so state one produces a value of one, state two produces a value of two, and so on.
48 |
49 | Ultimately, we would like to understand three major parameters:
50 |
51 | 1. The transition matrix. This is a matrix that assigns a probability of switching from one state to any other state, including the state that we are already in.
52 | 2. The emission parameters, which describes a typical value emitted by some state. In the plot above, the emission parameter for state one is simply one.
53 | 3. The state sequence is our understanding of what state we were actually in when we observed some data. This is very important in more sophisticated HMMs, where the emission value does not equal our state.
54 |
55 | With this in mind, let's set up our model. We are going to use some of our knowledge as modelers to provide additional information about our system. This takes the form of the prior on our emission parameter.
56 |
57 | $$
58 | m_i \sim \mathrm{Normal}(i, 0.5) \quad \text{where} \quad m = \{1,2,3\}
59 | $$
60 |
61 | Simply put, this says that we expect state one to emit values in a Normally distributed manner, where the mean of each state's emissions is that state's value. The variance of 0.5 helps the model converge more quickly — consider the case where we have a variance of 1 or 2. In this case, the likelihood of observing a 2 when we are in state 1 is actually quite high, as it is within a standard deviation of the true emission value. Applying the prior that we are likely to be tightly centered around the mean prevents our model from being too confused about the state that is generating our observations.
62 |
63 | The priors on our transition matrix are noninformative, using `T[i] ~ Dirichlet(ones(K)/K)`. The Dirichlet prior used in this way assumes that the state is likely to change to any other state with equal probability. As we'll see, this transition matrix prior will be overwritten as we observe data.
64 |
65 | ```{julia}
66 | # Turing model definition.
67 | @model function BayesHmm(y, K)
68 | # Get observation length.
69 | N = length(y)
70 |
71 | # State sequence.
72 | s = zeros(Int, N)
73 |
74 | # Emission matrix.
75 | m = Vector(undef, K)
76 |
77 | # Transition matrix.
78 | T = Vector{Vector}(undef, K)
79 |
80 | # Assign distributions to each element
81 | # of the transition matrix and the
82 | # emission matrix.
83 | for i in 1:K
84 | T[i] ~ Dirichlet(ones(K) / K)
85 | m[i] ~ Normal(i, 0.5)
86 | end
87 |
88 | # Observe each point of the input.
89 | s[1] ~ Categorical(K)
90 | y[1] ~ Normal(m[s[1]], 0.1)
91 |
92 | for i in 2:N
93 | s[i] ~ Categorical(vec(T[s[i - 1]]))
94 | y[i] ~ Normal(m[s[i]], 0.1)
95 | end
96 | end;
97 | ```
98 |
99 | We will use a combination of two samplers (HMC and Particle Gibbs) by passing them to the Gibbs sampler. The Gibbs sampler allows for compositional inference, where we can utilize different samplers on different parameters. (For API details of these samplers, please see [Turing.jl's API documentation](https://turinglang.org/Turing.jl/stable/api/Inference/).)
100 |
101 | In this case, we use HMC for `m` and `T`, representing the emission and transition matrices respectively. We use the Particle Gibbs sampler for `s`, the state sequence. You may wonder why it is that we are not assigning `s` to the HMC sampler, and why it is that we need compositional Gibbs sampling at all.
102 |
103 | The parameter `s` is not a continuous variable.
104 | It is a vector of **integers**, and thus Hamiltonian methods like HMC and NUTS won't work correctly.
105 | Gibbs allows us to apply the right tools to the best effect.
106 | If you are a particularly advanced user interested in higher performance, you may benefit from setting up your Gibbs sampler to use [different automatic differentiation]({{}}#compositional-sampling-with-differing-ad-modes) backends for each parameter space.
107 |
108 | Time to run our sampler.
109 |
110 | ```{julia}
111 | #| output: false
112 | #| echo: false
113 | setprogress!(false)
114 | ```
115 |
116 | ```{julia}
117 | g = Gibbs((:m, :T) => HMC(0.01, 50), :s => PG(120))
118 | chn = sample(BayesHmm(y, 3), g, 1000)
119 | ```
120 |
121 | Let's see how well our chain performed.
122 | Ordinarily, using `display(chn)` would be a good first step, but we have generated a lot of parameters here (`s[1]`, `s[2]`, `m[1]`, and so on).
123 | It's a bit easier to show how our model performed graphically.
124 |
125 | The code below generates an animation showing the graph of the data above, and the data our model generates in each sample.
126 |
127 | ```{julia}
128 | # Extract our m and s parameters from the chain.
129 | m_set = MCMCChains.group(chn, :m).value
130 | s_set = MCMCChains.group(chn, :s).value
131 |
132 | # Iterate through the MCMC samples.
133 | Ns = 1:length(chn)
134 |
135 | # Make an animation.
136 | animation = @gif for i in Ns
137 | m = m_set[i, :]
138 | s = Int.(s_set[i, :])
139 | emissions = m[s]
140 |
141 | p = plot(
142 | y;
143 | chn=:red,
144 | size=(500, 250),
145 | xlabel="Time",
146 | ylabel="State",
147 | legend=:topright,
148 | label="True data",
149 | xlim=(0, 30),
150 | ylim=(-1, 5),
151 | )
152 | plot!(emissions; color=:blue, label="Sample $i")
153 | end every 3
154 | ```
155 |
156 | Looks like our model did a pretty good job, but we should also check to make sure our chain converges. A quick check is to examine whether the diagonal (representing the probability of remaining in the current state) of the transition matrix appears to be stationary. The code below extracts the diagonal and shows a traceplot of each persistence probability.
157 |
158 | ```{julia}
159 | # Index the chain with the persistence probabilities.
160 | subchain = chn[["T[1][1]", "T[2][2]", "T[3][3]"]]
161 |
162 | plot(subchain; seriestype=:traceplot, title="Persistence Probability", legend=false)
163 | ```
164 |
165 | A cursory examination of the traceplot above indicates that all three chains converged to something resembling
166 | stationary. We can use the diagnostic functions provided by [MCMCChains](https://github.com/TuringLang/MCMCChains.jl) to engage in some more formal tests, like the Heidelberg and Welch diagnostic:
167 |
168 | ```{julia}
169 | heideldiag(MCMCChains.group(chn, :T))[1]
170 | ```
171 |
172 | The p-values on the test suggest that we cannot reject the hypothesis that the observed sequence comes from a stationary distribution, so we can be reasonably confident that our transition matrix has converged to something reasonable.
173 |
174 | ## Efficient Inference With The Forward Algorithm
175 |
176 | While the above method works well for the simple example in this tutorial, some users may desire a more efficient method, especially when their model is more complicated.
177 | One simple way to improve inference is to marginalize out the hidden states of the model with an appropriate algorithm, calculating only the posterior over the continuous random variables.
178 | Not only does this allow more efficient inference via Rao-Blackwellization, but now we can sample our model with `NUTS()` alone, which is usually a much more performant MCMC kernel.
179 |
180 | Thankfully, [HiddenMarkovModels.jl](https://github.com/gdalle/HiddenMarkovModels.jl) provides an extremely efficient implementation of many algorithms related to hidden Markov models. This allows us to rewrite our model as:
181 |
182 | ```{julia}
183 | using HiddenMarkovModels
184 | using FillArrays
185 | using LinearAlgebra
186 | using LogExpFunctions
187 |
188 |
189 | @model function BayesHmm2(y, K)
190 | m ~ Bijectors.ordered(MvNormal([1.0, 2.0, 3.0], 0.5I))
191 | T ~ filldist(Dirichlet(fill(1/K, K)), K)
192 |
193 | hmm = HMM(softmax(ones(K)), copy(T'), [Normal(m[i], 0.1) for i in 1:K])
194 | Turing.@addlogprob! logdensityof(hmm, y)
195 | end
196 |
197 | chn2 = sample(BayesHmm2(y, 3), NUTS(), 1000)
198 | ```
199 |
200 |
201 | We can compare the chains of these two models, confirming the posterior estimate is similar (modulo label switching concerns with the Gibbs model):
202 | ```{julia}
203 | #| code-fold: true
204 | #| code-summary: "Plotting Chains"
205 |
206 | plot(chn["m[1]"], label = "m[1], Model 1, Gibbs", color = :lightblue)
207 | plot!(chn2["m[1]"], label = "m[1], Model 2, NUTS", color = :blue)
208 | plot!(chn["m[2]"], label = "m[2], Model 1, Gibbs", color = :pink)
209 | plot!(chn2["m[2]"], label = "m[2], Model 2, NUTS", color = :red)
210 | plot!(chn["m[3]"], label = "m[3], Model 1, Gibbs", color = :yellow)
211 | plot!(chn2["m[3]"], label = "m[3], Model 2, NUTS", color = :orange)
212 | ```
213 |
214 |
215 | ### Recovering Marginalized Trajectories
216 |
217 | We can use the `viterbi()` algorithm, also from the `HiddenMarkovModels` package, to recover the most probable state for each parameter set in our posterior sample:
218 | ```{julia}
219 | @model function BayesHmmRecover(y, K, IncludeGenerated = false)
220 | m ~ Bijectors.ordered(MvNormal([1.0, 2.0, 3.0], 0.5I))
221 | T ~ filldist(Dirichlet(fill(1/K, K)), K)
222 |
223 | hmm = HMM(softmax(ones(K)), copy(T'), [Normal(m[i], 0.1) for i in 1:K])
224 | Turing.@addlogprob! logdensityof(hmm, y)
225 |
226 | # Conditional generation of the hidden states.
227 | if IncludeGenerated
228 | seq, _ = viterbi(hmm, y)
229 | s := [m[s] for s in seq]
230 | end
231 | end
232 |
233 | chn_recover = sample(BayesHmmRecover(y, 3, true), NUTS(), 1000)
234 | ```
235 |
236 | Plotting the estimated states, we can see that the results align well with our expectations:
237 |
238 | ```{julia}
239 | p = plot(xlim=(0, 30), ylim=(-1, 5), size=(500, 250))
240 | for i in 1:100
241 | ind = rand(DiscreteUniform(1, 1000))
242 | plot!(MCMCChains.group(chn_recover, :s).value[ind,:], color = :grey, opacity = 0.1, legend = :false)
243 | end
244 | scatter!(y, color = :blue)
245 |
246 | p
247 | ```
248 |
--------------------------------------------------------------------------------
/tutorials/infinite-mixture-models/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Infinite Mixture Models
3 | engine: julia
4 | aliases:
5 | - ../06-infinite-mixture-model/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | In many applications it is desirable to allow the model to adjust its complexity to the amount of data. Consider for example the task of assigning objects into clusters or groups. This task often involves the specification of the number of groups. However, often times it is not known beforehand how many groups exist. Moreover, in some applictions, e.g. modelling topics in text documents or grouping species, the number of examples per group is heavy tailed. This makes it impossible to predefine the number of groups and requiring the model to form new groups when data points from previously unseen groups are observed.
16 |
17 | A natural approach for such applications is the use of non-parametric models. This tutorial will introduce how to use the Dirichlet process in a mixture of infinitely many Gaussians using Turing. For further information on Bayesian nonparametrics and the Dirichlet process we refer to the [introduction by Zoubin Ghahramani](http://mlg.eng.cam.ac.uk/pub/pdf/Gha12.pdf) and the book "Fundamentals of Nonparametric Bayesian Inference" by Subhashis Ghosal and Aad van der Vaart.
18 |
19 | ```{julia}
20 | using Turing
21 | ```
22 |
23 | ## Mixture Model
24 |
25 | Before introducing infinite mixture models in Turing, we will briefly review the construction of finite mixture models. Subsequently, we will define how to use the [Chinese restaurant process](https://en.wikipedia.org/wiki/Chinese_restaurant_process) construction of a Dirichlet process for non-parametric clustering.
26 |
27 | #### Two-Component Model
28 |
29 | First, consider the simple case of a mixture model with two Gaussian components with fixed covariance.
30 | The generative process of such a model can be written as:
31 |
32 | \begin{equation*}
33 | \begin{aligned}
34 | \pi_1 &\sim \mathrm{Beta}(a, b) \\
35 | \pi_2 &= 1-\pi_1 \\
36 | \mu_1 &\sim \mathrm{Normal}(\mu_0, \Sigma_0) \\
37 | \mu_2 &\sim \mathrm{Normal}(\mu_0, \Sigma_0) \\
38 | z_i &\sim \mathrm{Categorical}(\pi_1, \pi_2) \\
39 | x_i &\sim \mathrm{Normal}(\mu_{z_i}, \Sigma)
40 | \end{aligned}
41 | \end{equation*}
42 |
43 | where $\pi_1, \pi_2$ are the mixing weights of the mixture model, i.e. $\pi_1 + \pi_2 = 1$, and $z_i$ is a latent assignment of the observation $x_i$ to a component (Gaussian).
44 |
45 | We can implement this model in Turing for 1D data as follows:
46 |
47 | ```{julia}
48 | @model function two_model(x)
49 | # Hyper-parameters
50 | μ0 = 0.0
51 | σ0 = 1.0
52 |
53 | # Draw weights.
54 | π1 ~ Beta(1, 1)
55 | π2 = 1 - π1
56 |
57 | # Draw locations of the components.
58 | μ1 ~ Normal(μ0, σ0)
59 | μ2 ~ Normal(μ0, σ0)
60 |
61 | # Draw latent assignment.
62 | z ~ Categorical([π1, π2])
63 |
64 | # Draw observation from selected component.
65 | if z == 1
66 | x ~ Normal(μ1, 1.0)
67 | else
68 | x ~ Normal(μ2, 1.0)
69 | end
70 | end
71 | ```
72 |
73 | #### Finite Mixture Model
74 |
75 | If we have more than two components, this model can elegantly be extended using a Dirichlet distribution as prior for the mixing weights $\pi_1, \dots, \pi_K$. Note that the Dirichlet distribution is the multivariate generalization of the beta distribution. The resulting model can be written as:
76 |
77 | $$
78 | \begin{align}
79 | (\pi_1, \dots, \pi_K) &\sim Dirichlet(K, \alpha) \\
80 | \mu_k &\sim \mathrm{Normal}(\mu_0, \Sigma_0), \;\; \forall k \\
81 | z &\sim Categorical(\pi_1, \dots, \pi_K) \\
82 | x &\sim \mathrm{Normal}(\mu_z, \Sigma)
83 | \end{align}
84 | $$
85 |
86 | which resembles the model in the [Gaussian mixture model tutorial]({{}}) with a slightly different notation.
87 |
88 | ## Infinite Mixture Model
89 |
90 | The question now arises, is there a generalization of a Dirichlet distribution for which the dimensionality $K$ is infinite, i.e. $K = \infty$?
91 |
92 | But first, to implement an infinite Gaussian mixture model in Turing, we first need to load the `Turing.RandomMeasures` module. `RandomMeasures` contains a variety of tools useful in nonparametrics.
93 |
94 | ```{julia}
95 | using Turing.RandomMeasures
96 | ```
97 |
98 | We now will utilize the fact that one can integrate out the mixing weights in a Gaussian mixture model allowing us to arrive at the Chinese restaurant process construction. See Carl E. Rasmussen: [The Infinite Gaussian Mixture Model](https://www.seas.harvard.edu/courses/cs281/papers/rasmussen-1999a.pdf), NIPS (2000) for details.
99 |
100 | In fact, if the mixing weights are integrated out, the conditional prior for the latent variable $z$ is given by:
101 |
102 | $$
103 | p(z_i = k \mid z_{\not i}, \alpha) = \frac{n_k + \alpha K}{N - 1 + \alpha}
104 | $$
105 |
106 | where $z_{\not i}$ are the latent assignments of all observations except observation $i$. Note that we use $n_k$ to denote the number of observations at component $k$ excluding observation $i$. The parameter $\alpha$ is the concentration parameter of the Dirichlet distribution used as prior over the mixing weights.
107 |
108 | #### Chinese Restaurant Process
109 |
110 | To obtain the Chinese restaurant process construction, we can now derive the conditional prior if $K \rightarrow \infty$.
111 |
112 | For $n_k > 0$ we obtain:
113 |
114 | $$
115 | p(z_i = k \mid z_{\not i}, \alpha) = \frac{n_k}{N - 1 + \alpha}
116 | $$
117 |
118 | and for all infinitely many clusters that are empty (combined) we get:
119 |
120 | $$
121 | p(z_i = k \mid z_{\not i}, \alpha) = \frac{\alpha}{N - 1 + \alpha}
122 | $$
123 |
124 | Those equations show that the conditional prior for component assignments is proportional to the number of such observations, meaning that the Chinese restaurant process has a rich get richer property.
125 |
126 | To get a better understanding of this property, we can plot the cluster choosen by for each new observation drawn from the conditional prior.
127 |
128 | ```{julia}
129 | # Concentration parameter.
130 | α = 10.0
131 |
132 | # Random measure, e.g. Dirichlet process.
133 | rpm = DirichletProcess(α)
134 |
135 | # Cluster assignments for each observation.
136 | z = Vector{Int}()
137 |
138 | # Maximum number of observations we observe.
139 | Nmax = 500
140 |
141 | for i in 1:Nmax
142 | # Number of observations per cluster.
143 | K = isempty(z) ? 0 : maximum(z)
144 | nk = Vector{Int}(map(k -> sum(z .== k), 1:K))
145 |
146 | # Draw new assignment.
147 | push!(z, rand(ChineseRestaurantProcess(rpm, nk)))
148 | end
149 | ```
150 |
151 | ```{julia}
152 | using Plots
153 |
154 | # Plot the cluster assignments over time
155 | @gif for i in 1:Nmax
156 | scatter(
157 | collect(1:i),
158 | z[1:i];
159 | markersize=2,
160 | xlabel="observation (i)",
161 | ylabel="cluster (k)",
162 | legend=false,
163 | )
164 | end
165 | ```
166 |
167 | Further, we can see that the number of clusters is logarithmic in the number of observations and data points. This is a side-effect of the "rich-get-richer" phenomenon, i.e. we expect large clusters and thus the number of clusters has to be smaller than the number of observations.
168 |
169 | $$
170 | \mathbb{E}[K \mid N] \approx \alpha \cdot log \big(1 + \frac{N}{\alpha}\big)
171 | $$
172 |
173 | We can see from the equation that the concentration parameter $\alpha$ allows us to control the number of clusters formed *a priori*.
174 |
175 | In Turing we can implement an infinite Gaussian mixture model using the Chinese restaurant process construction of a Dirichlet process as follows:
176 |
177 | ```{julia}
178 | @model function infiniteGMM(x)
179 | # Hyper-parameters, i.e. concentration parameter and parameters of H.
180 | α = 1.0
181 | μ0 = 0.0
182 | σ0 = 1.0
183 |
184 | # Define random measure, e.g. Dirichlet process.
185 | rpm = DirichletProcess(α)
186 |
187 | # Define the base distribution, i.e. expected value of the Dirichlet process.
188 | H = Normal(μ0, σ0)
189 |
190 | # Latent assignment.
191 | z = zeros(Int, length(x))
192 |
193 | # Locations of the infinitely many clusters.
194 | μ = zeros(Float64, 0)
195 |
196 | for i in 1:length(x)
197 |
198 | # Number of clusters.
199 | K = maximum(z)
200 | nk = Vector{Int}(map(k -> sum(z .== k), 1:K))
201 |
202 | # Draw the latent assignment.
203 | z[i] ~ ChineseRestaurantProcess(rpm, nk)
204 |
205 | # Create a new cluster?
206 | if z[i] > K
207 | push!(μ, 0.0)
208 |
209 | # Draw location of new cluster.
210 | μ[z[i]] ~ H
211 | end
212 |
213 | # Draw observation.
214 | x[i] ~ Normal(μ[z[i]], 1.0)
215 | end
216 | end
217 | ```
218 |
219 | We can now use Turing to infer the assignments of some data points. First, we will create some random data that comes from three clusters, with means of 0, -5, and 10.
220 |
221 | ```{julia}
222 | using Plots, Random
223 |
224 | # Generate some test data.
225 | Random.seed!(1)
226 | data = vcat(randn(10), randn(10) .- 5, randn(10) .+ 10)
227 | data .-= mean(data)
228 | data /= std(data);
229 | ```
230 |
231 | Next, we'll sample from our posterior using SMC.
232 |
233 | ```{julia}
234 | #| output: false
235 | setprogress!(false)
236 | ```
237 |
238 | ```{julia}
239 | # MCMC sampling
240 | Random.seed!(2)
241 | iterations = 1000
242 | model_fun = infiniteGMM(data);
243 | chain = sample(model_fun, SMC(), iterations);
244 | ```
245 |
246 | Finally, we can plot the number of clusters in each sample.
247 |
248 | ```{julia}
249 | # Extract the number of clusters for each sample of the Markov chain.
250 | k = map(
251 | t -> length(unique(vec(chain[t, MCMCChains.namesingroup(chain, :z), :].value))),
252 | 1:iterations,
253 | );
254 |
255 | # Visualize the number of clusters.
256 | plot(k; xlabel="Iteration", ylabel="Number of clusters", label="Chain 1")
257 | ```
258 |
259 | If we visualize the histogram of the number of clusters sampled from our posterior, we observe that the model seems to prefer 3 clusters, which is the true number of clusters. Note that the number of clusters in a Dirichlet process mixture model is not limited a priori and will grow to infinity with probability one. However, if conditioned on data the posterior will concentrate on a finite number of clusters enforcing the resulting model to have a finite amount of clusters. It is, however, not given that the posterior of a Dirichlet process Gaussian mixture model converges to the true number of clusters, given that data comes from a finite mixture model. See Jeffrey Miller and Matthew Harrison: [A simple example of Dirichlet process mixture inconsitency for the number of components](https://arxiv.org/pdf/1301.2708.pdf) for details.
260 |
261 | ```{julia}
262 | histogram(k; xlabel="Number of clusters", legend=false)
263 | ```
264 |
265 | One issue with the Chinese restaurant process construction is that the number of latent parameters we need to sample scales with the number of observations. It may be desirable to use alternative constructions in certain cases. Alternative methods of constructing a Dirichlet process can be employed via the following representations:
266 |
267 | Size-Biased Sampling Process
268 |
269 | $$
270 | j_k \sim \mathrm{Beta}(1, \alpha) \cdot \mathrm{surplus}
271 | $$
272 |
273 | Stick-Breaking Process
274 | $$
275 | v_k \sim \mathrm{Beta}(1, \alpha)
276 | $$
277 |
278 | Chinese Restaurant Process
279 | $$
280 | p(z_n = k | z_{1:n-1}) \propto \begin{cases}
281 | \frac{m_k}{n-1+\alpha}, \text{ if } m_k > 0\\\
282 | \frac{\alpha}{n-1+\alpha}
283 | \end{cases}
284 | $$
285 |
286 | For more details see [this article](https://www.stats.ox.ac.uk/%7Eteh/research/npbayes/Teh2010a.pdf).
287 |
--------------------------------------------------------------------------------
/tutorials/multinomial-logistic-regression/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Multinomial Logistic Regression
3 | engine: julia
4 | aliases:
5 | - ../08-multinomial-logistic-regression/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | [Multinomial logistic regression](https://en.wikipedia.org/wiki/Multinomial_logistic_regression) is an extension of logistic regression. Logistic regression is used to model problems in which there are exactly two possible discrete outcomes. Multinomial logistic regression is used to model problems in which there are two or more possible discrete outcomes.
16 |
17 | In our example, we'll be using the iris dataset. The iris multiclass problem aims to predict the species of a flower given measurements (in centimeters) of sepal length and width and petal length and width. There are three possible species: Iris setosa, Iris versicolor, and Iris virginica.
18 |
19 | To start, let's import all the libraries we'll need.
20 |
21 | ```{julia}
22 | # Load Turing.
23 | using Turing
24 |
25 | # Load RDatasets.
26 | using RDatasets
27 |
28 | # Load StatsPlots for visualizations and diagnostics.
29 | using StatsPlots
30 |
31 | # Functionality for splitting and normalizing the data.
32 | using MLDataUtils: shuffleobs, splitobs, rescale!
33 |
34 | # We need a softmax function which is provided by NNlib.
35 | using NNlib: softmax
36 |
37 | # Functionality for constructing arrays with identical elements efficiently.
38 | using FillArrays
39 |
40 | # Functionality for working with scaled identity matrices.
41 | using LinearAlgebra
42 |
43 | # Set a seed for reproducibility.
44 | using Random
45 | Random.seed!(0);
46 | ```
47 |
48 | ## Data Cleaning & Set Up
49 |
50 | Now we're going to import our dataset. Twenty rows of the dataset are shown below so you can get a good feel for what kind of data we have.
51 |
52 | ```{julia}
53 | # Import the "iris" dataset.
54 | data = RDatasets.dataset("datasets", "iris");
55 |
56 | # Show twenty random rows.
57 | data[rand(1:size(data, 1), 20), :]
58 | ```
59 |
60 | In this data set, the outcome `Species` is currently coded as a string. We convert it to a numerical value by using indices `1`, `2`, and `3` to indicate species `setosa`, `versicolor`, and `virginica`, respectively.
61 |
62 | ```{julia}
63 | # Recode the `Species` column.
64 | species = ["setosa", "versicolor", "virginica"]
65 | data[!, :Species_index] = indexin(data[!, :Species], species)
66 |
67 | # Show twenty random rows of the new species columns
68 | data[rand(1:size(data, 1), 20), [:Species, :Species_index]]
69 | ```
70 |
71 | After we've done that tidying, it's time to split our dataset into training and testing sets, and separate the features and target from the data. Additionally, we must rescale our feature variables so that they are centered around zero by subtracting each column by the mean and dividing it by the standard deviation. Without this step, Turing's sampler will have a hard time finding a place to start searching for parameter estimates.
72 |
73 | ```{julia}
74 | # Split our dataset 50%/50% into training/test sets.
75 | trainset, testset = splitobs(shuffleobs(data), 0.5)
76 |
77 | # Define features and target.
78 | features = [:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]
79 | target = :Species_index
80 |
81 | # Turing requires data in matrix and vector form.
82 | train_features = Matrix(trainset[!, features])
83 | test_features = Matrix(testset[!, features])
84 | train_target = trainset[!, target]
85 | test_target = testset[!, target]
86 |
87 | # Standardize the features.
88 | μ, σ = rescale!(train_features; obsdim=1)
89 | rescale!(test_features, μ, σ; obsdim=1);
90 | ```
91 |
92 | ## Model Declaration
93 |
94 | Finally, we can define our model `logistic_regression`. It is a function that takes three arguments where
95 |
96 | - `x` is our set of independent variables;
97 | - `y` is the element we want to predict;
98 | - `σ` is the standard deviation we want to assume for our priors.
99 |
100 | We select the `setosa` species as the baseline class (the choice does not matter). Then we create the intercepts and vectors of coefficients for the other classes against that baseline. More concretely, we create scalar intercepts `intercept_versicolor` and `intersept_virginica` and coefficient vectors `coefficients_versicolor` and `coefficients_virginica` with four coefficients each for the features `SepalLength`, `SepalWidth`, `PetalLength` and `PetalWidth`. We assume a normal distribution with mean zero and standard deviation `σ` as prior for each scalar parameter. We want to find the posterior distribution of these, in total ten, parameters to be able to predict the species for any given set of features.
101 |
102 | ```{julia}
103 | # Bayesian multinomial logistic regression
104 | @model function logistic_regression(x, y, σ)
105 | n = size(x, 1)
106 | length(y) == n ||
107 | throw(DimensionMismatch("number of observations in `x` and `y` is not equal"))
108 |
109 | # Priors of intercepts and coefficients.
110 | intercept_versicolor ~ Normal(0, σ)
111 | intercept_virginica ~ Normal(0, σ)
112 | coefficients_versicolor ~ MvNormal(Zeros(4), σ^2 * I)
113 | coefficients_virginica ~ MvNormal(Zeros(4), σ^2 * I)
114 |
115 | # Compute the likelihood of the observations.
116 | values_versicolor = intercept_versicolor .+ x * coefficients_versicolor
117 | values_virginica = intercept_virginica .+ x * coefficients_virginica
118 | for i in 1:n
119 | # the 0 corresponds to the base category `setosa`
120 | v = softmax([0, values_versicolor[i], values_virginica[i]])
121 | y[i] ~ Categorical(v)
122 | end
123 | end;
124 | ```
125 |
126 | ## Sampling
127 |
128 | Now we can run our sampler. This time we'll use [`NUTS`](https://turinglang.org/stable/docs/library/#Turing.Inference.NUTS) to sample from our posterior.
129 |
130 | ```{julia}
131 | #| output: false
132 | setprogress!(false)
133 | ```
134 |
135 | ```{julia}
136 | #| output: false
137 | m = logistic_regression(train_features, train_target, 1)
138 | chain = sample(m, NUTS(), MCMCThreads(), 1_500, 3)
139 | ```
140 |
141 |
142 | ```{julia}
143 | #| echo: false
144 | chain
145 | ```
146 |
147 | ::: {.callout-warning collapse="true"}
148 | ## Sampling With Multiple Threads
149 | The `sample()` call above assumes that you have at least `nchains` threads available in your Julia instance. If you do not, the multiple chains
150 | will run sequentially, and you may notice a warning. For more information, see [the Turing documentation on sampling multiple chains.]({{}}#sampling-multiple-chains)
151 | :::
152 |
153 | Since we ran multiple chains, we may as well do a spot check to make sure each chain converges around similar points.
154 |
155 | ```{julia}
156 | plot(chain)
157 | ```
158 |
159 | Looks good!
160 |
161 | We can also use the `corner` function from MCMCChains to show the distributions of the various parameters of our multinomial logistic regression. The corner function requires MCMCChains and StatsPlots.
162 |
163 | ```{julia}
164 | # Only plotting the first 3 coefficients due to a bug in Plots.jl
165 | corner(
166 | chain,
167 | MCMCChains.namesingroup(chain, :coefficients_versicolor)[1:3];
168 | )
169 | ```
170 |
171 | ```{julia}
172 | # Only plotting the first 3 coefficients due to a bug in Plots.jl
173 | corner(
174 | chain,
175 | MCMCChains.namesingroup(chain, :coefficients_virginica)[1:3];
176 | )
177 | ```
178 |
179 | Fortunately the corner plots appear to demonstrate unimodal distributions for each of our parameters, so it should be straightforward to take the means of each parameter's sampled values to estimate our model to make predictions.
180 |
181 | ## Making Predictions
182 |
183 | How do we test how well the model actually predicts which of the three classes an iris flower belongs to? We need to build a `prediction` function that takes the test dataset and runs it through the average parameter calculated during sampling.
184 |
185 | The `prediction` function below takes a `Matrix` and a `Chains` object. It computes the mean of the sampled parameters and calculates the species with the highest probability for each observation. Note that we do not have to evaluate the `softmax` function since it does not affect the order of its inputs.
186 |
187 | ```{julia}
188 | function prediction(x::Matrix, chain)
189 | # Pull the means from each parameter's sampled values in the chain.
190 | intercept_versicolor = mean(chain, :intercept_versicolor)
191 | intercept_virginica = mean(chain, :intercept_virginica)
192 | coefficients_versicolor = [
193 | mean(chain, k) for k in MCMCChains.namesingroup(chain, :coefficients_versicolor)
194 | ]
195 | coefficients_virginica = [
196 | mean(chain, k) for k in MCMCChains.namesingroup(chain, :coefficients_virginica)
197 | ]
198 |
199 | # Compute the index of the species with the highest probability for each observation.
200 | values_versicolor = intercept_versicolor .+ x * coefficients_versicolor
201 | values_virginica = intercept_virginica .+ x * coefficients_virginica
202 | species_indices = [
203 | argmax((0, x, y)) for (x, y) in zip(values_versicolor, values_virginica)
204 | ]
205 |
206 | return species_indices
207 | end;
208 | ```
209 |
210 | Let's see how we did! We run the test matrix through the prediction function, and compute the accuracy for our prediction.
211 |
212 | ```{julia}
213 | # Make the predictions.
214 | predictions = prediction(test_features, chain)
215 |
216 | # Calculate accuracy for our test set.
217 | mean(predictions .== testset[!, :Species_index])
218 | ```
219 |
220 | Perhaps more important is to see the accuracy per class.
221 |
222 | ```{julia}
223 | for s in 1:3
224 | rows = testset[!, :Species_index] .== s
225 | println("Number of `", species[s], "`: ", count(rows))
226 | println(
227 | "Percentage of `",
228 | species[s],
229 | "` predicted correctly: ",
230 | mean(predictions[rows] .== testset[rows, :Species_index]),
231 | )
232 | end
233 | ```
234 |
235 | This tutorial has demonstrated how to use Turing to perform Bayesian multinomial logistic regression.
236 |
--------------------------------------------------------------------------------
/usage/automatic-differentiation/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Automatic Differentiation
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/docs-10-using-turing-autodiff/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | ## Switching AD Modes
16 |
17 | Turing currently supports four automatic differentiation (AD) backends for sampling: [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) for forward-mode AD; and [Mooncake](https://github.com/compintell/Mooncake.jl) and [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl) for reverse-mode AD.
18 | `ForwardDiff` is automatically imported by Turing. To utilize `Mooncake` or `ReverseDiff` for AD, users must explicitly import them with `import Mooncake` or `import ReverseDiff`, alongside the usual `using Turing`.
19 |
20 | As of Turing version v0.30, the global configuration flag for the AD backend has been removed in favour of [`AdTypes.jl`](https://github.com/SciML/ADTypes.jl), allowing users to specify the AD backend for individual samplers independently.
21 | Users can pass the `adtype` keyword argument to the sampler constructor to select the desired AD backend, with the default being `AutoForwardDiff(; chunksize=0)`.
22 |
23 | For `ForwardDiff`, pass `adtype=AutoForwardDiff(; chunksize)` to the sampler constructor. A `chunksize` of `nothing` permits the chunk size to be automatically determined. For more information regarding the selection of `chunksize`, please refer to [related section of `ForwardDiff`'s documentation](https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Configuring-Chunk-Size).
24 |
25 | For `ReverseDiff`, pass `adtype=AutoReverseDiff()` to the sampler constructor. An additional keyword argument called `compile` can be provided to `AutoReverseDiff`. It specifies whether to pre-record the tape only once and reuse it later (`compile` is set to `false` by default, which means no pre-recording). This can substantially improve performance, but risks silently incorrect results if not used with care.
26 |
27 | Pre-recorded tapes should only be used if you are absolutely certain that the sequence of operations performed in your code does not change between different executions of your model.
28 |
29 | Thus, e.g., in the model definition and all implicitly and explicitly called functions in the model, all loops should be of fixed size, and `if`-statements should consistently execute the same branches.
30 | For instance, `if`-statements with conditions that can be determined at compile time or conditions that depend only on fixed properties of the model, e.g. fixed data.
31 | However, `if`-statements that depend on the model parameters can take different branches during sampling; hence, the compiled tape might be incorrect.
32 | Thus you must not use compiled tapes when your model makes decisions based on the model parameters, and you should be careful if you compute functions of parameters that those functions do not have branching which might cause them to execute different code for different values of the parameter.
33 |
34 | The previously used interface functions including `ADBackend`, `setadbackend`, `setsafe`, `setchunksize`, and `setrdcache` have been removed.
35 |
36 | For `Mooncake`, pass `adtype=AutoMooncake(; config=nothing)` to the sampler constructor.
37 |
38 | ## Compositional Sampling with Differing AD Modes
39 |
40 | Turing supports intermixed automatic differentiation methods for different variable spaces. The snippet below shows using `ForwardDiff` to sample the mean (`m`) parameter, and using `ReverseDiff` for the variance (`s`) parameter:
41 |
42 | ```{julia}
43 | using Turing
44 | using ReverseDiff
45 |
46 | # Define a simple Normal model with unknown mean and variance.
47 | @model function gdemo(x, y)
48 | s² ~ InverseGamma(2, 3)
49 | m ~ Normal(0, sqrt(s²))
50 | x ~ Normal(m, sqrt(s²))
51 | return y ~ Normal(m, sqrt(s²))
52 | end
53 |
54 | # Sample using Gibbs and varying autodiff backends.
55 | c = sample(
56 | gdemo(1.5, 2),
57 | Gibbs(
58 | :m => HMC(0.1, 5; adtype=AutoForwardDiff(; chunksize=0)),
59 | :s² => HMC(0.1, 5; adtype=AutoReverseDiff(false)),
60 | ),
61 | 1000,
62 | progress=false,
63 | )
64 | ```
65 |
66 | Generally, reverse-mode AD, for instance `ReverseDiff`, is faster when sampling from variables of high dimensionality (greater than 20), while forward-mode AD, for instance `ForwardDiff`, is more efficient for lower-dimension variables. This functionality allows those who are performance sensitive to fine tune their automatic differentiation for their specific models.
67 |
68 | If the differentiation method is not specified in this way, Turing will default to using whatever the global AD backend is.
69 | Currently, this defaults to `ForwardDiff`.
70 |
71 | The most reliable way to ensure you are using the fastest AD that works for your problem is to benchmark them using the functionality in DynamicPPL (see [the API documentation](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities)):
72 |
73 | ```{julia}
74 | using DynamicPPL.TestUtils.AD: run_ad, ADResult
75 | using ForwardDiff, ReverseDiff
76 |
77 | model = gdemo(1.5, 2)
78 |
79 | for adtype in [AutoForwardDiff(), AutoReverseDiff()]
80 | result = run_ad(model, adtype; benchmark=true)
81 | @show result.time_vs_primal
82 | end
83 | ```
84 |
85 | In this specific instance, ForwardDiff is clearly faster (due to the small size of the model).
86 |
87 | We also have a table of benchmarks for various models and AD backends in [the ADTests website](https://turinglang.org/ADTests/).
88 | These models aim to capture a variety of different Turing.jl features.
89 | If you have suggestions for things to include, please do let us know by [creating an issue on GitHub](https://github.com/TuringLang/ADTests/issues/new)!
90 |
--------------------------------------------------------------------------------
/usage/custom-distribution/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Custom Distributions
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/usage-custom-distribution/index.html
6 | - ../../tutorials/docs-09-using-turing-advanced/index.html
7 | ---
8 |
9 | ```{julia}
10 | #| echo: false
11 | #| output: false
12 | using Pkg;
13 | Pkg.instantiate();
14 | ```
15 |
16 | `Turing.jl` supports the use of distributions from the Distributions.jl package.
17 | By extension, it also supports the use of customized distributions by defining them as subtypes of `Distribution` type of the Distributions.jl package, as well as corresponding functions.
18 |
19 | This page shows a workflow of how to define a customized distribution, using our own implementation of a simple `Uniform` distribution as a simple example.
20 |
21 | ```{julia}
22 | #| output: false
23 | using Distributions, Turing, Random, Bijectors
24 | ```
25 |
26 | ## Define the Distribution Type
27 |
28 | First, define a type of the distribution, as a subtype of a corresponding distribution type in the Distributions.jl package.
29 |
30 | ```{julia}
31 | struct CustomUniform <: ContinuousUnivariateDistribution end
32 | ```
33 |
34 | ## Implement Sampling and Evaluation of the log-pdf
35 |
36 | Second, implement the `rand` and `logpdf` functions for your new distribution, which will be used to run the model.
37 |
38 | ```{julia}
39 | # sample in [0, 1]
40 | Distributions.rand(rng::AbstractRNG, d::CustomUniform) = rand(rng)
41 |
42 | # p(x) = 1 → log[p(x)] = 0
43 | Distributions.logpdf(d::CustomUniform, x::Real) = zero(x)
44 | ```
45 |
46 | ## Define Helper Functions
47 |
48 | In most cases, it may be required to define some helper functions.
49 |
50 | ### Domain Transformation
51 |
52 | Certain samplers, such as `HMC`, require the domain of the priors to be unbounded.
53 | Therefore, to use our `CustomUniform` as a prior in a model we also need to define how to transform samples from `[0, 1]` to `ℝ`.
54 | To do this, we need to define the corresponding `Bijector` from `Bijectors.jl`, which is what `Turing.jl` uses internally to deal with constrained distributions.
55 |
56 | To transform from `[0, 1]` to `ℝ` we can use the `Logit` bijector:
57 |
58 | ```{julia}
59 | Bijectors.bijector(d::CustomUniform) = Logit(0.0, 1.0)
60 | ```
61 |
62 | In the present example, `CustomUniform` is a subtype of `ContinuousUnivariateDistribution`.
63 | The procedure for subtypes of `ContinuousMultivariateDistribution` and `ContinuousMatrixDistribution` is exactly the same.
64 | For example, `Wishart` defines a distribution over positive-definite matrices and so `bijector` returns a `PDBijector` when called with a `Wishart` distribution as an argument.
65 | For discrete distributions, there is no need to define a bijector; the `Identity` bijector is used by default.
66 |
67 | As an alternative to the above, for `UnivariateDistribution` we could define the `minimum` and `maximum` of the distribution:
68 |
69 | ```{julia}
70 | Distributions.minimum(d::CustomUniform) = 0.0
71 | Distributions.maximum(d::CustomUniform) = 1.0
72 | ```
73 |
74 | and `Bijectors.jl` will return a default `Bijector` called `TruncatedBijector` which makes use of `minimum` and `maximum` derive the correct transformation.
75 |
76 | Internally, Turing basically does the following when it needs to convert a constrained distribution to an unconstrained distribution, e.g. when sampling using `HMC`:
77 |
78 | ```{julia}
79 | dist = Gamma(2,3)
80 | b = bijector(dist)
81 | transformed_dist = transformed(dist, b) # results in distribution with transformed support + correction for logpdf
82 | ```
83 |
84 | and then we can call `rand` and `logpdf` as usual, where
85 |
86 | - `rand(transformed_dist)` returns a sample in the unconstrained space, and
87 | - `logpdf(transformed_dist, y)` returns the log density of the original distribution, but with `y` living in the unconstrained space.
88 |
89 | To read more about Bijectors.jl, check out [its documentation](https://turinglang.org/Bijectors.jl/stable/).
90 |
--------------------------------------------------------------------------------
/usage/dynamichmc/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Using DynamicHMC
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/docs-11-using-turing-dynamichmc/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | Turing supports the use of [DynamicHMC](https://github.com/tpapp/DynamicHMC.jl) as a sampler through the `DynamicNUTS` function.
16 |
17 | To use the `DynamicNUTS` function, you must import the `DynamicHMC` package as well as Turing. Turing does not formally require `DynamicHMC` but will include additional functionality if both packages are present.
18 |
19 | Here is a brief example:
20 |
21 | ### How to apply `DynamicNUTS`:
22 |
23 | ```{julia}
24 | # Import Turing and DynamicHMC.
25 | using DynamicHMC, Turing
26 |
27 | # Model definition.
28 | @model function gdemo(x, y)
29 | s² ~ InverseGamma(2, 3)
30 | m ~ Normal(0, sqrt(s²))
31 | x ~ Normal(m, sqrt(s²))
32 | return y ~ Normal(m, sqrt(s²))
33 | end
34 |
35 | # Pull 2,000 samples using DynamicNUTS.
36 | dynamic_nuts = externalsampler(DynamicHMC.NUTS())
37 | chn = sample(gdemo(1.5, 2.0), dynamic_nuts, 2000, progress=false)
38 | ```
39 |
--------------------------------------------------------------------------------
/usage/external-samplers/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Using External Samplers
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/docs-16-using-turing-external-samplers/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | ## Using External Samplers on Turing Models
16 |
17 | `Turing` provides several wrapped samplers from external sampling libraries, e.g., HMC samplers from `AdvancedHMC`.
18 | These wrappers allow new users to seamlessly sample statistical models without leaving `Turing`
19 | However, these wrappers might only sometimes be complete, missing some functionality from the wrapped sampling library.
20 | Moreover, users might want to use samplers currently not wrapped within `Turing`.
21 |
22 | For these reasons, `Turing` also makes running external samplers on Turing models easy without any necessary modifications or wrapping!
23 | Throughout, we will use a 10-dimensional Neal's funnel as a running example::
24 |
25 | ```{julia}
26 | # Import libraries.
27 | using Turing, Random, LinearAlgebra
28 |
29 | d = 10
30 | @model function funnel()
31 | θ ~ Truncated(Normal(0, 3), -3, 3)
32 | z ~ MvNormal(zeros(d - 1), exp(θ) * I)
33 | return x ~ MvNormal(z, I)
34 | end
35 | ```
36 |
37 | Now we sample the model to generate some observations, which we can then condition on.
38 |
39 | ```{julia}
40 | (; x) = rand(funnel() | (θ=0,))
41 | model = funnel() | (; x);
42 | ```
43 |
44 | Users can use any sampler algorithm to sample this model if it follows the `AbstractMCMC` API.
45 | Before discussing how this is done in practice, giving a high-level description of the process is interesting.
46 | Imagine that we created an instance of an external sampler that we will call `spl` such that `typeof(spl)<:AbstractMCMC.AbstractSampler`.
47 | In order to avoid type ambiguity within Turing, at the moment it is necessary to declare `spl` as an external sampler to Turing `espl = externalsampler(spl)`, where `externalsampler(s::AbstractMCMC.AbstractSampler)` is a Turing function that types our external sampler adequately.
48 |
49 | An excellent point to start to show how this is done in practice is by looking at the sampling library `AdvancedMH` ([`AdvancedMH`'s GitHub](https://github.com/TuringLang/AdvancedMH.jl)) for Metropolis-Hastings (MH) methods.
50 | Let's say we want to use a random walk Metropolis-Hastings sampler without specifying the proposal distributions.
51 | The code below constructs an MH sampler using a multivariate Gaussian distribution with zero mean and unit variance in `d` dimensions as a random walk proposal.
52 |
53 | ```{julia}
54 | # Importing the sampling library
55 | using AdvancedMH
56 | rwmh = AdvancedMH.RWMH(d)
57 | ```
58 |
59 | ```{julia}
60 | #| output: false
61 | setprogress!(false)
62 | ```
63 |
64 | Sampling is then as easy as:
65 |
66 |
67 | ```{julia}
68 | chain = sample(model, externalsampler(rwmh), 10_000)
69 | ```
70 |
71 | ## Going beyond the Turing API
72 |
73 | As previously mentioned, the Turing wrappers can often limit the capabilities of the sampling libraries they wrap.
74 | `AdvancedHMC`[^1] ([`AdvancedHMC`'s GitHub](https://github.com/TuringLang/AdvancedHMC.jl)) is a clear example of this. A common practice when performing HMC is to provide an initial guess for the mass matrix.
75 | However, the native HMC sampler within Turing only allows the user to specify the type of the mass matrix despite the two options being possible within `AdvancedHMC`.
76 | Thankfully, we can use Turing's support for external samplers to define an HMC sampler with a custom mass matrix in `AdvancedHMC` and then use it to sample our Turing model.
77 |
78 | We can use the library `Pathfinder`[^2] ([`Pathfinder`'s GitHub](https://github.com/mlcolab/Pathfinder.jl)) to construct our estimate of mass matrix.
79 | `Pathfinder` is a variational inference algorithm that first finds the maximum a posteriori (MAP) estimate of a target posterior distribution and then uses the trace of the optimization to construct a sequence of multivariate normal approximations to the target distribution.
80 | In this process, `Pathfinder` computes an estimate of the mass matrix the user can access.
81 | You can see an example of how to use `Pathfinder` with Turing in [`Pathfinder`'s docs](https://mlcolab.github.io/Pathfinder.jl/stable/examples/turing/).
82 |
83 | ## Using new inference methods
84 |
85 | So far we have used Turing's support for external samplers to go beyond the capabilities of the wrappers.
86 | We want to use this support to employ a sampler not supported within Turing's ecosystem yet.
87 | We will use the recently developed Micro-Cannoncial Hamiltonian Monte Carlo (MCHMC) sampler to showcase this.
88 | MCHMC[[^3],[^4]] ((MCHMC's GitHub)[https://github.com/JaimeRZP/MicroCanonicalHMC.jl]) is HMC sampler that uses one single Hamiltonian energy level to explore the whole parameter space.
89 | This is achieved by simulating the dynamics of a microcanonical Hamiltonian with an additional noise term to ensure ergodicity.
90 |
91 | Using this as well as other inference methods outside the Turing ecosystem is as simple as executing the code shown below:
92 |
93 | ```{julia}
94 | using MicroCanonicalHMC
95 | # Create MCHMC sampler
96 | n_adapts = 1_000 # adaptation steps
97 | tev = 0.01 # target energy variance
98 | mchmc = MCHMC(n_adapts, tev; adaptive=true)
99 |
100 | # Sample
101 | chain = sample(model, externalsampler(mchmc), 10_000)
102 | ```
103 |
104 | The only requirement to work with `externalsampler` is that the provided `sampler` must implement the AbstractMCMC.jl-interface [INSERT LINK] for a `model` of type `AbstractMCMC.LogDensityModel` [INSERT LINK].
105 |
106 | As previously stated, in order to use external sampling libraries within `Turing` they must follow the `AbstractMCMC` API.
107 | In this section, we will briefly dwell on what this entails.
108 | First and foremost, the sampler should be a subtype of `AbstractMCMC.AbstractSampler`.
109 | Second, the stepping function of the MCMC algorithm must be made defined using `AbstractMCMC.step` and follow the structure below:
110 |
111 | ```{julia}
112 | #| eval: false
113 | # First step
114 | function AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}(
115 | rng::Random.AbstractRNG,
116 | model::AbstractMCMC.LogDensityModel,
117 | spl::T;
118 | kwargs...,
119 | )
120 | [...]
121 | return transition, sample
122 | end
123 |
124 | # N+1 step
125 | function AbstractMCMC.step{T<:AbstractMCMC.AbstractSampler}(
126 | rng::Random.AbstractRNG,
127 | model::AbstractMCMC.LogDensityModel,
128 | sampler::T,
129 | state;
130 | kwargs...,
131 | )
132 | [...]
133 | return transition, sample
134 | end
135 | ```
136 |
137 | There are several characteristics to note in these functions:
138 |
139 | - There must be two `step` functions:
140 |
141 | + A function that performs the first step and initializes the sampler.
142 | + A function that performs the following steps and takes an extra input, `state`, which carries the initialization information.
143 |
144 | - The functions must follow the displayed signatures.
145 | - The output of the functions must be a transition, the current state of the sampler, and a sample, what is saved to the MCMC chain.
146 |
147 | The last requirement is that the transition must be structured with a field `θ`, which contains the values of the parameters of the model for said transition.
148 | This allows `Turing` to seamlessly extract the parameter values at each step of the chain when bundling the chains.
149 | Note that if the external sampler produces transitions that Turing cannot parse, the bundling of the samples will be different or fail.
150 |
151 | For practical examples of how to adapt a sampling library to the `AbstractMCMC` interface, the readers can consult the following libraries:
152 |
153 | - [AdvancedMH](https://github.com/TuringLang/AdvancedMH.jl/blob/458a602ac32a8514a117d4c671396a9ba8acbdab/src/mh-core.jl#L73-L115)
154 | - [AdvancedHMC](https://github.com/TuringLang/AdvancedHMC.jl/blob/762e55f894d142495a41a6eba0eed9201da0a600/src/abstractmcmc.jl#L102-L170)
155 | - [MicroCanonicalHMC](https://github.com/JaimeRZP/MicroCanonicalHMC.jl/blob/master/src/abstractmcmc.jl)
156 |
157 |
158 | [^1]: Xu et al., [AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms](http://proceedings.mlr.press/v118/xu20a/xu20a.pdf), 2019
159 | [^2]: Zhang et al., [Pathfinder: Parallel quasi-Newton variational inference](https://arxiv.org/abs/2108.03782), 2021
160 | [^3]: Robnik et al, [Microcanonical Hamiltonian Monte Carlo](https://arxiv.org/abs/2212.08549), 2022
161 | [^4]: Robnik and Seljak, [Langevine Hamiltonian Monte Carlo](https://arxiv.org/abs/2303.18221), 2023
162 |
--------------------------------------------------------------------------------
/usage/mode-estimation/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Mode Estimation
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/docs-17-mode-estimation/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | After defining a statistical model, in addition to sampling from its distributions, one may be interested in finding the parameter values that maximise for instance the posterior distribution density function or the likelihood. This is called mode estimation. Turing provides support for two mode estimation techniques, [maximum likelihood estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) and [maximum a posterior](https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation) (MAP) estimation.
16 |
17 | To demonstrate mode estimation, let us load Turing and declare a model:
18 |
19 | ```{julia}
20 | using Turing
21 |
22 | @model function gdemo(x)
23 | s² ~ InverseGamma(2, 3)
24 | m ~ Normal(0, sqrt(s²))
25 |
26 | for i in eachindex(x)
27 | x[i] ~ Normal(m, sqrt(s²))
28 | end
29 | end
30 | ```
31 |
32 | Once the model is defined, we can construct a model instance as we normally would:
33 |
34 | ```{julia}
35 | # Instantiate the gdemo model with our data.
36 | data = [1.5, 2.0]
37 | model = gdemo(data)
38 | ```
39 |
40 | Finding the maximum aposteriori or maximum likelihood parameters is as simple as
41 |
42 | ```{julia}
43 | # Generate a MLE estimate.
44 | mle_estimate = maximum_likelihood(model)
45 |
46 | # Generate a MAP estimate.
47 | map_estimate = maximum_a_posteriori(model)
48 | ```
49 |
50 | The estimates are returned as instances of the `ModeResult` type. It has the fields `values` for the parameter values found and `lp` for the log probability at the optimum, as well as `f` for the objective function and `optim_result` for more detailed results of the optimisation procedure.
51 |
52 | ```{julia}
53 | @show mle_estimate.values
54 | @show mle_estimate.lp;
55 | ```
56 |
57 | ## Controlling the optimisation process
58 |
59 | Under the hood `maximum_likelihood` and `maximum_a_posteriori` use the [Optimization.jl](https://github.com/SciML/Optimization.jl) package, which provides a unified interface to many other optimisation packages. By default Turing typically uses the [LBFGS](https://en.wikipedia.org/wiki/Limited-memory_BFGS) method from [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) to find the mode estimate, but we can easily change that:
60 |
61 | ```{julia}
62 | using OptimizationOptimJL: NelderMead
63 | @show maximum_likelihood(model, NelderMead())
64 |
65 | using OptimizationNLopt: NLopt.LD_TNEWTON_PRECOND_RESTART
66 | @show maximum_likelihood(model, LD_TNEWTON_PRECOND_RESTART());
67 | ```
68 |
69 | The above are just two examples, Optimization.jl supports [many more](https://docs.sciml.ai/Optimization/stable/).
70 |
71 | We can also help the optimisation by giving it a starting point we know is close to the final solution, or by specifying an automatic differentiation method
72 |
73 | ```{julia}
74 | using ADTypes: AutoReverseDiff
75 | import ReverseDiff
76 | maximum_likelihood(
77 | model, NelderMead(); initial_params=[0.1, 2], adtype=AutoReverseDiff()
78 | )
79 | ```
80 |
81 | When providing values to arguments like `initial_params` the parameters are typically specified in the order in which they appear in the code of the model, so in this case first `s²` then `m`. More precisely it's the order returned by `Turing.Inference.getparams(model, Turing.VarInfo(model))`.
82 |
83 | We can also do constrained optimisation, by providing either intervals within which the parameters must stay, or costraint functions that they need to respect. For instance, here's how one can find the MLE with the constraint that the variance must be less than 0.01 and the mean must be between -1 and 1.:
84 |
85 | ```{julia}
86 | maximum_likelihood(model; lb=[0.0, -1.0], ub=[0.01, 1.0])
87 | ```
88 |
89 | The arguments for lower (`lb`) and upper (`ub`) bounds follow the arguments of `Optimization.OptimizationProblem`, as do other parameters for providing [constraints](https://docs.sciml.ai/Optimization/stable/tutorials/constraints/), such as `cons`. Any extraneous keyword arguments given to `maximum_likelihood` or `maximum_a_posteriori` are passed to `Optimization.solve`. Some often useful ones are `maxiters` for controlling the maximum number of iterations and `abstol` and `reltol` for the absolute and relative convergence tolerances:
90 |
91 | ```{julia}
92 | badly_converged_mle = maximum_likelihood(
93 | model, NelderMead(); maxiters=10, reltol=1e-9
94 | )
95 | ```
96 |
97 | We can check whether the optimisation converged using the `optim_result` field of the result:
98 |
99 | ```{julia}
100 | @show badly_converged_mle.optim_result;
101 | ```
102 |
103 | For more details, such as a full list of possible arguments, we encourage the reader to read the docstring of the function `Turing.Optimisation.estimate_mode`, which is what `maximum_likelihood` and `maximum_a_posteriori` call, and the documentation of [Optimization.jl](https://docs.sciml.ai/Optimization/stable/).
104 |
105 | ## Analyzing your mode estimate
106 |
107 | Turing extends several methods from `StatsBase` that can be used to analyze your mode estimation results. Methods implemented include `vcov`, `informationmatrix`, `coeftable`, `params`, and `coef`, among others.
108 |
109 | For example, let's examine our ML estimate from above using `coeftable`:
110 |
111 | ```{julia}
112 | using StatsBase: coeftable
113 | coeftable(mle_estimate)
114 | ```
115 |
116 | Standard errors are calculated from the Fisher information matrix (inverse Hessian of the log likelihood or log joint). Note that standard errors calculated in this way may not always be appropriate for MAP estimates, so please be cautious in interpreting them.
117 |
118 | ## Sampling with the MAP/MLE as initial states
119 |
120 | You can begin sampling your chain from an MLE/MAP estimate by extracting the vector of parameter values and providing it to the `sample` function with the keyword `initial_params`. For example, here is how to sample from the full posterior using the MAP estimate as the starting point:
121 |
122 | ```{julia}
123 | #| eval: false
124 | map_estimate = maximum_a_posteriori(model)
125 | chain = sample(model, NUTS(), 1_000; initial_params=map_estimate.values.array)
126 | ```
127 |
--------------------------------------------------------------------------------
/usage/modifying-logprob/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Modifying the Log Probability
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/usage-modifying-logprob/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | Turing accumulates log probabilities internally in an internal data structure that is accessible through the internal variable `__varinfo__` inside of the model definition.
16 | To avoid users having to deal with internal data structures, Turing provides the `Turing.@addlogprob!` macro which increases the accumulated log probability.
17 | For instance, this allows you to
18 | [include arbitrary terms in the likelihood](https://github.com/TuringLang/Turing.jl/issues/1332)
19 |
20 | ```{julia}
21 | using Turing
22 |
23 | myloglikelihood(x, μ) = loglikelihood(Normal(μ, 1), x)
24 |
25 | @model function demo(x)
26 | μ ~ Normal()
27 | Turing.@addlogprob! myloglikelihood(x, μ)
28 | end
29 | ```
30 |
31 | and to force a sampler to [reject a sample](https://github.com/TuringLang/Turing.jl/issues/1328):
32 |
33 | ```{julia}
34 | using Turing
35 | using LinearAlgebra
36 |
37 | @model function demo(x)
38 | m ~ MvNormal(zero(x), I)
39 | if dot(m, x) < 0
40 | Turing.@addlogprob! -Inf
41 | # Exit the model evaluation early
42 | return nothing
43 | end
44 |
45 | x ~ MvNormal(m, I)
46 | return nothing
47 | end
48 | ```
49 |
50 | Note that `@addlogprob!` always increases the accumulated log probability, regardless of the provided
51 | sampling context.
52 | For instance, if you do not want to apply `Turing.@addlogprob!` when evaluating the prior of your model but only when computing the log likelihood and the log joint probability, then you should [check the type of the internal variable `__context_`](https://github.com/TuringLang/DynamicPPL.jl/issues/154), as in the following example:
53 |
54 | ```{julia}
55 | #| eval: false
56 | if DynamicPPL.leafcontext(__context__) !== Turing.PriorContext()
57 | Turing.@addlogprob! myloglikelihood(x, μ)
58 | end
59 | ```
60 |
--------------------------------------------------------------------------------
/usage/performance-tips/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Performance Tips
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/docs-13-using-turing-performance-tips/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | This section briefly summarises a few common techniques to ensure good performance when using Turing.
16 | We refer to [the Julia documentation](https://docs.julialang.org/en/v1/manual/performance-tips/index.html) for general techniques to ensure good performance of Julia programs.
17 |
18 | ## Use multivariate distributions
19 |
20 | It is generally preferable to use multivariate distributions if possible.
21 |
22 | The following example:
23 |
24 | ```{julia}
25 | using Turing
26 | @model function gmodel(x)
27 | m ~ Normal()
28 | for i in 1:length(x)
29 | x[i] ~ Normal(m, 0.2)
30 | end
31 | end
32 | ```
33 |
34 | can be directly expressed more efficiently using a simple transformation:
35 |
36 | ```{julia}
37 | using FillArrays
38 |
39 | @model function gmodel(x)
40 | m ~ Normal()
41 | return x ~ MvNormal(Fill(m, length(x)), 0.04 * I)
42 | end
43 | ```
44 |
45 | ## Choose your AD backend
46 |
47 | Automatic differentiation (AD) makes it possible to use modern, efficient gradient-based samplers like NUTS and HMC, and that means a good AD system is incredibly important. Turing currently
48 | supports several AD backends, including [ForwardDiff](https://github.com/JuliaDiff/ForwardDiff.jl) (the default),
49 | [Mooncake](https://github.com/compintell/Mooncake.jl),
50 | [Zygote](https://github.com/FluxML/Zygote.jl), and
51 | [ReverseDiff](https://github.com/JuliaDiff/ReverseDiff.jl).
52 |
53 | For many common types of models, the default ForwardDiff backend performs great, and there is no need to worry about changing it. However, if you need more speed, you can try
54 | different backends via the standard [ADTypes](https://github.com/SciML/ADTypes.jl) interface by passing an `AbstractADType` to the sampler with the optional `adtype` argument, e.g.
55 | `NUTS(adtype = AutoZygote())`. See [Automatic Differentiation]({{}}) for details. Generally, `adtype = AutoForwardDiff()` is likely to be the fastest and most reliable for models with
56 | few parameters (say, less than 20 or so), while reverse-mode backends such as `AutoZygote()` or `AutoReverseDiff()` will perform better for models with many parameters or linear algebra
57 | operations. If in doubt, it's easy to try a few different backends to see how they compare.
58 |
59 | ### Special care for Zygote
60 |
61 | Note that Zygote will not perform well if your model contains `for`-loops, due to the way reverse-mode AD is implemented in these packages. Zygote also cannot differentiate code
62 | that contains mutating operations. If you can't implement your model without `for`-loops or mutation, `ReverseDiff` will be a better, more performant option. In general, though,
63 | vectorized operations are still likely to perform best.
64 |
65 | Avoiding loops can be done using `filldist(dist, N)` and `arraydist(dists)`. `filldist(dist, N)` creates a multivariate distribution that is composed of `N` identical and independent
66 | copies of the univariate distribution `dist` if `dist` is univariate, or it creates a matrix-variate distribution composed of `N` identical and independent copies of the multivariate
67 | distribution `dist` if `dist` is multivariate. `filldist(dist, N, M)` can also be used to create a matrix-variate distribution from a univariate distribution `dist`. `arraydist(dists)`
68 | is similar to `filldist` but it takes an array of distributions `dists` as input. Writing a [custom distribution](advanced) with a custom adjoint is another option to avoid loops.
69 |
70 | ### Special care for ReverseDiff with a compiled tape
71 |
72 | For large models, the fastest option is often ReverseDiff with a compiled tape, specified as `adtype=AutoReverseDiff(true)`. However, it is important to note that if your model contains any
73 | branching code, such as `if`-`else` statements, **the gradients from a compiled tape may be inaccurate, leading to erroneous results**. If you use this option for the (considerable) speedup it
74 | can provide, make sure to check your code. It's also a good idea to verify your gradients with another backend.
75 |
76 | ## Ensure that types in your model can be inferred
77 |
78 | For efficient gradient-based inference, e.g. using HMC, NUTS or ADVI, it is important to ensure the types in your model can be inferred.
79 |
80 | The following example with abstract types
81 |
82 | ```{julia}
83 | @model function tmodel(x, y)
84 | p, n = size(x)
85 | params = Vector{Real}(undef, n)
86 | for i in 1:n
87 | params[i] ~ truncated(Normal(); lower=0)
88 | end
89 |
90 | a = x * params
91 | return y ~ MvNormal(a, I)
92 | end
93 | ```
94 |
95 | can be transformed into the following representation with concrete types:
96 |
97 | ```{julia}
98 | @model function tmodel(x, y, ::Type{T}=Float64) where {T}
99 | p, n = size(x)
100 | params = Vector{T}(undef, n)
101 | for i in 1:n
102 | params[i] ~ truncated(Normal(); lower=0)
103 | end
104 |
105 | a = x * params
106 | return y ~ MvNormal(a, I)
107 | end
108 | ```
109 |
110 | Alternatively, you could use `filldist` in this example:
111 |
112 | ```{julia}
113 | @model function tmodel(x, y)
114 | params ~ filldist(truncated(Normal(); lower=0), size(x, 2))
115 | a = x * params
116 | return y ~ MvNormal(a, I)
117 | end
118 | ```
119 |
120 | Note that you can use `@code_warntype` to find types in your model definition that the compiler cannot infer.
121 | They are marked in red in the Julia REPL.
122 |
123 | For example, consider the following simple program:
124 |
125 | ```{julia}
126 | @model function tmodel(x)
127 | p = Vector{Real}(undef, 1)
128 | p[1] ~ Normal()
129 | p = p .+ 1
130 | return x ~ Normal(p[1])
131 | end
132 | ```
133 |
134 | We can use
135 |
136 | ```{julia}
137 | #| eval: false
138 | using Random
139 |
140 | model = tmodel(1.0)
141 |
142 | @code_warntype model.f(
143 | model,
144 | Turing.VarInfo(model),
145 | Turing.SamplingContext(
146 | Random.default_rng(), Turing.SampleFromPrior(), Turing.DefaultContext()
147 | ),
148 | model.args...,
149 | )
150 | ```
151 |
152 | to inspect type inference in the model.
153 |
--------------------------------------------------------------------------------
/usage/probability-interface/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Querying Model Probabilities
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/usage-probability-interface/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | The easiest way to manipulate and query Turing models is via the DynamicPPL probability interface.
16 |
17 | Let's use a simple model of normally-distributed data as an example.
18 |
19 | ```{julia}
20 | using Turing
21 | using DynamicPPL
22 | using Random
23 |
24 | @model function gdemo(n)
25 | μ ~ Normal(0, 1)
26 | x ~ MvNormal(fill(μ, n), I)
27 | end
28 | ```
29 |
30 | We generate some data using `μ = 0`:
31 |
32 | ```{julia}
33 | Random.seed!(1776)
34 | dataset = randn(100)
35 | dataset[1:5]
36 | ```
37 |
38 | ## Conditioning and Deconditioning
39 |
40 | Bayesian models can be transformed with two main operations, conditioning and deconditioning (also known as marginalization).
41 | Conditioning takes a variable and fixes its value as known.
42 | We do this by passing a model and a collection of conditioned variables to `|`, or its alias, `condition`:
43 |
44 | ```{julia}
45 | # (equivalently)
46 | # conditioned_model = condition(gdemo(length(dataset)), (x=dataset, μ=0))
47 | conditioned_model = gdemo(length(dataset)) | (x=dataset, μ=0)
48 | ```
49 |
50 | This operation can be reversed by applying `decondition`:
51 |
52 | ```{julia}
53 | original_model = decondition(conditioned_model)
54 | ```
55 |
56 | We can also decondition only some of the variables:
57 |
58 | ```{julia}
59 | partially_conditioned = decondition(conditioned_model, :μ)
60 | ```
61 |
62 | We can see which of the variables in a model have been conditioned with `DynamicPPL.conditioned`:
63 |
64 | ```{julia}
65 | DynamicPPL.conditioned(partially_conditioned)
66 | ```
67 |
68 | ::: {.callout-note}
69 | Sometimes it is helpful to define convenience functions for conditioning on some variable(s).
70 | For instance, in this example we might want to define a version of `gdemo` that conditions on some observations of `x`:
71 |
72 | ```julia
73 | gdemo(x::AbstractVector{<:Real}) = gdemo(length(x)) | (; x)
74 | ```
75 |
76 | For illustrative purposes, however, we do not use this function in the examples below.
77 | :::
78 |
79 | ## Probabilities and Densities
80 |
81 | We often want to calculate the (unnormalized) probability density for an event.
82 | This probability might be a prior, a likelihood, or a posterior (joint) density.
83 | DynamicPPL provides convenient functions for this.
84 | To begin, let's define a model `gdemo`, condition it on a dataset, and draw a sample.
85 | The returned sample only contains `μ`, since the value of `x` has already been fixed:
86 |
87 | ```{julia}
88 | model = gdemo(length(dataset)) | (x=dataset,)
89 |
90 | Random.seed!(124)
91 | sample = rand(model)
92 | ```
93 |
94 | We can then calculate the joint probability of a set of samples (here drawn from the prior) with `logjoint`.
95 |
96 | ```{julia}
97 | logjoint(model, sample)
98 | ```
99 |
100 | For models with many variables `rand(model)` can be prohibitively slow since it returns a `NamedTuple` of samples from the prior distribution of the unconditioned variables.
101 | We recommend working with samples of type `DataStructures.OrderedDict` in this case (which Turing re-exports, so can be used directly):
102 |
103 | ```{julia}
104 | Random.seed!(124)
105 | sample_dict = rand(OrderedDict, model)
106 | ```
107 |
108 | `logjoint` can also be used on this sample:
109 |
110 | ```{julia}
111 | logjoint(model, sample_dict)
112 | ```
113 |
114 | The prior probability and the likelihood of a set of samples can be calculated with the functions `logprior` and `loglikelihood` respectively.
115 | The log joint probability is the sum of these two quantities:
116 |
117 | ```{julia}
118 | logjoint(model, sample) ≈ loglikelihood(model, sample) + logprior(model, sample)
119 | ```
120 |
121 | ```{julia}
122 | logjoint(model, sample_dict) ≈ loglikelihood(model, sample_dict) + logprior(model, sample_dict)
123 | ```
124 |
125 | ## Example: Cross-validation
126 |
127 | To give an example of the probability interface in use, we can use it to estimate the performance of our model using cross-validation.
128 | In cross-validation, we split the dataset into several equal parts.
129 | Then, we choose one of these sets to serve as the validation set.
130 | Here, we measure fit using the cross entropy (Bayes loss).[^1]
131 | (For the sake of simplicity, in the following code, we enforce that `nfolds` must divide the number of data points.
132 | For a more competent implementation, see [MLUtils.jl](https://juliaml.github.io/MLUtils.jl/dev/api/#MLUtils.kfolds).)
133 |
134 | ```{julia}
135 | # Calculate the train/validation splits across `nfolds` partitions, assume `length(dataset)` divides `nfolds`
136 | function kfolds(dataset::Array{<:Real}, nfolds::Int)
137 | fold_size, remaining = divrem(length(dataset), nfolds)
138 | if remaining != 0
139 | error("The number of folds must divide the number of data points.")
140 | end
141 | first_idx = firstindex(dataset)
142 | last_idx = lastindex(dataset)
143 | splits = map(0:(nfolds - 1)) do i
144 | start_idx = first_idx + i * fold_size
145 | end_idx = start_idx + fold_size
146 | train_set_indices = [first_idx:(start_idx - 1); end_idx:last_idx]
147 | return (view(dataset, train_set_indices), view(dataset, start_idx:(end_idx - 1)))
148 | end
149 | return splits
150 | end
151 |
152 | function cross_val(
153 | dataset::Vector{<:Real};
154 | nfolds::Int=5,
155 | nsamples::Int=1_000,
156 | rng::Random.AbstractRNG=Random.default_rng(),
157 | )
158 | # Initialize `loss` in a way such that the loop below does not change its type
159 | model = gdemo(1) | (x=[first(dataset)],)
160 | loss = zero(logjoint(model, rand(rng, model)))
161 |
162 | for (train, validation) in kfolds(dataset, nfolds)
163 | # First, we train the model on the training set, i.e., we obtain samples from the posterior.
164 | # For normally-distributed data, the posterior can be computed in closed form.
165 | # For general models, however, typically samples will be generated using MCMC with Turing.
166 | posterior = Normal(mean(train), 1)
167 | samples = rand(rng, posterior, nsamples)
168 |
169 | # Evaluation on the validation set.
170 | validation_model = gdemo(length(validation)) | (x=validation,)
171 | loss += sum(samples) do sample
172 | logjoint(validation_model, (μ=sample,))
173 | end
174 | end
175 |
176 | return loss
177 | end
178 |
179 | cross_val(dataset)
180 | ```
181 |
182 | [^1]: See [ParetoSmooth.jl](https://github.com/TuringLang/ParetoSmooth.jl) for a faster and more accurate implementation of cross-validation than the one provided here.
183 |
--------------------------------------------------------------------------------
/usage/sampler-visualisation/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Sampler Visualization
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/docs-15-using-turing-sampler-viz/index.html
6 | ---
7 |
8 | ```{julia}
9 | #| echo: false
10 | #| output: false
11 | using Pkg;
12 | Pkg.instantiate();
13 | ```
14 |
15 | ## Introduction
16 |
17 | ### The Code
18 |
19 | For each sampler, we will use the same code to plot sampler paths. The block below loads the relevant libraries and defines a function for plotting the sampler's trajectory across the posterior.
20 |
21 | The Turing model definition used here is not especially practical, but it is designed in such a way as to produce visually interesting posterior surfaces to show how different samplers move along the distribution.
22 |
23 | ```{julia}
24 | ENV["GKS_ENCODING"] = "utf-8" # Allows the use of unicode characters in Plots.jl
25 | using Plots
26 | using StatsPlots
27 | using Turing
28 | using Random
29 | using Bijectors
30 |
31 | # Set a seed.
32 | Random.seed!(0)
33 |
34 | # Define a strange model.
35 | @model function gdemo(x)
36 | s² ~ InverseGamma(2, 3)
37 | m ~ Normal(0, sqrt(s²))
38 | bumps = sin(m) + cos(m)
39 | m = m + 5 * bumps
40 | for i in eachindex(x)
41 | x[i] ~ Normal(m, sqrt(s²))
42 | end
43 | return s², m
44 | end
45 |
46 | # Define our data points.
47 | x = [1.5, 2.0, 13.0, 2.1, 0.0]
48 |
49 | # Set up the model call, sample from the prior.
50 | model = gdemo(x)
51 |
52 | # Evaluate surface at coordinates.
53 | evaluate(m1, m2) = logjoint(model, (m=m2, s²=invlink.(Ref(InverseGamma(2, 3)), m1)))
54 |
55 | function plot_sampler(chain; label="")
56 | # Extract values from chain.
57 | val = get(chain, [:s², :m, :lp])
58 | ss = link.(Ref(InverseGamma(2, 3)), val.s²)
59 | ms = val.m
60 | lps = val.lp
61 |
62 | # How many surface points to sample.
63 | granularity = 100
64 |
65 | # Range start/stop points.
66 | spread = 0.5
67 | σ_start = minimum(ss) - spread * std(ss)
68 | σ_stop = maximum(ss) + spread * std(ss)
69 | μ_start = minimum(ms) - spread * std(ms)
70 | μ_stop = maximum(ms) + spread * std(ms)
71 | σ_rng = collect(range(σ_start; stop=σ_stop, length=granularity))
72 | μ_rng = collect(range(μ_start; stop=μ_stop, length=granularity))
73 |
74 | # Make surface plot.
75 | p = surface(
76 | σ_rng,
77 | μ_rng,
78 | evaluate;
79 | camera=(30, 65),
80 | # ticks=nothing,
81 | colorbar=false,
82 | color=:inferno,
83 | title=label,
84 | )
85 |
86 | line_range = 1:length(ms)
87 |
88 | scatter3d!(
89 | ss[line_range],
90 | ms[line_range],
91 | lps[line_range];
92 | mc=:viridis,
93 | marker_z=collect(line_range),
94 | msw=0,
95 | legend=false,
96 | colorbar=false,
97 | alpha=0.5,
98 | xlabel="σ",
99 | ylabel="μ",
100 | zlabel="Log probability",
101 | title=label,
102 | )
103 |
104 | return p
105 | end;
106 | ```
107 |
108 | ```{julia}
109 | #| output: false
110 | setprogress!(false)
111 | ```
112 |
113 | ## Samplers
114 |
115 | ### Gibbs
116 |
117 | Gibbs sampling tends to exhibit a "jittery" trajectory. The example below combines `HMC` and `PG` sampling to traverse the posterior.
118 |
119 | ```{julia}
120 | c = sample(model, Gibbs(:s² => HMC(0.01, 5), :m => PG(20)), 1000)
121 | plot_sampler(c)
122 | ```
123 |
124 | ### HMC
125 |
126 | Hamiltonian Monte Carlo (HMC) sampling is a typical sampler to use, as it tends to be fairly good at converging in a efficient manner. It can often be tricky to set the correct parameters for this sampler however, and the `NUTS` sampler is often easier to run if you don't want to spend too much time fiddling with step size and and the number of steps to take. Note however that `HMC` does not explore the positive values μ very well, likely due to the leapfrog and step size parameter settings.
127 |
128 | ```{julia}
129 | c = sample(model, HMC(0.01, 10), 1000)
130 | plot_sampler(c)
131 | ```
132 |
133 | ### HMCDA
134 |
135 | The HMCDA sampler is an implementation of the Hamiltonian Monte Carlo with Dual Averaging algorithm found in the paper "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo" by Hoffman and Gelman (2011). The paper can be found on [arXiv](https://arxiv.org/abs/1111.4246) for the interested reader.
136 |
137 | ```{julia}
138 | c = sample(model, HMCDA(200, 0.65, 0.3), 1000)
139 | plot_sampler(c)
140 | ```
141 |
142 | ### MH
143 |
144 | Metropolis-Hastings (MH) sampling is one of the earliest Markov Chain Monte Carlo methods. MH sampling does not "move" a lot, unlike many of the other samplers implemented in Turing. Typically a much longer chain is required to converge to an appropriate parameter estimate.
145 |
146 | The plot below only uses 1,000 iterations of Metropolis-Hastings.
147 |
148 | ```{julia}
149 | c = sample(model, MH(), 1000)
150 | plot_sampler(c)
151 | ```
152 |
153 | As you can see, the MH sampler doesn't move parameter estimates very often.
154 |
155 | ### NUTS
156 |
157 | The No U-Turn Sampler (NUTS) is an implementation of the algorithm found in the paper "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo" by Hoffman and Gelman (2011). The paper can be found on [arXiv](https://arxiv.org/abs/1111.4246) for the interested reader.
158 |
159 | NUTS tends to be very good at traversing complex posteriors quickly.
160 |
161 |
162 | ```{julia}
163 | c = sample(model, NUTS(0.65), 1000)
164 | plot_sampler(c)
165 | ```
166 |
167 | The only parameter that needs to be set other than the number of iterations to run is the target acceptance rate. In the Hoffman and Gelman paper, they note that a target acceptance rate of 0.65 is typical.
168 |
169 | Here is a plot showing a very high acceptance rate. Note that it appears to "stick" to a mode and is not particularly good at exploring the posterior as compared to the 0.65 target acceptance ratio case.
170 |
171 | ```{julia}
172 | c = sample(model, NUTS(0.95), 1000)
173 | plot_sampler(c)
174 | ```
175 |
176 | An exceptionally low acceptance rate will show very few moves on the posterior:
177 |
178 | ```{julia}
179 | c = sample(model, NUTS(0.2), 1000)
180 | plot_sampler(c)
181 | ```
182 |
183 | ### PG
184 |
185 | The Particle Gibbs (PG) sampler is an implementation of an algorithm from the paper "Particle Markov chain Monte Carlo methods" by Andrieu, Doucet, and Holenstein (2010). The interested reader can learn more [here](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1467-9868.2009.00736.x).
186 |
187 | The two parameters are the number of particles, and the number of iterations. The plot below shows the use of 20 particles.
188 |
189 | ```{julia}
190 | c = sample(model, PG(20), 1000)
191 | plot_sampler(c)
192 | ```
193 |
194 | Next, we plot using 50 particles.
195 |
196 | ```{julia}
197 | c = sample(model, PG(50), 1000)
198 | plot_sampler(c)
199 | ```
200 |
--------------------------------------------------------------------------------
/usage/tracking-extra-quantities/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Tracking Extra Quantities
3 | engine: julia
4 | aliases:
5 | - ../../tutorials/usage-generated-quantities/index.html
6 | - ../generated-quantities/index.html
7 | ---
8 |
9 | ```{julia}
10 | #| echo: false
11 | #| output: false
12 | using Pkg;
13 | Pkg.instantiate();
14 | ```
15 |
16 | Often, there are quantities in models that we might be interested in viewing the values of, but which are not random variables in the model that are explicitly drawn from a distribution.
17 |
18 | As a motivating example, the most natural parameterization for a model might not be the most computationally feasible.
19 | Consider the following (efficiently reparametrized) implementation of Neal's funnel [(Neal, 2003)](https://arxiv.org/abs/physics/0009028):
20 |
21 | ```{julia}
22 | using Turing
23 | setprogress!(false)
24 |
25 | @model function Neal()
26 | # Raw draws
27 | y_raw ~ Normal(0, 1)
28 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
29 |
30 | # Transform:
31 | y = 3 * y_raw
32 | x = exp.(y ./ 2) .* x_raw
33 | return nothing
34 | end
35 | ```
36 |
37 | In this case, the random variables exposed in the chain (`x_raw`, `y_raw`) are not in a helpful form — what we're after are the deterministically transformed variables `x` and `y`.
38 |
39 | There are two ways to track these extra quantities in Turing.jl.
40 |
41 | ## Using `:=` (during inference)
42 |
43 | The first way is to use the `:=` operator, which behaves exactly like `=` except that the values of the variables on its left-hand side are automatically added to the chain returned by the sampler.
44 | For example:
45 |
46 | ```{julia}
47 | @model function Neal_coloneq()
48 | # Raw draws
49 | y_raw ~ Normal(0, 1)
50 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
51 |
52 | # Transform:
53 | y := 3 * y_raw
54 | x := exp.(y ./ 2) .* x_raw
55 | end
56 |
57 | sample(Neal_coloneq(), NUTS(), 1000)
58 | ```
59 |
60 | ## Using `returned` (post-inference)
61 |
62 | Alternatively, one can specify the extra quantities as part of the model function's return statement:
63 |
64 | ```{julia}
65 | @model function Neal_return()
66 | # Raw draws
67 | y_raw ~ Normal(0, 1)
68 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
69 |
70 | # Transform and return as a NamedTuple
71 | y = 3 * y_raw
72 | x = exp.(y ./ 2) .* x_raw
73 | return (x=x, y=y)
74 | end
75 |
76 | chain = sample(Neal_return(), NUTS(), 1000)
77 | ```
78 |
79 | The sampled chain does not contain `x` and `y`, but we can extract the values using the `returned` function.
80 | Calling this function outputs an array:
81 |
82 | ```{julia}
83 | nts = returned(Neal_return(), chain)
84 | ```
85 |
86 | where each element of which is a NamedTuple, as specified in the return statement of the model.
87 |
88 | ```{julia}
89 | nts[1]
90 | ```
91 |
92 | ## Which to use?
93 |
94 | There are some pros and cons of using `returned`, as opposed to `:=`.
95 |
96 | Firstly, `returned` is more flexible, as it allows you to track any type of object; `:=` only works with variables that can be inserted into an `MCMCChains.Chains` object.
97 | (Notice that `x` is a vector, and in the first case where we used `:=`, reconstructing the vector value of `x` can also be rather annoying as the chain stores each individual element of `x` separately.)
98 |
99 | A drawback is that naively using `returned` can lead to unnecessary computation during inference.
100 | This is because during the sampling process, the return values are also calculated (since they are part of the model function), but then thrown away.
101 | So, if the extra quantities are expensive to compute, this can be a problem.
102 |
103 | To avoid this, you will essentially have to create two different models, one for inference and one for post-inference.
104 | The simplest way of doing this is to add a parameter to the model argument:
105 |
106 | ```{julia}
107 | @model function Neal_coloneq_optional(track::Bool)
108 | # Raw draws
109 | y_raw ~ Normal(0, 1)
110 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
111 |
112 | if track
113 | y = 3 * y_raw
114 | x = exp.(y ./ 2) .* x_raw
115 | return (x=x, y=y)
116 | else
117 | return nothing
118 | end
119 | end
120 |
121 | chain = sample(Neal_coloneq_optional(false), NUTS(), 1000)
122 | ```
123 |
124 | The above ensures that `x` and `y` are not calculated during inference, but allows us to still use `returned` to extract them:
125 |
126 | ```{julia}
127 | returned(Neal_coloneq_optional(true), chain)
128 | ```
129 |
130 | Another equivalent option is to use a submodel:
131 |
132 | ```{julia}
133 | @model function Neal()
134 | y_raw ~ Normal(0, 1)
135 | x_raw ~ arraydist([Normal(0, 1) for i in 1:9])
136 | return (x_raw=x_raw, y_raw=y_raw)
137 | end
138 |
139 | chain = sample(Neal(), NUTS(), 1000)
140 |
141 | @model function Neal_with_extras()
142 | neal ~ to_submodel(Neal(), false)
143 | y = 3 * neal.y_raw
144 | x = exp.(y ./ 2) .* neal.x_raw
145 | return (x=x, y=y)
146 | end
147 |
148 | returned(Neal_with_extras(), chain)
149 | ```
150 |
151 | Note that for the `returned` call to work, the `Neal_with_extras()` model must have the same variable names as stored in `chain`.
152 | This means the submodel `Neal()` must not be prefixed, i.e. `to_submodel()` must be passed a second parameter `false`.
153 |
--------------------------------------------------------------------------------
/usage/troubleshooting/index.qmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: Troubleshooting
3 | engine: julia
4 | ---
5 |
6 | ```{julia}
7 | #| echo: false
8 | #| output: false
9 | using Pkg;
10 | Pkg.instantiate();
11 | ```
12 |
13 | This page collects a number of common error messages observed when using Turing, along with suggestions on how to fix them.
14 |
15 | If the suggestions here do not resolve your problem, please do feel free to [open an issue](https://github.com/TuringLang/Turing.jl/issues).
16 |
17 | ```{julia}
18 | using Turing
19 | Turing.setprogress!(false)
20 | ```
21 |
22 | ## Initial parameters
23 |
24 | > failed to find valid initial parameters in {N} tries. This may indicate an error with the model or AD backend...
25 |
26 | This error is seen when a Hamiltonian Monte Carlo sampler is unable to determine a valid set of initial parameters for the sampling.
27 | Here, 'valid' means that the log probability density of the model, as well as its gradient with respect to each parameter, is finite and not `NaN`.
28 |
29 | ### `NaN` gradient
30 |
31 | One of the most common causes of this error is having a `NaN` gradient.
32 | To find out whether this is happening, you can evaluate the gradient manually.
33 | Here is an example with a model that is known to be problematic:
34 |
35 | ```{julia}
36 | using Turing
37 | using DynamicPPL.TestUtils.AD: run_ad
38 |
39 | @model function initial_bad()
40 | a ~ Normal()
41 | x ~ truncated(Normal(a), 0, Inf)
42 | end
43 |
44 | model = initial_bad()
45 | adtype = AutoForwardDiff()
46 | result = run_ad(model, adtype; test=false, benchmark=false)
47 | result.grad_actual
48 | ```
49 |
50 | (See [the DynamicPPL docs](https://turinglang.org/DynamicPPL.jl/stable/api/#AD-testing-and-benchmarking-utilities) for more details on the `run_ad` function and its return type.)
51 |
52 | In this case, the `NaN` gradient is caused by the `Inf` argument to `truncated`.
53 | (See, e.g., [this issue on Distributions.jl](https://github.com/JuliaStats/Distributions.jl/issues/1910).)
54 | Here, the upper bound of `Inf` is not needed, so it can be removed:
55 |
56 | ```{julia}
57 | @model function initial_good()
58 | a ~ Normal()
59 | x ~ truncated(Normal(a); lower=0)
60 | end
61 |
62 | model = initial_good()
63 | adtype = AutoForwardDiff()
64 | run_ad(model, adtype; test=false, benchmark=false).grad_actual
65 | ```
66 |
67 | More generally, you could try using a different AD backend; if you don't know why a model is returning `NaN` gradients, feel free to open an issue.
68 |
69 | ### `-Inf` log density
70 |
71 | Another cause of this error is having models with very extreme parameters.
72 | This example is taken from [this Turing.jl issue](https://github.com/TuringLang/Turing.jl/issues/2476):
73 |
74 | ```{julia}
75 | @model function initial_bad2()
76 | x ~ Exponential(100)
77 | y ~ Uniform(0, x)
78 | end
79 | model = initial_bad2() | (y = 50.0,)
80 | ```
81 |
82 | The problem here is that HMC attempts to find initial values for parameters inside the region of `[-2, 2]`, _after_ the parameters have been transformed to unconstrained space.
83 | For a distribution of `Exponential(100)`, the appropriate transformation is `log(x)` (see the [variable transformation docs]({{< meta dev-transforms-distributions >}}) for more info).
84 |
85 | Thus, HMC attempts to find initial values of `log(x)` in the region of `[-2, 2]`, which corresponds to `x` in the region of `[exp(-2), exp(2)]` = `[0.135, 7.39]`.
86 | However, all of these values of `x` will give rise to a zero probability density for `y` because the value of `y = 50.0` is outside the support of `Uniform(0, x)`.
87 | Thus, the log density of the model is `-Inf`, as can be seen with `logjoint`:
88 |
89 | ```{julia}
90 | logjoint(model, (x = exp(-2),))
91 | ```
92 |
93 | ```{julia}
94 | logjoint(model, (x = exp(2),))
95 | ```
96 |
97 | The most direct way of fixing this is to manually provide a set of initial parameters that are valid.
98 | For example, you can obtain a set of initial parameters with `rand(Vector, model)`, and then pass this as the `initial_params` keyword argument to `sample`:
99 |
100 | ```{julia}
101 | sample(model, NUTS(), 1000; initial_params=rand(Vector, model))
102 | ```
103 |
104 | More generally, you may also consider reparameterising the model to avoid such issues.
105 |
--------------------------------------------------------------------------------