├── .github
    └── workflows
    │   ├── README.md
    │   ├── pr-close-signal.yaml
    │   ├── pr-comment.yaml
    │   ├── pr-post-remove-branch.yaml
    │   ├── pr-preflight.yaml
    │   ├── pr-receive.yaml
    │   ├── sandpaper-main.yaml
    │   ├── sandpaper-version.txt
    │   ├── update-cache.yaml
    │   └── update-workflows.yaml
├── .gitignore
├── .travis.yml
├── AUTHORS
├── CITATION
├── CITATION.cff
├── CODEOWNERS
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Gemfile
├── LICENSE.md
├── Makefile
├── README.md
├── config.yaml
├── episodes
    ├── .gitkeep
    ├── 01-intro.md
    ├── 02-variables.md
    ├── 03-ranges-arrays.md
    ├── 04-conditionals.md
    ├── 05-loops.md
    ├── 06-procedures.md
    ├── 07-commandargs.md
    ├── 08-timing.md
    ├── 11-parallel-intro.md
    ├── 12-fire-forget-tasks.md
    ├── 13-synchronization.md
    ├── 14-parallel-case-study.md
    ├── 21-locales.md
    └── 22-domains.md
├── hpc-chapel.Rproj
├── index.md
├── instructors
    └── instructor-notes.md
├── learners
    ├── reference.md
    └── setup.md
├── links.md
├── profiles
    └── learner-profiles.md
└── site
    └── README.md


/.github/workflows/README.md:
--------------------------------------------------------------------------------
  1 | # Carpentries Workflows
  2 | 
  3 | This directory contains workflows to be used for Lessons using the {sandpaper}
  4 | lesson infrastructure. Two of these workflows require R (`sandpaper-main.yaml`
  5 | and `pr-receive.yaml`) and the rest are bots to handle pull request management.
  6 | 
  7 | These workflows will likely change as {sandpaper} evolves, so it is important to
  8 | keep them up-to-date. To do this in your lesson you can do the following in your
  9 | R console:
 10 | 
 11 | ```r
 12 | # Install/Update sandpaper
 13 | options(repos = c(carpentries = "https://carpentries.r-universe.dev/", 
 14 |   CRAN = "https://cloud.r-project.org"))
 15 | install.packages("sandpaper")
 16 | 
 17 | # update the workflows in your lesson
 18 | library("sandpaper")
 19 | update_github_workflows()
 20 | ```
 21 | 
 22 | Inside this folder, you will find a file called `sandpaper-version.txt`, which
 23 | will contain a version number for sandpaper. This will be used in the future to
 24 | alert you if a workflow update is needed.
 25 | 
 26 | What follows are the descriptions of the workflow files:
 27 | 
 28 | ## Deployment
 29 | 
 30 | ### 01 Build and Deploy (sandpaper-main.yaml)
 31 | 
 32 | This is the main driver that will only act on the main branch of the repository.
 33 | This workflow does the following:
 34 | 
 35 |  1. checks out the lesson
 36 |  2. provisions the following resources
 37 |    - R
 38 |    - pandoc
 39 |    - lesson infrastructure (stored in a cache)
 40 |    - lesson dependencies if needed (stored in a cache)
 41 |  3. builds the lesson via `sandpaper:::ci_deploy()`
 42 | 
 43 | #### Caching
 44 | 
 45 | This workflow has two caches; one cache is for the lesson infrastructure and 
 46 | the other is for the the lesson dependencies if the lesson contains rendered
 47 | content. These caches are invalidated by new versions of the infrastructure and
 48 | the `renv.lock` file, respectively. If there is a problem with the cache, 
 49 | manual invaliation is necessary. You will need maintain access to the repository
 50 | and you can either go to the actions tab and [click on the caches button to find
 51 | and invalidate the failing cache](https://github.blog/changelog/2022-10-20-manage-caches-in-your-actions-workflows-from-web-interface/) 
 52 | or by setting the `CACHE_VERSION` secret to the current date (which will
 53 | invalidate all of the caches).
 54 | 
 55 | ## Updates
 56 | 
 57 | ### Setup Information
 58 | 
 59 | These workflows run on a schedule and at the maintainer's request. Because they
 60 | create pull requests that update workflows/require the downstream actions to run,
 61 | they need a special repository/organization secret token called 
 62 | `SANDPAPER_WORKFLOW` and it must have the `public_repo` and `workflow` scope. 
 63 | 
 64 | This can be an individual user token, OR it can be a trusted bot account. If you
 65 | have a repository in one of the official Carpentries accounts, then you do not
 66 | need to worry about this token being present because the Carpentries Core Team
 67 | will take care of supplying this token.
 68 | 
 69 | If you want to use your personal account: you can go to 
 70 | <https://github.com/settings/tokens/new?scopes=public_repo,workflow&description=Sandpaper%20Token>
 71 | to create a token. Once you have created your token, you should copy it to your
 72 | clipboard and then go to your repository's settings > secrets > actions and
 73 | create or edit the `SANDPAPER_WORKFLOW` secret, pasting in the generated token.
 74 | 
 75 | If you do not specify your token correctly, the runs will not fail and they will
 76 | give you instructions to provide the token for your repository. 
 77 | 
 78 | ### 02 Maintain: Update Workflow Files (update-workflow.yaml)
 79 | 
 80 | The {sandpaper} repository was designed to do as much as possible to separate 
 81 | the tools from the content. For local builds, this is absolutely true, but 
 82 | there is a minor issue when it comes to workflow files: they must live inside 
 83 | the repository. 
 84 | 
 85 | This workflow ensures that the workflow files are up-to-date. The way it work is
 86 | to download the update-workflows.sh script from GitHub and run it. The script 
 87 | will do the following:
 88 | 
 89 | 1. check the recorded version of sandpaper against the current version on github
 90 | 2. update the files if there is a difference in versions
 91 | 
 92 | After the files are updated, if there are any changes, they are pushed to a
 93 | branch called `update/workflows` and a pull request is created. Maintainers are
 94 | encouraged to review the changes and accept the pull request if the outputs
 95 | are okay.
 96 | 
 97 | This update is run weekly or on demand.
 98 | 
 99 | ### 03 Maintain: Update Package Cache (update-cache.yaml)
100 | 
101 | For lessons that have generated content, we use {renv} to ensure that the output
102 | is stable. This is controlled by a single lockfile which documents the packages
103 | needed for the lesson and the version numbers. This workflow is skipped in 
104 | lessons that do not have generated content.
105 | 
106 | Because the lessons need to remain current with the package ecosystem, it's a
107 | good idea to make sure these packages can be updated periodically. The 
108 | update cache workflow will do this by checking for updates, applying them in a
109 | branch called `updates/packages` and creating a pull request with _only the
110 | lockfile changed_. 
111 | 
112 | From here, the markdown documents will be rebuilt and you can inspect what has
113 | changed based on how the packages have updated. 
114 | 
115 | ## Pull Request and Review Management
116 | 
117 | Because our lessons execute code, pull requests are a secruity risk for any
118 | lesson and thus have security measures associted with them. **Do not merge any
119 | pull requests that do not pass checks and do not have bots commented on them.**
120 | 
121 | This series of workflows all go together and are described in the following 
122 | diagram and the below sections:
123 | 
124 | ![Graph representation of a pull request](https://carpentries.github.io/sandpaper/articles/img/pr-flow.dot.svg)
125 | 
126 | ### Pre Flight Pull Request Validation (pr-preflight.yaml)
127 | 
128 | This workflow runs every time a pull request is created and its purpose is to
129 | validate that the pull request is okay to run. This means the following things:
130 | 
131 | 1. The pull request does not contain modified workflow files
132 | 2. If the pull request contains modified workflow files, it does not contain 
133 |    modified content files (such as a situation where @carpentries-bot will
134 |    make an automated pull request)
135 | 3. The pull request does not contain an invalid commit hash (e.g. from a fork
136 |    that was made before a lesson was transitioned from styles to use the
137 |    workbench).
138 | 
139 | Once the checks are finished, a comment is issued to the pull request, which 
140 | will allow maintainers to determine if it is safe to run the 
141 | "Receive Pull Request" workflow from new contributors.
142 | 
143 | ### Receive Pull Request (pr-receive.yaml)
144 | 
145 | **Note of caution:** This workflow runs arbitrary code by anyone who creates a
146 | pull request. GitHub has safeguarded the token used in this workflow to have no
147 | priviledges in the repository, but we have taken precautions to protect against
148 | spoofing.
149 | 
150 | This workflow is triggered with every push to a pull request. If this workflow
151 | is already running and a new push is sent to the pull request, the workflow
152 | running from the previous push will be cancelled and a new workflow run will be
153 | started.
154 | 
155 | The first step of this workflow is to check if it is valid (e.g. that no
156 | workflow files have been modified). If there are workflow files that have been
157 | modified, a comment is made that indicates that the workflow is not run. If 
158 | both a workflow file and lesson content is modified, an error will occurr.
159 | 
160 | The second step (if valid) is to build the generated content from the pull
161 | request. This builds the content and uploads three artifacts:
162 | 
163 | 1. The pull request number (pr)
164 | 2. A summary of changes after the rendering process (diff)
165 | 3. The rendered files (build)
166 | 
167 | Because this workflow builds generated content, it follows the same general 
168 | process as the `sandpaper-main` workflow with the same caching mechanisms.
169 | 
170 | The artifacts produced are used by the next workflow.
171 | 
172 | ### Comment on Pull Request (pr-comment.yaml)
173 | 
174 | This workflow is triggered if the `pr-receive.yaml` workflow is successful.
175 | The steps in this workflow are:
176 | 
177 | 1. Test if the workflow is valid and comment the validity of the workflow to the
178 |    pull request.
179 | 2. If it is valid: create an orphan branch with two commits: the current state
180 |    of the repository and the proposed changes.
181 | 3. If it is valid: update the pull request comment with the summary of changes
182 | 
183 | Importantly: if the pull request is invalid, the branch is not created so any
184 | malicious code is not published.
185 | 
186 | From here, the maintainer can request changes from the author and eventually 
187 | either merge or reject the PR. When this happens, if the PR was valid, the 
188 | preview branch needs to be deleted. 
189 | 
190 | ### Send Close PR Signal (pr-close-signal.yaml)
191 | 
192 | Triggered any time a pull request is closed. This emits an artifact that is the
193 | pull request number for the next action
194 | 
195 | ### Remove Pull Request Branch (pr-post-remove-branch.yaml)
196 | 
197 | Tiggered by `pr-close-signal.yaml`. This removes the temporary branch associated with
198 | the pull request (if it was created).
199 | 


--------------------------------------------------------------------------------
/.github/workflows/pr-close-signal.yaml:
--------------------------------------------------------------------------------
 1 | name: "Bot: Send Close Pull Request Signal"
 2 | 
 3 | on:
 4 |   pull_request:
 5 |     types:
 6 |       [closed]
 7 | 
 8 | jobs:
 9 |   send-close-signal:
10 |     name: "Send closing signal"
11 |     runs-on: ubuntu-latest
12 |     if: ${{ github.event.action == 'closed' }}
13 |     steps:
14 |       - name: "Create PRtifact"
15 |         run: |
16 |           mkdir -p ./pr
17 |           printf ${{ github.event.number }} > ./pr/NUM
18 |       - name: Upload Diff
19 |         uses: actions/upload-artifact@v4
20 |         with:
21 |           name: pr
22 |           path: ./pr
23 | 
24 | 


--------------------------------------------------------------------------------
/.github/workflows/pr-comment.yaml:
--------------------------------------------------------------------------------
  1 | name: "Bot: Comment on the Pull Request"
  2 | 
  3 | # read-write repo token
  4 | # access to secrets
  5 | on:
  6 |   workflow_run:
  7 |     workflows: ["Receive Pull Request"]
  8 |     types:
  9 |       - completed
 10 | 
 11 | concurrency:
 12 |   group: pr-${{ github.event.workflow_run.pull_requests[0].number }}
 13 |   cancel-in-progress: true
 14 | 
 15 | 
 16 | jobs:
 17 |   # Pull requests are valid if:
 18 |   #  - they match the sha of the workflow run head commit
 19 |   #  - they are open
 20 |   #  - no .github files were committed
 21 |   test-pr:
 22 |     name: "Test if pull request is valid"
 23 |     runs-on: ubuntu-latest
 24 |     if: >
 25 |       github.event.workflow_run.event == 'pull_request' &&
 26 |       github.event.workflow_run.conclusion == 'success'
 27 |     outputs:
 28 |       is_valid: ${{ steps.check-pr.outputs.VALID }}
 29 |       payload: ${{ steps.check-pr.outputs.payload }}
 30 |       number: ${{ steps.get-pr.outputs.NUM }}
 31 |       msg: ${{ steps.check-pr.outputs.MSG }}
 32 |     steps:
 33 |       - name: 'Download PR artifact'
 34 |         id: dl
 35 |         uses: carpentries/actions/download-workflow-artifact@main
 36 |         with:
 37 |           run: ${{ github.event.workflow_run.id }}
 38 |           name: 'pr'
 39 | 
 40 |       - name: "Get PR Number"
 41 |         if: ${{ steps.dl.outputs.success == 'true' }}
 42 |         id: get-pr
 43 |         run: |
 44 |           unzip pr.zip
 45 |           echo "NUM=$(<./NR)" >> $GITHUB_OUTPUT
 46 | 
 47 |       - name: "Fail if PR number was not present"
 48 |         id: bad-pr
 49 |         if: ${{ steps.dl.outputs.success != 'true' }}
 50 |         run: |
 51 |           echo '::error::A pull request number was not recorded. The pull request that triggered this workflow is likely malicious.'
 52 |           exit 1
 53 |       - name: "Get Invalid Hashes File"
 54 |         id: hash
 55 |         run: |
 56 |           echo "json<<EOF
 57 |           $(curl -sL https://files.carpentries.org/invalid-hashes.json)
 58 |           EOF" >> $GITHUB_OUTPUT
 59 |       - name: "Check PR"
 60 |         id: check-pr
 61 |         if: ${{ steps.dl.outputs.success == 'true' }}
 62 |         uses: carpentries/actions/check-valid-pr@main
 63 |         with:
 64 |           pr: ${{ steps.get-pr.outputs.NUM }}
 65 |           sha: ${{ github.event.workflow_run.head_sha }}
 66 |           headroom: 3 # if it's within the last three commits, we can keep going, because it's likely rapid-fire
 67 |           invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }}
 68 |           fail_on_error: true
 69 | 
 70 |   # Create an orphan branch on this repository with two commits
 71 |   #  - the current HEAD of the md-outputs branch
 72 |   #  - the output from running the current HEAD of the pull request through
 73 |   #    the md generator
 74 |   create-branch:
 75 |     name: "Create Git Branch"
 76 |     needs: test-pr
 77 |     runs-on: ubuntu-latest
 78 |     if: ${{ needs.test-pr.outputs.is_valid == 'true' }}
 79 |     env:
 80 |       NR: ${{ needs.test-pr.outputs.number }}
 81 |     permissions:
 82 |       contents: write
 83 |     steps:
 84 |       - name: 'Checkout md outputs'
 85 |         uses: actions/checkout@v4
 86 |         with:
 87 |           ref: md-outputs
 88 |           path: built
 89 |           fetch-depth: 1
 90 | 
 91 |       - name: 'Download built markdown'
 92 |         id: dl
 93 |         uses: carpentries/actions/download-workflow-artifact@main
 94 |         with:
 95 |           run: ${{ github.event.workflow_run.id }}
 96 |           name: 'built'
 97 | 
 98 |       - if: ${{ steps.dl.outputs.success == 'true' }}
 99 |         run: unzip built.zip
100 | 
101 |       - name: "Create orphan and push"
102 |         if: ${{ steps.dl.outputs.success == 'true' }}
103 |         run: |
104 |           cd built/
105 |           git config --local user.email "actions@github.com"
106 |           git config --local user.name "GitHub Actions"
107 |           CURR_HEAD=$(git rev-parse HEAD)
108 |           git checkout --orphan md-outputs-PR-${NR}
109 |           git add -A
110 |           git commit -m "source commit: ${CURR_HEAD}"
111 |           ls -A | grep -v '^.git$' | xargs -I _ rm -r '_'
112 |           cd ..
113 |           unzip -o -d built built.zip
114 |           cd built
115 |           git add -A
116 |           git commit --allow-empty -m "differences for PR #${NR}"
117 |           git push -u --force --set-upstream origin md-outputs-PR-${NR}
118 | 
119 |   # Comment on the Pull Request with a link to the branch and the diff
120 |   comment-pr:
121 |     name: "Comment on Pull Request"
122 |     needs: [test-pr, create-branch]
123 |     runs-on: ubuntu-latest
124 |     if: ${{ needs.test-pr.outputs.is_valid == 'true' }}
125 |     env:
126 |       NR: ${{ needs.test-pr.outputs.number }}
127 |     permissions:
128 |       pull-requests: write
129 |     steps:
130 |       - name: 'Download comment artifact'
131 |         id: dl
132 |         uses: carpentries/actions/download-workflow-artifact@main
133 |         with:
134 |           run: ${{ github.event.workflow_run.id }}
135 |           name: 'diff'
136 | 
137 |       - if: ${{ steps.dl.outputs.success == 'true' }}
138 |         run: unzip ${{ github.workspace }}/diff.zip
139 | 
140 |       - name: "Comment on PR"
141 |         id: comment-diff
142 |         if: ${{ steps.dl.outputs.success == 'true' }}
143 |         uses: carpentries/actions/comment-diff@main
144 |         with:
145 |           pr: ${{ env.NR }}
146 |           path: ${{ github.workspace }}/diff.md
147 | 
148 |   # Comment if the PR is open and matches the SHA, but the workflow files have
149 |   # changed
150 |   comment-changed-workflow:
151 |     name: "Comment if workflow files have changed"
152 |     needs: test-pr
153 |     runs-on: ubuntu-latest
154 |     if: ${{ always() && needs.test-pr.outputs.is_valid == 'false' }}
155 |     env:
156 |       NR: ${{ github.event.workflow_run.pull_requests[0].number }}
157 |       body: ${{ needs.test-pr.outputs.msg }}
158 |     permissions:
159 |       pull-requests: write
160 |     steps:
161 |       - name: 'Check for spoofing'
162 |         id: dl
163 |         uses: carpentries/actions/download-workflow-artifact@main
164 |         with:
165 |           run: ${{ github.event.workflow_run.id }}
166 |           name: 'built'
167 | 
168 |       - name: 'Alert if spoofed'
169 |         id: spoof
170 |         if: ${{ steps.dl.outputs.success == 'true' }}
171 |         run: |
172 |           echo 'body<<EOF' >> $GITHUB_ENV
173 |           echo '' >> $GITHUB_ENV
174 |           echo '## :x: DANGER :x:' >> $GITHUB_ENV
175 |           echo 'This pull request has modified workflows that created output. Close this now.' >> $GITHUB_ENV
176 |           echo '' >> $GITHUB_ENV
177 |           echo 'EOF' >> $GITHUB_ENV
178 | 
179 |       - name: "Comment on PR"
180 |         id: comment-diff
181 |         uses: carpentries/actions/comment-diff@main
182 |         with:
183 |           pr: ${{ env.NR }}
184 |           body: ${{ env.body }}
185 | 
186 | 


--------------------------------------------------------------------------------
/.github/workflows/pr-post-remove-branch.yaml:
--------------------------------------------------------------------------------
 1 | name: "Bot: Remove Temporary PR Branch"
 2 | 
 3 | on:
 4 |   workflow_run:
 5 |     workflows: ["Bot: Send Close Pull Request Signal"]
 6 |     types:
 7 |       - completed
 8 | 
 9 | jobs:
10 |   delete:
11 |     name: "Delete branch from Pull Request"
12 |     runs-on: ubuntu-latest
13 |     if: >
14 |       github.event.workflow_run.event == 'pull_request' &&
15 |       github.event.workflow_run.conclusion == 'success'
16 |     permissions:
17 |       contents: write
18 |     steps:
19 |       - name: 'Download artifact'
20 |         uses: carpentries/actions/download-workflow-artifact@main
21 |         with:
22 |           run: ${{ github.event.workflow_run.id }}
23 |           name: pr
24 |       - name: "Get PR Number"
25 |         id: get-pr
26 |         run: |
27 |           unzip pr.zip
28 |           echo "NUM=$(<./NUM)" >> $GITHUB_OUTPUT
29 |       - name: 'Remove branch'
30 |         uses: carpentries/actions/remove-branch@main
31 |         with:
32 |           pr: ${{ steps.get-pr.outputs.NUM }}
33 | 


--------------------------------------------------------------------------------
/.github/workflows/pr-preflight.yaml:
--------------------------------------------------------------------------------
 1 | name: "Pull Request Preflight Check"
 2 | 
 3 | on:
 4 |   pull_request_target:
 5 |     branches:
 6 |       ["main"]
 7 |     types:
 8 |       ["opened", "synchronize", "reopened"]
 9 | 
10 | jobs:
11 |   test-pr:
12 |     name: "Test if pull request is valid"
13 |     if: ${{ github.event.action != 'closed' }}
14 |     runs-on: ubuntu-latest
15 |     outputs:
16 |       is_valid: ${{ steps.check-pr.outputs.VALID }}
17 |     permissions:
18 |       pull-requests: write
19 |     steps:
20 |       - name: "Get Invalid Hashes File"
21 |         id: hash
22 |         run: |
23 |           echo "json<<EOF
24 |           $(curl -sL https://files.carpentries.org/invalid-hashes.json)
25 |           EOF" >> $GITHUB_OUTPUT
26 |       - name: "Check PR"
27 |         id: check-pr
28 |         uses: carpentries/actions/check-valid-pr@main
29 |         with:
30 |           pr: ${{ github.event.number }}
31 |           invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }}
32 |           fail_on_error: true
33 |       - name: "Comment result of validation"
34 |         id: comment-diff
35 |         if: ${{ always() }}
36 |         uses: carpentries/actions/comment-diff@main
37 |         with:
38 |           pr: ${{ github.event.number }}
39 |           body: ${{ steps.check-pr.outputs.MSG }}
40 | 


--------------------------------------------------------------------------------
/.github/workflows/pr-receive.yaml:
--------------------------------------------------------------------------------
  1 | name: "Receive Pull Request"
  2 | 
  3 | on:
  4 |   pull_request:
  5 |     types:
  6 |       [opened, synchronize, reopened]
  7 | 
  8 | concurrency:
  9 |   group: ${{ github.ref }}
 10 |   cancel-in-progress: true
 11 | 
 12 | jobs:
 13 |   test-pr:
 14 |     name: "Record PR number"
 15 |     if: ${{ github.event.action != 'closed' }}
 16 |     runs-on: ubuntu-latest
 17 |     outputs:
 18 |       is_valid: ${{ steps.check-pr.outputs.VALID }}
 19 |     steps:
 20 |       - name: "Record PR number"
 21 |         id: record
 22 |         if: ${{ always() }}
 23 |         run: |
 24 |           echo ${{ github.event.number }} > ${{ github.workspace }}/NR # 2022-03-02: artifact name fixed to be NR
 25 |       - name: "Upload PR number"
 26 |         id: upload
 27 |         if: ${{ always() }}
 28 |         uses: actions/upload-artifact@v4
 29 |         with:
 30 |           name: pr
 31 |           path: ${{ github.workspace }}/NR
 32 |       - name: "Get Invalid Hashes File"
 33 |         id: hash
 34 |         run: |
 35 |           echo "json<<EOF
 36 |           $(curl -sL https://files.carpentries.org/invalid-hashes.json)
 37 |           EOF" >> $GITHUB_OUTPUT
 38 |       - name: "echo output"
 39 |         run: |
 40 |           echo "${{ steps.hash.outputs.json }}"
 41 |       - name: "Check PR"
 42 |         id: check-pr
 43 |         uses: carpentries/actions/check-valid-pr@main
 44 |         with:
 45 |           pr: ${{ github.event.number }}
 46 |           invalid: ${{ fromJSON(steps.hash.outputs.json)[github.repository] }}
 47 | 
 48 |   build-md-source:
 49 |     name: "Build markdown source files if valid"
 50 |     needs: test-pr
 51 |     runs-on: ubuntu-latest
 52 |     if: ${{ needs.test-pr.outputs.is_valid == 'true' }}
 53 |     env:
 54 |       GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
 55 |       RENV_PATHS_ROOT: ~/.local/share/renv/
 56 |       CHIVE: ${{ github.workspace }}/site/chive
 57 |       PR: ${{ github.workspace }}/site/pr
 58 |       MD: ${{ github.workspace }}/site/built
 59 |     steps:
 60 |       - name: "Check Out Main Branch"
 61 |         uses: actions/checkout@v4
 62 | 
 63 |       - name: "Check Out Staging Branch"
 64 |         uses: actions/checkout@v4
 65 |         with:
 66 |           ref: md-outputs
 67 |           path: ${{ env.MD }}
 68 | 
 69 |       - name: "Set up R"
 70 |         uses: r-lib/actions/setup-r@v2
 71 |         with:
 72 |           use-public-rspm: true
 73 |           install-r: false
 74 | 
 75 |       - name: "Set up Pandoc"
 76 |         uses: r-lib/actions/setup-pandoc@v2
 77 | 
 78 |       - name: "Setup Lesson Engine"
 79 |         uses: carpentries/actions/setup-sandpaper@main
 80 |         with:
 81 |           cache-version: ${{ secrets.CACHE_VERSION }}
 82 | 
 83 |       - name: "Setup Package Cache"
 84 |         uses: carpentries/actions/setup-lesson-deps@main
 85 |         with:
 86 |           cache-version: ${{ secrets.CACHE_VERSION }}
 87 | 
 88 |       - name: "Validate and Build Markdown"
 89 |         id: build-site
 90 |         run: |
 91 |           sandpaper::package_cache_trigger(TRUE)
 92 |           sandpaper::validate_lesson(path = '${{ github.workspace }}')
 93 |           sandpaper:::build_markdown(path = '${{ github.workspace }}', quiet = FALSE)
 94 |         shell: Rscript {0}
 95 | 
 96 |       - name: "Generate Artifacts"
 97 |         id: generate-artifacts
 98 |         run: |
 99 |           sandpaper:::ci_bundle_pr_artifacts(
100 |             repo         = '${{ github.repository }}',
101 |             pr_number    = '${{ github.event.number }}',
102 |             path_md      = '${{ env.MD }}',
103 |             path_pr      = '${{ env.PR }}',
104 |             path_archive = '${{ env.CHIVE }}',
105 |             branch       = 'md-outputs'
106 |           )
107 |         shell: Rscript {0}
108 | 
109 |       - name: "Upload PR"
110 |         uses: actions/upload-artifact@v4
111 |         with:
112 |           name: pr
113 |           path: ${{ env.PR }}
114 | 
115 |       - name: "Upload Diff"
116 |         uses: actions/upload-artifact@v4
117 |         with:
118 |           name: diff
119 |           path: ${{ env.CHIVE }}
120 |           retention-days: 1
121 | 
122 |       - name: "Upload Build"
123 |         uses: actions/upload-artifact@v4
124 |         with:
125 |           name: built
126 |           path: ${{ env.MD }}
127 |           retention-days: 1
128 | 
129 |       - name: "Teardown"
130 |         run: sandpaper::reset_site()
131 |         shell: Rscript {0}
132 | 


--------------------------------------------------------------------------------
/.github/workflows/sandpaper-main.yaml:
--------------------------------------------------------------------------------
 1 | name: "01 Build and Deploy Site"
 2 | 
 3 | on:
 4 |   push:
 5 |     branches:
 6 |       - main
 7 |       - master
 8 |   schedule:
 9 |     - cron: '0 0 * * 2'
10 |   workflow_dispatch:
11 |     inputs:
12 |       name:
13 |         description: 'Who triggered this build?'
14 |         required: true
15 |         default: 'Maintainer (via GitHub)'
16 |       reset:
17 |         description: 'Reset cached markdown files'
18 |         required: false
19 |         default: false
20 |         type: boolean
21 | jobs:
22 |   full-build:
23 |     name: "Build Full Site"
24 |     runs-on: ubuntu-latest
25 |     permissions:
26 |       checks: write
27 |       contents: write
28 |       pages: write
29 |     env:
30 |       GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
31 |       RENV_PATHS_ROOT: ~/.local/share/renv/
32 |     steps:
33 | 
34 |       - name: "Checkout Lesson"
35 |         uses: actions/checkout@v4
36 | 
37 |       - name: "Set up R"
38 |         uses: r-lib/actions/setup-r@v2
39 |         with:
40 |           use-public-rspm: true
41 |           install-r: false
42 | 
43 |       - name: "Set up Pandoc"
44 |         uses: r-lib/actions/setup-pandoc@v2
45 | 
46 |       - name: "Setup Lesson Engine"
47 |         uses: carpentries/actions/setup-sandpaper@main
48 |         with:
49 |           cache-version: ${{ secrets.CACHE_VERSION }}
50 | 
51 |       - name: "Setup Package Cache"
52 |         uses: carpentries/actions/setup-lesson-deps@main
53 |         with:
54 |           cache-version: ${{ secrets.CACHE_VERSION }}
55 | 
56 |       - name: "Deploy Site"
57 |         run: |
58 |           reset <- "${{ github.event.inputs.reset }}" == "true"
59 |           sandpaper::package_cache_trigger(TRUE)
60 |           sandpaper:::ci_deploy(reset = reset)
61 |         shell: Rscript {0}
62 | 


--------------------------------------------------------------------------------
/.github/workflows/sandpaper-version.txt:
--------------------------------------------------------------------------------
1 | 0.16.6
2 | 


--------------------------------------------------------------------------------
/.github/workflows/update-cache.yaml:
--------------------------------------------------------------------------------
  1 | name: "03 Maintain: Update Package Cache"
  2 | 
  3 | on:
  4 |   workflow_dispatch:
  5 |     inputs:
  6 |       name:
  7 |         description: 'Who triggered this build (enter github username to tag yourself)?'
  8 |         required: true
  9 |         default: 'monthly run'
 10 |   schedule:
 11 |     # Run every tuesday
 12 |     - cron: '0 0 * * 2'
 13 | 
 14 | jobs:
 15 |   preflight:
 16 |     name: "Preflight Check"
 17 |     runs-on: ubuntu-latest
 18 |     outputs:
 19 |       ok: ${{ steps.check.outputs.ok }}
 20 |     steps:
 21 |       - id: check
 22 |         run: |
 23 |           if [[ ${{ github.event_name }} == 'workflow_dispatch' ]]; then
 24 |             echo "ok=true" >> $GITHUB_OUTPUT
 25 |             echo "Running on request"
 26 |           # using single brackets here to avoid 08 being interpreted as octal
 27 |           # https://github.com/carpentries/sandpaper/issues/250
 28 |           elif [ `date +%d` -le 7 ]; then
 29 |             # If the Tuesday lands in the first week of the month, run it
 30 |             echo "ok=true" >> $GITHUB_OUTPUT
 31 |             echo "Running on schedule"
 32 |           else
 33 |             echo "ok=false" >> $GITHUB_OUTPUT
 34 |             echo "Not Running Today"
 35 |           fi
 36 | 
 37 |   check_renv:
 38 |     name: "Check if We Need {renv}"
 39 |     runs-on: ubuntu-latest
 40 |     needs: preflight
 41 |     if: ${{ needs.preflight.outputs.ok == 'true'}}
 42 |     outputs:
 43 |       needed: ${{ steps.renv.outputs.exists }}
 44 |     steps:
 45 |       - name: "Checkout Lesson"
 46 |         uses: actions/checkout@v4
 47 |       - id: renv
 48 |         run: |
 49 |           if [[ -d renv ]]; then
 50 |             echo "exists=true" >> $GITHUB_OUTPUT
 51 |           fi
 52 | 
 53 |   check_token:
 54 |     name: "Check SANDPAPER_WORKFLOW token"
 55 |     runs-on: ubuntu-latest
 56 |     needs: check_renv
 57 |     if: ${{ needs.check_renv.outputs.needed == 'true' }}
 58 |     outputs:
 59 |       workflow: ${{ steps.validate.outputs.wf }}
 60 |       repo: ${{ steps.validate.outputs.repo }}
 61 |     steps:
 62 |       - name: "validate token"
 63 |         id: validate
 64 |         uses: carpentries/actions/check-valid-credentials@main
 65 |         with:
 66 |           token: ${{ secrets.SANDPAPER_WORKFLOW }}
 67 | 
 68 |   update_cache:
 69 |     name: "Update Package Cache"
 70 |     needs: check_token
 71 |     if: ${{ needs.check_token.outputs.repo== 'true' }}
 72 |     runs-on: ubuntu-latest
 73 |     env:
 74 |       GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
 75 |       RENV_PATHS_ROOT: ~/.local/share/renv/
 76 |     steps:
 77 | 
 78 |       - name: "Checkout Lesson"
 79 |         uses: actions/checkout@v4
 80 | 
 81 |       - name: "Set up R"
 82 |         uses: r-lib/actions/setup-r@v2
 83 |         with:
 84 |           use-public-rspm: true
 85 |           install-r: false
 86 | 
 87 |       - name: "Update {renv} deps and determine if a PR is needed"
 88 |         id: update
 89 |         uses: carpentries/actions/update-lockfile@main
 90 |         with:
 91 |           cache-version: ${{ secrets.CACHE_VERSION }}
 92 | 
 93 |       - name: Create Pull Request
 94 |         id: cpr
 95 |         if: ${{ steps.update.outputs.n > 0 }}
 96 |         uses: carpentries/create-pull-request@main
 97 |         with:
 98 |           token: ${{ secrets.SANDPAPER_WORKFLOW }}
 99 |           delete-branch: true
100 |           branch: "update/packages"
101 |           commit-message: "[actions] update ${{ steps.update.outputs.n }} packages"
102 |           title: "Update ${{ steps.update.outputs.n }} packages"
103 |           body: |
104 |             :robot: This is an automated build
105 | 
106 |             This will update ${{ steps.update.outputs.n }} packages in your lesson with the following versions:
107 | 
108 |             ```
109 |             ${{ steps.update.outputs.report }}
110 |             ```
111 | 
112 |             :stopwatch: In a few minutes, a comment will appear that will show you how the output has changed based on these updates.
113 | 
114 |             If you want to inspect these changes locally, you can use the following code to check out a new branch:
115 | 
116 |             ```bash
117 |             git fetch origin update/packages
118 |             git checkout update/packages
119 |             ```
120 | 
121 |             - Auto-generated by [create-pull-request][1] on ${{ steps.update.outputs.date }}
122 | 
123 |             [1]: https://github.com/carpentries/create-pull-request/tree/main
124 |           labels: "type: package cache"
125 |           draft: false
126 | 


--------------------------------------------------------------------------------
/.github/workflows/update-workflows.yaml:
--------------------------------------------------------------------------------
 1 | name: "02 Maintain: Update Workflow Files"
 2 | 
 3 | on:
 4 |   workflow_dispatch:
 5 |     inputs:
 6 |       name:
 7 |         description: 'Who triggered this build (enter github username to tag yourself)?'
 8 |         required: true
 9 |         default: 'weekly run'
10 |       clean:
11 |         description: 'Workflow files/file extensions to clean (no wildcards, enter "" for none)'
12 |         required: false
13 |         default: '.yaml'
14 |   schedule:
15 |     # Run every Tuesday
16 |     - cron: '0 0 * * 2'
17 | 
18 | jobs:
19 |   check_token:
20 |     name: "Check SANDPAPER_WORKFLOW token"
21 |     runs-on: ubuntu-latest
22 |     outputs:
23 |       workflow: ${{ steps.validate.outputs.wf }}
24 |       repo: ${{ steps.validate.outputs.repo }}
25 |     steps:
26 |       - name: "validate token"
27 |         id: validate
28 |         uses: carpentries/actions/check-valid-credentials@main
29 |         with:
30 |           token: ${{ secrets.SANDPAPER_WORKFLOW }}
31 | 
32 |   update_workflow:
33 |     name: "Update Workflow"
34 |     runs-on: ubuntu-latest
35 |     needs: check_token
36 |     if: ${{ needs.check_token.outputs.workflow == 'true' }}
37 |     steps:
38 |       - name: "Checkout Repository"
39 |         uses: actions/checkout@v4
40 | 
41 |       - name: Update Workflows
42 |         id: update
43 |         uses: carpentries/actions/update-workflows@main
44 |         with:
45 |           clean: ${{ github.event.inputs.clean }}
46 | 
47 |       - name: Create Pull Request
48 |         id: cpr
49 |         if: "${{ steps.update.outputs.new }}"
50 |         uses: carpentries/create-pull-request@main
51 |         with:
52 |           token: ${{ secrets.SANDPAPER_WORKFLOW }}
53 |           delete-branch: true
54 |           branch: "update/workflows"
55 |           commit-message: "[actions] update sandpaper workflow to version ${{ steps.update.outputs.new }}"
56 |           title: "Update Workflows to Version ${{ steps.update.outputs.new }}"
57 |           body: |
58 |             :robot: This is an automated build
59 | 
60 |             Update Workflows from sandpaper version ${{ steps.update.outputs.old }} -> ${{ steps.update.outputs.new }}
61 | 
62 |             - Auto-generated by [create-pull-request][1] on ${{ steps.update.outputs.date }}
63 | 
64 |             [1]: https://github.com/carpentries/create-pull-request/tree/main
65 |           labels: "type: template and tools"
66 |           draft: false
67 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # sandpaper files
 2 | episodes/*html
 3 | site/*
 4 | !site/README.md
 5 | 
 6 | # History files
 7 | .Rhistory
 8 | .Rapp.history
 9 | 
10 | # Session Data files
11 | .RData
12 | 
13 | # User-specific files
14 | .Ruserdata
15 | 
16 | # Example code in package build process
17 | *-Ex.R
18 | 
19 | # Output files from R CMD build
20 | /*.tar.gz
21 | 
22 | # Output files from R CMD check
23 | /*.Rcheck/
24 | 
25 | # RStudio files
26 | .Rproj.user/
27 | 
28 | # produced vignettes
29 | vignettes/*.html
30 | vignettes/*.pdf
31 | 
32 | # OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
33 | .httr-oauth
34 | 
35 | # knitr and R markdown default cache directories
36 | *_cache/
37 | /cache/
38 | 
39 | # Temporary files created by R markdown
40 | *.utf8.md
41 | *.knit.md
42 | 
43 | # R Environment Variables
44 | .Renviron
45 | 
46 | # pkgdown site
47 | docs/
48 | 
49 | # translation temp files
50 | po/*~
51 | 
52 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | # Travis-CI config for https://github.com/hpc-carpentry/hpc-chapel
 2 | # Results at https://travis-ci.org/github/hpc-carpentry/hpc-chapel
 3 | 
 4 | dist: xenial
 5 | language: python
 6 | python: 3.7
 7 | 
 8 | branches:
 9 |   only:
10 |   - gh-pages
11 |   - /.*/
12 | 
13 | before_install:
14 | install:
15 | script:
16 | 
17 | jobs:
18 |   include:
19 |     - stage: "Check for typos and spelling mistakes"
20 |       before_install: # Don't need everything to build the site
21 |       install:
22 |         pip install codespell
23 |       script:
24 |         codespell --skip="assets,*.svg,bin" --quiet-level=2  -L "rouge,dropse,namd,hist"
25 |     - stage: "Build the site"
26 |       before_install: 
27 |         - sudo apt-get update -y
28 |         - rvm default
29 |         - gem install bundler jekyll json kramdown
30 |         - bundle config build.nokogiri --use-system-libraries
31 |         - bundle install
32 |       install:
33 |         pip install pyyaml
34 |       script:
35 |         - make lesson-check-all
36 |         - make --always-make site
37 | 
38 | 


--------------------------------------------------------------------------------
/AUTHORS:
--------------------------------------------------------------------------------
1 | HPC Chapel is maintained by
2 | 
3 | - [Alex Razoumov](mailto:alex.razoumov@westgrid.ca)
4 | 
5 | It was written and edited by
6 | 
7 | - [@razoumov](https://github.com/razoumov)
8 | - [@jcarzu](https://github.com/jcarzu)
9 | 


--------------------------------------------------------------------------------
/CITATION:
--------------------------------------------------------------------------------
1 | To reference this lesson, please cite:
2 | 
3 | Razoumov A., Zuniga J. (2024). Introduction to High-Performance Computing in Chapel. https://www.hpc-carpentry.org/hpc-chapel
4 | 


--------------------------------------------------------------------------------
/CITATION.cff:
--------------------------------------------------------------------------------
 1 | # This template CITATION.cff file was generated with cffinit.
 2 | # Visit https://bit.ly/cffinit to replace its contents
 3 | # with information about your lesson.
 4 | # Remember to update this file periodically, 
 5 | # ensuring that the author list and other fields remain accurate.
 6 | 
 7 | cff-version: 1.2.0
 8 | title: Introduction to High-Performance Computing in Chapel
 9 | message: >-
10 |   Please cite this lesson using the information in this file
11 |   when you refer to it in publications, and/or if you
12 |   re-use, adapt, or expand on the content in your own
13 |   training material.
14 | type: dataset
15 | authors:
16 |   - given-names: Alex
17 |     family-names: Razoumov
18 |     email: alex.razoumov@westdri.ca
19 |     affiliation: SFU
20 |   - given-names: Juan
21 |     family-names: Zuniga
22 | repository-code: 'https://github.com/hpc-carpentry/hpc-chapel'
23 | url: 'https://www.hpc-carpentry.org/hpc-chapel'
24 | abstract: >-
25 |   This lesson is an introduction to high-performance
26 |   computing using Chapel parallel language.
27 | keywords:
28 |   - 'Chapel, HPC, parallel'
29 | license: CC-BY-4.0
30 | 


--------------------------------------------------------------------------------
/CODEOWNERS:
--------------------------------------------------------------------------------
 1 | # This file lists the contributors responsible for the
 2 | # repository content. They will also be automatically
 3 | # asked to review any pull request made in this repository.
 4 | 
 5 | # Each line is a file pattern followed by one or more owners.
 6 | # The sequence matters: later patterns take precedence.
 7 | 
 8 | # FILES  OWNERS
 9 | *        @hpc-carpentry/hpc-chapel-maintainers
10 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Contributor Code of Conduct"
 3 | ---
 4 | 
 5 | As contributors and maintainers of this project,
 6 | we pledge to follow the [The Carpentries Code of Conduct][coc].
 7 | 
 8 | Instances of abusive, harassing, or otherwise unacceptable behavior
 9 | may be reported by following our [reporting guidelines][coc-reporting].
10 | 
11 | 
12 | [coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html
13 | [coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
14 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
  1 | ## Contributing
  2 | 
  3 | [The Carpentries][cp-site] ([Software Carpentry][swc-site], [Data
  4 | Carpentry][dc-site], and [Library Carpentry][lc-site]) are open source
  5 | projects, and we welcome contributions of all kinds: new lessons, fixes to
  6 | existing material, bug reports, and reviews of proposed changes are all
  7 | welcome.
  8 | 
  9 | ### Contributor Agreement
 10 | 
 11 | By contributing, you agree that we may redistribute your work under [our
 12 | license](LICENSE.md). In exchange, we will address your issues and/or assess
 13 | your change proposal as promptly as we can, and help you become a member of our
 14 | community. Everyone involved in [The Carpentries][cp-site] agrees to abide by
 15 | our [code of conduct](CODE_OF_CONDUCT.md).
 16 | 
 17 | ### How to Contribute
 18 | 
 19 | The easiest way to get started is to file an issue to tell us about a spelling
 20 | mistake, some awkward wording, or a factual error. This is a good way to
 21 | introduce yourself and to meet some of our community members.
 22 | 
 23 | 1. If you do not have a [GitHub][github] account, you can [send us comments by
 24 |    email][contact]. However, we will be able to respond more quickly if you use
 25 |    one of the other methods described below.
 26 | 
 27 | 2. If you have a [GitHub][github] account, or are willing to [create
 28 |    one][github-join], but do not know how to use Git, you can report problems
 29 |    or suggest improvements by [creating an issue][repo-issues]. This allows us
 30 |    to assign the item to someone and to respond to it in a threaded discussion.
 31 | 
 32 | 3. If you are comfortable with Git, and would like to add or change material,
 33 |    you can submit a pull request (PR). Instructions for doing this are
 34 |    [included below](#using-github). For inspiration about changes that need to
 35 |    be made, check out the [list of open issues][issues] across the Carpentries.
 36 | 
 37 | Note: if you want to build the website locally, please refer to [The Workbench
 38 | documentation][template-doc].
 39 | 
 40 | ### Where to Contribute
 41 | 
 42 | 1. If you wish to change this lesson, add issues and pull requests here.
 43 | 2. If you wish to change the template used for workshop websites, please refer
 44 |    to [The Workbench documentation][template-doc].
 45 | 
 46 | 
 47 | ### What to Contribute
 48 | 
 49 | There are many ways to contribute, from writing new exercises and improving
 50 | existing ones to updating or filling in the documentation and submitting [bug
 51 | reports][issues] about things that do not work, are not clear, or are missing.
 52 | If you are looking for ideas, please see [the list of issues for this
 53 | repository][repo-issues], or the issues for [Data Carpentry][dc-issues],
 54 | [Library Carpentry][lc-issues], and [Software Carpentry][swc-issues] projects.
 55 | 
 56 | Comments on issues and reviews of pull requests are just as welcome: we are
 57 | smarter together than we are on our own. **Reviews from novices and newcomers
 58 | are particularly valuable**: it's easy for people who have been using these
 59 | lessons for a while to forget how impenetrable some of this material can be, so
 60 | fresh eyes are always welcome.
 61 | 
 62 | ### What *Not* to Contribute
 63 | 
 64 | Our lessons already contain more material than we can cover in a typical
 65 | workshop, so we are usually *not* looking for more concepts or tools to add to
 66 | them. As a rule, if you want to introduce a new idea, you must (a) estimate how
 67 | long it will take to teach and (b) explain what you would take out to make room
 68 | for it. The first encourages contributors to be honest about requirements; the
 69 | second, to think hard about priorities.
 70 | 
 71 | We are also not looking for exercises or other material that only run on one
 72 | platform. Our workshops typically contain a mixture of Windows, macOS, and
 73 | Linux users; in order to be usable, our lessons must run equally well on all
 74 | three.
 75 | 
 76 | ### Using GitHub
 77 | 
 78 | If you choose to contribute via GitHub, you may want to look at [How to
 79 | Contribute to an Open Source Project on GitHub][how-contribute]. In brief, we
 80 | use [GitHub flow][github-flow] to manage changes:
 81 | 
 82 | 1. Create a new branch in your desktop copy of this repository for each
 83 |    significant change.
 84 | 2. Commit the change in that branch.
 85 | 3. Push that branch to your fork of this repository on GitHub.
 86 | 4. Submit a pull request from that branch to the [upstream repository][repo].
 87 | 5. If you receive feedback, make changes on your desktop and push to your
 88 |    branch on GitHub: the pull request will update automatically.
 89 | 
 90 | NB: The published copy of the lesson is usually in the `main` branch.
 91 | 
 92 | Each lesson has a team of maintainers who review issues and pull requests or
 93 | encourage others to do so. The maintainers are community volunteers, and have
 94 | final say over what gets merged into the lesson.
 95 | 
 96 | ### Other Resources
 97 | 
 98 | The Carpentries is a global organisation with volunteers and learners all over
 99 | the world. We share values of inclusivity and a passion for sharing knowledge,
100 | teaching and learning. There are several ways to connect with The Carpentries
101 | community listed at <https://carpentries.org/connect/> including via social
102 | media, slack, newsletters, and email lists. You can also [reach us by
103 | email][contact].
104 | 
105 | [repo]: https://github.com/hpc-carpentry/hpc-chapel
106 | [repo-issues]: https://github.com/hpc-carpentry/hpc-chapel/issues
107 | [contact]: mailto:maintainers-hpc@lists.carpentries.org
108 | [cp-site]: https://carpentries.org/
109 | [dc-issues]: https://github.com/issues?q=user%3Adatacarpentry
110 | [dc-lessons]: https://datacarpentry.org/lessons/
111 | [dc-site]: https://datacarpentry.org/
112 | [discuss-list]: https://carpentries.topicbox.com/groups/discuss
113 | [github]: https://github.com
114 | [github-flow]: https://guides.github.com/introduction/flow/
115 | [github-join]: https://github.com/join
116 | [how-contribute]: https://egghead.io/courses/how-to-contribute-to-an-open-source-project-on-github
117 | [issues]: https://carpentries.org/help-wanted-issues/
118 | [lc-issues]: https://github.com/issues?q=user%3ALibraryCarpentry
119 | [swc-issues]: https://github.com/issues?q=user%3Aswcarpentry
120 | [swc-lessons]: https://software-carpentry.org/lessons/
121 | [swc-site]: https://software-carpentry.org/
122 | [lc-site]: https://librarycarpentry.org/
123 | [template-doc]: https://carpentries.github.io/workbench/
124 | 


--------------------------------------------------------------------------------
/Gemfile:
--------------------------------------------------------------------------------
1 | source "https://rubygems.org"
2 | gem "github-pages", group: :jekyll_plugins
3 | gem "kramdown-parser-gfm"
4 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: "Licenses"
 3 | ---
 4 | 
 5 | ## Instructional Material
 6 | 
 7 | All High Performance Computing Carpentry instructional material is
 8 | made available under the [Creative Commons Attribution
 9 | license][cc-by-human]. The following is a human-readable summary of
10 | (and not a substitute for) the full legal text of the [CC BY 4.0
11 | license][cc-by-legal].
12 | 
13 | ### You are free to
14 | 
15 | **Share**---copy and redistribute the material in any medium or format for any
16 | purpose, even commercially.
17 | 
18 | **Adapt**---remix, transform, and build upon the material for any purpose, even
19 | commercially.
20 | 
21 | The licensor cannot revoke these freedoms as long as you follow the license
22 | terms.
23 | 
24 | ### Under the following terms
25 | 
26 | **Attribution**---You must give appropriate credit, provide a [link to the
27 | license][cc-by-human], and indicate if changes were made. You may do so in any
28 | reasonable manner, but not in any way that suggests the licensor endorses you
29 | or your use.
30 | 
31 | **No additional restrictions**---You may not apply legal terms or technological
32 | measures that legally restrict others from doing anything the license permits.
33 | 
34 | ### Notices
35 | 
36 | You do not have to comply with the license for elements of the material in the
37 | public domain or where your use is permitted by an applicable exception or
38 | limitation.
39 | 
40 | No warranties are given. The license may not give you all of the permissions
41 | necessary for your intended use. For example, other rights such as publicity,
42 | privacy, or moral rights may limit how you use the material.
43 | 
44 | ## Software
45 | 
46 | Except where otherwise noted, the example programs and other software provided
47 | by HPC Carpentry are made available under the [OSI][osi]-approved [MIT
48 | license][mit-license].
49 | 
50 | ### MIT License
51 | 
52 | Copyright © 2024 HPC Carpentry
53 | 
54 | Permission is hereby granted, free of charge, to any person obtaining a copy of
55 | this software and associated documentation files (the "Software"), to deal in
56 | the Software without restriction, including without limitation the rights to
57 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
58 | of the Software, and to permit persons to whom the Software is furnished to do
59 | so, subject to the following conditions:
60 | 
61 | The above copyright notice and this permission notice shall be included in all
62 | copies or substantial portions of the Software.
63 | 
64 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
65 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
66 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
67 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
68 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
69 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
70 | SOFTWARE.
71 | 
72 | ## Trademark
73 | 
74 | "The Carpentries", "Software Carpentry", "Data Carpentry", and "Library
75 | Carpentry" and their respective logos are registered trademarks of [Community
76 | Initiatives][ci].
77 | 
78 | [cc-by-human]: https://creativecommons.org/licenses/by/4.0/
79 | [cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode
80 | [mit-license]: https://opensource.org/licenses/mit-license.html
81 | [ci]: https://communityin.org/
82 | [osi]: https://opensource.org
83 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | # Makefile to build HPC Chapel lesson locally
 2 | # Docs: <https://carpentries.github.io/sandpaper-docs>
 3 | 
 4 | # Disable the browser, if none is set.
 5 | export R_BROWSER := $(or $(R_BROWSER),"false")
 6 | 
 7 | all: serve
 8 | .PHONY: all build check clean serve
 9 | 
10 | serve: build
11 | 	Rscript -e "sandpaper::serve()"
12 | 
13 | build:
14 | 	Rscript -e "sandpaper::build_lesson()"
15 | 
16 | check:
17 | 	Rscript -e "sandpaper::check_lesson()"
18 | 
19 | clean:
20 | 	Rscript -e "sandpaper::reset_site()"
21 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # HPC Chapel 
 2 | 
 3 | This lesson is focused on teaching the basics of high-performance computing
 4 | (HPC). There are 4 primary components to this lesson. Each component is
 5 | budgeted half a day's worth of teaching-time, resulting in a two day workshop.
 6 | 
 7 | 1. UNIX fundamentals
 8 | 2. Working on a cluster
 9 | 3. Programming language introduction/review
10 | 4. Introduction to parallel programming
11 | 
12 | Sections 3 and 4 (programming) will feature two programming languages:
13 | [Python](https://www.python.org/) and [Chapel](https://chapel-lang.org). There
14 | are strong arguments for both languages, and instructors will be able to choose
15 | which language they wish to teach in.
16 | 
17 | ## Topic breakdown and todo list
18 | 
19 | The lesson outline and rough breakdown of topics by lesson writer is in
20 | [lesson-outline.md](lesson-outline.md). The topics there will be initially
21 | generated by the lesson writer, and then reviewed by the rest of the group once
22 | complete.
23 | 
24 | ## Lesson writing instructions
25 | 
26 | This is a fast overview of the Software Carpentry lesson template. This won't
27 | cover lesson style or formatting (address that during review?).
28 | 
29 | For a full guide to the lesson template, see the [Software Carpentry example
30 | lesson](http://swcarpentry.github.io/lesson-example/).
31 | 
32 | ### Lesson structure
33 | 
34 | Software Carpentry lessons are generally episodic, with one clear concept for
35 | each episode ([example](http://swcarpentry.github.io/r-novice-gapminder/)).
36 | We've got 4 major sections, each section should be broken up into several
37 | episodes (perhaps the higher-level bullet points from the lesson outline?).
38 | 
39 | An episode is just a markdown file that lives under the `_episodes` folder.
40 | Here is a link to a [markdown
41 | cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)
42 | with most markdown syntax. Additionally, the Software Carpentry lesson template
43 | uses several extra bits of formatting- see here for a [full
44 | guide](http://swcarpentry.github.io/lesson-example/04-formatting/). The most
45 | significant change is the addition of a YAML header that adds metadata (key
46 | questions, lesson teaching times, etc.) and special syntax for code blocks,
47 | exercises, and the like.
48 | 
49 | Episode names should be prefixed with a number of their section plus the number
50 | of their episode within that section. This is important because the Software
51 | Carpentry lesson template will auto-post our lessons in the order that they
52 | would sort in. As long as your lesson sorts into the correct order, it will
53 | appear in the correct order on the website.
54 | 
55 | ### Publishing changes to GitHub + the GitHub pages website
56 | 
57 | The lesson website is viewable at
58 | [hpc-carpentry.github.io/hpc-novice](hpc-carpentry.github.io/hpc-novice).
59 | 
60 | The lesson website itself is auto-generated from the `gh-pages` branch of this
61 | repository. GitHub pages will rebuild the website as soon as you push to the
62 | GitHub `gh-pages` branch. Because of this `gh-pages` is considered the "master"
63 | branch.
64 | 
65 | ### Previewing changes locally
66 | 
67 | Obviously having to push to GitHub every time you want to view your changes to
68 | the website isn't very convenient. To preview the lesson locally, run `make
69 | serve`. You can then view the website at `localhost:4321` in your browser.
70 | Pages will be automatically regenerated every time you write to them.
71 | 
72 | This process requires the R language and three R packages --
73 | [sandpaper](https://carpentries.github.io/sandpaper), [pegboard](https://carpentries.github.io/pegboard), and
74 | [varnish](https://carpentries.github.io/varnish) -- that work together with R and [pandoc](https://pandoc.org)
75 | to manage and deploy Carpentries Lesson websites written in Markdown or R Markdown.
76 | 
77 | You can find the setup instructions [here](https://carpentries.github.io/workbench).
78 | 
79 | ## Example lessons
80 | 
81 | A couple links to example SWC workshop lessons for reference:
82 | 
83 | * [Example Bash lesson](https://github.com/swcarpentry/shell-novice)
84 | * [Example Python lesson](https://github.com/swcarpentry/python-novice-inflammation)
85 | * [Example R lesson](https://github.com/swcarpentry/r-novice-gapminder) (uses R
86 |   markdown files instead of markdown)
87 | 
88 | 
89 | 


--------------------------------------------------------------------------------
/config.yaml:
--------------------------------------------------------------------------------
 1 | #------------------------------------------------------------
 2 | # Values for this lesson.
 3 | #------------------------------------------------------------
 4 | 
 5 | # Which carpentry is this (swc, dc, lc, or cp)?
 6 | # swc: Software Carpentry
 7 | # dc: Data Carpentry
 8 | # lc: Library Carpentry
 9 | # cp: Carpentries (to use for instructor training for instance)
10 | # incubator: The Carpentries Incubator
11 | # 
12 | # This option supports custom types so lessons can be branded
13 | # and themed with your own logo and alt-text (see `carpentry_description`)
14 | # See https://carpentries.github.io/sandpaper-docs/editing.html#adding-a-custom-logo
15 | carpentry: 'incubator'
16 | 
17 | # Alt-text description of the lesson.
18 | carpentry_description: 'Introduction to parallel programming in Chapel'
19 | 
20 | # Overall title for pages.
21 | title: 'Introduction to High-Performance Computing in Chapel'
22 | 
23 | # Date the lesson was created (YYYY-MM-DD, this is empty by default)
24 | created: 2017-09-14
25 | 
26 | # Comma-separated list of keywords for the lesson
27 | keywords: 'software, data, lesson, The Carpentries, HPC, Chapel'
28 | 
29 | # Life cycle stage of the lesson
30 | # possible values: pre-alpha, alpha, beta, stable
31 | life_cycle: 'alpha'
32 | 
33 | # License of the lesson
34 | license: 'CC-BY 4.0'
35 | 
36 | # Link to the source repository for this lesson
37 | source: 'https://github.com/hpc-carpentry/hpc-chapel'
38 | 
39 | # Default branch of your lesson
40 | branch: 'main'
41 | 
42 | # Who to contact if there are any issues
43 | contact: 'maintainers-hpc@lists.carpentries.org'
44 | 
45 | # Navigation ------------------------------------------------
46 | #
47 | # Use the following menu items to specify the order of
48 | # individual pages in each dropdown section. Leave blank to
49 | # include all pages in the folder.
50 | #
51 | # Example -------------
52 | #
53 | # episodes:
54 | # - introduction.md
55 | # - first-steps.md
56 | #
57 | # learners:
58 | # - setup.md
59 | #
60 | # instructors:
61 | # - instructor-notes.md
62 | #
63 | # profiles:
64 | # - one-learner.md
65 | # - another-learner.md
66 | 
67 | # Order of episodes in your lesson
68 | episodes:
69 | - 01-intro.md
70 | - 02-variables.md
71 | - 03-ranges-arrays.md
72 | - 04-conditionals.md
73 | - 05-loops.md
74 | - 06-procedures.md
75 | - 07-commandargs.md
76 | - 08-timing.md
77 | - 11-parallel-intro.md
78 | - 12-fire-forget-tasks.md
79 | - 13-synchronization.md
80 | - 14-parallel-case-study.md
81 | - 21-locales.md
82 | - 22-domains.md
83 | # - introduction.md
84 | 
85 | # Information for Learners
86 | learners: 
87 | 
88 | # Information for Instructors
89 | instructors: 
90 | 
91 | # Learner Profiles
92 | profiles: 
93 | 
94 | # Customisation ---------------------------------------------
95 | #
96 | # This space below is where custom yaml items (e.g. pinning
97 | # sandpaper and varnish versions) should live
98 | 


--------------------------------------------------------------------------------
/episodes/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hpc-carpentry/hpc-chapel/0f86bde7d5f3f4a9fe5c25f2ce016ac444e4f434/episodes/.gitkeep


--------------------------------------------------------------------------------
/episodes/01-intro.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Introduction to Chapel"
  3 | teaching: 15 
  4 | exercises: 15 
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "What is Chapel and why is it useful?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Write and execute our first Chapel program."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | **_Chapel_** is a modern, open-source programming language that supports HPC via high-level
 16 | abstractions for data parallelism and task parallelism. These abstractions allow the users to express parallel
 17 | codes in a natural, almost intuitive, manner. In contrast with other high-level parallel languages, however,
 18 | Chapel was designed around a _multi-resolution_ philosophy.  This means that users can incrementally add more
 19 | detail to their original code prototype, to optimise it to a particular computer as closely as required.
 20 | 
 21 | In a nutshell, with Chapel we can write parallel code with the simplicity and readability of scripting
 22 | languages such as Python or MATLAB, but achieving performance comparable to compiled languages like C or
 23 | Fortran (+ traditional parallel libraries such as MPI or OpenMP).
 24 | 
 25 | In this lesson we will learn the basic elements and syntax of the language; then we will study **_task
 26 | parallelism_**, the first level of parallelism in Chapel, and finally we will use parallel data structures and
 27 | **_data parallelism_**, which is the higher level of abstraction, in parallel programming, offered by Chapel.
 28 | 
 29 | ## Getting started
 30 | 
 31 | Chapel is a compilable language which means that we must **_compile_** our **_source code_** to generate a
 32 | **_binary_** or **_executable_** that we can then run in the computer.
 33 | 
 34 | Chapel source code must be written in text files with the extension **_.chpl_**. Let's write a simple "hello
 35 | world"-type program to demonstrate how we write Chapel code! Using your favourite text editor, create the file
 36 | `hello.chpl` with the following content:
 37 | 
 38 | ```chpl
 39 | writeln('If we can see this, everything works!');
 40 | ```
 41 | 
 42 | This program can then be compiled with the following bash command:
 43 | 
 44 | ```bash
 45 | chpl --fast hello.chpl
 46 | ```
 47 | 
 48 | The flag `--fast` indicates the compiler to optimise the binary to run as fast as possible in the given
 49 | architecture. By default, the compiler will produce a program with the same name
 50 | as the source file. In our case, the program will be called `hello`. The `-o`
 51 | option can be used to change the name of the generated binary.
 52 | 
 53 | To run the code, you execute it as you would any other program:
 54 | 
 55 | ```bash
 56 | ./hello
 57 | ```
 58 | ```output
 59 | If we can see this, everything works!
 60 | ```
 61 | 
 62 | ## Running on a cluster
 63 | 
 64 | Depending on the code, it might utilise several or even all cores on the current node. The command above
 65 | implies that you are allowed to utilise all cores. This might not be the case on an HPC cluster, where a login
 66 | node is shared by many people at the same time, and where it might not be a good idea to occupy all cores on a
 67 | login node with CPU-intensive tasks. Instead, you will need to submit your Chapel run as a job to the
 68 | scheduler asking for a specific number of CPU cores.
 69 | 
 70 | Use `module avail chapel` to list Chapel packages on your HPC cluster, and select the best fit for Chapel,
 71 | e.g. the single-locale Chapel module:
 72 | 
 73 | ```bash
 74 | module load chapel-multicore
 75 | ```
 76 | 
 77 | Then, for running a test code on a cluster you would submit an interactive job to the queue
 78 | 
 79 | ```bash
 80 | salloc --time=0:30:0 --ntasks=1 --cpus-per-task=3 --mem-per-cpu=1000 --account=def-guest
 81 | ```
 82 | 
 83 | and then inside that job compile and run the test code
 84 | 
 85 | ```bash
 86 | chpl --fast hello.chpl
 87 | ./hello
 88 | ```
 89 | 
 90 | For production jobs, you would compile the code and then submit a batch script to the queue:
 91 | 
 92 | ```bash
 93 | chpl --fast hello.chpl
 94 | sbatch script.sh
 95 | ```
 96 | 
 97 | where the script `script.sh` would set all Slurm variables and call the executable `mybinary`.
 98 | 
 99 | ## Case study
100 | 
101 | Along all the Chapel lessons we will be using the following _case study_ as the leading thread of the
102 | discussion. Essentially, we will be building, step by step, a Chapel code to solve the **_Heat transfer_**
103 | problem described below.  Then we will parallelize the code to improve its performance.
104 | 
105 | Suppose that we have a square metallic plate with some initial heat distribution or **_initial
106 | conditions_**. We want to simulate the evolution of the temperature across the plate when its border is in
107 | contact with a different heat distribution that we call the **_boundary conditions_**.
108 | 
109 | The Laplace equation is the mathematical model for the evolution of the temperature in the plate. To solve
110 | this equation numerically, we need to **_discretise_** it, i.e. to consider the plate as a grid, or matrix of
111 | points, and to evaluate the temperature on each point at each iteration, according to the following
112 | **_difference equation_**:
113 | 
114 | ```chpl
115 | temp_new[i,j] = 0.25 * (temp[i-1,j] + temp[i+1,j] + temp[i,j-1] + temp[i,j+1])
116 | ```
117 | 
118 | Here `temp_new` stands for the new temperature at the current iteration, while `temp` contains the temperature calculated
119 | at the past iteration (or the initial conditions in case we are at the first iteration). The indices `i` and
120 | `j` indicate that we are working on the point of the grid located at the *i*th row and the *j*th column.
121 | 
122 | So, our objective is to:
123 | 
124 | > ## Goals
125 | > 1. Write a code to implement the difference equation above. The code should
126 | >    have the following requirements:
127 | >
128 | >    - It should work for any given number of rows and columns in the grid.
129 | >    - It should run for a given number of iterations, or until the difference
130 | >      between `temp_new` and `temp` is smaller than a given tolerance value.
131 | >    - It should output the temperature at a desired position on the grid every
132 | >      given number of iterations.
133 | >
134 | > 2. Use task parallelism to improve the performance of the code and run it in
135 | >    the cluster
136 | > 3. Use data parallelism to improve the performance of the code and run it in
137 | >    the cluster.
138 | 
139 | ::::::::::::::::::::::::::::::::::::: keypoints
140 | - "Chapel is a compiled language - any programs we make must be compiled with `chpl`."
141 | - "The `--fast` flag instructs the Chapel compiler to optimise our code."
142 | - "The `-o` flag tells the compiler what to name our output (otherwise it gets named after the source file)"
143 | ::::::::::::::::::::::::::::::::::::::::::::::::
144 | 


--------------------------------------------------------------------------------
/episodes/02-variables.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Basic syntax and variables"
  3 | teaching: 15
  4 | exercises: 15
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do I write basic Chapel code?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Perform basic maths in Chapel."
 13 | - "Understand Chapel's basic data types."
 14 | - "Understand how to read and fix errors."
 15 | - "Know how to define and use data stored as variables."
 16 | ::::::::::::::::::::::::::::::::::::::::::::::::
 17 | 
 18 | Using basic maths in Chapel is fairly intuitive. Try compiling the following code to see
 19 | how the different mathematical operators work.
 20 | 
 21 | ```chpl
 22 | writeln(4 + 5);
 23 | writeln(4 - 5);
 24 | writeln(4 * 5);
 25 | writeln(4 / 5);   // integer division
 26 | writeln(4.0 / 5.0); // floating-point division
 27 | writeln(4 ** 5);  // exponentiation
 28 | ```
 29 | 
 30 | In this example, our code is called `operators.chpl`. You can compile it with the following commands:
 31 | 
 32 | ```bash
 33 | chpl operators.chpl --fast
 34 | ./operators
 35 | ```
 36 | 
 37 | You should see output that looks something like the following:
 38 | 
 39 | ```output
 40 | 9
 41 | -1
 42 | 20
 43 | 0
 44 | 0.8
 45 | 1024
 46 | ```
 47 | 
 48 | Code beginning with `//` is interpreted as a comment &mdash; it does not get run. Comments are very valuable
 49 | when writing code, because they allow us to write notes to ourselves about what each piece of code does. You
 50 | can also create block comments with `/*` and `*/`:
 51 | 
 52 | ```chpl
 53 | /* This is a block comment.
 54 | It can span as many lines as you want!
 55 | (like this) */
 56 | ```
 57 | 
 58 | ## Variables
 59 | 
 60 | Granted, we probably want to do more than basic maths with Chapel. We will need to store the results of
 61 | complex operations using variables. Variables in programming are not the same as the mathematical concept. In
 62 | programming, a variable represents (or references) a location in the memory of the computer where we can store information or
 63 | data while executing a program. A variable has three elements:
 64 | 
 65 | 1. a **_name_** or label, to identify the variable 
 66 | 2. a **_type_**, that indicates the kind of data that we can store in it, and
 67 | 3. a **_value_**, the actual information or data stored in the variable.
 68 | 
 69 | Variables in Chapel are declared with the `var` or `const` keywords. When a variable declared as `const` is
 70 | initialised, its value cannot be modified anymore during the execution of the program. What happens if we try to
 71 | modify a constant variable like `test` below?
 72 | 
 73 | ```chpl
 74 | const test = 100;
 75 | test = 200;
 76 | writeln('The value of test is: ', test);
 77 | writeln(test / 4);
 78 | ```
 79 | ```bash
 80 | chpl variables.chpl
 81 | ```
 82 | ```error
 83 | variables.chpl:2: error: cannot assign to const variable
 84 | ```
 85 | 
 86 | The compiler threw an error, and did not compile our program. This is a feature of compiled languages - if
 87 | there is something wrong, we will typically see an error at compile-time, instead of while running
 88 | it. Although we already kind of know why the error was caused (we tried to reassign the value of a `const`
 89 | variable, which by definition cannot be changed), let's walk through the error as an example of how to
 90 | troubleshoot our programs.
 91 | 
 92 | - `variables.chpl:2:` indicates that the error was caused on line 2 of our `variables.chpl` file.
 93 | 
 94 | - `error:` indicates that the issue was an error, and blocks compilation.  Sometimes the compiler will just
 95 |   give us warning or information, not necessarily errors. When we see something that is not an error, we
 96 |   should carefully read the output and consider if it necessitates changing our code.  Errors must be fixed,
 97 |   as they will block the code from compiling.
 98 | 
 99 | - `cannot assign to const variable` indicates that we were trying to reassign a `const` variable, which is
100 |   explicitly not allowed in Chapel.
101 | 
102 | To fix this error, we can change `const` to `var` when declaring our `test` variable. `var` indicates a
103 | variable that can be reassigned.
104 | 
105 | ```chpl
106 | var test = 100;
107 | test = 200;
108 | writeln('The value of test is: ', test);
109 | writeln(test / 4);
110 | ```
111 | ```bash
112 | chpl variables.chpl
113 | ```
114 | ```output
115 | The value of test is: 200
116 | 50
117 | ```
118 | 
119 | 
120 | 
121 | 
122 | 
123 | In Chapel, to initialize a variable we must specify the type of the variable, or initialise it in place with
124 | some value. The common variable types in Chapel are:
125 | 
126 | - integer `int` (positive or negative whole numbers)
127 | - floating-point number `real` (decimal values)
128 | - Boolean `bool`  (true or false)
129 | - string `string` (any type of text)
130 | 
131 | These two variables below are initialized with the type. If no initial value is given, Chapel will initialise
132 | a variable with a default value depending on the declared type, for example 0 for integers and 0.0 for real
133 | variables.
134 | 
135 | ```chpl
136 | var counter: int;
137 | var delta: real;
138 | writeln("counter is ", counter, " and delta is ", delta);
139 | ```
140 | ```bash
141 | chpl variables.chpl
142 | ./variables
143 | ```
144 | ```output
145 | counter is 0 and delta is 0.0
146 | ```
147 | 
148 | If a variable is initialised with a value but without a type, Chapel will infer its type from the given
149 | initial value:
150 | 
151 | ```chpl
152 | const test = 100;
153 | writeln('The value of test is ', test, ' and its type is ', test.type:string);
154 | ```
155 | ```bash
156 | chpl variables.chpl
157 | ./variables
158 | ```
159 | ```output
160 | The value of test is 100 and its type is int(64)
161 | ```
162 | 
163 | When initialising a variable, we can also assign its type in addition to its value:
164 | 
165 | ```chpl
166 | const tolerance: real = 0.0001;
167 | const outputFrequency: int = 20;
168 | ```
169 | 
170 | ::::::::::::::::::::::::::::::::::::: callout
171 | 
172 | Note that these two notations below are different, but produce the same result in the end:
173 | 
174 | ```chpl
175 | var a: real = 10.0;   // we specify both the type and the value
176 | var a = 10: real;     // we specify only the value (10 converted to real)
177 | ```
178 | 
179 | ::::::::::::::::::::::::::::::::::::::::::::::::
180 | 
181 | 
182 | ::::::::::::::::::::::::::::::::::::: callout
183 | 
184 | In the following code (saved as `variables.chpl`) we have not initialised the variable `test` before trying to
185 | use it in line 2:
186 | 
187 | ```chpl
188 | const test;  // declare 'test' variable
189 | writeln('The value of test is: ', test);
190 | ```
191 | ```error
192 | variables.chpl:1: error: 'test' is not initialized and has no type
193 | variables.chpl:1: note: cannot find initialization point to split-init this variable
194 | variables.chpl:2: note: 'test' is used here before it is initialized
195 | ```
196 | 
197 | ::::::::::::::::::::::::::::::::::::::::::::::::
198 | 
199 | Now we know how to set, use, and change a variable, as well as the implications of using `var` and `const`. We
200 | also know how to read and interpret errors.
201 | 
202 | Let's practice defining variables and use this as the starting point of our simulation code. The code will be
203 | stored in the file `base_solution.chpl`. We will be solving the heat transfer problem introduced in the
204 | previous section, starting with some initial temperature and computing a new temperature at each iteration. We
205 | will then compute the greatest difference between the old and the new temperature and will check if it is
206 | smaller than a preset `tolerance`. If no, we will continue iterating. If yes, we will stop iterations and will
207 | print the final temperature. We will also stop iterations if we reach the maximum number of iterations
208 | `niter`.
209 | 
210 | Our grid will be of size `rows` by `cols`, and every `outputFrequency`th iteration we will print temperature
211 | at coordinates `x` and `y`.
212 | 
213 | The variable `delta` will store the greatest difference in temperature from one iteration to another. The
214 | variable `tmp` will store some temporary results when computing the temperatures.
215 | 
216 | Let's define our variables:
217 | 
218 | ```chpl
219 | const rows = 100;               // number of rows in the grid
220 | const cols = 100;               // number of columns in the grid
221 | const niter = 500;              // maximum number of iterations
222 | const x = 50;                   // row number for a printout
223 | const y = 50;                   // column number for a printout
224 | var delta: real;                // greatest difference in temperature from one iteration to another 
225 | var tmp: real;                  // for temporary results
226 | const tolerance: real = 0.0001; // smallest difference in temperature that would be accepted before stopping
227 | const outputFrequency: int = 20;   // the temperature will be printed every outputFrequency iterations
228 | ```
229 | 
230 | ::::::::::::::::::::::::::::::::::::: keypoints
231 | - "A comment is preceded with `//` or surrounded by `/* and `*/`"
232 | - "All variables in Chapel have a type, whether assigned explicitly by the user, or chosen by the Chapel
233 |   compiler based on its value."
234 | - "Reassigning a new value to a `const` variable will produce an error during compilation. If you want to assign a new value to a variable, declare that variable with the `var` keyword."
235 | ::::::::::::::::::::::::::::::::::::::::::::::::
236 | 


--------------------------------------------------------------------------------
/episodes/03-ranges-arrays.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Ranges and arrays"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "What is Chapel and why is it useful?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Learn to define and use ranges and arrays."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | ## Ranges and Arrays
 16 | 
 17 | A series of integers (1,2,3,4,5, for example), is called a **_range_**. Ranges are generated with the `..`
 18 | operator. Let's examine what a range looks like; we store the following code as `ranges.chpl`. Here we
 19 | introduce a very simple loop, cycling through all elements of the range and printing their values (we will
 20 | study `for` loops in a separate section):
 21 | 
 22 | ```chpl
 23 | var example_range = 0..10;
 24 | writeln('Our example range was set to: ', example_range);
 25 | for x in example_range do writeln(x);
 26 | ```
 27 | 
 28 | ```bash
 29 | chpl ranges.chpl
 30 | ./ranges
 31 | ```
 32 | 
 33 | ```output
 34 | Our example range was set to: 0..10
 35 | 0
 36 | 1
 37 | ...
 38 | 9
 39 | 10
 40 | ```
 41 | 
 42 | Among other uses, ranges can be used to declare **_arrays_** of variables. An array is a multidimensional
 43 | collection of values of the same type. Arrays can be of any size. Let's define a 1-dimensional array of the
 44 | size `example_range` and see what it looks like. Notice how the size of an array is included with its type.
 45 | 
 46 | ```chpl
 47 | var example_range = 0..10;
 48 | writeln('Our example range was set to: ', example_range);
 49 | var example_array: [example_range] real;
 50 | writeln('Our example array is now: ', example_array);
 51 | ```
 52 | 
 53 | We can reassign the values in our example array the same way we would reassign a variable. An array can either
 54 | be set all to a single value, or to a sequence of values.
 55 | 
 56 | ```chpl
 57 | var example_range = 0..10;
 58 | writeln('Our example range was set to: ', example_range);
 59 | var example_array: [example_range] real;
 60 | writeln('Our example array is now: ', example_array);
 61 | example_array = 5;
 62 | writeln('When set to 5: ', example_array);
 63 | example_array = 1..11;
 64 | writeln('When set to a range: ', example_array);
 65 | ```
 66 | 
 67 | ```bash
 68 | chpl ranges.chpl
 69 | ./ranges
 70 | ```
 71 | 
 72 | ```output
 73 | Our example range was set to: 0..10
 74 | Our example array is now: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
 75 | When set to 5: 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 76 | When set to a range: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
 77 | ```
 78 | 
 79 | Notice how ranges are "right inclusive", the last number of a range is included in the range. This is
 80 | different from languages like Python where this does not happen.
 81 | 
 82 | ## Indexing elements
 83 | 
 84 | We can retrieve and reset specific values of an array using `[]` notation. Note that we use the same square
 85 | bracket notation in two different contexts: (1) to declare an array, with the square brackets containing the
 86 | array's full index range `[example_range]`, and (2) to access specific array elements, as we will see
 87 | below. Let's try retrieving and setting a specific value in our example so far:
 88 | 
 89 | ```chpl
 90 | var example_range = 0..10;
 91 | writeln('Our example range was set to: ', example_range);
 92 | var example_array: [example_range] real;
 93 | writeln('Our example array is now: ', example_array);
 94 | example_array = 5;
 95 | writeln('When set to 5: ', example_array);
 96 | example_array = 1..11;
 97 | writeln('When set to a range: ', example_array);
 98 | // retrieve the 5th index
 99 | writeln(example_array[5]);
100 | // set index 5 to a new value
101 | example_array[5] = 99999;
102 | writeln(example_array);
103 | ```
104 | 
105 | ```bash
106 | chpl ranges.chpl
107 | ./ranges
108 | ```
109 | 
110 | ```output
111 | Our example range was set to: 0..10
112 | Our example array is now: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
113 | When set to 5: 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
114 | When set to a range: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
115 | 6.0
116 | 1.0 2.0 3.0 4.0 5.0 99999.0 7.0 8.0 9.0 10.0 11.0
117 | ```
118 | 
119 | One very important thing to note - in this case, index 5 was actually the 6th element. This was caused by how
120 | we set up our array. When we defined our array using a range starting at 0, element 5 corresponds to the 6th
121 | element. Unlike most other programming languages, arrays in Chapel do not start at a fixed value - they can
122 | start at any number depending on how we define them! For instance, let's redefine example_range to start at 5:
123 | 
124 | ```chpl
125 | var example_range = 5..15;
126 | writeln('Our example range was set to: ', example_range);
127 | var example_array: [example_range] real;
128 | writeln('Our example array is now: ', example_array);
129 | example_array = 5;
130 | writeln('When set to 5: ', example_array);
131 | example_array = 1..11;
132 | writeln('When set to a range: ', example_array);
133 | // retrieve the 5th index
134 | writeln(example_array[5]);
135 | // set index 5 to a new value
136 | example_array[5] = 99999;
137 | writeln(example_array);
138 | ```
139 | 
140 | ```bash
141 | chpl ranges.chpl
142 | ./ranges
143 | ```
144 | 
145 | ```output
146 | Our example range was set to: 5..15
147 | Our example array is now: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
148 | When set to 5: 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
149 | When set to a range: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
150 | 1.0
151 | 99999.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
152 | ```
153 | 
154 | ## Back to our simulation
155 | 
156 | Let's define a two-dimensional array for use in our simulation and set its initial values:
157 | 
158 | ```chpl
159 | // this is our "plate"
160 | var temp: [0..rows+1, 0..cols+1] real;
161 | temp[1..rows,1..cols] = 25;     // set the initial temperature on the internal grid
162 | ```
163 | 
164 | This is a matrix (2D array) with (`rows + 2`) rows and (`cols + 2`) columns of real numbers. The ranges
165 | `0..rows+1` and `0..cols+1` used here, not only define the size and shape of the array, they stand for the
166 | indices with which we could access particular elements of the array using the `[ , ]` notation. For example,
167 | `temp[0,0]` is the real variable located at the first row and first column of the array `temp`, while
168 | `temp[3,7]` is the one at the 4th row and 8th column; `temp[2,3..15]` access columns 4th to 16th of the 3th
169 | row of `temp`, and `temp[0..3,4]` corresponds to the first 4 rows on the 5th column of `temp`.
170 | 
171 | We divide our "plate" into two parts: (1) the internal grid `1..rows,1..cols` on which we set the initial
172 | temperature at 25.0, and (2) the surrounding layer of *ghost points* with row indices equal to `0` or `rows+1`
173 | and column indices equal to `0` or `cols+1`. The temperature in the ghost layer is equal to 0.0 by default, as
174 | we do not assign a value there.
175 | 
176 | We must now be ready to start coding our simulations. Let's print some information about the initial
177 | configuration, compile the code, and execute it to see if everything is working as expected.
178 | 
179 | ```chpl
180 | const rows = 100;
181 | const cols = 100;
182 | const niter = 500;
183 | const x = 50;                   // row number of the desired position
184 | const y = 50;                   // column number of the desired position
185 | const tolerance = 0.0001;       // smallest difference in temperature that would be accepted before stopping
186 | const outputFrequency: int = 20;   // the temperature will be printed every outputFrequency iterations
187 | 
188 | // this is our "plate"
189 | var temp: [0..rows+1, 0..cols+1] real;
190 | temp[1..rows,1..cols] = 25;     // set the initial temperature on the internal grid
191 | 
192 | writeln('This simulation will consider a matrix of ', rows, ' by ', cols, ' elements.');
193 | writeln('Temperature at start is: ', temp[x, y]);
194 | ```
195 | 
196 | ```bash
197 | chpl base_solution.chpl
198 | ./base_solution
199 | ```
200 | 
201 | ```output
202 | This simulation will consider a matrix of 100 by 100 elements.
203 | Temperature at start is: 25.0
204 | ```
205 | 
206 | ::::::::::::::::::::::::::::::::::::: keypoints
207 | - "A range is a sequence of integers."
208 | - "An array holds a non-negative number of values of the same type."
209 | - "Chapel arrays can start at any index, not just 0 or 1."
210 | - "You can index arrays with the `[]` brackets."
211 | ::::::::::::::::::::::::::::::::::::::::::::::::
212 | 


--------------------------------------------------------------------------------
/episodes/04-conditionals.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Conditional statements"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do I add conditional logic to my code?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "You can use the `==`, `>`, `>=`, etc. operators to make a comparison that returns true or false."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | Chapel, as most *high level programming languages*, has different statements to control the flow of the
 16 | program or code. The conditional statements are: the **_if statement_**, and the **_while statement_**. These
 17 | statements both rely on comparisons between values. Let's try a few comparisons to see how they work
 18 | (`conditionals.chpl`):
 19 | 
 20 | ```chpl
 21 | writeln(1 == 2);
 22 | writeln(1 != 2);
 23 | writeln(1 > 2);
 24 | writeln(1 >= 2);
 25 | writeln(1 < 2);
 26 | writeln(1 <= 2);
 27 | ```
 28 | 
 29 | ```bash
 30 | chpl conditionals.chpl
 31 | ./conditionals
 32 | ```
 33 | 
 34 | ```output
 35 | false
 36 | true
 37 | false
 38 | false
 39 | true
 40 | true
 41 | ```
 42 | 
 43 | You can combine comparisons with the `&&` (AND) and `||` (OR) operators. `&&` only returns `true` if both
 44 | conditions are true, while `||` returns `true` if either condition is true.
 45 | 
 46 | ```chpl
 47 | writeln(1 == 2);
 48 | writeln(1 != 2);
 49 | writeln(1 > 2);
 50 | writeln(1 >= 2);
 51 | writeln(1 < 2);
 52 | writeln(1 <= 2);
 53 | writeln(true && true);
 54 | writeln(true && false);
 55 | writeln(true || false);
 56 | ```
 57 | 
 58 | ```bash
 59 | chpl conditionals.chpl
 60 | ./conditionals
 61 | ```
 62 | 
 63 | ```output
 64 | false
 65 | true
 66 | false
 67 | false
 68 | true
 69 | true
 70 | true
 71 | false
 72 | true
 73 | ```
 74 | 
 75 | ## Control flow
 76 | 
 77 | The general syntax of a while statement is: 
 78 | 
 79 | ```chpl
 80 | // single-statement form
 81 | while condition do
 82 |   instruction
 83 | 
 84 | // multi-statement form
 85 | while condition
 86 | {
 87 |   instructions
 88 | }
 89 | ```
 90 | 
 91 | The code flows as follows: first, the condition is evaluated, and then, if it is satisfied, all the
 92 | instructions within the curly brackets or `do` are executed one by one. This will be repeated over and over again
 93 | until the condition does not hold anymore.
 94 | 
 95 | The main loop in our simulation can be programmed using a while statement like this
 96 | 
 97 | ```chpl
 98 | //this is the main loop of the simulation
 99 | var c = 0;
100 | delta = tolerance;
101 | while (c < niter && delta >= tolerance)
102 | {
103 |   c += 1;
104 |   // actual simulation calculations will go here
105 | }
106 | ```
107 | 
108 | Essentially, what we want is to repeat all the code inside the curly brackets until the number of iterations
109 | is greater than or equal to `niter`, or the difference of temperature between iterations is less than
110 | `tolerance`. (Note that in our case, as `delta` was not initialised when declared -- and thus Chapel assigned it
111 | the default real value 0.0 -- we need to assign it a value greater than or equal to 0.001, or otherwise the
112 | condition of the while statement will never be satisfied. A good starting point is to simple say that `delta`
113 | is equal to `tolerance`).
114 | 
115 | To count iterations we just need to keep adding 1 to the counter variable `c`.  We could do this with `c=c+1`,
116 | or with the compound assignment, `+=`, as in the code above. To program the rest of the logic inside the curly
117 | brackets, on the other hand, we will need more elaborated instructions.
118 | 
119 | Let's focus, first, on printing the temperature every `outputFrequency = 20` iterations. To achieve this, we
120 | only need to check whether `c` is a multiple of `outputFrequency`, and in that case, to print the temperature
121 | at the desired position. This is the type of control that an **_if statement_** give us. The general syntax
122 | is:
123 | 
124 | ```chpl
125 | // single-statement form
126 | if condition then
127 |   instruction A
128 | else
129 |   instruction B
130 | 
131 | // multi-statement form
132 | if condition
133 | {instructions A}
134 | else
135 | {instructions B}
136 | ```
137 | 
138 | The set of instructions A is executed once if the condition is satisfied; the set of instructions B is
139 | executed otherwise (the else part of the if statement is optional).
140 | 
141 | So, in our case this would do the trick:
142 | 
143 | ```chpl
144 | if (c % outputFrequency == 0)
145 | {
146 |   writeln('Temperature at iteration ', c, ': ', temp[x, y]);
147 | }
148 | ```
149 | 
150 | Note that when only one instruction will be executed, there is no need to use the curly brackets. `%` is the
151 | modulo operator, it returns the remainder after the division (i.e. it returns zero when `c` is multiple of
152 | `outputFrequency`).
153 | 
154 | Let's compile and execute our code to see what we get until now
155 | 
156 | ```chpl
157 | const rows = 100;
158 | const cols = 100;
159 | const niter = 500;
160 | const x = 50;                   // row number of the desired position
161 | const y = 50;                   // column number of the desired position
162 | const tolerance = 0.0001;       // smallest difference in temperature that
163 |                                 // would be accepted before stopping
164 | const outputFrequency: int = 20;   // the temperature will be printed every outputFrequency iterations
165 | var delta: real;                // greatest difference in temperature from one iteration to another 
166 | var tmp: real;                  // for temporary results
167 | 
168 | // this is our "plate"
169 | var temp: [0..rows+1, 0..cols+1] real = 25;
170 | 
171 | writeln('This simulation will consider a matrix of ', rows, ' by ', cols, ' elements.');
172 | writeln('Temperature at start is: ', temp[x, y]);
173 | 
174 | //this is the main loop of the simulation
175 | var c = 0;
176 | delta = tolerance;
177 | while (c < niter && delta >= tolerance)
178 | {
179 |   c += 1;
180 |   if (c % outputFrequency == 0)
181 |   {
182 |     writeln('Temperature at iteration ', c, ': ', temp[x, y]);
183 |   }
184 | }
185 | ```
186 | 
187 | ```bash
188 | chpl base_solution.chpl
189 | ./base_solution
190 | ```
191 | 
192 | ```output
193 | This simulation will consider a matrix of 100 by 100 elements.
194 | Temperature at start is: 25.0
195 | Temperature at iteration 20: 25.0
196 | Temperature at iteration 40: 25.0
197 | Temperature at iteration 60: 25.0
198 | Temperature at iteration 80: 25.0
199 | Temperature at iteration 100: 25.0
200 | Temperature at iteration 120: 25.0
201 | Temperature at iteration 140: 25.0
202 | Temperature at iteration 160: 25.0
203 | Temperature at iteration 180: 25.0
204 | Temperature at iteration 200: 25.0
205 | Temperature at iteration 220: 25.0
206 | Temperature at iteration 240: 25.0
207 | Temperature at iteration 260: 25.0
208 | Temperature at iteration 280: 25.0
209 | Temperature at iteration 300: 25.0
210 | Temperature at iteration 320: 25.0
211 | Temperature at iteration 340: 25.0
212 | Temperature at iteration 360: 25.0
213 | Temperature at iteration 380: 25.0
214 | Temperature at iteration 400: 25.0
215 | Temperature at iteration 420: 25.0
216 | Temperature at iteration 440: 25.0
217 | Temperature at iteration 460: 25.0
218 | Temperature at iteration 480: 25.0
219 | Temperature at iteration 500: 25.0
220 | ```
221 | 
222 | Of course the temperature is always 25.0 at any iteration other than the initial one, as we haven't done any
223 | computation yet.
224 | 
225 | ::::::::::::::::::::::::::::::::::::: keypoints
226 | - "Use `if <condition> {instructions A} else {instructions B}` syntax to execute one set of instructions
227 |   if the condition is satisfied, and the other set of instructions if the condition is not satisfied."
228 | - This syntax can be simplified to `if <condition> {instructions}` if we only want to execute the
229 |   instructions within the curly brackets if the condition is satisfied.
230 | - "Use `while <condition> {instructions}` to repeatedly execute the instructions within the curly brackets
231 |   while the condition is satisfied. The instructions will be executed over and over again until the condition
232 |   does not hold anymore."
233 | ::::::::::::::::::::::::::::::::::::::::::::::::
234 | 


--------------------------------------------------------------------------------
/episodes/05-loops.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Getting started with loops"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do I run the same piece of code repeatedly?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Learn to use `for` loops to run over every element of an iterand."
 13 | - "Learn the difference between using `for` loops and using a `while` statement to repeatedly execute a code block.
 14 | ::::::::::::::::::::::::::::::::::::::::::::::::
 15 | 
 16 | To compute the new temperature, i.e. each element of `temp_new`, we need to add all the surrounding elements in
 17 | `temp` and divide the result by 4. And, essentially, we need to repeat this process for all the elements
 18 | of `temp_new`, or, in other words, we need to *iterate* over the elements of `temp_new`. When it comes to iterating over
 19 | a given number of elements, the **_for-loop_** is what we want to use. The for-loop has the following general
 20 | syntax:
 21 | 
 22 | ```chpl
 23 | // single-statement version
 24 | for index in iterand do
 25 |   instruction;
 26 | 
 27 | // multi-statement version
 28 | for index in iterand
 29 | {instructions}
 30 | ```
 31 | 
 32 | The *iterand* is a function or statement that expresses an iteration; it could be the range 1..15, for
 33 | example. *index* is a variable that exists only in the context of the for-loop, and that will be taking the
 34 | different values yielded by the iterand. The code flows as follows: index takes the first value yielded by the
 35 | iterand, and keeps it until all the instructions inside the curly brackets are executed one by one; then,
 36 | index takes the second value yielded by the iterand, and keeps it until all the instructions are executed
 37 | again. This pattern is repeated until index takes all the different values expressed by the iterand.
 38 | 
 39 | This `for` loop, for example
 40 | 
 41 | ```chpl
 42 | // calculate the new temperatures (temp_new) using the past temperatures (temp)
 43 | for i in 1..rows
 44 | {
 45 |   // do this for every row 
 46 | }
 47 | ```
 48 | 
 49 | will allow us to iterate over the rows of `temp_new`. Now, for each row we also need to iterate over all the
 50 | columns in order to access every single element of `temp_new`. This can be done with nested `for` loops like
 51 | this:
 52 | 
 53 | ```chpl
 54 | // calculate the new temperatures (temp_new) using the past temperatures (temp)
 55 | for i in 1..rows
 56 | {
 57 |   // do this for every row 
 58 |   for j in 1..cols
 59 |   {
 60 |     // and this for every column in the row i
 61 |   }
 62 | }
 63 | ```
 64 | 
 65 | Now, inside the inner loop, we can use the indices `i` and `j` to perform the required computations as
 66 | follows:
 67 | 
 68 | ```chpl
 69 | // calculate the new temperatures (temp_new) using the past temperatures (temp)
 70 | for i in 1..rows
 71 | {
 72 |   // do this for every row 
 73 |   for j in 1..cols
 74 |   {
 75 |     // and this for every column in the row i
 76 |     temp_new[i,j] = (temp[i-1,j] + temp[i+1,j] + temp[i,j-1] + temp[i,j+1]) / 4;
 77 |   }
 78 | }     
 79 | temp=temp_new;
 80 | ```
 81 | 
 82 | Note that at the end of the outer `for` loop, when all the elements in `temp_new` are already calculated, we update
 83 | `temp` with the values of `temp_new`; this way everything is set up for the next iteration of the main `while`
 84 | statement.
 85 | 
 86 | We're ready to execute our code, but the conditions we have initially set up
 87 | will not produce interesting output, because the plate has a temperature
 88 | value of `25` everywhere. We can change the boundaries to have temperature `0`
 89 | so that the middle will start cooling down. To do this, we should change the
 90 | declaration of `temp` to:
 91 | 
 92 | ```chpl
 93 | var temp: [0..rows+1, 0..cols+1] real = 0; // the whole plate starts at 0
 94 | temp[1..rows,1..cols] = 25;                // set the non-boundary coordinates to 25
 95 | ```
 96 | 
 97 | Now let's compile and execute our code again:
 98 | 
 99 | ```bash
100 | chpl base_solution.chpl
101 | ./base_solution
102 | ```
103 | 
104 | ```output
105 | The simulation will consider a matrix of 100 by 100 elements,
106 | it will run up to 500 iterations, or until the largest difference
107 | in temperature between iterations is less than 0.0001.
108 | You are interested in the evolution of the temperature at the
109 | position (50,50) of the matrix...
110 | 
111 | and here we go...
112 | Temperature at iteration 0: 25.0
113 | Temperature at iteration 20: 25.0
114 | Temperature at iteration 40: 25.0
115 | Temperature at iteration 60: 25.0
116 | Temperature at iteration 80: 25.0
117 | Temperature at iteration 100: 25.0
118 | Temperature at iteration 120: 25.0
119 | Temperature at iteration 140: 25.0
120 | Temperature at iteration 160: 25.0
121 | Temperature at iteration 180: 25.0
122 | Temperature at iteration 200: 25.0
123 | Temperature at iteration 220: 24.9999
124 | Temperature at iteration 240: 24.9996
125 | Temperature at iteration 260: 24.9991
126 | Temperature at iteration 280: 24.9981
127 | Temperature at iteration 300: 24.9963
128 | Temperature at iteration 320: 24.9935
129 | Temperature at iteration 340: 24.9893
130 | Temperature at iteration 360: 24.9833
131 | Temperature at iteration 380: 24.9752
132 | Temperature at iteration 400: 24.9644
133 | Temperature at iteration 420: 24.9507
134 | Temperature at iteration 440: 24.9337
135 | Temperature at iteration 460: 24.913
136 | Temperature at iteration 480: 24.8883
137 | Temperature at iteration 500: 24.8595
138 | ```
139 | 
140 | As we can see, the temperature in the middle of the plate (position 50,50) is slowly decreasing as the plate
141 | is cooling down.
142 | 
143 | ::::::::::::::::::::::::::::::::::::: challenge
144 | 
145 | ## Challenge 1: Can you do it?
146 | 
147 | What would be the temperature at the top right corner of the plate? In our current setup we have a layer of
148 | ghost points around the internal grid. While the temperature on the internal grid was initially set to 25.0,
149 | the temperature at the ghost points was set to 0.0. Note that during our iterations we do not compute the
150 | temperature at the ghost points -- it is permanently set to 0.0. Consequently, any point close to the ghost
151 | layer will be influenced by this zero temperature, so we expect the temperature near the border of the plate
152 | to decrease faster. Modify the code to see the temperature at the top right corner.
153 | 
154 | :::::::::::::::::::::::: solution
155 | 
156 | To see the evolution of the temperature at the top right corner of the plate, we just need to modify `x` and
157 | `y`. This corner correspond to the first row (`x=1`) and the last column (`y=cols`) of the plate.
158 | 
159 | ```bash
160 | chpl base_solution.chpl
161 | ./base_solution
162 | ```
163 | 
164 | ```output
165 | The simulation will consider a matrix of 100 by 100 elements,
166 | it will run up to 500 iterations, or until the largest difference
167 | in temperature between iterations is less than 0.0001.
168 | You are interested in the evolution of the temperature at the position (1,100) of the matrix...
169 | 
170 | and here we go...
171 | Temperature at iteration 0: 25.0
172 | Temperature at iteration 20: 1.48171
173 | Temperature at iteration 40: 0.767179
174 | ...
175 | Temperature at iteration 460: 0.068973
176 | Temperature at iteration 480: 0.0661081
177 | Temperature at iteration 500: 0.0634717
178 | ```
179 | 
180 | :::::::::::::::::::::::::::::::::
181 | ::::::::::::::::::::::::::::::::::::::::::::::::
182 | 
183 | ::::::::::::::::::::::::::::::::::::: challenge
184 | 
185 | ## Challenge 2: Can you do it?
186 | 
187 | Now let's have some more interesting boundary conditions. Suppose that the plate is heated by a source of 80
188 | degrees located at the bottom right corner, and that the temperature on the rest of the border decreases
189 | linearly as one gets farther form the corner (see the image below). Utilise for loops to setup the described
190 | boundary conditions. Compile and run your code to see how the temperature is changing now.
191 | 
192 | :::::::::::::::::::::::: solution
193 | 
194 | To get the linear distribution, the 80 degrees must be divided by the number of rows or columns in our
195 | plate. So, the following couple of `for` loops at the start of time iteration will give us what we want:
196 | 
197 | ```chpl
198 | // set the boundary conditions
199 | for i in 1..rows do
200 |   temp[i,cols+1] = i*80.0/rows;   // right side
201 | for j in 1..cols do
202 |   temp[rows+1,j] = j*80.0/cols;   // bottom side
203 | ```
204 | 
205 | Note that 80 degrees is written as a real number 80.0. The division of integers in Chapel returns an integer,
206 | then, as `rows` and `cols` are integers, we must have 80 as real so that the result is not truncated.
207 | 
208 | ```bash
209 | chpl base_solution.chpl
210 | ./base_solution
211 | ```
212 | 
213 | ```output
214 | The simulation will consider a matrix of 100 by 100 elements, it will run
215 | up to 500 iterations, or until the largest difference in temperature
216 | between iterations is less than 0.0001. You are interested in the evolution
217 | of the temperature at the position (1,100) of the matrix...
218 | 
219 | and here we go...
220 | Temperature at iteration 0: 25.0
221 | Temperature at iteration 20: 2.0859
222 | Temperature at iteration 40: 1.42663
223 | ...
224 | Temperature at iteration 460: 0.826941
225 | Temperature at iteration 480: 0.824959
226 | Temperature at iteration 500: 0.823152
227 | ```
228 | 
229 | :::::::::::::::::::::::::::::::::
230 | ::::::::::::::::::::::::::::::::::::::::::::::::
231 | 
232 | ::::::::::::::::::::::::::::::::::::: challenge
233 | 
234 | ## Challenge 3: Can you do it?
235 | 
236 | Let us increase the maximum number of iterations to `niter = 10_000`. The code now does 10_000 iterations:
237 | 
238 | ```output
239 | ...
240 | Temperature at iteration 9960: 0.79214
241 | Temperature at iteration 9980: 0.792139
242 | Temperature at iteration 10000: 0.792139
243 | ```
244 | 
245 | So far, `delta` has been always equal to `tolerance`, which means that our main `while` loop will always run
246 | `niter` iterations. So let's update `delta` after each iteration. Use what we have studied so far to write the
247 | required piece of code.
248 | 
249 | :::::::::::::::::::::::: solution
250 | 
251 | The idea is simple, after each iteration of the while loop, we must compare all elements of `temp_new` and
252 | `temp`, find the greatest difference, and update `delta` with that value. The next nested for loops do
253 | the job:
254 | 
255 | ```chpl
256 | // update delta, the greatest difference between temp_new and temp
257 | delta=0;
258 | for i in 1..rows
259 | {
260 |   for j in 1..cols
261 |   {
262 |     tmp = abs(temp_new[i,j]-temp[i,j]);
263 |     if tmp > delta then delta = tmp;
264 |   }
265 | }
266 | ```
267 | 
268 | Clearly there is no need to keep the difference at every single position in the array, we just need to update
269 | `delta` if we find a greater one.
270 | 
271 | ```bash
272 | chpl base_solution.chpl
273 | ./base_solution
274 | ```
275 | 
276 | ```output
277 | The simulation will consider a matrix of 100 by 100 elements,
278 | it will run up to 10000 iterations, or until the largest difference
279 | in temperature between iterations is less than 0.0001.
280 | You are interested in the evolution of the temperature at the
281 | position (1,100) of the matrix...
282 | 
283 | and here we go...
284 | Temperature at iteration 0: 25.0
285 | Temperature at iteration 20: 2.0859
286 | Temperature at iteration 40: 1.42663
287 | ...
288 | Temperature at iteration 7460: 0.792283
289 | Temperature at iteration 7480: 0.792281
290 | Temperature at iteration 7500: 0.792279
291 | 
292 | Final temperature at the desired position after 7505 iterations is: 0.792279
293 | The difference in temperatures between the last two iterations was: 9.99834e-05
294 | ```
295 | 
296 | :::::::::::::::::::::::::::::::::
297 | ::::::::::::::::::::::::::::::::::::::::::::::::
298 | 
299 | Now, after Exercise 3 we should have a working program to simulate our heat
300 | transfer equation. Let's just print some additional useful information,
301 | 
302 | ```chpl
303 | // print final information
304 | writeln('\nFinal temperature at the desired position after ',c,' iterations is: ',temp[x,y]);
305 | writeln('The difference in temperatures between the last two iterations was: ',delta,'\n');
306 | ```
307 | 
308 | and compile and execute our final code,
309 | 
310 | ```bash
311 | chpl base_solution.chpl
312 | ./base_solution
313 | ```
314 | 
315 | ```output
316 | The simulation will consider a matrix of 100 by 100 elements,
317 | it will run up to 500 iterations, or until the largest difference
318 | in temperature between iterations is less than 0.0001.
319 | You are interested in the evolution of the temperature at the
320 | position (1,100) of the matrix...
321 | 
322 | and here we go...
323 | Temperature at iteration 0: 25.0
324 | Temperature at iteration 20: 2.0859
325 | Temperature at iteration 40: 1.42663
326 | Temperature at iteration 60: 1.20229
327 | Temperature at iteration 80: 1.09044
328 | Temperature at iteration 100: 1.02391
329 | Temperature at iteration 120: 0.980011
330 | Temperature at iteration 140: 0.949004
331 | Temperature at iteration 160: 0.926011
332 | Temperature at iteration 180: 0.908328
333 | Temperature at iteration 200: 0.894339
334 | Temperature at iteration 220: 0.88302
335 | Temperature at iteration 240: 0.873688
336 | Temperature at iteration 260: 0.865876
337 | Temperature at iteration 280: 0.85925
338 | Temperature at iteration 300: 0.853567
339 | Temperature at iteration 320: 0.848644
340 | Temperature at iteration 340: 0.844343
341 | Temperature at iteration 360: 0.840559
342 | Temperature at iteration 380: 0.837205
343 | Temperature at iteration 400: 0.834216
344 | Temperature at iteration 420: 0.831537
345 | Temperature at iteration 440: 0.829124
346 | Temperature at iteration 460: 0.826941
347 | Temperature at iteration 480: 0.824959
348 | Temperature at iteration 500: 0.823152
349 | 
350 | Final temperature at the desired position after 500 iterations is: 0.823152
351 | The greatest difference in temperatures between the last two iterations was: 0.0258874
352 | ```
353 | 
354 | ::::::::::::::::::::::::::::::::::::: keypoints
355 | - "You can organize loops with `for` and `while` statements. Use a `for` loop to run over every element of the
356 |   iterand, e.g. `for i in 1..rows { ...}` will run over all integers from 1 to `rows`. Use a `while`
357 |   statement to repeatedly execute a code block until the condition does not hold anymore, e.g. `while (c <
358 |   niter && delta >= tolerance) {...}` will repeatedly execute the commands in curly braces until one of the
359 |   two conditions turns false."
360 | ::::::::::::::::::::::::::::::::::::::::::::::::
361 | 


--------------------------------------------------------------------------------
/episodes/06-procedures.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Procedures"
  3 | teaching: 15
  4 | exercises: 0
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do I write functions?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Be able to write our own procedures."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | Similar to other programming languages, Chapel lets you define your own functions. These are called
 16 | 'procedures' in Chapel and have an easy-to-understand syntax:
 17 | 
 18 | ```chpl
 19 | proc addOne(n) { // n is an input parameter
 20 |   return n + 1;
 21 | }
 22 | ```
 23 | 
 24 | To call this procedure, you would use its name:
 25 | 
 26 | ```chpl
 27 | writeln(addOne(10));
 28 | ```
 29 | 
 30 | Procedures can be recursive, as demonstrated below. In this example the procedure takes an integer number as a
 31 | parameter and returns an integer number -- more on this below. If the input parameter is 1 or 0, `fibonacci`
 32 | will return the same input parameter. If the input parameter is 2 or larger, `fibonacci` will call itself
 33 | recursively.
 34 | 
 35 | ```chpl
 36 | proc fibonacci(n: int): int { // input parameter type and procedure return type, respectively
 37 |   if n <= 1 then return n;
 38 |   return fibonacci(n-1) + fibonacci(n-2);
 39 | }
 40 | ```
 41 | ```chpl
 42 | writeln(fibonacci(10));
 43 | ```
 44 | 
 45 | The input parameter type `n: int` is enforced at compilation time. For example, if you try to pass a real-type
 46 | number to the procedure with `fibonacci(10.2)`, you will get an error "error: unresolved call". Similarly, the
 47 | return variable type is also enforced at compilation time. For example, replacing `return n` with `return 1.0`
 48 | in line 2 will result in "error: cannot initialize return value of type 'int(64)'". While specifying these
 49 | types might be optional (see the call out below), we highly recommend doing so in your code, as it will add
 50 | additional checks for your program.
 51 | 
 52 | ::::::::::::::::::::::::::::::::::::: callout
 53 | 
 54 | If not specified, the procedure return type is inferred from the return variable type. This might not be
 55 | possible with a recursive procedure as the return type is the procedure type, and it is not known to the
 56 | compiler, so in this case (and in the `fibonacci` example above) we need to specify the procedure return type
 57 | explicitly.
 58 | 
 59 | ::::::::::::::::::::::::::::::::::::::::::::::::
 60 | 
 61 | Procedures can take a varying number of parameters. In this example the procedure `maxOf` takes two or more
 62 | parameters of the same type. This group of parameters is referred to as a *tuple* and is named `x` inside the
 63 | procedure. The number of elements `k` in this tuple is inferred from the number of parameters passed to the
 64 | procedure and is used to organize the calculations inside the procedure:
 65 | 
 66 | ```chpl
 67 | proc maxOf(x ...?k) { // take a tuple of one type with k elements
 68 |   var maximum = x[0];
 69 |   for i in 1..<k do maximum = if maximum < x[i] then x[i] else maximum;
 70 |   return maximum;
 71 | }
 72 | ```
 73 | ```chpl
 74 | writeln(maxOf(1, -5, 123, 85, -17, 3));
 75 | writeln(maxOf(1.12, 0.85, 2.35));
 76 | ```
 77 | ```output
 78 | 123
 79 | 2.35
 80 | ```
 81 | 
 82 | Procedures can have default parameter values. If a parameter with the default value (like `y` in the example
 83 | below) is not passed to the procedure, it takes the default value inside the procedure. If it is passed with
 84 | another value, then this new value is used inside the procedure.
 85 | 
 86 | In Chapel a procedure always returns a single value or a single data structure. In this example the procedure
 87 | returns a *tuple* (a structure) with two numbers inside, one integer and one real:
 88 | 
 89 | ```chpl
 90 | proc returnTuple(x: int, y: real = 3.1415926): (int,real) {
 91 |   return (x,y);
 92 | }
 93 | ```
 94 | ```chpl
 95 | writeln(returnTuple(1));
 96 | writeln(returnTuple(x=2));
 97 | writeln(returnTuple(x=-10, y=10));
 98 | writeln(returnTuple(y=-1, x=3)); // the parameters can be named out of order
 99 | ```
100 | 
101 | Chapel procedures have many other useful features, however, they are not essential for learning task and data
102 | parallelism, so we refer the interested readers to the official Chapel documentation.
103 | 
104 | ::::::::::::::::::::::::::::::::::::: keypoints
105 | - "Functions in Chapel are called procedures."
106 | - "Procedures can take a varying number of parameters."
107 | - "Optionally, you can specify input parameter types and the return variable type."
108 | - "Procedures can have default parameter values."
109 | - "Procedures can be recursive. Recursive procedures require specifying the return variable type."
110 | ::::::::::::::::::::::::::::::::::::::::::::::::
111 | 


--------------------------------------------------------------------------------
/episodes/07-commandargs.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Using command-line arguments"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do I use the same program for multiple use-cases?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Modifying code's constant parameters without re-compiling the code."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | From the last run of our code, we can see that 500 iterations is not enough to get to a _steady state_ (a
 16 | state where the difference in temperature does not vary too much, i.e. `delta`<`tolerance`). Now, if we want to
 17 | change the number of iterations we would need to modify `niter` in the code, and compile it again.  What if we
 18 | want to change the number of rows and columns in our grid to have more precision, or if we want to see the
 19 | evolution of the temperature at a different point (x,y)? The answer would be the same, modify the code and
 20 | compile it again!
 21 | 
 22 | No need to say that this would be very tedious and inefficient. A better scenario would be if we can pass the
 23 | desired configuration values to our binary when it is called at the command line. The Chapel mechanism for
 24 | this is to use **_config_** variables. When a variable is declared with the `config` keyword, in addition to
 25 | `var` or `const`, like this:
 26 | 
 27 | ```chpl
 28 | config const niter = 500;    //number of iterations
 29 | ```
 30 | 
 31 | it can be initialised with a specific value, when executing the code at the command line, using the syntax:
 32 | 
 33 | ```bash
 34 | chpl base_solution.chpl
 35 | ./base_solution --niter=3000
 36 | ```
 37 | 
 38 | ```output
 39 | The simulation will consider a matrix of 100 by 100 elements,
 40 | it will run up to 3000 iterations, or until the largest difference
 41 | in temperature between iterations is less than 0.0001.
 42 | You are interested in the evolution of the temperature at the 
 43 | position (1,100) of the matrix...
 44 | 
 45 | and here we go...
 46 | Temperature at iteration 0: 25.0
 47 | Temperature at iteration 20: 2.0859
 48 | Temperature at iteration 40: 1.42663
 49 | ...
 50 | Temperature at iteration 2980: 0.793969
 51 | Temperature at iteration 3000: 0.793947
 52 | 
 53 | Final temperature at the desired position after 3000 iterations is: 0.793947
 54 | The greatest difference in temperatures between the last two iterations was: 0.000350086
 55 | ```
 56 | 
 57 | ::::::::::::::::::::::::::::::::::::: challenge
 58 | 
 59 | ## Challenge 4: Can you do it?
 60 | 
 61 | Make `outputFrequency`, `x`, `y`, `tolerance`, `rows` and `cols` configurable variables, and test the code
 62 | simulating different
 63 | configurations. What can you conclude about the performance of the code?
 64 | 
 65 | :::::::::::::::::::::::: solution
 66 | 
 67 | Let's prepend `config` to the following lines in our code:
 68 | 
 69 | ```chpl
 70 | config const rows = 100;               // number of rows in the grid
 71 | config const cols = 100;               // number of columns in the grid
 72 | config const niter = 10_000;           // maximum number of iterations
 73 | config const x = 1;                    // row number for a printout
 74 | config const y = cols;                 // column number for a printout
 75 | config const tolerance: real = 0.0001; // smallest difference in temperature that would be accepted before stopping
 76 | config const outputFrequency: int = 20;   // the temperature will be printed every outputFrequency iterations
 77 | ```
 78 | 
 79 | We can then recompile the code and try modifying some of these parameters from the command line. For example,
 80 | let's use a 650 x 650 grid and observe the evolution of the temperature at the position (200,300) for 10,000
 81 | iterations or until the difference of temperature between iterations is less than 0.002; also, let's print the
 82 | temperature every 1000 iterations.
 83 | 
 84 | ```bash
 85 | chpl base_solution.chpl
 86 | ./base_solution --rows=650 --cols=650 --x=200 --y=300 --tolerance=0.002 --outputFrequency=1000
 87 | ```
 88 | 
 89 | ```output
 90 | The simulation will consider a matrix of 650 by 650 elements, it will run up to 10000 iterations, or until
 91 | the largest difference in temperature between iterations is less than 0.002.  You are interested in the
 92 | evolution of the temperature at the position (200,300) of the matrix...
 93 | 
 94 | and here we go...
 95 | Temperature at iteration 0: 25.0
 96 | Temperature at iteration 1000: 25.0
 97 | Temperature at iteration 2000: 25.0
 98 | Temperature at iteration 3000: 25.0
 99 | Temperature at iteration 4000: 24.9998
100 | Temperature at iteration 5000: 24.9984
101 | Temperature at iteration 6000: 24.9935
102 | Temperature at iteration 7000: 24.9819
103 | 
104 | Final temperature at the desired position after 7750 iterations is: 24.9671
105 | The greatest difference in temperatures between the last two iterations was: 0.00199985
106 | ```
107 | 
108 | :::::::::::::::::::::::::::::::::
109 | ::::::::::::::::::::::::::::::::::::::::::::::::
110 | 
111 | ::::::::::::::::::::::::::::::::::::: keypoints
112 | - "Config variables accept values from the command line at runtime, without you having to recompile the code."
113 | ::::::::::::::::::::::::::::::::::::::::::::::::
114 | 


--------------------------------------------------------------------------------
/episodes/08-timing.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Measuring code performance"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do I know how fast my code is?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Measuring code performance by instrumenting the code."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | The code generated after Exercise 4 is the basic implementation of our simulation. We will use it as a
 16 | benchmark, to see how much we can improve the performance when introducing the parallel programming features
 17 | of the language in the following lessons.
 18 | 
 19 | But first, we need a quantitative way to measure the performance of our code.  The easiest way to do it is to
 20 | see how long it takes to finish a simulation.  The UNIX command `time` could be used to this effect
 21 | 
 22 | ```bash
 23 | time ./base_solution --rows=650 --cols=650 --x=200 --y=300 --tolerance=0.002 --outputFrequency=1000
 24 | ```
 25 | 
 26 | ```output
 27 | The simulation will consider a matrix of 650 by 650 elements,
 28 | it will run up to 10000 iterations, or until the largest difference
 29 | in temperature between iterations is less than 0.002.
 30 | You are interested in the evolution of the temperature at the 
 31 | position (200,300) of the matrix...
 32 | 
 33 | and here we go...
 34 | Temperature at iteration 0: 25.0
 35 | Temperature at iteration 1000: 25.0
 36 | Temperature at iteration 2000: 25.0
 37 | Temperature at iteration 3000: 25.0
 38 | Temperature at iteration 4000: 24.9998
 39 | Temperature at iteration 5000: 24.9984
 40 | Temperature at iteration 6000: 24.9935
 41 | Temperature at iteration 7000: 24.9819
 42 | 
 43 | Final temperature at the desired position after 7750 iterations is: 24.9671
 44 | The greatest difference in temperatures between the last two iterations was: 0.00199985
 45 | 
 46 | real	0m20.381s
 47 | user	0m20.328s
 48 | sys	0m0.053s
 49 | ```
 50 | 
 51 | The real time is what interests us. Our code is taking around 20 seconds from the moment it is called at the
 52 | command line until it returns.
 53 | 
 54 | Some times, however, it could be useful to take the execution time of specific parts of the code. This can be
 55 | achieved by modifying the code to output the information that we need. This process is called
 56 | **_instrumentation of code_**.
 57 | 
 58 | An easy way to instrument our code with Chapel is by using the module `Time`.  **_Modules_** in Chapel are
 59 | libraries of useful functions and methods that can be used once the module is loaded. To load a module we use
 60 | the keyword `use` followed by the name of the module. Once the Time module is loaded we can create a variable
 61 | of the type `stopwatch`, and use the methods `start`,`stop`and `elapsed` to instrument our code.
 62 | 
 63 | ```chpl
 64 | use Time;
 65 | var watch: stopwatch;
 66 | watch.start();
 67 | 
 68 | //this is the main loop of the simulation
 69 | delta=tolerance;
 70 | while (c<niter && delta>=tolerance) do
 71 | {
 72 | ...
 73 | }
 74 | 
 75 | watch.stop();
 76 | 
 77 | //print final information
 78 | writeln('\nThe simulation took ',watch.elapsed(),' seconds');
 79 | writeln('Final temperature at the desired position after ',c,' iterations is: ',temp[x,y]);
 80 | writeln('The greatest difference in temperatures between the last two iterations was: ',delta,'\n');
 81 | ```
 82 | 
 83 | ```bash
 84 | chpl base_solution.chpl
 85 | ./base_solution --rows=650 --cols=650 --x=200 --y=300 --tolerance=0.002 --outputFrequency=1000
 86 | ```
 87 | 
 88 | ```output
 89 | The simulation will consider a matrix of 650 by 650 elements,
 90 | it will run up to 10000 iterations, or until the largest difference
 91 | in temperature between iterations is less than 0.002.
 92 | You are interested in the evolution of the temperature at the 
 93 | position (200,300) of the matrix...
 94 | 
 95 | and here we go...
 96 | Temperature at iteration 0: 25.0
 97 | Temperature at iteration 1000: 25.0
 98 | Temperature at iteration 2000: 25.0
 99 | Temperature at iteration 3000: 25.0
100 | Temperature at iteration 4000: 24.9998
101 | Temperature at iteration 5000: 24.9984
102 | Temperature at iteration 6000: 24.9935
103 | Temperature at iteration 7000: 24.9819
104 | 
105 | The simulation took 20.1621 seconds
106 | Final temperature at the desired position after 7750 iterations is: 24.9671
107 | The greatest difference in temperatures between the last two iterations was: 0.00199985
108 | ```
109 | 
110 | ::::::::::::::::::::::::::::::::::::: keypoints
111 | - "To measure performance, instrument your Chapel code using a stopwatch from the `Time` module."
112 | ::::::::::::::::::::::::::::::::::::::::::::::::
113 | 


--------------------------------------------------------------------------------
/episodes/11-parallel-intro.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Intro to parallel computing"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How does parallel processing work?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Discuss some common concepts in parallel computing."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | The basic concept of parallel computing is simple to understand: we divide our job into tasks that can be
 16 | executed at the same time, so that we finish the job in a fraction of the time that it would have taken if the
 17 | tasks were executed one by one. Implementing parallel computations, however, is not always easy, nor
 18 | possible...
 19 | 
 20 | Consider the following analogy:
 21 | 
 22 | Suppose that we want to paint the four walls in a room. We'll call this the *problem*. We can divide our
 23 | problem into 4 different tasks: paint each of the walls. In principle, our 4 tasks are independent from each
 24 | other in the sense that we don't need to finish one to start one another. We say that we have 4 **_concurrent
 25 | tasks_**; the tasks can be executed within the same time frame.  However, this does not mean that the tasks
 26 | can be executed simultaneously or in parallel. It all depends on the amount of resources that we have for the
 27 | tasks.  If there is only one painter, this guy could work for a while in one wall, then start painting another
 28 | one, then work for a little bit on the third one, and so on. **_The tasks are being executed concurrently but
 29 | not in parallel_**. If we have two painters for the job, then more parallelism can be introduced. Four
 30 | painters could execute the tasks **_truly in parallel_**.
 31 | 
 32 | ::::::::::::::::::::::::::::::::::::: callout
 33 | 
 34 | Think of the CPU cores as the painters or workers that will execute your concurrent tasks.
 35 | 
 36 | ::::::::::::::::::::::::::::::::::::::::::::::::
 37 | 
 38 | Now imagine that all workers have to obtain their paint from a central dispenser located at the middle of the
 39 | room. If each worker is using a different colour, then they can work **_asynchronously_**, however, if they
 40 | use the same colour, and two of them run out of paint at the same time, then they have to **_synchronise_** to
 41 | use the dispenser: One must wait while the other is being serviced.
 42 | 
 43 | ::::::::::::::::::::::::::::::::::::: callout
 44 | 
 45 | Think of the shared memory in your computer as the central dispenser for all your workers.
 46 | 
 47 | ::::::::::::::::::::::::::::::::::::::::::::::::
 48 | 
 49 | Finally, imagine that we have 4 paint dispensers, one for each worker. In this scenario, each worker can
 50 | complete their task totally on their own. They don't even have to be in the same room, they could be painting
 51 | walls of different rooms in the house, in different houses in the city, and different cities in the
 52 | country. We need, however, a communication system in place. Suppose that worker A, for some reason, needs a
 53 | colour that is only available in the dispenser of worker B, they must then synchronise: worker A must request
 54 | the paint of worker B and worker B must respond by sending the required colour.
 55 | 
 56 | ::::::::::::::::::::::::::::::::::::: callout
 57 | 
 58 | Think of the memory on each node of a cluster as a separate dispenser for your workers.
 59 | 
 60 | ::::::::::::::::::::::::::::::::::::::::::::::::
 61 | 
 62 | A **_fine-grained_** parallel code needs lots of communication or synchronisation between tasks, in contrast
 63 | with a **_coarse-grained_** one. An **_embarrassingly parallel_** problem is one where all tasks can be
 64 | executed completely independent from each other (no communications required).
 65 | 
 66 | ## Parallel programming in Chapel
 67 | 
 68 | Chapel provides high-level abstractions for parallel programming no matter the grain size of your tasks,
 69 | whether they run in a shared memory on one node or use memory distributed across multiple compute nodes,
 70 | or whether they are executed
 71 | concurrently or truly in parallel. As a programmer you can focus in the algorithm: how to divide the problem
 72 | into tasks that make sense in the context of the problem, and be sure that the high-level implementation will
 73 | run on any hardware configuration. Then you could consider the details of the specific system you are going to
 74 | use (whether it is shared or distributed, the number of cores, etc.) and tune your code/algorithm to obtain a
 75 | better performance.
 76 | 
 77 | ::::::::::::::::::::::::::::::::::::: callout
 78 | 
 79 | To this effect, **_concurrency_** (the creation and execution of multiple tasks), and **_locality_** (in
 80 | which set of resources these tasks are executed) are orthogonal concepts in Chapel.
 81 | 
 82 | ::::::::::::::::::::::::::::::::::::::::::::::::
 83 | 
 84 | In summary, we can have a set of several tasks; these tasks could be running:
 85 | 
 86 | 1. concurrently by the same processor in a single compute node,
 87 | 2. in parallel by several processors in a single compute node,
 88 | 3. in parallel by several processors distributed in different compute nodes, or
 89 | 4. serially (one by one) by several processors distributed in different compute nodes.
 90 | 
 91 | Similarly, each of these tasks could be using variables
 92 | 
 93 | 1. located in the local memory on the compute node where it is running, or 
 94 | 2. stored on other compute nodes.
 95 | 
 96 | And again, Chapel could take care of all the stuff required to run our algorithm in most of the scenarios, but
 97 | we can always add more specific detail to gain performance when targeting a particular scenario.
 98 | 
 99 | ::::::::::::::::::::::::::::::::::::: keypoints
100 | - "Concurrency and locality are orthogonal concepts in Chapel: where the tasks are running may not be
101 |   indicative of when they run, and you can control both in Chapel."
102 | - "Problems with a lot of communication between tasks, or so called **_fine-grained_** parallel problems, are
103 |   typically more difficult to parallelize. As we will see later in these lessons, Chapel simplifies writing
104 |   **_fine-grained_** parallel codes by hiding a lot of communication complexity under the hood."
105 | ::::::::::::::::::::::::::::::::::::::::::::::::
106 | 


--------------------------------------------------------------------------------
/episodes/12-fire-forget-tasks.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Fire-and-forget tasks"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do we execute work in parallel?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Launching multiple threads to execute tasks in parallel."
 13 | - "Learn how to use `begin`, `cobegin`, and `coforall` to spawn new tasks."
 14 | ::::::::::::::::::::::::::::::::::::::::::::::::
 15 | 
 16 | ::::::::::::::::::::::::::::::::::::: callout
 17 | 
 18 | In the very first chapter where we showed how to run single-node Chapel codes. As a refresher, let's go over
 19 | this again. If you are running Chapel on your own computer, then you are all set, and you can simply compile
 20 | and run Chapel codes. If you are on a cluster, you will need to run Chapel codes inside interactive jobs. Here
 21 | so far we are covering only single-locale Chapel, so -- from the login node -- you can submit an interactive
 22 | job to the scheduler with a command like this one:
 23 | 
 24 | ```sh
 25 | salloc --time=2:0:0 --ntasks=1 --cpus-per-task=3 --mem-per-cpu=1000
 26 | ```
 27 | 
 28 | The details may vary depending on your cluster, e.g. different scheduler, requirement to specify an account or
 29 | reservation, etc, but the general idea remains the same: on a cluster you need to ask for resources before you
 30 | can run calculations. In this case we are asking for 2 hours maximum runtime, single MPI task (sufficient for
 31 | our parallelism in this chapter), 3 CPU cores inside that task, and 1000M maximum memory per core. The core
 32 | count means that we can run 3 threads in parallel, each on its own CPU core. Once your interactive job starts,
 33 | you can compile and run the Chapel codes below. Inside your Chapel code, when new threads start, they will be
 34 | able to utilize our 3 allocated CPU cores.
 35 | 
 36 | ::::::::::::::::::::::::::::::::::::::::::::::::
 37 | 
 38 | A Chapel program always start as a single main thread. You can then start concurrent tasks with the `begin`
 39 | statement. A task spawned by the `begin` statement will run in a different thread while the main thread
 40 | continues its normal execution. Consider the following example:
 41 | 
 42 | ```chpl
 43 | var x = 0;
 44 | 
 45 | writeln("This is the main thread starting first task");
 46 | begin
 47 | {
 48 |   var c = 0;
 49 |   while c < 10
 50 |   {
 51 |     c += 1;
 52 |     writeln('thread 1: ', x+c);
 53 |   }
 54 | }
 55 | 
 56 | writeln("This is the main thread starting second task");
 57 | begin
 58 | {
 59 |   var c = 0;
 60 |   while c < 10
 61 |   {
 62 |     c += 1;
 63 |     writeln('thread 2: ', x+c);
 64 |   }
 65 | }
 66 | 
 67 | writeln('this is main thread, I am done...');
 68 | ```
 69 | 
 70 | ```bash
 71 | chpl begin_example.chpl
 72 | ./begin_example
 73 | ```
 74 | 
 75 | ```output
 76 | This is the main thread starting first task
 77 | This is the main thread starting second task
 78 | this is main thread, I am done...
 79 | thread 1: 1
 80 | thread 1: 2
 81 | thread 1: 3
 82 | thread 1: 4
 83 | thread 1: 5
 84 | thread 1: 6
 85 | thread 1: 7
 86 | thread 1: 8
 87 | thread 1: 9
 88 | thread 1: 10
 89 | thread 2: 1
 90 | thread 2: 2
 91 | thread 2: 3
 92 | thread 2: 4
 93 | thread 2: 5
 94 | thread 2: 6
 95 | thread 2: 7
 96 | thread 2: 8
 97 | thread 2: 9
 98 | thread 2: 10
 99 | ```
100 | 
101 | As you can see the order of the output is not what we would expected, and actually it is completely
102 | unpredictable. This is a well known effect of concurrent tasks accessing the same shared resource at the same
103 | time (in this case the screen); the system decides in which order the tasks could write to the screen.
104 | 
105 | 
106 | 
107 | 
108 | 
109 | 
110 | ::::::::::::::::::::::::::::::::::::: challenge
111 | 
112 | ## Challenge 1: what if `c` is defined globally?
113 | 
114 | What would happen if in the last code we *move* the definition of `c` into the main thread, but try to assign
115 | it from threads 1 and 2? Select one answer from these:
116 | 
117 | 1. The code will fail to compile.
118 | 1. The code will compile and run, but `c` will be updated by both threads at the same time (a *race
119 |    condition*), so that its final value will vary from one run to another.
120 | 1. The code will compile and run, and the two threads will be taking turns updating `c`, so that its final
121 |    value will always be the same.
122 | 
123 | :::::::::::::::::::::::: solution
124 | 
125 | We'll get an error at compilation ("cannot assign to const variable"), since then `c` would be defined within
126 | the scope of the main thread, and we could modify its value only in the main thread. Any attempt to modify its
127 | value inside threads 1 or 2 will produce a compilation error.
128 | 
129 | :::::::::::::::::::::::::::::::::
130 | ::::::::::::::::::::::::::::::::::::::::::::::::
131 | 
132 | 
133 | 
134 | 
135 | 
136 | 
137 | 
138 | ::::::::::::::::::::::::::::::::::::: challenge
139 | 
140 | ## Challenge 2: what if we have a second, local definition of `x`?
141 | 
142 | What would happen if we try to insert a second definition `var x = 10;` inside the first `begin` statement?
143 | Select one answer from these:
144 | 
145 | 1. The code will fail to compile.
146 | 1. The code will compile and run, and the inside the first `begin` statement the value `x = 10` will be used,
147 |    whereas inside the second `begin` statement the value `x = 0` will be used.
148 | 1. The new value `x = 10` will overwrite the global value `x = 0` in both threads 1 and 2.
149 | 
150 | :::::::::::::::::::::::: solution
151 | 
152 | The code will compile and run, and you will see the following output:
153 | 
154 | ```output
155 | This is the main thread starting first task
156 | This is the main thread starting second task
157 | this is main thread, I am done...
158 | thread 1: 11
159 | thread 1: 12
160 | thread 1: 13
161 | thread 1: 14
162 | thread 1: 15
163 | thread 1: 16
164 | thread 1: 17
165 | thread 1: 18
166 | thread 1: 19
167 | thread 1: 20
168 | thread 2: 1
169 | thread 2: 2
170 | thread 2: 3
171 | thread 2: 4
172 | thread 2: 5
173 | thread 2: 6
174 | thread 2: 7
175 | thread 2: 8
176 | thread 2: 9
177 | thread 2: 10
178 | ```
179 | 
180 | :::::::::::::::::::::::::::::::::
181 | ::::::::::::::::::::::::::::::::::::::::::::::::
182 | 
183 | ::::::::::::::::::::::::::::::::::::: callout
184 | 
185 | All variables have a **_scope_** in which they can be used. The variables declared inside a concurrent task
186 | are accessible only by that task. The variables declared in the main task can be read everywhere, but Chapel
187 | won't allow other concurrent tasks to modify them.
188 | 
189 | ::::::::::::::::::::::::::::::::::::::::::::::::
190 | 
191 | 
192 | 
193 | 
194 | 
195 | 
196 | 
197 | ::::::::::::::::::::::::::::::::::::::: discussion
198 | 
199 | ## Try this ...
200 | 
201 | Are the concurrent tasks, spawned by the last code, running truly in parallel?
202 | 
203 | The answer is: it depends on the number of cores available to your Chapel code. To verify this, let's modify the code
204 | to get both threads 1 and 2 into an infinite loop:
205 | 
206 | ```chpl
207 | begin
208 | {
209 |   var c=0;
210 |   while c > -1
211 |   {
212 |        c += 1;
213 |       // the rest of the code in the thread
214 |    }
215 | }
216 | ```
217 | 
218 | Compile and run the code:
219 | 
220 | ```sh
221 | chpl begin_example.chpl
222 | ./begin_example
223 | ```
224 | 
225 | If you are running this on your own computer, you can run `top` or `htop` or `ps` commands in another terminal
226 | to check Chapel's CPU usage. If you are running inside an interactive job on a cluster, you can open a
227 | different terminal, log in to the cluster, and open a bash shell on the node that is running your job (if your
228 | cluster setup allows this):
229 | 
230 | ```sh
231 | squeue -u $USER                   # check the jobID number
232 | srun --jobid=<jobID> --pty bash   # put your jobID here
233 | htop -u $USER -s PERCENT_CPU      # display CPU usage and other information
234 | ```
235 | 
236 | In the output of `htop` you will see a table with the list of your processes, and in the "CPU%" column you
237 | will see the percentage consumed by each process. Find the Chapel process, and if it shows that your CPU usage
238 | is close to 300%, you are using 3 CPU cores. What do you see?
239 | 
240 | Now exit `htop` by pressing *Q*. Also exit your interactive run by pressing *Ctrl-C*.
241 | 
242 | :::::::::::::::::::::::::::::::::::::::::::::::::::
243 | 
244 | ::::::::::::::::::::::::::::::::::::: callout
245 | 
246 | To maximise performance, start as many tasks as cores are available.
247 | 
248 | ::::::::::::::::::::::::::::::::::::::::::::::::
249 | 
250 | A slightly more structured way to start concurrent tasks in Chapel is by using the `cobegin`statement. Here
251 | you can start a block of concurrent tasks, one for each statement inside the curly brackets. The main
252 | difference between the `begin`and `cobegin` statements is that with the `cobegin`, all the spawned tasks are
253 | synchronised at the end of the statement, i.e. the main thread won't continue its execution until all tasks
254 | are done.
255 | 
256 | ```chpl
257 | var x=0;
258 | writeln("This is the main thread, my value of x is ",x);
259 | 
260 | cobegin
261 | {
262 |   {
263 |     var x=5;
264 |     writeln("this is task 1, my value of x is ",x);
265 |   }
266 |   writeln("this is task 2, my value of x is ",x);
267 | }
268 | 
269 | writeln("this message won't appear until all tasks are done...");
270 | ```
271 | 
272 | ```bash
273 | chpl cobegin_example.chpl
274 | ./cobegin_example
275 | ```
276 | 
277 | ```output
278 | This is the main thread, my value of x is 0
279 | this is task 2, my value of x is 0
280 | this is task 1, my value of x is 5
281 | this message won't appear until all tasks are done...
282 | ```
283 | 
284 | As you may have conclude from the Discussion exercise above, the variables declared inside a task are
285 | accessible only by the task, while those variables declared in the main task are accessible to all tasks.
286 | 
287 | The last, and most useful way to start concurrent/parallel tasks in Chapel, is the `coforall` loop. This is a
288 | combination of the for-loop and the `cobegin`statements. The general syntax is:
289 | 
290 | ```chpl
291 | coforall index in iterand
292 | {instructions}
293 | ```
294 | 
295 | This will start a new task, for each iteration. Each tasks will then perform all the instructions inside the
296 | curly brackets. Each task will have a copy of the variable **_index_** with the corresponding value yielded by
297 | the iterand.  This index allows us to _customise_ the set of instructions for each particular task.
298 | 
299 | ```chpl
300 | var x=1;
301 | config var numoftasks=2;
302 | 
303 | writeln("This is the main task: x = ",x);
304 | 
305 | coforall taskid in 1..numoftasks
306 | {
307 |   var c=taskid+1;
308 |   writeln("this is task ",taskid,": x + ",taskid," = ",x+taskid,". My value of c is: ",c);
309 | }
310 | 
311 | writeln("this message won't appear until all tasks are done...");
312 | ```
313 | 
314 | ```bash
315 | chpl coforall_example.chpl
316 | ./coforall_example --numoftasks=5
317 | ```
318 | 
319 | ```output
320 | This is the main task: x = 1
321 | this is task 5: x + 5 = 6. My value of c is: 6
322 | this is task 2: x + 2 = 3. My value of c is: 3
323 | this is task 4: x + 4 = 5. My value of c is: 5
324 | this is task 3: x + 3 = 4. My value of c is: 4
325 | this is task 1: x + 1 = 2. My value of c is: 2
326 | this message won't appear until all tasks are done...
327 | ```
328 | 
329 | Notice how we are able to customise the instructions inside the coforall, to give different results depending
330 | on the task that is executing them. Also, notice how, once again, the variables declared outside the coforall
331 | can be read by all tasks, while the variables declared inside, are available only to the particular task.
332 | 
333 | ::::::::::::::::::::::::::::::::::::: challenge
334 | 
335 | ## Challenge 3: Can you do it?
336 | 
337 | Would it be possible to print all the messages in the right order? Modify the code in the last example as
338 | required.
339 | 
340 | Hint: you can use an array of strings declared in the main task, where all the concurrent tasks could write
341 | their messages in the corresponding position. Then, at the end, have the main task printing all elements of
342 | the array in order.
343 | 
344 | :::::::::::::::::::::::: solution
345 | 
346 | The following code is a possible solution:
347 | 
348 | ```chpl
349 | var x = 1;
350 | config var numoftasks = 2;
351 | var messages: [1..numoftasks] string;
352 | 
353 | writeln("This is the main task: x = ", x);
354 | 
355 | coforall taskid in 1..numoftasks {
356 |   var c = taskid + 1;
357 |   messages[taskid] = 'this is task ' + taskid:string +
358 |     ': my value of c is ' + c:string + ' and x is ' + x:string;
359 | }
360 | 
361 | for i in 1..numoftasks do writeln(messages[i]);
362 | writeln("this message won't appear until all tasks are done...");
363 | ```
364 | 
365 | ```bash
366 | chpl exercise_coforall.chpl
367 | ./exercise_coforall --numoftasks=5
368 | ```
369 | 
370 | ```output
371 | This is the main task: x = 1
372 | this is task 1: x + 1 = 2. My value of c is: 2
373 | this is task 2: x + 2 = 3. My value of c is: 3
374 | this is task 3: x + 3 = 4. My value of c is: 4
375 | this is task 4: x + 4 = 5. My value of c is: 5
376 | this is task 5: x + 5 = 6. My value of c is: 6
377 | this message won't appear until all tasks are done...
378 | ```
379 | 
380 | Note that we need to convert integers to strings first (`taskid:string` converts `taskid` integer variable to
381 | a string) before we can add them to other strings to form a message stored inside each `messages` element.
382 | 
383 | :::::::::::::::::::::::::::::::::
384 | ::::::::::::::::::::::::::::::::::::::::::::::::
385 | 
386 | ::::::::::::::::::::::::::::::::::::: challenge
387 | 
388 | ## Challenge 4: Can you do it?
389 | 
390 | Consider the following code:
391 | 
392 | ```chpl
393 | use Random;
394 | config const nelem = 100_000_000;
395 | var x: [1..nelem] int;
396 | fillRandom(x);	//fill array with random numbers
397 | var mymax = 0;
398 | 
399 | // here put your code to find mymax
400 | 
401 | writeln("the maximum value in x is: ", mymax);
402 | ```
403 | 
404 | Write a parallel code to find the maximum value in the array x.
405 | 
406 | :::::::::::::::::::::::: solution
407 | 
408 | ```chpl
409 | config const numtasks = 12;
410 | const n = nelem/numtasks;     // number of elements per thread
411 | const r = nelem - n*numtasks; // these elements did not fit into the last thread
412 | 
413 | var d: [1..numtasks] int;  // local maxima for each thread
414 | 
415 | coforall taskid in 1..numtasks {
416 |   var i, f: int;
417 |   i  = (taskid-1)*n + 1;
418 |   f = (taskid-1)*n + n;
419 |   if taskid == numtasks then f += r; // add r elements to the last thread
420 |   for j in i..f do
421 |     if x[j] > d[taskid] then d[taskid] = x[j];
422 | }
423 | for i in 1..numtasks do
424 |   if d[i] > mymax then mymax = d[i];
425 | ```
426 | 
427 | ```bash
428 | chpl --fast exercise_coforall_2.chpl
429 | ./exercise_coforall_2
430 | ```
431 | 
432 | ```output
433 | the maximum value in x is: 9223372034161572255   # large random integer
434 | ```
435 | 
436 | We use the `coforall` loop to spawn tasks that work concurrently in a fraction of the array. The trick here is to
437 | determine, based on the _taskid_, the initial and final indices that the task will use. Each task obtains the
438 | maximum in its fraction of the array, and finally, after the coforall is done, the main task obtains the
439 | maximum of the array from the maximums of all tasks.
440 | 
441 | :::::::::::::::::::::::::::::::::
442 | ::::::::::::::::::::::::::::::::::::::::::::::::
443 | 
444 | ::::::::::::::::::::::::::::::::::::::: discussion
445 | 
446 | ## Try this ...
447 | 
448 | Substitute the code to find _mymax_ in the last exercise with:
449 | 
450 | ```chpl
451 | mymax=max reduce x;
452 | ```
453 | 
454 | Time the execution of the original code and this new one. How do they compare?
455 | 
456 | :::::::::::::::::::::::::::::::::::::::::::::::::::
457 | 
458 | 
459 | ::::::::::::::::::::::::::::::::::::: callout
460 | 
461 | It is always a good idea to check whether there is _built-in_ functions or methods in the used language, that
462 | can do what we want to do as efficiently (or better) than our house-made code. In this case, the _reduce_
463 | statement reduces the given array to a single number using the given operation (in this case max), and it is
464 | parallelized and optimised to have a very good performance.
465 | 
466 | ::::::::::::::::::::::::::::::::::::::::::::::::
467 | 
468 | 
469 | The code in these last Exercises somehow _synchronise_ the tasks to obtain the desired result. In addition,
470 | Chapel has specific mechanisms task synchronisation, that could help us to achieve fine-grained
471 | parallelization.
472 | 
473 | ::::::::::::::::::::::::::::::::::::: keypoints
474 | - "Use `begin` or `cobegin` or `coforall` to spawn new tasks."
475 | - "You can run more than one task per core, as the number of cores on a node is limited."
476 | ::::::::::::::::::::::::::::::::::::::::::::::::
477 | 


--------------------------------------------------------------------------------
/episodes/13-synchronization.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Synchronising tasks"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How should I access my data in parallel?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "Learn how to synchronize multiple threads using one of three mechanisms: `sync` statements, sync variables,
 13 |   and atomic variables."
 14 | - "Learn that with shared memory access from multiple threads you can run into race conditions and deadlocks,
 15 |   and learn how to recognize and solve these problems."
 16 | ::::::::::::::::::::::::::::::::::::::::::::::::
 17 | 
 18 | In Chapel the keyword `sync` can be either a statement or a type qualifier, providing two different
 19 | synchronization mechanisms for threads. Let's start with using `sync` as a statement.
 20 | 
 21 | As we saw in the previous section, the `begin` statement will start a concurrent (or *child*) task that will
 22 | run in a different thread while the main (or *parent*) thread continues its normal execution. In this sense
 23 | the `begin` statement is non-blocking. If you want to pause the execution of the main thread and wait until
 24 | the child thread ends, you can prepend the `begin` statement with the `sync` statement. Consider the following
 25 | code; running this code, after the initial output line, you will first see all output from thread 1 and only
 26 | then the line "The first task is done..." and the rest of the output:
 27 | 
 28 | ```chpl
 29 | var x=0;
 30 | writeln("This is the main thread starting a synchronous task");
 31 | 
 32 | sync
 33 | {
 34 |   begin
 35 |   {
 36 |     var c=0;
 37 |     while c<10
 38 |     {
 39 |       c+=1;
 40 |       writeln('thread 1: ',x+c);
 41 |     }
 42 |   }
 43 | }
 44 | writeln("The first task is done...");
 45 | 
 46 | writeln("This is the main thread starting an asynchronous task");
 47 | begin
 48 | {
 49 |   var c=0;
 50 |   while c<10
 51 |   {
 52 |     c+=1;
 53 |     writeln('thread 2: ',x+c);
 54 |   }
 55 | }
 56 | 
 57 | writeln('this is main thread, I am done...');
 58 | ```
 59 | 
 60 | ```bash
 61 | chpl sync_example_1.chpl
 62 | ./sync_example_1 
 63 | ```
 64 | 
 65 | ```output
 66 | This is the main thread starting a synchronous task
 67 | thread 1: 1
 68 | thread 1: 2
 69 | thread 1: 3
 70 | thread 1: 4
 71 | thread 1: 5
 72 | thread 1: 6
 73 | thread 1: 7
 74 | thread 1: 8
 75 | thread 1: 9
 76 | thread 1: 10
 77 | The first task is done...
 78 | This is the main thread starting an asynchronous task
 79 | this is main thread, I am done...
 80 | thread 2: 1
 81 | thread 2: 2
 82 | thread 2: 3
 83 | thread 2: 4
 84 | thread 2: 5
 85 | thread 2: 6
 86 | thread 2: 7
 87 | thread 2: 8
 88 | thread 2: 9
 89 | thread 2: 10
 90 | ```
 91 | 
 92 | ::::::::::::::::::::::::::::::::::::::: discussion
 93 | 
 94 | ## Discussion
 95 | 
 96 | What would happen if we write instead
 97 | 
 98 | ```chpl
 99 | begin
100 | {
101 |   sync
102 |   {
103 |     var c=0;
104 |     while c<10
105 |     {
106 |       c+=1;
107 |       writeln('thread 1: ',x+c);
108 |     }
109 |   }
110 | }
111 | writeln("The first task is done...");
112 | ```
113 | 
114 | :::::::::::::::::::::::::::::::::::::::::::::::::::
115 | 
116 | ::::::::::::::::::::::::::::::::::::: challenge
117 | 
118 | ## Challenge 3: Can you do it?
119 | 
120 | Use `begin` and `sync` statements to reproduce the functionality of `cobegin` in `cobegin_example.chpl`.
121 | 
122 | :::::::::::::::::::::::: solution
123 | 
124 | ```chpl
125 | var x=0;
126 | writeln("This is the main thread, my value of x is ",x);
127 | 
128 | sync
129 | {
130 |     begin
131 |     {
132 |        var x=5;
133 |        writeln("this is task 1, my value of x is ",x);
134 |     }
135 |     begin writeln("this is task 2, my value of x is ",x);
136 |  }
137 | 
138 | writeln("this message won't appear until all tasks are done...");
139 | ```
140 | 
141 | :::::::::::::::::::::::::::::::::
142 | ::::::::::::::::::::::::::::::::::::::::::::::::
143 | 
144 | A more elaborated and powerful use of `sync` is as a type qualifier for variables. When a variable is declared
145 | as _sync_, a state that can be **_full_** or **_empty_** is associated to it.
146 | 
147 | To assign a new value to a _sync_ variable, its state must be _empty_ (after the assignment operation is
148 | completed, the state will be set as _full_). On the contrary, to read a value from a _sync_ variable, its
149 | state must be _full_ (after the read operation is completed, the state will be set as _empty_ again).
150 | 
151 | Starting from Chapel 2.x, you must use functions `writeEF` and `readFF` to perform blocking write and read
152 | with sync variables. Below is an example to demonstrate the use of sync variables. Here we launch a new task
153 | that is busy for a short time executing the loop. While this loop is running, the main task continues printing
154 | the message "this is main task after launching new task... I will wait until it is done". As it takes time to
155 | spawn a new thread, it is very likely that you will see this message before the output from the loop. Next,
156 | the main task will attempt to read `x` and assign it to `a` which it can only do when `x` is full. We write
157 | into `x` after the loop, so you will see the final message "and now it is done" only after the message "New
158 | task finished". In other words, reading `x`, we pause the execution of the main thread.
159 | 
160 | ```chpl
161 | var x: sync int, a: int;
162 | writeln("this is main task launching a new task");
163 | begin {
164 |   for i in 1..10 do writeln("this is new task working: ",i);
165 |   x.writeEF(2);   // assign 2 to x
166 |   writeln("New task finished");
167 | }
168 | 
169 | writeln("this is main task after launching new task... I will wait until it is done");
170 | a = x.readFE();   // don't run this line until the variable x is written in the other task
171 | writeln("and now it is done");
172 | ```
173 | 
174 | ```bash
175 | chpl sync_example_2.chpl
176 | ./sync_example_2
177 | ```
178 | 
179 | ```output
180 | this is main task launching a new task
181 | this is main task after launching new task... I will wait until it is done
182 | this is new task working: 1
183 | this is new task working: 2
184 | this is new task working: 3
185 | this is new task working: 4
186 | this is new task working: 5
187 | this is new task working: 6
188 | this is new task working: 7
189 | this is new task working: 8
190 | this is new task working: 9
191 | this is new task working: 10
192 | New task finished
193 | and now it is done
194 | ```
195 | 
196 | ::::::::::::::::::::::::::::::::::::::: discussion
197 | 
198 | ## Discussion
199 | 
200 | What would happen if we try to read `x` inside the new task as well, i.e. we have the following `begin`
201 | statement, without changing the rest of the code:
202 | 
203 | ```chpl
204 | begin {
205 |   for i in 1..10 do writeln("this is new task working: ",i);
206 |   x.writeEF(2);
207 |   writeln("New task finished");
208 |   x.readFE();
209 | }
210 | ```
211 | :::::::::::::::::::::::: solution
212 | 
213 | The code will block (run forever), and you would need to press *Ctrl-C* to halt its execution. In this example
214 | we try to read `x` in two places: the main task and the new task. When we read a sync variable with `readFE`,
215 | the state of the sync variable is set to empty when this method completes. In other words, one of the two
216 | `readFE` calls will succeed (which one -- depends on the runtime) and will mark the variable as empty. The
217 | other `readFE` will then attempt to read it but it will block waiting for `x` to become full again (which will
218 | never happen). In the end, the execution of either the main thread or the child thread will block, hanging the
219 | entire code.
220 | 
221 | :::::::::::::::::::::::::::::::::
222 | :::::::::::::::::::::::::::::::::::::::::::::::::::
223 | 
224 | There are a number of methods defined for _sync_ variables. If `x` is a sync variable of a given type, you can
225 | use the following functions:
226 | 
227 | ```chpl
228 | // non-blocking methods
229 | x.reset()	//will set the state as empty and the value as the default of x's type
230 | x.isFull  	//will return true is the state of x is full, false if it is empty
231 | 
232 | //blocking read and write methods
233 | x.writeEF(value)	//will block until the state of x is empty, 
234 | 			//then will assign the value,  and set the state to full 
235 | x.writeFF(value)	//will block until the state of x is full, 
236 | 			//then will assign the value, and leave the state as full
237 | x.readFE()		//will block until the state of x is full, 
238 | 			//then will return x's value, and set the state to empty
239 | x.readFF()		//will block until the state of x is full, 
240 | 			//then will return x's value, and leave the state as full
241 | 
242 | //non-blocking read and write methods
243 | x.writeXF(value)	//will assign the value no matter the state of x, and then set the state as full
244 | x.readXX()		//will return the value of x regardless its state. The state will remain unchanged
245 | ```
246 | 
247 | Chapel also implements **_atomic_** operations with variables declared as `atomic`, and this provides another
248 | option to synchronise tasks. Atomic operations run completely independently of any other thread or
249 | process. This means that when several tasks try to write an atomic variable, only one will succeed at a given
250 | moment, providing implicit synchronisation between them.  There is a number of methods defined for atomic
251 | variables, among them `sub()`, `add()`, `write()`, `read()`, and `waitfor()` are very useful to establish
252 | explicit synchronisation between tasks, as showed in the next code:
253 | 
254 | ```chpl
255 | var lock: atomic int;
256 | const numtasks=5;
257 | 
258 | lock.write(0);  //the main task set lock to zero
259 | 
260 | coforall id in 1..numtasks
261 | {
262 |     writeln("greetings from task ",id,"... I am waiting for all tasks to say hello");
263 |     lock.add(1);                //task id says hello and atomically adds 1 to lock
264 |     lock.waitFor(numtasks);     //then it waits for lock to be equal numtasks (which will happen when all tasks say hello)
265 |     writeln("task ",id," is done...");
266 | }
267 | ```
268 | 
269 | ```bash
270 | chpl atomic_example.chpl
271 | ./atomic_example
272 | ```
273 | 
274 | ```output
275 | greetings from task 4... I am waiting for all tasks to say hello
276 | greetings from task 5... I am waiting for all tasks to say hello
277 | greetings from task 2... I am waiting for all tasks to say hello
278 | greetings from task 3... I am waiting for all tasks to say hello
279 | greetings from task 1... I am waiting for all tasks to say hello
280 | task 1 is done...
281 | task 5 is done...
282 | task 2 is done...
283 | task 3 is done...
284 | task 4 is done...
285 | ```
286 | 
287 | > ## Try this...
288 | >
289 | > Comment out the line `lock.waitfor(numtasks)` in the code above to clearly observe the effect of the task
290 | > synchronisation.
291 | 
292 | Finally, with all the material studied so far, we should be ready to parallelize our code for the simulation
293 | of the heat transfer equation.
294 | 
295 | ::::::::::::::::::::::::::::::::::::: keypoints
296 | - "You can explicitly synchronise tasks with `sync` statement."
297 | - "You can also use sync and atomic variables to synchronise tasks."
298 | ::::::::::::::::::::::::::::::::::::::::::::::::
299 | 


--------------------------------------------------------------------------------
/episodes/14-parallel-case-study.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Task parallelism with Chapel"
  3 | teaching: 60
  4 | exercises: 30
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do I write parallel code for a real use case?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "First objective."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | Here is our plan to task-parallelize the heat transfer equation:
 16 | 
 17 | 1. divide the entire grid of points into blocks and assign blocks to individual tasks,
 18 | 1. each task should compute the new temperature of its assigned points,
 19 | 1. perform a **_reduction_** over the whole grid, to update the greatest temperature difference between
 20 |    `temp_new` and `temp`.
 21 | 
 22 | For the reduction of the grid we can simply use the `max reduce` statement, which is already
 23 | parallelized. Now, let's divide the grid into `rowtasks` x `coltasks` sub-grids, and assign each sub-grid to a
 24 | task using the `coforall` loop (we will have `rowtasks*coltasks` tasks in total).
 25 | 
 26 | ```chpl
 27 | config const rowtasks = 2;
 28 | config const coltasks = 2;
 29 | 
 30 | // this is the main loop of the simulation
 31 | delta = tolerance;
 32 | while (c<niter && delta>=tolerance) {
 33 |   c += 1;
 34 | 
 35 |   coforall taskid in 0..coltasks*rowtasks-1 {
 36 |     for i in rowi..rowf {
 37 |       for j in coli..colf {
 38 |         temp_new[i,j] = (temp[i-1,j] + temp[i+1,j] + temp[i,j-1] + temp[i,j+1]) / 4;
 39 |       }
 40 |     }
 41 |   }
 42 | 
 43 |   delta = max reduce (temp_new-temp);
 44 |   temp = temp_new;
 45 | 
 46 |   if c%outputFrequency == 0 then writeln('Temperature at iteration ',c,': ',temp[x,y]);
 47 | }
 48 | ```
 49 | 
 50 | Note that now the nested `for` loops run from `rowi` to `rowf` and from `coli` to `colf` which are,
 51 | respectively, the initial and final row and column of the sub-grid associated to the task `taskid`. To compute
 52 | these limits, based on `taskid`, we need to compute the number of rows and columns per task (`nr` and `nc`,
 53 | respectively) and account for possible non-zero remainders (`rr` and `rc`) that we should add to the last row
 54 | and column:
 55 | 
 56 | ```chpl
 57 | config const rowtasks = 2;
 58 | config const coltasks = 2;
 59 | 
 60 | const nr = rows/rowtasks;
 61 | const rr = rows-nr*rowtasks;
 62 | const nc = cols/coltasks;
 63 | const rc = cols-nc*coltasks;
 64 | 
 65 | // this is the main loop of the simulation
 66 | delta = tolerance;
 67 | while (c<niter && delta>=tolerance) {
 68 |   c+=1;
 69 | 
 70 |   coforall taskid in 0..coltasks*rowtasks-1 {
 71 |     var rowi, coli, rowf, colf: int;
 72 |     var taskr, taskc: int;
 73 | 
 74 |     taskr = taskid/coltasks;
 75 |     taskc = taskid%coltasks;
 76 | 
 77 |     if taskr<rr {
 78 |       rowi=(taskr*nr)+1+taskr;
 79 |       rowf=(taskr*nr)+nr+taskr+1;
 80 |     }
 81 |     else {
 82 |       rowi = (taskr*nr)+1+rr;
 83 |       rowf = (taskr*nr)+nr+rr;
 84 |     }
 85 | 
 86 |     if taskc<rc {
 87 |       coli = (taskc*nc)+1+taskc;
 88 |       colf = (taskc*nc)+nc+taskc+1;
 89 |     }
 90 |     else {
 91 |       coli = (taskc*nc)+1+rc;
 92 |       colf = (taskc*nc)+nc+rc;
 93 |     }
 94 | 
 95 |     for i in rowi..rowf {
 96 |       for j in coli..colf {
 97 |       ...
 98 | }
 99 | ```
100 | 
101 | As you can see, to divide a data set (the array `temp` in this case) between concurrent tasks, could be
102 | cumbersome. Chapel provides high-level abstractions for data parallelism that take care of all the data
103 | distribution for us. We will study data parallelism in the following lessons, but for now, let's compare the
104 | benchmark solution with our `coforall` parallelization to see how the performance improved.
105 | 
106 | ```bash
107 | chpl --fast parallel1.chpl
108 | ./parallel1 --rows=650 --cols=650 --x=200 --y=300 --niter=10000 --tolerance=0.002 --outputFrequency=1000
109 | ```
110 | 
111 | ```output
112 | The simulation will consider a matrix of 650 by 650 elements,
113 | it will run up to 10000 iterations, or until the largest difference
114 | in temperature between iterations is less than 0.002.
115 | You are interested in the evolution of the temperature at the position (200,300) of the matrix...
116 | 
117 | and here we go...
118 | Temperature at iteration 0: 25.0
119 | Temperature at iteration 1000: 25.0
120 | Temperature at iteration 2000: 25.0
121 | Temperature at iteration 3000: 25.0
122 | Temperature at iteration 4000: 24.9998
123 | Temperature at iteration 5000: 24.9984
124 | Temperature at iteration 6000: 24.9935
125 | Temperature at iteration 7000: 24.9819
126 | 
127 | The simulation took 17.0193 seconds
128 | Final temperature at the desired position after 7750 iterations is: 24.9671
129 | The greatest difference in temperatures between the last two iterations was: 0.00199985
130 | ```
131 | 
132 | This parallel solution, using 4 parallel tasks, took around 17 seconds to finish. Compared with the ~20
133 | seconds needed by the benchmark solution, seems not very impressive. To understand the reason, let's analyse
134 | the code's flow.  When the program starts, the main thread does all the declarations and initialisations, and
135 | then, it enters the main loop of the simulation (the **_while loop_**). Inside this loop, the parallel tasks
136 | are launched for the first time. When these tasks finish their computations, the main task resumes its
137 | execution, it updates `delta`, and everything is repeated again. So, in essence, parallel tasks are launched
138 | and resumed 7750 times, which introduces a significant amount of overhead (the time the system needs to
139 | effectively start and destroy threads in the specific hardware, at each iteration of the while loop).
140 | 
141 | Clearly, a better approach would be to launch the parallel tasks just once, and have them executing all the
142 | simulations, before resuming the main task to print the final results.
143 | 
144 | ```chpl
145 | config const rowtasks = 2;
146 | config const coltasks = 2;
147 | 
148 | const nr = rows/rowtasks;
149 | const rr = rows-nr*rowtasks;
150 | const nc = cols/coltasks;
151 | const rc = cols-nc*coltasks;
152 | 
153 | // this is the main loop of the simulation
154 | delta = tolerance;
155 | coforall taskid in 0..coltasks*rowtasks-1 {
156 |   var rowi, coli, rowf, colf: int;
157 |   var taskr, taskc: int;
158 |   var c = 0;
159 | 
160 |   taskr = taskid/coltasks;
161 |   taskc = taskid%coltasks;
162 | 
163 |   if taskr<rr {
164 |     rowi = (taskr*nr)+1+taskr;
165 |     rowf = (taskr*nr)+nr+taskr+1;
166 |   }
167 |   else {
168 |     rowi = (taskr*nr)+1+rr;
169 |     rowf = (taskr*nr)+nr+rr;
170 |   }
171 | 
172 |   if taskc<rc {
173 |     coli = (taskc*nc)+1+taskc;
174 |     colf = (taskc*nc)+nc+taskc+1;
175 |   }
176 |   else {
177 |     coli = (taskc*nc)+1+rc;
178 |     colf = (taskc*nc)+nc+rc;
179 |   }
180 | 
181 |   while (c<niter && delta>=tolerance) {
182 |     c = c+1;
183 | 
184 |     for i in rowi..rowf {
185 |       for j in coli..colf {
186 |         temp_new[i,j] = (temp[i-1,j] + temp[i+1,j] + temp[i,j-1] + temp[i,j+1]) / 4;
187 |       }
188 |     }
189 | 
190 |     //update delta
191 |     //update temp
192 |     //print temperature in desired position
193 |   }
194 | }
195 | ```
196 | 
197 | The problem with this approach is that now we have to explicitly synchronise the tasks. Before, `delta` and
198 | `temp` were updated only by the main task at each iteration; similarly, only the main task was printing
199 | results. Now, all these operations must be carried inside the coforall loop, which imposes the need of
200 | synchronisation between tasks.
201 | 
202 | The synchronisation must happen at two points:
203 | 
204 | 1. We need to be sure that all tasks have finished with the computations of their part of the grid `temp`,
205 |    before updating `delta` and `temp` safely.
206 | 2. We need to be sure that all tasks use the updated value of `delta` to evaluate the condition of the while
207 |    loop for the next iteration.
208 | 
209 | To update `delta` we could have each task computing the greatest difference in temperature in its associated
210 | sub-grid, and then, after the synchronisation, have only one task reducing all the sub-grids' maximums.
211 | 
212 | ```chpl
213 | var delta: atomic real;
214 | var myd: [0..coltasks*rowtasks-1] real;
215 | ...
216 | //this is the main loop of the simulation
217 | delta.write(tolerance);
218 | coforall taskid in 0..coltasks*rowtasks-1
219 | {
220 |   var myd2: real;
221 |   ...
222 | 
223 |   while (c<niter && delta.read() >= tolerance) {
224 |     c = c+1;
225 |     ...
226 | 
227 |     for i in rowi..rowf {
228 |       for j in coli..colf {
229 |         temp_new[i,j] = (temp[i-1,j] + temp[i+1,j] + temp[i,j-1] + temp[i,j+1]) / 4;
230 |         myd2 = max(abs(temp_new[i,j]-temp[i,j]),myd2);
231 |       }
232 |     }
233 |     myd[taskid] = myd2
234 | 
235 |     // here comes the synchronisation of tasks
236 | 
237 |     temp[rowi..rowf,coli..colf] = temp_new[rowi..rowf,coli..colf];
238 |     if taskid==0 {
239 |       delta.write(max reduce myd);
240 |       if c%outputFrequency==0 then writeln('Temperature at iteration ',c,': ',temp[x,y]);
241 |     }
242 | 
243 |     // here comes the synchronisation of tasks again
244 |   }
245 | }
246 | ```
247 | 
248 | ::::::::::::::::::::::::::::::::::::: challenge
249 | 
250 | ## Challenge 4: Can you do it?
251 | 
252 | Use `sync` or `atomic` variables to implement the synchronisation required in the code above.
253 | 
254 | :::::::::::::::::::::::: solution
255 | 
256 | One possible solution is to use an atomic variable as a _lock_ that opens (using the `waitFor` method) when
257 | all the tasks complete the required instructions
258 | 
259 | ```chpl
260 | var lock: atomic int;
261 | lock.write(0);
262 | ...
263 | //this is the main loop of the simulation
264 | delta.write(tolerance);
265 | coforall taskid in 0..coltasks*rowtasks-1
266 | {
267 |    ...
268 |    while (c<niter && delta>=tolerance)
269 |    {
270 |       ...
271 |       myd[taskid]=myd2
272 | 
273 |       //here comes the synchronisation of tasks
274 |       lock.add(1);
275 |       lock.waitFor(coltasks*rowtasks);
276 | 
277 |       temp[rowi..rowf,coli..colf] = temp_new[rowi..rowf,coli..colf];
278 |       ...
279 | 
280 |       //here comes the synchronisation of tasks again
281 |       lock.sub(1);
282 |       lock.waitFor(0);
283 |    }
284 | }
285 | ```
286 | 
287 | :::::::::::::::::::::::::::::::::
288 | ::::::::::::::::::::::::::::::::::::::::::::::::
289 | 
290 | Using the solution in the Exercise 4, we can now compare the performance with the benchmark solution
291 | 
292 | ```bash
293 | chpl --fast parallel2.chpl
294 | ./parallel2 --rows=650 --cols=650 --x=200 --y=300 --niter=10000 --tolerance=0.002 --outputFrequency=1000
295 | ```
296 | 
297 | ```output
298 | The simulation will consider a matrix of 650 by 650 elements,
299 | it will run up to 10000 iterations, or until the largest difference
300 | in temperature between iterations is less than 0.002.
301 | You are interested in the evolution of the temperature at the position (200,300) of the matrix...
302 | 
303 | and here we go...
304 | Temperature at iteration 0: 25.0
305 | Temperature at iteration 1000: 25.0
306 | Temperature at iteration 2000: 25.0
307 | Temperature at iteration 3000: 25.0
308 | Temperature at iteration 4000: 24.9998
309 | Temperature at iteration 5000: 24.9984
310 | Temperature at iteration 6000: 24.9935
311 | Temperature at iteration 7000: 24.9819
312 | 
313 | The simulation took 4.2733 seconds
314 | Final temperature at the desired position after 7750 iterations is: 24.9671
315 | The greatest difference in temperatures between the last two iterations was: 0.00199985
316 | ```
317 | 
318 | to see that we now have a code that performs 5x faster.
319 | 
320 | We finish this section by providing another, elegant version of the 2D heat transfer solver (without time
321 | stepping) using data parallelism on a single locale:
322 | 
323 | ```chpl
324 | use Math; /* for exp() */
325 | 
326 | const n = 100, stride = 20;
327 | var temp: [0..n+1, 0..n+1] real;
328 | var temp_new: [1..n,1..n] real;
329 | var x, y: real;
330 | for (i,j) in {1..n,1..n} { // serial iteration
331 |   x = ((i:real)-0.5)/n;
332 |   y = ((j:real)-0.5)/n;
333 |   temp[i,j] = exp(-((x-0.5)**2 + (y-0.5)**2)/0.01); // narrow Gaussian peak
334 | }
335 | coforall (i,j) in {1..n,1..n} by (stride,stride) { // 5x5 decomposition into 20x20 blocks => 25 tasks
336 |   for k in i..i+stride-1 { // serial loop inside each block
337 |     for l in j..j+stride-1 {
338 |       temp_new[k,l] = (temp[k-1,l] + temp[k+1,l] + temp[k,l-1] + temp[k,l+1]) / 4;
339 |     }
340 |   }
341 | }
342 | ```
343 | 
344 | We will study data parallelism in more detail in the next section.
345 | 
346 | ::::::::::::::::::::::::::::::::::::: keypoints
347 | - "To parallelize the diffusion solver with tasks, you divide the 2D domain into blocks and assign each block
348 |   to a task."
349 | - "To get the maximum performance, you need to launch the parallel tasks only once, and run the temporal loop
350 |   of the simulation with the same set of tasks, resuming the main task only to print the final results."
351 | - "Parallelizing with tasks is more laborious than parallelizing with data (covered in the next section)."
352 | ::::::::::::::::::::::::::::::::::::::::::::::::
353 | 


--------------------------------------------------------------------------------
/episodes/21-locales.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Running code on multiple machines"
  3 | teaching: 120
  4 | exercises: 60
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "What is a locale?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "First objective."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | So far we have been working with single-locale Chapel codes that may run on one or many cores on a single
 16 | compute node, making use of the shared memory space and accelerating computations by launching concurrent
 17 | tasks on individual cores in parallel. Chapel codes can also run on multiple nodes on a compute cluster.  In
 18 | Chapel this is referred to as *multi-locale* execution.
 19 | 
 20 | If you work inside a Chapel Docker container, e.g., chapel/chapel-gasnet, the container environment simulates
 21 | a multi-locale cluster, so you would compile and launch multi-locale Chapel codes directly by specifying the
 22 | number of locales with `-nl` flag:
 23 | 
 24 | ```bash
 25 | chpl --fast mycode.chpl -o mybinary
 26 | ./mybinary -nl 4
 27 | ```
 28 | 
 29 | Inside the Docker container on multiple locales your code will not run any faster than on a single locale,
 30 | since you are emulating a virtual cluster, and all tasks run on the same physical node. To achieve actual
 31 | speedup, you need to run your parallel multi-locale Chapel code on a real physical cluster which we hope you
 32 | have access to for this session.
 33 | 
 34 | On a real HPC cluster you would need to submit either an interactive or a batch job asking for several nodes
 35 | and then run a multi-locale Chapel code inside that job. In practice, the exact commands depend on how the
 36 | multi-locale Chapel was built on the cluster.
 37 | 
 38 | When you compile a Chapel code with the multi-locale Chapel compiler, two binaries will be produced. One is
 39 | called `mybinary` and is a launcher binary used to submit the real executable `mybinary_real`. If the Chapel
 40 | environment is configured properly with the launcher for the cluster's physical interconnect (which might not
 41 | be always possible due to a number of factors), then you would simply compile the code and use the launcher
 42 | binary `mybinary` to submit the job to the queue:
 43 | 
 44 | ```bash
 45 | chpl --fast mycode.chpl -o mybinary
 46 | ./mybinary -nl 2
 47 | ```
 48 | 
 49 | The exact parameters of the job such as the maximum runtime and the requested memory can be specified with
 50 | Chapel environment variables. One possible drawback of this launching method is that, depending on your
 51 | cluster setup, Chapel might have access to all physical cores on each node participating in the run -- this
 52 | will present problems if you are scheduling jobs by-core and not by-node, since part of a node should be
 53 | allocated to someone else's job.
 54 | 
 55 | Note that on Compute Canada clusters this launching method works without problem. On these clusters
 56 | multi-locale Chapel is provided by `chapel-ofi` (for the OmniPath interconnect on Cedar) and `chapel-ucx` (for
 57 | the InfiniBand interconnect on Graham, Béluga, Narval) modules, so -- depending on the cluster -- you will
 58 | load Chapel using one of the two lines below:
 59 | 
 60 | ```bash
 61 | module load gcc chapel-ofi   # for the OmniPath interconnect on Cedar cluster
 62 | module load gcc chapel-ucx   # for the InfiniBand interconnect on Graham, Béluga, Narval clusters
 63 | ```
 64 | 
 65 | <!-- We cannot configure the same single launcher for both. Therefore, we launch -->
 66 | 
 67 | We can also launch multi-locale Chapel codes using the real executable `mybinary_real`. For example, for an
 68 | interactive job you would type:
 69 | 
 70 | ```bash
 71 | salloc --time=0:30:0 --nodes=4 --cpus-per-task=3 --mem-per-cpu=1000 --account=def-guest
 72 | chpl --fast mycode.chpl -o mybinary
 73 | srun ./mybinary_real -nl 4   # will run on four locales with max 3 cores per locale
 74 | ```
 75 | 
 76 | Production jobs would be launched with `sbatch` command and a Slurm launch script as usual.
 77 | 
 78 | For the rest of this class we assume that you have a working multi-locale Chapel environment, whether provided
 79 | by a Docker container or by multi-locale Chapel on a physical HPC cluster. We will run all examples on four
 80 | nodes with three cores per node.
 81 | 
 82 | # Intro to multi-locale code
 83 | 
 84 | Let us test our multi-locale Chapel environment by launching the following code:
 85 | 
 86 | ```chpl
 87 | writeln(Locales);
 88 | ```
 89 | 
 90 | This code will print the built-in global array `Locales`. Running it on four
 91 | locales will produce
 92 | 
 93 | ```output
 94 | LOCALE0 LOCALE1 LOCALE2 LOCALE3
 95 | ```
 96 | 
 97 | We want to run some code on each locale (node). For that, we can cycle through locales:
 98 | 
 99 | ```chpl
100 | for loc in Locales do   // this is still a serial program
101 |   on loc do             // run the next line on locale `loc`
102 |     writeln("this locale is named ", here.name);
103 | ```
104 | 
105 | This will produce
106 | 
107 | ```output
108 | this locale is named cdr544
109 | this locale is named cdr552
110 | this locale is named cdr556
111 | this locale is named cdr692
112 | ```
113 | 
114 | Here the built-in variable class `here` refers to the locale on which the code is running, and `here.name` is
115 | its hostname. We started a serial `for` loop cycling through all locales, and on each locale we printed its
116 | name, i.e., the hostname of each node. This program ran in serial starting a task on each locale only after
117 | completing the same task on the previous locale. Note the order in which locales were listed.
118 | 
119 | To run this code in parallel, starting four simultaneous tasks, one per locale, we simply need to replace
120 | `for` with `forall`:
121 | 
122 | ```chpl
123 | forall loc in Locales do   // now this is a parallel loop
124 |   on loc do
125 |     writeln("this locale is named ", here.name);
126 | ```
127 | 
128 | This starts four tasks in parallel, and the order in which the print statement is executed depends on the
129 | runtime conditions and can change from run to run:
130 | 
131 | ```output
132 | this locale is named cdr544
133 | this locale is named cdr692
134 | this locale is named cdr556
135 | this locale is named cdr552
136 | ```
137 | 
138 | We can print few other attributes of each locale. Here it is actually useful to revert to the serial loop
139 | `for` so that the print statements appear in order:
140 | 
141 | ```chpl
142 | use MemDiagnostics;
143 | for loc in Locales do
144 |   on loc {
145 |     writeln("locale #", here.id, "...");
146 |     writeln("  ...is named: ", here.name);
147 |     writeln("  ...has ", here.numPUs(), " processor cores");
148 |     writeln("  ...has ", here.physicalMemory(unit=MemUnits.GB, retType=real), " GB of memory");
149 |     writeln("  ...has ", here.maxTaskPar, " maximum parallelism");
150 |   }
151 | ```
152 | 
153 | ```output
154 | locale #0...
155 |   ...is named: cdr544
156 |   ...has 3 processor cores
157 |   ...has 125.804 GB of memory
158 |   ...has 3 maximum parallelism
159 | locale #1...
160 |   ...is named: cdr552
161 |   ...has 3 processor cores
162 |   ...has 125.804 GB of memory
163 |   ...has 3 maximum parallelism
164 | locale #2...
165 |   ...is named: cdr556
166 |   ...has 3 processor cores
167 |   ...has 125.804 GB of memory
168 |   ...has 3 maximum parallelism
169 | locale #3...
170 |   ...is named: cdr692
171 |   ...has 3 processor cores
172 |   ...has 125.804 GB of memory
173 |   ...has 3 maximum parallelism
174 | ```
175 | 
176 | Note that while Chapel correctly determines the number of cores available inside our job on each node, and the
177 | maximum parallelism (which is the same as the number of cores available!), it lists the total physical memory
178 | on each node available to all running jobs which is not the same as the total memory per node allocated to our
179 | job.
180 | 
181 | ::::::::::::::::::::::::::::::::::::: keypoints
182 | - "Locale in Chapel is a shared-memory node on a cluster."
183 | - "We can cycle in serial or parallel through all locales."
184 | ::::::::::::::::::::::::::::::::::::::::::::::::
185 | 


--------------------------------------------------------------------------------
/episodes/22-domains.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: "Domains and data parallelism"
  3 | teaching: 120
  4 | exercises: 60
  5 | ---
  6 | 
  7 | :::::::::::::::::::::::::::::::::::::: questions
  8 | - "How do I store and manipulate data across multiple locales?"
  9 | ::::::::::::::::::::::::::::::::::::::::::::::::
 10 | 
 11 | ::::::::::::::::::::::::::::::::::::: objectives
 12 | - "First objective."
 13 | ::::::::::::::::::::::::::::::::::::::::::::::::
 14 | 
 15 | # Domains and single-locale data parallelism
 16 | 
 17 | We start this section by recalling the definition of a range in Chapel. A range is a 1D set of integer indices
 18 | that can be bounded or infinite:
 19 | 
 20 | ```chpl
 21 | var oneToTen: range = 1..10; // 1, 2, 3, ..., 10
 22 | var a = 1234, b = 5678;
 23 | var aToB: range = a..b; // using variables
 24 | var twoToTenByTwo: range(strides=strideKind.positive) = 2..10 by 2; // 2, 4, 6, 8, 10
 25 | var oneToInf = 1.. ; // unbounded range
 26 | ```
 27 | 
 28 | On the other hand, domains are multi-dimensional (including 1D) sets of integer indices that are always
 29 | bounded. To stress the difference between domain ranges and domains, domain definitions always enclose their
 30 | indices in curly brackets.  Ranges can be used to define a specific dimension of a domain:
 31 | 
 32 | ```chpl
 33 | var domain1to10: domain(1) = {1..10};        // 1D domain from 1 to 10 defined using the range 1..10
 34 | var twoDimensions: domain(2) = {-2..2,0..2}; // 2D domain over a product of two ranges
 35 | var thirdDim: range = 1..16; // a range
 36 | var threeDims: domain(3) = {thirdDim, 1..10, 5..10}; // 3D domain over a product of three ranges
 37 | for idx in twoDimensions do // cycle through all points in a 2D domain
 38 |   write(idx, ", ");
 39 | writeln();
 40 | for (x,y) in twoDimensions { // can also cycle using explicit tuples (x,y)
 41 |   write("(", x, ", ", y, ")", ", ");
 42 | }
 43 | ```
 44 | 
 45 | Let us define an n^2 domain called `mesh`. It is defined by the single task in our code and is therefore
 46 | defined in memory on the same node (locale 0) where this task is running. For each of n^2 mesh points, let us
 47 | print out
 48 | 
 49 | 1. `m.locale.id`, the ID of the locale holding that mesh point (should be 0)
 50 | 2. `here.id`, the ID of the locale on which the code is running (should be 0)
 51 | 3. `here.maxTaskPar`, the number of cores (max parallelism with 1 task/core) (should be 3)
 52 | 
 53 | **Note**: We already saw some of these variables/functions: numLocales, Locales, here.id, here.name,
 54 | here.numPUs(), here.physicalMemory(), here.maxTaskPar.
 55 | 
 56 | ```chpl
 57 | config const n = 8;
 58 | const mesh: domain(2) = {1..n, 1..n};  // a 2D domain defined in shared memory on a single locale
 59 | forall m in mesh { // go in parallel through all n^2 mesh points
 60 |   writeln((m, m.locale.id, here.id, here.maxTaskPar));
 61 | }
 62 | ```
 63 | 
 64 | ```output
 65 | ((7, 1), 0, 0, 3)
 66 | ((1, 1), 0, 0, 3)
 67 | ((7, 2), 0, 0, 3)
 68 | ((1, 2), 0, 0, 3)
 69 | ...
 70 | ((6, 6), 0, 0, 3)
 71 | ((6, 7), 0, 0, 3)
 72 | ((6, 8), 0, 0, 3)
 73 | ```
 74 | 
 75 | Now we are going to learn two very important properties of Chapel domains.  First, domains can be used to
 76 | define arrays of variables of any type on top of them. For example, let us define an n^2 array of real numbers
 77 | on top of `mesh`:
 78 | 
 79 | ```chpl
 80 | config const n = 8;
 81 | const mesh: domain(2) = {1..n, 1..n};  // a 2D domain defined in shared memory on a single locale
 82 | var T: [mesh] real; // a 2D array of reals defined in shared memory on a single locale (mapped onto this domain)
 83 | forall t in T { // go in parallel through all n^2 elements of T
 84 |   writeln((t, t.locale.id));
 85 | }
 86 | ```
 87 | 
 88 | ```output
 89 | (0.0, 0)
 90 | (0.0, 0)
 91 | (0.0, 0)
 92 | (0.0, 0)
 93 | ...
 94 | (0.0, 0)
 95 | (0.0, 0)
 96 | (0.0, 0)
 97 | ```
 98 | 
 99 | By default, all n^2 array elements are set to zero, and all of them are defined on the same locale as the
100 | underlying mesh. We can also cycle through all indices of T by accessing its domain:
101 | 
102 | ```chpl
103 | forall idx in T.domain {
104 |   writeln(idx, ' ', T(idx));   // idx is a tuple (i,j); also print the corresponding array element
105 | }
106 | ```
107 | 
108 | ```output
109 | (7, 1) 0.0
110 | (1, 1) 0.0
111 | (7, 2) 0.0
112 | (1, 2) 0.0
113 | ...
114 | (6, 6) 0.0
115 | (6, 7) 0.0
116 | (6, 8) 0.0
117 | ```
118 | 
119 | Since we use a parallel `forall` loop, the print statements appear in a random runtime order.
120 | 
121 | We can also define multiple arrays on the same domain:
122 | 
123 | ```chpl
124 | const grid = {1..100}; // 1D domain
125 | const alpha = 5; // some number
126 | var A, B, C: [grid] real; // local real-type arrays on this 1D domain
127 | B = 2; C = 3;
128 | forall (a,b,c) in zip(A,B,C) do // parallel loop
129 |   a = b + alpha*c;   // simple example of data parallelism on a single locale
130 | writeln(A);
131 | ```
132 | 
133 | The second important property of Chapel domains is that they can span multiple locales (nodes).
134 | 
135 | ## Distributed domains
136 | 
137 | Domains are fundamental Chapel concept for distributed-memory data parallelism.
138 | 
139 | Let us now define an n^2 distributed (over several locales) domain `distributedMesh` mapped to locales in
140 | blocks. On top of this domain we define a 2D block-distributed array A of strings mapped to locales in exactly
141 | the same pattern as the underlying domain. Let us print out
142 | 
143 | 1. `a.locale.id`, the ID of the locale holding the element a of A
144 | 2. `here.name`, the name of the locale on which the code is running
145 | 3. `here.maxTaskPar`, the number of cores on the locale on which the code is
146 |    running
147 | 
148 | Instead of printing these values to the screen, we will store this output inside each element of A as a string
149 | `a.locale.id:string + '-' + here.name + '-' + here.maxTaskPar:string`, adding a separator `' '` at the end of
150 | each element.
151 | 
152 | ```chpl
153 | use BlockDist; // use standard block distribution module to partition the domain into blocks
154 | config const n = 8;
155 | const mesh: domain(2) = {1..n, 1..n};
156 | const distributedMesh: domain(2) dmapped new blockDist(boundingBox=mesh) = mesh;
157 | var A: [distributedMesh] string; // block-distributed array mapped to locales
158 | forall a in A { // go in parallel through all n^2 elements in A
159 |   // assign each array element on the locale that stores that index/element
160 |   a = a.locale.id:string + '-' + here.name + '-' + here.maxTaskPar:string + '  ';
161 | }
162 | writeln(A);
163 | ```
164 | 
165 | The syntax `boundingBox=mesh` tells the compiler that the outer edge of our decomposition coincides exactly
166 | with the outer edge of our domain.  Alternatively, the outer decomposition layer could include an additional
167 | perimeter of *ghost points* if we specify
168 | 
169 | ```chpl
170 | const mesh: domain(2) = {1..n, 1..n};
171 | const largerMesh: domain(2) dmapped new blockDist(boundingBox=mesh) = {0..n+1,0..n+1};
172 | ```
173 | 
174 | but let us not worry about this for now.
175 | 
176 | Running our code on four locales with three cores per locale produces the following output:
177 | 
178 | ```output
179 | 0-cdr544-3   0-cdr544-3   0-cdr544-3   0-cdr544-3   1-cdr552-3   1-cdr552-3   1-cdr552-3   1-cdr552-3  
180 | 0-cdr544-3   0-cdr544-3   0-cdr544-3   0-cdr544-3   1-cdr552-3   1-cdr552-3   1-cdr552-3   1-cdr552-3  
181 | 0-cdr544-3   0-cdr544-3   0-cdr544-3   0-cdr544-3   1-cdr552-3   1-cdr552-3   1-cdr552-3   1-cdr552-3  
182 | 0-cdr544-3   0-cdr544-3   0-cdr544-3   0-cdr544-3   1-cdr552-3   1-cdr552-3   1-cdr552-3   1-cdr552-3  
183 | 2-cdr556-3   2-cdr556-3   2-cdr556-3   2-cdr556-3   3-cdr692-3   3-cdr692-3   3-cdr692-3   3-cdr692-3  
184 | 2-cdr556-3   2-cdr556-3   2-cdr556-3   2-cdr556-3   3-cdr692-3   3-cdr692-3   3-cdr692-3   3-cdr692-3  
185 | 2-cdr556-3   2-cdr556-3   2-cdr556-3   2-cdr556-3   3-cdr692-3   3-cdr692-3   3-cdr692-3   3-cdr692-3  
186 | 2-cdr556-3   2-cdr556-3   2-cdr556-3   2-cdr556-3   3-cdr692-3   3-cdr692-3   3-cdr692-3   3-cdr692-3  
187 | ```
188 | 
189 | As we see, the domain `distributedMesh` (along with the string array `A` on top of it) was decomposed into 2x2
190 | blocks stored on the four nodes, respectively.  Equally important, for each element `a` of the array, the line
191 | of code filling in that element ran on the same locale where that element was stored. In other words, this
192 | code ran in parallel (`forall` loop) on four nodes, using up to three cores on each node to fill in the
193 | corresponding array elements. Once the parallel loop is finished, the `writeln` command runs on locale 0
194 | gathering remote elements from other locales and printing them to standard output.
195 | 
196 | Now we can print the range of indices for each sub-domain by adding the following to our code:
197 | 
198 | ```chpl
199 | for loc in Locales {
200 |   on loc {
201 |     writeln(A.localSubdomain());
202 |   }
203 | }
204 | ```
205 | 
206 | On 4 locales we should get:
207 | 
208 | ```output
209 | {1..4, 1..4}  
210 | {1..4, 5..8}  
211 | {5..8, 1..4}  
212 | {5..8, 5..8}  
213 | ```
214 | 
215 | Let us count the number of threads by adding the following to our code:
216 | 
217 | ```chpl
218 | var counter = 0;
219 | forall a in A with (+ reduce counter) { // go in parallel through all n^2 elements
220 |   counter = 1;
221 | }
222 | writeln("actual number of threads = ", counter);
223 | ```
224 | 
225 | If `n=8` in our code is sufficiently large, there are enough array elements per node (8*8/4 = 16 in our case)
226 | to fully utilise all three available cores on each node, so our output should be
227 | 
228 | ```output
229 | actual number of threads = 12
230 | ```
231 | 
232 | Try reducing the array size `n` to see if that changes the output (fewer tasks per locale), e.g., setting
233 | n=3. Also try increasing the array size to n=20 and study the output. Does the output make sense?
234 | 
235 | So far we looked at the block distribution `BlockDist`. It will distribute a 2D domain among nodes either
236 | using 1D or 2D decomposition (in our example it was 2D decomposition 2x2), depending on the domain size and
237 | the number of nodes.
238 | 
239 | Let us take a look at another standard module for domain partitioning onto locales, called CyclicDist. For
240 | each element of the array we will print out again
241 | 
242 | 1. `a.locale.id`, the ID of the locale holding the element a of A
243 | 2. `here.name`, the name of the locale on which the code is running
244 | 3. `here.maxTaskPar`, the number of cores on the locale on which the code is running
245 | 
246 | ```chpl
247 | use CyclicDist; // elements are sent to locales in a round-robin pattern
248 | config const n = 8;
249 | const mesh: domain(2) = {1..n, 1..n};  // a 2D domain defined in shared memory on a single locale
250 | const m2: domain(2) dmapped new cyclicDist(startIdx=mesh.low) = mesh; // mesh.low is the first index (1,1)
251 | var A2: [m2] string;
252 | forall a in A2 {
253 |   a = a.locale.id:string + '-' + here.name + '-' + here.maxTaskPar:string + '  ';
254 | }
255 | writeln(A2);
256 | ```
257 | 
258 | ```output
259 | 0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3  
260 | 2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3  
261 | 0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3  
262 | 2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3  
263 | 0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3  
264 | 2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3  
265 | 0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3   0-cdr544-3   1-cdr552-3  
266 | 2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3   2-cdr556-3   3-cdr692-3  
267 | ```
268 | 
269 | As the name `CyclicDist` suggests, the domain was mapped to locales in a cyclic, round-robin pattern. We can
270 | also print the range of indices for each sub-domain by adding the following to our code:
271 | 
272 | ```chpl
273 | for loc in Locales {
274 |   on loc {
275 |     writeln(A2.localSubdomain());
276 |   }
277 | }
278 | ```
279 | 
280 | ```output
281 | {1..7 by 2, 1..7 by 2}
282 | {1..7 by 2, 2..8 by 2}
283 | {2..8 by 2, 1..7 by 2}
284 | {2..8 by 2, 2..8 by 2}
285 | ```
286 | 
287 | In addition to BlockDist and CyclicDist, Chapel has several other predefined distributions: BlockCycDist,
288 | ReplicatedDist, DimensionalDist2D, ReplicatedDim, BlockCycDim &mdash; for details please see
289 | https://chapel-lang.org/docs/primers/distributions.html.
290 | 
291 | ## Diffusion solver on distributed domains
292 | 
293 | Now let us use distributed domains to write a parallel version of our original diffusion solver code:
294 | 
295 | ```chpl
296 | use BlockDist;
297 | use Math;
298 | config const n = 8;
299 | const mesh: domain(2) = {1..n, 1..n};  // local 2D n^2 domain
300 | ```
301 | 
302 | We will add a larger (n+2)^2 block-distributed domain `largerMesh` with a layer of *ghost points* on
303 | *perimeter locales*, and define a temperature array `temp` on top of it, by adding the following to our code:
304 | 
305 | ```chpl
306 | const largerMesh: domain(2) dmapped new blockDist(boundingBox=mesh) = {0..n+1, 0..n+1};
307 | var temp: [largerMesh] real; // a block-distributed array of temperatures
308 | forall (i,j) in temp.domain[1..n,1..n] {
309 |   var x = ((i:real)-0.5)/(n:real); // x, y are local to each task
310 |   var y = ((j:real)-0.5)/(n:real);
311 |   temp[i,j] = exp(-((x-0.5)**2 + (y-0.5)**2) / 0.01); // narrow Gaussian peak
312 | }
313 | writeln(temp);
314 | ```
315 | 
316 | Here we initialised an initial Gaussian temperature peak in the middle of the mesh. As we evolve our solution
317 | in time, this peak should diffuse slowly over the rest of the domain.
318 | 
319 | > ## Question
320 | >
321 | > Why do we have `forall (i,j) in temp.domain[1..n,1..n]`
322 | > and not `forall (i,j) in mesh`?
323 | >
324 | > > ## Answer
325 | > > The first one will run on multiple locales in parallel, whereas the
326 | > > second will run in parallel via multiple threads on locale 0 only, since
327 | > > "mesh" is defined on locale 0.
328 | 
329 | The code above will print the initial temperature distribution:
330 | 
331 | ```output
332 | 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
333 | 0.0 2.36954e-17 2.79367e-13 1.44716e-10 3.29371e-09 3.29371e-09 1.44716e-10 2.79367e-13 2.36954e-17 0.0
334 | 0.0 2.79367e-13 3.29371e-09 1.70619e-06 3.88326e-05 3.88326e-05 1.70619e-06 3.29371e-09 2.79367e-13 0.0
335 | 0.0 1.44716e-10 1.70619e-06 0.000883826 0.0201158 0.0201158 0.000883826 1.70619e-06 1.44716e-10 0.0
336 | 0.0 3.29371e-09 3.88326e-05 0.0201158 0.457833 0.457833 0.0201158 3.88326e-05 3.29371e-09 0.0
337 | 0.0 3.29371e-09 3.88326e-05 0.0201158 0.457833 0.457833 0.0201158 3.88326e-05 3.29371e-09 0.0
338 | 0.0 1.44716e-10 1.70619e-06 0.000883826 0.0201158 0.0201158 0.000883826 1.70619e-06 1.44716e-10 0.0
339 | 0.0 2.79367e-13 3.29371e-09 1.70619e-06 3.88326e-05 3.88326e-05 1.70619e-06 3.29371e-09 2.79367e-13 0.0
340 | 0.0 2.36954e-17 2.79367e-13 1.44716e-10 3.29371e-09 3.29371e-09 1.44716e-10 2.79367e-13 2.36954e-17 0.0
341 | 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0  
342 | ```
343 | 
344 | Let us define an array of strings `nodeID` with the same distribution over locales as `temp`, by adding the
345 | following to our code:
346 | 
347 | ```chpl
348 | var nodeID: [largerMesh] string;
349 | forall m in nodeID do
350 |   m = here.id:string;
351 | writeln(nodeID);
352 | ```
353 | 
354 | The outer perimeter in the partition below are the *ghost points*:
355 | 
356 | ```output
357 | 0 0 0 0 0 1 1 1 1 1  
358 | 0 0 0 0 0 1 1 1 1 1  
359 | 0 0 0 0 0 1 1 1 1 1  
360 | 0 0 0 0 0 1 1 1 1 1  
361 | 0 0 0 0 0 1 1 1 1 1  
362 | 2 2 2 2 2 3 3 3 3 3  
363 | 2 2 2 2 2 3 3 3 3 3  
364 | 2 2 2 2 2 3 3 3 3 3  
365 | 2 2 2 2 2 3 3 3 3 3  
366 | 2 2 2 2 2 3 3 3 3 3  
367 | ```
368 | 
369 | ::::::::::::::::::::::::::::::::::::: challenge
370 | 
371 | ## Challenge 3: Can you do it?
372 | 
373 | In addition to here.id, also print the ID of the locale holding that value.  Is it the same or different from here.id?
374 | 
375 | :::::::::::::::::::::::: solution
376 | 
377 | Something along the lines: `m = here.id:string + '-' + m.locale.id:string`
378 | 
379 | :::::::::::::::::::::::::::::::::
380 | ::::::::::::::::::::::::::::::::::::::::::::::::
381 | 
382 | Now we implement the parallel solver, by adding the following to our code (*contains a mistake on purpose!*):
383 | 
384 | ```chpl
385 | var temp_new: [largerMesh] real;
386 | for step in 1..5 { // time-stepping
387 |   forall (i,j) in mesh do
388 |     temp_new[i,j] = (temp[i-1,j] + temp[i+1,j] + temp[i,j-1] + temp[i,j+1]) / 4;
389 |   temp[mesh] = temp_new[mesh]; // uses parallel forall underneath
390 | }
391 | ```
392 | 
393 | ::::::::::::::::::::::::::::::::::::: challenge
394 | 
395 | ## Challenge 4: Can you do it?
396 | 
397 | Can anyone spot a mistake in the last code?
398 | 
399 | :::::::::::::::::::::::: solution 
400 | 
401 | It should be
402 | 
403 | `forall (i,j) in temp_new.domain[1..n,1..n] do`
404 | 
405 | instead of
406 | 
407 | `forall (i,j) in mesh do`
408 | 
409 | as the last one will likely run in parallel via threads only on locale 0, whereas the former will run on
410 | multiple locales in parallel.
411 | 
412 | :::::::::::::::::::::::::::::::::
413 | ::::::::::::::::::::::::::::::::::::::::::::::::
414 | 
415 | Here is the final version of the entire code:
416 | 
417 | ```chpl
418 | use BlockDist;
419 | use Math;
420 | config const n = 8;
421 | const mesh: domain(2) = {1..n,1..n};
422 | const largerMesh: domain(2) dmapped new blockDist(boundingBox=mesh) = {0..n+1,0..n+1};
423 | var temp, temp_new: [largerMesh] real;
424 | forall (i,j) in temp.domain[1..n,1..n] {
425 |   var x = ((i:real)-0.5)/(n:real);
426 |   var y = ((j:real)-0.5)/(n:real);
427 |   temp[i,j] = exp(-((x-0.5)**2 + (y-0.5)**2) / 0.01);
428 | }
429 | for step in 1..5 {
430 |   forall (i,j) in temp_new.domain[1..n,1..n] {
431 |     temp_new[i,j] = (temp[i-1,j] + temp[i+1,j] + temp[i,j-1] + temp[i,j+1]) / 4.0;
432 |   }
433 |   temp = temp_new;
434 |   writeln((step, " ", temp[n/2,n/2], " ", temp[1,1]));
435 | }
436 | ```
437 | 
438 | This is the entire parallel solver! Note that we implemented an open boundary: `temp` on the *ghost points* is
439 | always 0. Let us add some printout and also compute the total energy on the mesh, by adding the following to
440 | our code:
441 | 
442 | ```chpl
443 |   writeln((step, " ", temp[n/2,n/2], " ", temp[2,2]));
444 |   var total: real = 0;
445 |   forall (i,j) in mesh with (+ reduce total) do
446 |     total += temp[i,j];
447 |   writeln("total = ", total);
448 | ```
449 | 
450 | Notice how the total energy decreases in time with the open boundary conditions, as the energy is leaving the
451 | system.
452 | 
453 | 
454 | ::::::::::::::::::::::::::::::::::::: challenge
455 | 
456 | ## Challenge 5: Can you do it?
457 | 
458 | Write a code to print how the finite-difference stencil [i,j], [i-1,j], [i+1,j], [i,j-1], [i,j+1] is
459 | distributed among nodes, and compare that to the ID of the node where temp[i,i] is computed.
460 | 
461 | :::::::::::::::::::::::: solution
462 | 
463 | Here is one possible solution examining the locality of the finite-difference stencil:
464 | 
465 | ```chpl
466 | var nodeID: [largerMesh] string = 'empty';
467 | forall (i,j) in nodeID.domain[1..n,1..n] do
468 |   nodeID[i,j] = here.id:string + nodeID[i,j].locale.id:string + nodeID[i-1,j].locale.id:string +
469 |     nodeID[i+1,j].locale.id:string + nodeID[i,j-1].locale.id:string + nodeID[i,j+1].locale.id:string + '  ';
470 | writeln(nodeID);
471 | ```
472 | 
473 | :::::::::::::::::::::::::::::::::
474 | ::::::::::::::::::::::::::::::::::::::::::::::::
475 | 
476 | This produced the following output clearly showing the *ghost points* and the stencil distribution for each
477 | mesh point:
478 | 
479 | ```output
480 | empty empty empty empty empty empty empty empty empty empty
481 | empty 000000   000000   000000   000001   111101   111111   111111   111111   empty
482 | empty 000000   000000   000000   000001   111101   111111   111111   111111   empty
483 | empty 000000   000000   000000   000001   111101   111111   111111   111111   empty
484 | empty 000200   000200   000200   000201   111301   111311   111311   111311   empty
485 | empty 220222   220222   220222   220223   331323   331333   331333   331333   empty
486 | empty 222222   222222   222222   222223   333323   333333   333333   333333   empty
487 | empty 222222   222222   222222   222223   333323   333333   333333   333333   empty
488 | empty 222222   222222   222222   222223   333323   333333   333333   333333   empty
489 | empty empty empty empty empty empty empty empty empty empty
490 | ```
491 | 
492 | Note that temp[i,j] is always computed on the same node where that element is stored, which makes sense.
493 | 
494 | ## Periodic boundary conditions
495 | 
496 | Now let us modify the previous parallel solver to include periodic BCs. At the beginning of each time step we
497 | need to set elements on the *ghost points* to their respective values on the *opposite ends*, by adding the
498 | following to our code:
499 | 
500 | ```chpl
501 |   temp[0,1..n] = temp[n,1..n]; // periodic boundaries on all four sides; these will run via parallel forall
502 |   temp[n+1,1..n] = temp[1,1..n];
503 |   temp[1..n,0] = temp[1..n,n];
504 |   temp[1..n,n+1] = temp[1..n,1];
505 | ```
506 | 
507 | Now total energy should be conserved, as nothing leaves the domain.
508 | 
509 | # I/O
510 | 
511 | Let us write the final solution to disk. There are several caveats:
512 | 
513 | - works only with ASCII
514 | - Chapel can also write binary data but nothing can read it (checked: not the
515 |   endians problem!)
516 | - would love to write NetCDF and HDF5, probably can do this by calling C/C++
517 |   functions from Chapel
518 | 
519 | We'll add the following to our code to write ASCII:
520 | 
521 | ```chpl
522 | use IO;
523 | var myFile = open("output.dat", ioMode.cw); // open the file for writing
524 | var myWritingChannel = myFile.writer(); // create a writing channel starting at file offset 0
525 | myWritingChannel.write(temp); // write the array
526 | myWritingChannel.close(); // close the channel
527 | ```
528 | 
529 | Run the code and check the file *output.dat*: it should contain the array T after 5 steps in ASCII.
530 | 
531 | <!-- # Ideas for future topics or homework -->
532 | 
533 | <!-- - binary I/O -->
534 | <!-- - write/read NetCDF from Chapel by calling a C/C++ function -->
535 | <!-- - take a simple non-linear problem, linearise it, implement a parallel -->
536 | <!--   multi-locale linear solver entirely in Chapel -->
537 | 
538 | ::::::::::::::::::::::::::::::::::::: keypoints
539 | - "Domains are multi-dimensional sets of integer indices."
540 | - "A domain can be defined on a single locale or distributed across many locales."
541 | - "There are many predefined distribution method: block, cyclic, etc."
542 | - "Arrays are defined on top of domains and inherit their distribution model."
543 | ::::::::::::::::::::::::::::::::::::::::::::::::
544 | 


--------------------------------------------------------------------------------
/hpc-chapel.Rproj:
--------------------------------------------------------------------------------
 1 | Version: 1.0
 2 | 
 3 | RestoreWorkspace: No
 4 | SaveWorkspace: No
 5 | AlwaysSaveHistory: Default
 6 | 
 7 | EnableCodeIndexing: Yes
 8 | UseSpacesForTab: Yes
 9 | NumSpacesForTab: 2
10 | Encoding: UTF-8
11 | 
12 | RnwWeave: Sweave
13 | LaTeX: pdfLaTeX
14 | 
15 | AutoAppendNewline: Yes
16 | StripTrailingWhitespace: Yes
17 | LineEndingConversion: Posix
18 | 
19 | BuildType: Website
20 | 


--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | site: sandpaper::sandpaper_site
 3 | ---
 4 | 
 5 | <!-- This is a new lesson built with [The Carpentries Workbench][workbench].  -->
 6 | <!-- [workbench]: https://carpentries.github.io/sandpaper-docs -->
 7 | 
 8 | This workshop is an introduction to parallel programming in Chapel. This material is designed for Day 2 of HPC Carpentry.
 9 | 
10 | By the end of this workshop, students will know:
11 | 
12 | - the basic syntax of Chapel codes,
13 | - how to run single-locale Chapel codes,
14 | - how to write task-parallel codes for a shared-memory compute node,
15 | - how to run multi-locale Chapel codes,
16 | - how to write domain-parallel codes for a distributed-memory cluster.
17 | 
18 | **NOTE**: This is the draft HPC Carpentry release. Comments and feedback are welcome.
19 | 


--------------------------------------------------------------------------------
/instructors/instructor-notes.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: 'Instructor Notes'
3 | ---
4 | 
5 | This is a placeholder file. Please add content here. 
6 | 


--------------------------------------------------------------------------------
/learners/reference.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: 'Reference'
3 | ---
4 | 
5 | ## Glossary
6 | 
7 | This is a placeholder file. Please add content here. 
8 | 
9 | 


--------------------------------------------------------------------------------
/learners/setup.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Setup
 3 | ---
 4 | 
 5 | We highly recommend running Chapel on an HPC cluster. Alternatively, you can run Chapel on your computer, but
 6 | don't expect a multi-node speedup since you have only one node.
 7 | 
 8 | <!-- ## Data Sets -->
 9 | 
10 | <!-- <\!-- -->
11 | <!-- FIXME: place any data you want learners to use in `episodes/data` and then use -->
12 | <!--        a relative link ( [data zip file](data/lesson-data.zip) ) to provide a -->
13 | <!--        link to it, replacing the example.com link. -->
14 | <!-- -\-> -->
15 | <!-- Download the [data zip file](https://example.com/FIXME) and unzip it to your Desktop -->
16 | 
17 | ## Software Setup
18 | 
19 | ::::::::::::::::::::::::::::::::::::::: discussion
20 | 
21 | ### Details
22 | 
23 | This section describes installing Chapel on your own computer. Before proceeding, please double-check that
24 | your workshop instructors do not already provide Chapel on an HPC cluster.
25 | 
26 | <!-- Setup for different systems can be presented in dropdown menus via a `spoiler` -->
27 | <!-- tag. They will join to this discussion block, so you can give a general overview -->
28 | <!-- of the software used in this lesson here and fill out the individual operating -->
29 | <!-- systems (and potentially add more, e.g. online setup) in the solutions blocks. -->
30 | 
31 | :::::::::::::::::::::::::::::::::::::::::::::::::::
32 | 
33 | :::::::::::::::: spoiler
34 | 
35 | ### Windows
36 | 
37 | Go to the website https://docs.docker.com/docker-for-windows/install/ and download the Docker Desktop
38 | installation file. Double-click on the `Docker_Desktop_Installer.exe` to run the installer. During the
39 | installation process, enable Hyper-V Windows Feature on the Configuration page, and wait for the installation
40 | to complete. At this point you might need to restart your computer.
41 | 
42 | Eventually you want to run https://hub.docker.com/r/chapel/chapel Docker image.
43 | 
44 | ::::::::::::::::::::::::
45 | 
46 | :::::::::::::::: spoiler
47 | 
48 | ### MacOS
49 | 
50 | The quickest way to get started with Chapel on MacOS is to install it via Homebrew. If you don't have Homebrew
51 | installed (skip this step if you do), open Terminal.app and type
52 | 
53 | ```bash
54 | /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
55 | ```
56 | 
57 | Next, proceed to installing Chapel:
58 | 
59 | ```bash
60 | brew update
61 | brew install chapel
62 | ```
63 | 
64 | <!-- Compile and run a test program: -->
65 | 
66 | <!-- ```bash -->
67 | <!-- chpl $(brew --cellar)/chapel/<chapel-version>/libexec/examples/hello.chpl -->
68 | <!-- ./hello -->
69 | <!-- ``` -->
70 | 
71 | ::::::::::::::::::::::::
72 | 
73 | 
74 | :::::::::::::::: spoiler
75 | 
76 | ### Linux
77 | 
78 | At https://github.com/chapel-lang/chapel/releases scroll to the first "Assets" section (you might need to
79 | click on "Show all assets") and pick the latest precompiled Chapel package for your Linux distribution. For
80 | example, with Ubuntu 22.04 you can do:
81 | 
82 | ```bash
83 | wget https://github.com/chapel-lang/chapel/releases/download/2.0.0/chapel-2.1.0-1.ubuntu22.amd64.deb
84 | sudo apt install ./chapel-2.1.0-1.ubuntu22.amd64.deb
85 | ```
86 | 
87 | ::::::::::::::::::::::::
88 | 


--------------------------------------------------------------------------------
/links.md:
--------------------------------------------------------------------------------
 1 | <!-- 
 2 | Place links that you need to refer to multiple times across pages here. Delete
 3 | any links that you are not going to use. 
 4 |  -->
 5 | 
 6 | [pandoc]: https://pandoc.org/MANUAL.html
 7 | [r-markdown]: https://rmarkdown.rstudio.com/
 8 | [rstudio]: https://www.rstudio.com/
 9 | [carpentries-workbench]: https://carpentries.github.io/sandpaper-docs/
10 | 
11 | 


--------------------------------------------------------------------------------
/profiles/learner-profiles.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: FIXME
3 | ---
4 | 
5 | This is a placeholder file. Please add content here. 
6 | 


--------------------------------------------------------------------------------
/site/README.md:
--------------------------------------------------------------------------------
1 | # {sandpaper}-Generated Content
2 | 
3 | This directory contains rendered lesson materials.
4 | Please do not edit files here.
5 | 


--------------------------------------------------------------------------------