├── .github └── workflows │ └── spelling.yml ├── .gitignore ├── BUILD.md ├── CONTRIBUTING.md ├── Dockerfile ├── LICENCE ├── PRINT.md ├── README.md ├── VERSION ├── configs └── print.yaml ├── convert.sh ├── filters └── print.py ├── generate_patch.sh ├── images ├── README.md ├── cexa.png ├── code.svg ├── code_2.png ├── code_txt.svg ├── doc_txt.svg ├── documentation.png ├── kokkos_wire.pdf ├── kokkos_wire.svg ├── maison_de_la_simulation.png ├── training.png ├── tutorial_txt.svg ├── warning.png ├── warning.svg └── warning_txt.svg ├── install.md ├── patches └── print │ ├── install.tex.diff │ ├── terminology.tex.diff │ └── utilization.tex.diff ├── personna.md ├── requirements.txt ├── styles ├── after_body │ └── print.tex ├── before_body │ └── print.tex └── header │ └── print.tex ├── terminology.md ├── typos.toml └── utilization.md /.github/workflows/spelling.yml: -------------------------------------------------------------------------------- 1 | name: Spelling checks 2 | 3 | on: 4 | pull_request: 5 | 6 | jobs: 7 | spelling: 8 | name: Spell Check with Typos 9 | runs-on: ubuntu-latest 10 | steps: 11 | - name: Checkout Actions Repository 12 | uses: actions/checkout@v4 13 | - name: Spell Check Repo 14 | uses: crate-ci/typos@v1.29.7 15 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | **.DS_Store 2 | *.aux 3 | *.log 4 | *.synctex.gz 5 | /*.pdf 6 | /*.tex 7 | -------------------------------------------------------------------------------- /BUILD.md: -------------------------------------------------------------------------------- 1 | # Build 2 | 3 | This document is intended for paper distribution of the cheat sheets. 4 | The Markdown documents use commented pre-processor instructions to remove unprintable parts. 5 | Then the documents are converted to LaTeX sources, which can be compiled as PDF documents. 6 | 7 | ## Setup 8 | 9 | ### Requirements 10 | 11 | - GPP (General Pre-Processor); 12 | - Pandoc 2; 13 | - PDFLatex (from at least `texlive-latex-extra`); and 14 | - Python 3. 15 | 16 | Note that Pandoc 2 is not available anymore on Ubuntu 24.04 ! 17 | In that case, please use the provided Docker image. 18 | 19 | ### Install dependencies 20 | 21 | Note that it is worth to create a [virtual environment](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) for the project: 22 | 23 | ```sh 24 | python3 -m venv .venv 25 | ``` 26 | 27 | which you can activate with: 28 | 29 | ```sh 30 | source .venv/bin/activate 31 | ``` 32 | 33 | Install Python dependencies with Pip: 34 | 35 | ```sh 36 | pip3 install -r requirements.txt 37 | ``` 38 | 39 | Or, without using a virtual environment: 40 | 41 | ```sh 42 | pip3 install --user -r requirements.txt 43 | ``` 44 | 45 | ### Docker image 46 | 47 | A docker image containing all dependencies (minus LaTeX) is available: 48 | 49 | ```sh 50 | sudo docker build -t kokkos_cheat_sheet . 51 | ``` 52 | 53 | ## Generate LaTeX files 54 | 55 | Call the `convert.sh` script which pre-processes the input Markdown file and converts it to standalone LaTeX sources: 56 | 57 | ```sh 58 | ./convert.sh 59 | # or 60 | sudo docker run --rm -v $PWD:/work kokkos_cheat_sheet --user $UID:$GID ./convert.sh 61 | ``` 62 | 63 | Note that `--user $UID:$GID` is required to produce a file with your ownership on some systems. 64 | 65 | ## Build PDF document 66 | 67 | Build the document with: 68 | 69 | ```sh 70 | pdflatex 71 | ``` 72 | 73 | Note that calling `convert.sh` with `-b` automatically builds the generated file, but this option cannot be used when the script is executed with Docker. 74 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing 2 | 3 | ## Versioning 4 | 5 | The cheat sheet version is a combination of the Kokkos version and the timestamp when doing a release: 6 | 7 | ``` 8 | 4.2.0.20240226 9 | ^ ^ ^ ^ 10 | | | | + Cheat sheet generation date in the form YYYYMMDD 11 | | | +-- Kokkos patch version 12 | | +---- Kokkos minor version 13 | +------ Kokkos major version 14 | ``` 15 | 16 | ## Pre-processor 17 | 18 | Since Markdown does not have branching controls, we use pre-processor instructions, that are parsed by [GPP (Generic Preprocessor)](https://logological.org/gpp). 19 | These instructions have the similar syntax as the C pre-processor ones, but are encapsulated within Markdown comments, so that the un-pre-processed file remains a valid Markdown file: 20 | 21 | ```md 22 | 23 | ``` 24 | 25 | Note that there are *no spaces* between the Markdown comment symbol and the pre-processor instruction. 26 | By instance, in order to display text only if the `PRINT` macro is defined: 27 | 28 | ```md 29 | 30 | Only visible in printed version! 31 | 32 | ``` 33 | 34 | Reversely, in order to hide text if the `PRINT` macro is defined (which is the most common case): 35 | 36 | ```md 37 | 38 | Not visible in printed version! 39 | 40 | ``` 41 | 42 | Passing macros to GPP is similar with passing macros to a C compiler: 43 | 44 | ```sh 45 | gpp -DPRINT 46 | ``` 47 | 48 | For now, the `PRINT` macro is used for print mode. 49 | 50 | ## Patching 51 | 52 | Patches are used to keep specific adjustments in the final document that cannot be present in the source Markdown file. 53 | Namely, such changes include manual page breaks, abbreviations, etc. 54 | Large modification, however, such as the removal of an entire section, are better handled by the pre-processor discussed above. 55 | 56 | A patch is associated with a converted file in the form `name.ext.diff`, and is stored under `patches//name.ext.diff` (with `` being `print` for print mode). 57 | If a patch file exists with the same name, then the patch is automatically applied when calling `convert.sh`. 58 | 59 | The creation of the patch uses the following workflow: 60 | 61 | ```sh 62 | ./generate_patch.sh start 63 | # edit and perform specific adjustments 64 | ./generate_patch.sh end 65 | ``` 66 | 67 | Any consecutive call to `generate_patch.sh` will cumulate the modifications. 68 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:22.04 2 | 3 | # system dependencies 4 | RUN apt-get update && \ 5 | apt-get install -y --no-install-recommends \ 6 | gpp \ 7 | pandoc \ 8 | patch \ 9 | python3 \ 10 | python-is-python3 \ 11 | python3-pip \ 12 | && \ 13 | apt-get clean && \ 14 | rm -rf /var/lib/apt/lists/* 15 | 16 | # python dependencies 17 | COPY ./requirements.txt / 18 | RUN pip3 install --no-cache-dir -r /requirements.txt 19 | 20 | WORKDIR /work 21 | -------------------------------------------------------------------------------- /LICENCE: -------------------------------------------------------------------------------- 1 | Creative Commons Attribution-ShareAlike 4.0 International Public 2 | License 3 | 4 | CExA Project 2024. 5 | 6 | By exercising the Licensed Rights (defined below), You accept and agree 7 | to be bound by the terms and conditions of this Creative Commons 8 | Attribution-ShareAlike 4.0 International Public License ("Public 9 | License"). To the extent this Public License may be interpreted as a 10 | contract, You are granted the Licensed Rights in consideration of Your 11 | acceptance of these terms and conditions, and the Licensor grants You 12 | such rights in consideration of benefits the Licensor receives from 13 | making the Licensed Material available under these terms and 14 | conditions. 15 | 16 | Latest version presently available at 17 | https://creativecommons.org/licenses/by-sa/4.0/ 18 | 19 | Section 1 -- Definitions. 20 | 21 | a. Adapted Material means material subject to Copyright and Similar 22 | Rights that is derived from or based upon the Licensed Material 23 | and in which the Licensed Material is translated, altered, 24 | arranged, transformed, or otherwise modified in a manner requiring 25 | permission under the Copyright and Similar Rights held by the 26 | Licensor. For purposes of this Public License, where the Licensed 27 | Material is a musical work, performance, or sound recording, 28 | Adapted Material is always produced where the Licensed Material is 29 | synched in timed relation with a moving image. 30 | 31 | b. Adapter's License means the license You apply to Your Copyright 32 | and Similar Rights in Your contributions to Adapted Material in 33 | accordance with the terms and conditions of this Public License. 34 | 35 | c. BY-SA Compatible License means a license listed at 36 | creativecommons.org/compatiblelicenses, approved by Creative 37 | Commons as essentially the equivalent of this Public License. 38 | 39 | d. Copyright and Similar Rights means copyright and/or similar rights 40 | closely related to copyright including, without limitation, 41 | performance, broadcast, sound recording, and Sui Generis Database 42 | Rights, without regard to how the rights are labeled or 43 | categorized. For purposes of this Public License, the rights 44 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 45 | Rights. 46 | 47 | e. Effective Technological Measures means those measures that, in the 48 | absence of proper authority, may not be circumvented under laws 49 | fulfilling obligations under Article 11 of the WIPO Copyright 50 | Treaty adopted on December 20, 1996, and/or similar international 51 | agreements. 52 | 53 | f. Exceptions and Limitations means fair use, fair dealing, and/or 54 | any other exception or limitation to Copyright and Similar Rights 55 | that applies to Your use of the Licensed Material. 56 | 57 | g. License Elements means the license attributes listed in the name 58 | of a Creative Commons Public License. The License Elements of this 59 | Public License are Attribution and ShareAlike. 60 | 61 | h. Licensed Material means the artistic or literary work, database, 62 | or other material to which the Licensor applied this Public 63 | License. 64 | 65 | i. Licensed Rights means the rights granted to You subject to the 66 | terms and conditions of this Public License, which are limited to 67 | all Copyright and Similar Rights that apply to Your use of the 68 | Licensed Material and that the Licensor has authority to license. 69 | 70 | j. Licensor means the individual(s) or entity(ies) granting rights 71 | under this Public License. 72 | 73 | k. Share means to provide material to the public by any means or 74 | process that requires permission under the Licensed Rights, such 75 | as reproduction, public display, public performance, distribution, 76 | dissemination, communication, or importation, and to make material 77 | available to the public including in ways that members of the 78 | public may access the material from a place and at a time 79 | individually chosen by them. 80 | 81 | l. Sui Generis Database Rights means rights other than copyright 82 | resulting from Directive 96/9/EC of the European Parliament and of 83 | the Council of 11 March 1996 on the legal protection of databases, 84 | as amended and/or succeeded, as well as other essentially 85 | equivalent rights anywhere in the world. 86 | 87 | m. You means the individual or entity exercising the Licensed Rights 88 | under this Public License. Your has a corresponding meaning. 89 | 90 | 91 | Section 2 -- Scope. 92 | 93 | a. License grant. 94 | 95 | 1. Subject to the terms and conditions of this Public License, 96 | the Licensor hereby grants You a worldwide, royalty-free, 97 | non-sublicensable, non-exclusive, irrevocable license to 98 | exercise the Licensed Rights in the Licensed Material to: 99 | 100 | a. reproduce and Share the Licensed Material, in whole or 101 | in part; and 102 | 103 | b. produce, reproduce, and Share Adapted Material. 104 | 105 | 2. Exceptions and Limitations. For the avoidance of doubt, where 106 | Exceptions and Limitations apply to Your use, this Public 107 | License does not apply, and You do not need to comply with 108 | its terms and conditions. 109 | 110 | 3. Term. The term of this Public License is specified in Section 111 | 6(a). 112 | 113 | 4. Media and formats; technical modifications allowed. The 114 | Licensor authorizes You to exercise the Licensed Rights in 115 | all media and formats whether now known or hereafter created, 116 | and to make technical modifications necessary to do so. The 117 | Licensor waives and/or agrees not to assert any right or 118 | authority to forbid You from making technical modifications 119 | necessary to exercise the Licensed Rights, including 120 | technical modifications necessary to circumvent Effective 121 | Technological Measures. For purposes of this Public License, 122 | simply making modifications authorized by this Section 2(a) 123 | (4) never produces Adapted Material. 124 | 125 | 5. Downstream recipients. 126 | 127 | a. Offer from the Licensor -- Licensed Material. Every 128 | recipient of the Licensed Material automatically 129 | receives an offer from the Licensor to exercise the 130 | Licensed Rights under the terms and conditions of this 131 | Public License. 132 | 133 | b. Additional offer from the Licensor -- Adapted Material. 134 | Every recipient of Adapted Material from You 135 | automatically receives an offer from the Licensor to 136 | exercise the Licensed Rights in the Adapted Material 137 | under the conditions of the Adapter's License You apply. 138 | 139 | c. No downstream restrictions. You may not offer or impose 140 | any additional or different terms or conditions on, or 141 | apply any Effective Technological Measures to, the 142 | Licensed Material if doing so restricts exercise of the 143 | Licensed Rights by any recipient of the Licensed 144 | Material. 145 | 146 | 6. No endorsement. Nothing in this Public License constitutes or 147 | may be construed as permission to assert or imply that You 148 | are, or that Your use of the Licensed Material is, connected 149 | with, or sponsored, endorsed, or granted official status by, 150 | the Licensor or others designated to receive attribution as 151 | provided in Section 3(a)(1)(A)(i). 152 | 153 | b. Other rights. 154 | 155 | 1. Moral rights, such as the right of integrity, are not 156 | licensed under this Public License, nor are publicity, 157 | privacy, and/or other similar personality rights; however, to 158 | the extent possible, the Licensor waives and/or agrees not to 159 | assert any such rights held by the Licensor to the limited 160 | extent necessary to allow You to exercise the Licensed 161 | Rights, but not otherwise. 162 | 163 | 2. Patent and trademark rights are not licensed under this 164 | Public License. 165 | 166 | 3. To the extent possible, the Licensor waives any right to 167 | collect royalties from You for the exercise of the Licensed 168 | Rights, whether directly or through a collecting society 169 | under any voluntary or waivable statutory or compulsory 170 | licensing scheme. In all other cases the Licensor expressly 171 | reserves any right to collect such royalties. 172 | 173 | 174 | Section 3 -- License Conditions. 175 | 176 | Your exercise of the Licensed Rights is expressly made subject to the 177 | following conditions. 178 | 179 | a. Attribution. 180 | 181 | 1. If You Share the Licensed Material (including in modified 182 | form), You must: 183 | 184 | a. retain the following if it is supplied by the Licensor 185 | with the Licensed Material: 186 | 187 | i. identification of the creator(s) of the Licensed 188 | Material and any others designated to receive 189 | attribution, in any reasonable manner requested by 190 | the Licensor (including by pseudonym if 191 | designated); 192 | 193 | ii. a copyright notice; 194 | 195 | iii. a notice that refers to this Public License; 196 | 197 | iv. a notice that refers to the disclaimer of 198 | warranties; 199 | 200 | v. a URI or hyperlink to the Licensed Material to the 201 | extent reasonably practicable; 202 | 203 | b. indicate if You modified the Licensed Material and 204 | retain an indication of any previous modifications; and 205 | 206 | c. indicate the Licensed Material is licensed under this 207 | Public License, and include the text of, or the URI or 208 | hyperlink to, this Public License. 209 | 210 | 2. You may satisfy the conditions in Section 3(a)(1) in any 211 | reasonable manner based on the medium, means, and context in 212 | which You Share the Licensed Material. For example, it may be 213 | reasonable to satisfy the conditions by providing a URI or 214 | hyperlink to a resource that includes the required 215 | information. 216 | 217 | 3. If requested by the Licensor, You must remove any of the 218 | information required by Section 3(a)(1)(A) to the extent 219 | reasonably practicable. 220 | 221 | b. ShareAlike. 222 | 223 | In addition to the conditions in Section 3(a), if You Share 224 | Adapted Material You produce, the following conditions also apply. 225 | 226 | 1. The Adapter's License You apply must be a Creative Commons 227 | license with the same License Elements, this version or 228 | later, or a BY-SA Compatible License. 229 | 230 | 2. You must include the text of, or the URI or hyperlink to, the 231 | Adapter's License You apply. You may satisfy this condition 232 | in any reasonable manner based on the medium, means, and 233 | context in which You Share Adapted Material. 234 | 235 | 3. You may not offer or impose any additional or different terms 236 | or conditions on, or apply any Effective Technological 237 | Measures to, Adapted Material that restrict exercise of the 238 | rights granted under the Adapter's License You apply. 239 | 240 | 241 | Section 4 -- Sui Generis Database Rights. 242 | 243 | Where the Licensed Rights include Sui Generis Database Rights that 244 | apply to Your use of the Licensed Material: 245 | 246 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 247 | to extract, reuse, reproduce, and Share all or a substantial 248 | portion of the contents of the database; 249 | 250 | b. if You include all or a substantial portion of the database 251 | contents in a database in which You have Sui Generis Database 252 | Rights, then the database in which You have Sui Generis Database 253 | Rights (but not its individual contents) is Adapted Material, 254 | including for purposes of Section 3(b); and 255 | 256 | c. You must comply with the conditions in Section 3(a) if You Share 257 | all or a substantial portion of the contents of the database. 258 | 259 | For the avoidance of doubt, this Section 4 supplements and does not 260 | replace Your obligations under this Public License where the Licensed 261 | Rights include other Copyright and Similar Rights. 262 | 263 | 264 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 265 | 266 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 267 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 268 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 269 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 270 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 271 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 272 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 273 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 274 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 275 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 276 | 277 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 278 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 279 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 280 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 281 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 282 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 283 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 284 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 285 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 286 | 287 | c. The disclaimer of warranties and limitation of liability provided 288 | above shall be interpreted in a manner that, to the extent 289 | possible, most closely approximates an absolute disclaimer and 290 | waiver of all liability. 291 | 292 | 293 | Section 6 -- Term and Termination. 294 | 295 | a. This Public License applies for the term of the Copyright and 296 | Similar Rights licensed here. However, if You fail to comply with 297 | this Public License, then Your rights under this Public License 298 | terminate automatically. 299 | 300 | b. Where Your right to use the Licensed Material has terminated under 301 | Section 6(a), it reinstates: 302 | 303 | 1. automatically as of the date the violation is cured, provided 304 | it is cured within 30 days of Your discovery of the 305 | violation; or 306 | 307 | 2. upon express reinstatement by the Licensor. 308 | 309 | For the avoidance of doubt, this Section 6(b) does not affect any 310 | right the Licensor may have to seek remedies for Your violations 311 | of this Public License. 312 | 313 | c. For the avoidance of doubt, the Licensor may also offer the 314 | Licensed Material under separate terms or conditions or stop 315 | distributing the Licensed Material at any time; however, doing so 316 | will not terminate this Public License. 317 | 318 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 319 | License. 320 | 321 | 322 | Section 7 -- Other Terms and Conditions. 323 | 324 | a. The Licensor shall not be bound by any additional or different 325 | terms or conditions communicated by You unless expressly agreed. 326 | 327 | b. Any arrangements, understandings, or agreements regarding the 328 | Licensed Material not stated herein are separate from and 329 | independent of the terms and conditions of this Public License. 330 | 331 | 332 | Section 8 -- Interpretation. 333 | 334 | a. For the avoidance of doubt, this Public License does not, and 335 | shall not be interpreted to, reduce, limit, restrict, or impose 336 | conditions on any use of the Licensed Material that could lawfully 337 | be made without permission under this Public License. 338 | 339 | b. To the extent possible, if any provision of this Public License is 340 | deemed unenforceable, it shall be automatically reformed to the 341 | minimum extent necessary to make it enforceable. If the provision 342 | cannot be reformed, it shall be severed from this Public License 343 | without affecting the enforceability of the remaining terms and 344 | conditions. 345 | 346 | c. No term or condition of this Public License will be waived and no 347 | failure to comply consented to unless expressly agreed to by the 348 | Licensor. 349 | 350 | d. Nothing in this Public License constitutes or may be interpreted 351 | as a limitation upon, or waiver of, any privileges and immunities 352 | that apply to the Licensor or You, including from the legal 353 | processes of any jurisdiction or authority. 354 | -------------------------------------------------------------------------------- /PRINT.md: -------------------------------------------------------------------------------- 1 | # Specific print customizations 2 | 3 | ## Add print margins to the document 4 | 5 | When the LaTeX file has been generated, you can add a print margin in the TEX file to allow a full-size print when the printer automatically crops the edges of the printing area. 6 | 7 | See the `\printmargin` length and the other tuning lengths in the preamble of the generated file. Starting values are suggested. 8 | 9 | ## Add watermark and notes section 10 | 11 | When the LaTeX file has been generated, you can add a watermark and/or a notes section at the end of the TEX file. The watermark contains the logo of CExA and of the Maison de la Simulation, aligned to the right. The notes section adds a section named Notes, and prints horizontal dotted lines down to the end of the page. 12 | 13 | To add the watermark alone: 14 | 15 | ```tex 16 | \watermark 17 | ``` 18 | 19 | To add the notes section alone: 20 | 21 | ```tex 22 | \notessection 23 | ``` 24 | 25 | Note that anything after `\notessection` will be displayed on the next page. 26 | 27 | To add the notes section, then the watermark at the bottom of the same page: 28 | 29 | ```tex 30 | \notessection{\watermark} 31 | ``` 32 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Cheat sheets for Kokkos 2 | 3 | ## Resources 4 | 5 | - Full documentation: https://kokkos.org/kokkos-core-wiki/index.html 6 | - GitHub sources: https://github.com/kokkos 7 | - Tutorials: https://github.com/kokkos/kokkos-tutorials 8 | - Training lecture series: https://github.com/kokkos/kokkos-tutorials/tree/main/LectureSeries 9 | 10 | ## Pictogram symbols 11 | 12 | Code Link to compilable code examples 13 | 14 | Doc Link to documentation pages 15 | 16 | Training Link to training pages 17 | 18 | Warning Warning 19 | 20 | ## Cheat sheets 21 | 22 | - [Installation](install.md) 23 | - [Utilization](utilization.md) 24 | -------------------------------------------------------------------------------- /VERSION: -------------------------------------------------------------------------------- 1 | 4.6.1.20250516 2 | -------------------------------------------------------------------------------- /configs/print.yaml: -------------------------------------------------------------------------------- 1 | variables: 2 | documentclass: article 3 | classoption: 4 | - table 5 | - twoside 6 | papersize: a4 7 | fontsize: 10pt 8 | pdf-engine: pdflatex 9 | -------------------------------------------------------------------------------- /convert.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -eu 4 | 5 | GPP_USERMODE_START='' 7 | GPP_USERMODE_ARG_START='\B' 8 | GPP_USERMODE_ARG_SEPARATOR=' ' 9 | GPP_USERMODE_ARG_END='-->' 10 | GPP_USERMODE_CHARACTER_STACK='(' 11 | GPP_USERMODE_CHARACTER_UNSTACK=')' 12 | GPP_USERMODE_NUMBER='#' 13 | GPP_USERMODE_QUOTE='' 14 | PANDOC_VERSION_MAJOR=2 15 | 16 | check_pandoc_version () { 17 | version=$(pandoc --version | grep "^pandoc" | sed 's/pandoc \([0-9]\+\).*/\1/') 18 | 19 | if [[ "$version" != "$PANDOC_VERSION_MAJOR" ]] 20 | then 21 | echo "Unsupported Pandoc version: $version" >&2 22 | return 1 23 | fi 24 | } 25 | 26 | convert () { 27 | local input_file="$1" 28 | local output_file="$2" 29 | 30 | if [[ ! -f "$input_file" ]] 31 | then 32 | echo "Not such file $input_file" 33 | return 1 34 | fi 35 | 36 | gpp \ 37 | -U \ 38 | "$GPP_USERMODE_START" \ 39 | "$GPP_USERMODE_END" \ 40 | "$GPP_USERMODE_ARG_START" \ 41 | "$GPP_USERMODE_ARG_SEPARATOR" \ 42 | "$GPP_USERMODE_ARG_END" \ 43 | "$GPP_USERMODE_CHARACTER_STACK" \ 44 | "$GPP_USERMODE_CHARACTER_UNSTACK" \ 45 | "$GPP_USERMODE_NUMBER" \ 46 | "$GPP_USERMODE_QUOTE" \ 47 | -DPRINT \ 48 | "$input_file" \ 49 | | \ 50 | pandoc \ 51 | --defaults "configs/print.yaml" \ 52 | --include-in-header "styles/header/print.tex" \ 53 | --include-before-body "styles/before_body/print.tex" \ 54 | --include-after-body "styles/after_body/print.tex" \ 55 | --filter "filters/print.py" \ 56 | --output "$output_file" 57 | 58 | echo "Converted to $output_file" 59 | } 60 | 61 | patch_modifs () { 62 | local output_file_diff="$1" 63 | 64 | patchfile="patches/print/$output_file_diff" 65 | 66 | if [[ ! -f "$patchfile" ]] 67 | then 68 | echo "No patch to apply" 69 | return 70 | fi 71 | 72 | # patch 73 | # Note: If there are parts of the patch that cannot be applied, the patch 74 | # command returns non-0 and stores them in a specific reject file. First, 75 | # the call is marked to never fail, and second, this reject file is 76 | # discarded. 77 | patch --quiet --forward --reject-file - <"$patchfile" || true 78 | 79 | echo "Applied patch from $output_file_diff" 80 | } 81 | 82 | build () { 83 | local input_file="$1" 84 | pdflatex -interactive=nonstopmode "$input_file" 85 | } 86 | 87 | usage () { 88 | cat < 1: 30 | return False 31 | 32 | if cell["c"][0].get("t") == "Code": 33 | return True 34 | 35 | return False 36 | 37 | def has_long_line(cell, threshold=THRESHOLD_NORMAL): 38 | size = 0 39 | 40 | for index_word, word in enumerate(cell["c"]): 41 | if word and "c" in word: 42 | size += len(word["c"]) 43 | 44 | return size >= threshold 45 | 46 | def tbl_alignment(s, h, v): 47 | aligns = { 48 | "AlignDefault": "l", 49 | "AlignLeft": "l", 50 | "AlignCenter": "c", 51 | "AlignRight": "r", 52 | } 53 | # detect columns with one single line of code 54 | one_single_code_cols = [False] * len(s) 55 | for row in [h] + v: 56 | for index_col, col in enumerate(row): 57 | if col: 58 | one_single_code_cols[index_col] |= has_one_single_code(col[0]) 59 | 60 | # detect columns with long line 61 | long_line_cols = [False] * len(s) 62 | for row in [h] + v: 63 | for index_col, col in enumerate(row): 64 | if col: 65 | long_line_cols[index_col] |= has_long_line(col[0], threshold=THRESHOLD_CODE if one_single_code_cols[index_col] else THRESHOLD_NORMAL) 66 | 67 | # mark a left column as big if its size is over threshold 68 | alignments = [] 69 | for index_col, e in enumerate(s): 70 | align = aligns[e["t"]] 71 | 72 | if align == "l" and long_line_cols[index_col]: 73 | alignments.append("X") 74 | continue 75 | 76 | alignments.append(align) 77 | 78 | # if there are no big columns, make the first left one big 79 | if "X" not in alignments: 80 | try: 81 | alignments[alignments.index("l")] = "X" 82 | 83 | except ValueError: 84 | pass 85 | 86 | return "".join(alignments) 87 | 88 | 89 | def tbl_headers(s): 90 | result = s[0][0]["c"][:] 91 | # Build the columns. Note how the every column value is altered. 92 | # We are still missing "\tblhead{" for the first column 93 | # and a "}" for the last column. 94 | for i in range(1, len(s)): 95 | result.append(inlatex(r"} & \tblhead{")) 96 | result.extend(s[i][0]["c"]) 97 | # Don't forget to close the last column's "\tblhead{" before newline 98 | result.append(inlatex(r"} \\ \midrule")) 99 | # Put the missing "\tblhead{" in front of the list 100 | result.insert(0, inlatex(r"\tblhead{")) 101 | return pf.Para(result) 102 | 103 | 104 | def tbl_contents(s): 105 | result = [] 106 | for row in s: 107 | para = [] 108 | for col in row: 109 | if col: 110 | content = col[0]["c"] 111 | 112 | # un-escape pipe for code within a table 113 | for word in content: 114 | if word["t"] == "Code" and "t" in word: 115 | word["c"] = [term.replace(r"\|", "|") if type(term) is str else term for term in word["c"]] 116 | 117 | para.extend(content) 118 | 119 | para.append(inlatex(" & ")) 120 | result.extend(para) 121 | result[-1] = inlatex(r" \\" "\n") 122 | return pf.Para(result) 123 | 124 | 125 | def do_filter(k, v, f, m): 126 | if k == "Table": 127 | return [ 128 | latex( 129 | r"\begin{tabularx}{\linewidth}{%s}" % tbl_alignment(v[1], v[3], v[4]) + "\n" 130 | r"\toprule" 131 | ), 132 | tbl_headers(v[3]), 133 | tbl_contents(v[4]), 134 | latex(r"\bottomrule" "\n" r"\end{tabularx}"), 135 | ] 136 | 137 | 138 | if __name__ == "__main__": 139 | pf.toJSONFilter(do_filter) 140 | -------------------------------------------------------------------------------- /generate_patch.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | set -eu 4 | 5 | source convert.sh 6 | 7 | get_out_file_ref () { 8 | local output_file="$1" 9 | echo ".$output_file" 10 | } 11 | 12 | start_patch () { 13 | local input_file="$1" 14 | local output_file="$2" 15 | local output_file_ref="$3" 16 | local output_file_diff="$4" 17 | 18 | # convert once and save a reference "raw" file (i.e. without current patches) 19 | convert "$input_file" "$output_file" 20 | cp "$output_file" "$output_file_ref" 21 | 22 | patch_modifs "$output_file_diff" 23 | } 24 | 25 | end_patch () { 26 | local input_file="$1" 27 | local output_file="$2" 28 | local output_file_ref="$3" 29 | local output_file_diff="$4" 30 | 31 | if [[ ! -f "$output_file_ref" ]] 32 | then 33 | echo "Please call \"generate_patch.sh $input_file start\" first!" 34 | return 1 35 | fi 36 | 37 | mkdir -p "patches/print" 38 | 39 | # generate diff 40 | # Note: If there is an actual difference between the two files, the diff 41 | # command returns non-0. Consequently, the call is marked to never fail. 42 | diff -au "$output_file_ref" "$output_file" >"patches/print/$output_file_diff" || true 43 | 44 | # remove reference file 45 | rm --force "$output_file_ref" 46 | } 47 | 48 | usage () { 49 | cat < Link to compilable code examples 2 | - Doc Link to documentation pages 3 | - Training Link to training pages 4 | - Warning Warning -------------------------------------------------------------------------------- /images/cexa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CExA-project/cheat-sheet-for-kokkos/60c16b3d7a2bc2d763ed6138f570e8e0fadd232f/images/cexa.png -------------------------------------------------------------------------------- /images/code.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | {} 111 | -------------------------------------------------------------------------------- /images/code_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CExA-project/cheat-sheet-for-kokkos/60c16b3d7a2bc2d763ed6138f570e8e0fadd232f/images/code_2.png -------------------------------------------------------------------------------- /images/code_txt.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Example 113 | -------------------------------------------------------------------------------- /images/doc_txt.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Doc 113 | -------------------------------------------------------------------------------- /images/documentation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CExA-project/cheat-sheet-for-kokkos/60c16b3d7a2bc2d763ed6138f570e8e0fadd232f/images/documentation.png -------------------------------------------------------------------------------- /images/kokkos_wire.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CExA-project/cheat-sheet-for-kokkos/60c16b3d7a2bc2d763ed6138f570e8e0fadd232f/images/kokkos_wire.pdf -------------------------------------------------------------------------------- /images/kokkos_wire.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 12 | 14 | 17 | 21 | 25 | 29 | 33 | 37 | 41 | 42 | 43 | -------------------------------------------------------------------------------- /images/maison_de_la_simulation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CExA-project/cheat-sheet-for-kokkos/60c16b3d7a2bc2d763ed6138f570e8e0fadd232f/images/maison_de_la_simulation.png -------------------------------------------------------------------------------- /images/training.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CExA-project/cheat-sheet-for-kokkos/60c16b3d7a2bc2d763ed6138f570e8e0fadd232f/images/training.png -------------------------------------------------------------------------------- /images/tutorial_txt.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Tutorial 113 | -------------------------------------------------------------------------------- /images/warning.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CExA-project/cheat-sheet-for-kokkos/60c16b3d7a2bc2d763ed6138f570e8e0fadd232f/images/warning.png -------------------------------------------------------------------------------- /images/warning.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | ! 115 | -------------------------------------------------------------------------------- /images/warning_txt.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Warning 113 | -------------------------------------------------------------------------------- /install.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Installation cheat sheet for Kokkos 3 | --- 4 | 5 | 6 | 7 | # Kokkos install cheat sheet 8 | 9 | Doc https://kokkos.org/kokkos-core-wiki/get-started/quick-start.html 10 | 11 | Doc https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/Compiling.html 12 | 13 | Doc https://github.com/kokkos/kokkos-tutorials/blob/main/LectureSeries/KokkosTutorial_01_Introduction.pdf 14 | 15 | 16 | 17 | ## Requirements 18 | 19 | ### Compiler 20 | 21 | | Compiler | Minimum version | Notes | 22 | |------------|-----------------|----------| 23 | | ARM Clang | 20.1 | | 24 | | Clang | 10.0.0 | For CUDA | 25 | | Clang | 8.0.0 | For CPU | 26 | | GCC | 8.2.0 | | 27 | | Intel LLVM | 2023.0.0 | For SYCL | 28 | | Intel LLVM | 2021.1.1 | For CPU | 29 | | MSVC | 19.29 | | 30 | | NVCC | 11.0 | | 31 | | NVHPC | 22.3 | | 32 | | ROCM | 5.2.0 | | 33 | 34 | ### Build system 35 | 36 | | Build system | Minimum version | Notes | 37 | |--------------|-----------------|-----------------------------| 38 | | CMake | 3.25.2 | For Intel LLVM full support | 39 | | CMake | 3.21.1 | For NVHPC support | 40 | | CMake | 3.18 | For better Fortran linking | 41 | | CMake | 3.16 | | 42 | 43 | 44 | Doc https://kokkos.org/kokkos-core-wiki/get-started/requirements.html 45 | 46 | 47 | ## How to integrate Kokkos 48 | 49 | Note the difference in the version number between `x.y.z` and `x.y.zz`. 50 | 51 | ### As an external dependency 52 | 53 | #### Configure, build and install Kokkos 54 | 55 | ```sh 56 | git clone -b x.y.zz https://github.com/kokkos/kokkos.git 57 | cd kokkos 58 | cmake -B build \ 59 | -DCMAKE_CXX_COMPILER= \ 60 | -DCMAKE_INSTALL_PREFIX=path/to/kokkos/install \ 61 | 62 | cmake --build build 63 | cmake --install build 64 | ``` 65 | 66 | #### Setup, and configure your code 67 | 68 | ```cmake 69 | find_package(Kokkos x.y.z REQUIRED) 70 | target_link_libraries( 71 | my-app 72 | Kokkos::kokkos 73 | ) 74 | ``` 75 | 76 | ```sh 77 | cd path/to/your/code 78 | cmake -B build \ 79 | -DCMAKE_CXX_COMPILER= \ 80 | -DKokkos_ROOT=path/to/kokkos/install 81 | ``` 82 | 83 | 84 | Doc https://kokkos.org/kokkos-core-wiki/get-started/integrating-kokkos-into-your-cmake-project.html#external-kokkos-recommended-for-most-users 85 | Doc https://cmake.org/cmake/help/latest/guide/tutorial/index.html 86 | 87 | 88 | ### As an internal dependency 89 | 90 | #### Setup with a Git submodule 91 | 92 | ```sh 93 | git submodule add -b x.y.zz https://github.com/kokkos/kokkos.git tpls/kokkos 94 | ``` 95 | 96 | ```cmake 97 | add_subdirectory(path/to/kokkos) 98 | target_link_libraries( 99 | my-app 100 | Kokkos::kokkos 101 | ) 102 | ``` 103 | 104 | 105 | Doc https://kokkos.org/kokkos-core-wiki/get-started/integrating-kokkos-into-your-cmake-project.html#embedded-kokkos-via-add-subdirectory-and-git-submodules 106 | Doc https://cmake.org/cmake/help/latest/command/add_subdirectory.html#command:add_subdirectory 107 | Doc https://git-scm.com/book/en/v2/Git-Tools-Submodules 108 | 109 | 110 | #### Setup with FetchContent 111 | 112 | ```cmake 113 | include(FetchContent) 114 | FetchContent_Declare( 115 | kokkos 116 | URL https://github.com/kokkos/kokkos/releases/download/x.y.zz/kokkos-x.y.zz.tar.gz 117 | URL_HASH SHA256= 118 | ) 119 | FetchContent_MakeAvailable(kokkos) 120 | target_link_libraries( 121 | my-app 122 | Kokkos::kokkos 123 | ) 124 | ``` 125 | 126 | #### Configure your code 127 | 128 | ```sh 129 | cmake -B build \ 130 | -DCMAKE_CXX_COMPILER= \ 131 | 132 | ``` 133 | 134 | You may combine the external/internal dependency approaches. 135 | 136 | 137 | Doc https://kokkos.org/kokkos-core-wiki/get-started/integrating-kokkos-into-your-cmake-project.html#embedded-kokkos-via-fetchcontent 138 | Doc https://cmake.org/cmake/help/latest/module/FetchContent.html 139 | 140 | 141 | 142 | 143 | ### As an external or internal dependency 144 | 145 | ```cmake 146 | find_package(Kokkos x.y.z QUIET) 147 | if(Kokkos_FOUND) 148 | message(STATUS "Using installed Kokkos in ${Kokkos_DIR}") 149 | else() 150 | message(STATUS "Using Kokkos from ...") 151 | # with either a Git submodule or FetchContent 152 | endif() 153 | ``` 154 | 155 | Depending if Kokkos is already installed, you may have to call CMake with `-DKokkos_ROOT`, or with Kokkos compile options. 156 | Note that this setup may not scale for a library, you should use a package manager instead. 157 | 158 | Doc https://kokkos.org/kokkos-core-wiki/get-started/integrating-kokkos-into-your-cmake-project.html#supporting-both-external-and-embedded-kokkos 159 | 160 | 161 | 162 | 163 | ### As a Spack package 164 | 165 | TODO finish this part 166 | 167 | Doc https://kokkos.org/kokkos-core-wiki/get-started/package-managers.html?highlight=spack#spack-https-spack-io 168 | 169 | 170 | 171 | ## Kokkos compile options 172 | 173 | ### Host backends 174 | 175 | | Option | Backend | 176 | |------------------------------|---------| 177 | | `-DKokkos_ENABLE_SERIAL=ON` | Serial | 178 | | `-DKokkos_ENABLE_OPENMP=ON` | OpenMP | 179 | | `-DKokkos_ENABLE_THREADS=ON` | Threads | 180 | 181 | Warning The serial backend is enabled by default if no other host backend is enabled. 182 | 183 | ### Device backends 184 | 185 | | Option | Backend | Device | 186 | |---------------------------|---------|--------| 187 | | `-DKokkos_ENABLE_CUDA=ON` | CUDA | NVIDIA | 188 | | `-DKokkos_ENABLE_HIP=ON` | HIP | AMD | 189 | | `-DKokkos_ENABLE_SYCL=ON` | SYCL | Intel | 190 | 191 | Warning You can only select the serial backend, plus another host backend and one device backend at a time. 192 | 193 | See [architecture-specific options](#architecture-specific-options). 194 | 195 | ### Specific options 196 | 197 | | Option | Description | 198 | |-----------------------------------------|-----------------------------------------------------------| 199 | | `-DKokkos_ENABLE_DEBUG=ON` | Activate extra debug features, may increase compile times | 200 | | `-DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=ON` | Use bounds checking, will increase runtime | 201 | | `-DKokkos_ENABLE_EXAMPLES=ON` | Build examples | 202 | | `-DKokkos_ENABLE_TUNING=ON` | Create bindings for tuning tools | 203 | 204 | 205 |
206 | Extra options 207 | 208 | | Option | Description | 209 | |--------------------------------------------------|--------------------------------------------| 210 | | `-DKokkos_ENABLE_AGGRESSIVE_VECTORIZATION=ON` | Aggressively vectorize loops | 211 | | `-DKokkos_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK=ON` | Debug check on dual views | 212 | | `-DKokkos_ENABLE_DEPRECATED_CODE=ON` | Enable deprecated code | 213 | | `-DKokkos_ENABLE_LARGE_MEM_TESTS=ON` | Perform extra large memory tests | 214 | 215 |
216 | 217 | Doc https://kokkos.org/kokkos-core-wiki/get-started/configuration-guide.html 218 | 219 | 220 | ### Architecture-specific options 221 | 222 | #### Host architectures 223 | 224 | Host options are used for controlling optimization and are optional. 225 | 226 | | Option | Architecture | 227 | |---------------------------|--------------| 228 | | `-DKokkos_ARCH_NATIVE=ON` | Local host | 229 | 230 | 231 | 232 |
233 | 234 | 235 | ##### AMD CPU architectures 236 | 237 | 238 | 239 | | Option | Architecture | 240 | |-------------------------|--------------| 241 | | `-DKokkos_ARCH_ZEN4=ON` | Zen4 | 242 | | `-DKokkos_ARCH_ZEN3=ON` | Zen3 | 243 | | `-DKokkos_ARCH_ZEN2=ON` | Zen2 | 244 | | `-DKokkos_ARCH_ZEN=ON` | Zen | 245 | 246 |
247 | 248 |
249 | 250 | 251 | ##### ARM CPU architectures 252 | 253 | 254 | 255 | | Option | Architecture | 256 | |--------------------------------|--------------------------| 257 | | `-DKokkos_ARCH_ARMV9_GRACE=ON` | Grace | 258 | | `-DKokkos_ARCH_A64FX=ON` | ARMv8.2 with SVE Support | 259 | | `-DKokkos_ARCH_ARMV81=ON` | ARMV8.1 | 260 | | `-DKokkos_ARCH_ARMV80=ON` | ARMV8.0 | 261 | 262 |
263 | 264 |
265 | 266 | 267 | ##### Intel CPU architectures 268 | 269 | 270 | 271 | | Option | Architecture | 272 | |-----------------------|-----------------------| 273 | | `-DKokkos_ARCH_SPR=ON | Sapphire Rapids | 274 | | `-DKokkos_ARCH_SKX=ON | Skylake | 275 | | `-DKokkos_ARCH_BDW=ON | Intel Broadwell | 276 | | `-DKokkos_ARCH_HSW=ON | Intel Haswell | 277 | | `-DKokkos_ARCH_KNL=ON | Intel Knights Landing | 278 | | `-DKokkos_ARCH_SNB=ON | Sandy Bridge | 279 | 280 |
281 | 282 |
283 | 284 | 285 | ##### RISC-V CPU architectures 286 | 287 | 288 | 289 | | Option | Architecture | 290 | |---------------------------------|--------------| 291 | | `-DKokkos_ARCH_RISCV_RVA22V=ON` | RVA22V | 292 | 293 |
294 | 295 | 296 | 297 | #### Device architectures 298 | 299 | Device options are mandatory. 300 | They can be deduced from the device if present at CMake configuration time. 301 | 302 |
303 | 304 | 305 | ##### AMD GPU architectures (HIP) 306 | 307 | 308 | 309 | | Option | Architecture | Associated cards | 310 | |-----------------------------------|--------------|--------------------------------------------------| 311 | | `-DKokkos_ARCH_AMD_GFX942_APU=ON` | GFX942 APU | MI300A | 312 | | `-DKokkos_ARCH_AMD_GFX942=ON` | GFX942 | MI300X | 313 | | `-DKokkos_ARCH_AMD_GFX90A=ON` | GFX90A | MI210, MI250, MI250X | 314 | | `-DKokkos_ARCH_AMD_GFX908=ON` | GFX908 | MI100 | 315 | | `-DKokkos_ARCH_AMD_GFX906=ON` | GFX906 | MI50, MI60 | 316 | | `-DKokkos_ARCH_AMD_GFX1103=ON` | GFX1103 | Ryzen 8000G, Radeon 740M, 760M, 780M, 880M, 980M | 317 | | `-DKokkos_ARCH_AMD_GFX1100=ON` | GFX1100 | 7900xt | 318 | | `-DKokkos_ARCH_AMD_GFX1030=ON` | GFX1030 | V620, W6800 | 319 | 320 | 321 | 322 | | Option | Description | 323 | |---------------------------------------------------------|-----------------------------------------------------------------------------------------------| 324 | | `-DKokkos_ENABLE_HIP_MULTIPLE_KERNEL_INSTANTIATIONS=ON` | Instantiate multiple kernels at compile time, improves performance but increases compile time | 325 | | `-DKokkos_ENABLE_HIP_RELOCATABLE_DEVICE_CODE=ON` | Enable Relocatable Device Code (RDC) for HIP | 326 | 327 | 328 | 329 |
330 | 331 |
332 | 333 | 334 | ##### Intel GPU architectures (SYCL) 335 | 336 | 337 | 338 | | Option | Architecture | 339 | |--------------------------------|-------------------------| 340 | | `-DKokkos_ARCH_INTEL_GEN=ON` | Generic JIT | 341 | | `-DKokkos_ARCH_INTEL_XEHP=ON` | Xe-HP | 342 | | `-DKokkos_ARCH_INTEL_PVC=ON` | GPU Max (Ponte Vecchio) | 343 | | `-DKokkos_ARCH_INTEL_DG1=ON` | Iris XeMAX | 344 | | `-DKokkos_ARCH_INTEL_GEN12=ON` | Gen12 | 345 | | `-DKokkos_ARCH_INTEL_GEN11=ON` | Gen11 | 346 | 347 | 348 | 349 | | Option | Description | 350 | |---------------------------------------------------|-----------------------------------------------| 351 | | `-DKokkos_ENABLE_SYCL_RELOCATABLE_DEVICE_CODE=ON` | Enable Relocatable Device Code (RDC) for SYCL | 352 | 353 | 354 | 355 |
356 | 357 |
358 | 359 | 360 | ##### NVIDIA GPU architectures (CUDA) 361 | 362 | 363 | 364 | | Option | Architecture | CC | Associated cards | 365 | |------------------------------|--------------|-----|--------------------------------------------------------| 366 | | `-DKokkos_ARCH_HOPPER90=ON` | Hopper | 9.0 | H200, H100 | 367 | | `-DKokkos_ARCH_ADA89=ON` | Ada | 8.9 | GeForce RTX 40 series, RTX 6000/5000 series, L4, L40 | 368 | | `-DKokkos_ARCH_AMPERE86=ON` | Ampere | 8.6 | GeForce RTX 30 series, RTX A series, A40, A10, A16, A2 | 369 | | `-DKokkos_ARCH_AMPERE80=ON` | Ampere | 8.0 | A100, A30 | 370 | | `-DKokkos_ARCH_TURING75=ON` | Turing | 7.5 | T4 | 371 | | `-DKokkos_ARCH_VOLTA72=ON` | Volta | 7.2 | | 372 | | `-DKokkos_ARCH_VOLTA70=ON` | Volta | 7.0 | V100 | 373 | | `-DKokkos_ARCH_PASCAL61=ON` | Pascal | 6.1 | P6, P40, P4 | 374 | | `-DKokkos_ARCH_PASCAL60=ON` | Pascal | 6.0 | P100 | 375 | | `-DKokkos_ARCH_MAXWELL53=ON` | Maxwell | 5.3 | | 376 | | `-DKokkos_ARCH_MAXWELL52=ON` | Maxwell | 5.2 | M6, M60, M4, M40 | 377 | | `-DKokkos_ARCH_MAXWELL50=ON` | Maxwell | 5.0 | M10 | 378 | 379 | 380 | 381 | Doc See NVIDIA documentation on Compute Capability (CC): https://developer.nvidia.com/cuda-gpus 382 | 383 | | Option | Description | 384 | |------------------------------------------------|---------------------------------------------------| 385 | | `-DKokkos_ENABLE_CUDA_CONSTEXPR` | Activate experimental relaxed constexpr functions | 386 | | `-DKokkos_ENABLE_CUDA_LAMBDA` | Activate experimental lambda features | 387 | | `-DKokkos_ENABLE_CUDA_LDG_INTRINSIC` | Use CUDA LDG intrinsics | 388 | | `-DKokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE` | Enable relocatable device code (RDC) for CUDA | 389 | 390 | 391 | 392 |
393 | 394 | 395 | 396 | ### Third-party Libraries (TPLs) 397 | 398 | Doc https://kokkos.org/kokkos-core-wiki/get-started/configuration-guide.html#keywords-tpls 399 | 400 | 401 | ### Examples for the most common architectures 402 | 403 | #### Current CPU with OpenMP 404 | 405 | ```sh 406 | cmake \ 407 | -B build \ 408 | -DCMAKE_BUILD_TYPE=Release \ 409 | -DKokkos_ENABLE_OPENMP=ON \ 410 | -DKokkos_ARCH_NATIVE=ON 411 | ``` 412 | 413 | #### AMD MI300A APU with HIP 414 | 415 | ```sh 416 | export HSA_XNACK=1 417 | cmake \ 418 | -B build \ 419 | -DCMAKE_CXX_COMPILER=hipcc \ 420 | -DCMAKE_BUILD_TYPE=Release \ 421 | -DKokkos_ENABLE_HIP=ON \ 422 | -DKokkos_ARCH_AMD_GFX942_APU=ON 423 | ``` 424 | 425 | The environment variable is required to access host allocations from the device. 426 | 427 | #### AMD MI250 GPU with HIP 428 | 429 | ```sh 430 | cmake \ 431 | -B build \ 432 | -DCMAKE_CXX_COMPILER=hipcc \ 433 | -DCMAKE_BUILD_TYPE=Release \ 434 | -DKokkos_ENABLE_HIP=ON \ 435 | -DKokkos_ARCH_AMD_GFX90A=ON 436 | ``` 437 | 438 | #### Intel GPU Max 1550 (Ponte Vecchio) with SYCL 439 | 440 | ```sh 441 | cmake \ 442 | -B build \ 443 | -DCMAKE_CXX_COMPILER=icpx \ 444 | -DCMAKE_BUILD_TYPE=Release \ 445 | -DKokkos_ENABLE_SYCL=ON \ 446 | -DKokkos_ARCH_INTEL_PVC=ON \ 447 | -DCMAKE_CXX_FLAGS="-fp-model=precise" 448 | ``` 449 | 450 | The last option is required for math operators precision. 451 | 452 | #### NVIDIA H100 GPU with CUDA 453 | 454 | ```sh 455 | cmake \ 456 | -B build \ 457 | -DCMAKE_BUILD_TYPE=Release \ 458 | -DKokkos_ENABLE_CUDA=ON \ 459 | -DKokkos_ARCH_HOPPER90=ON 460 | ``` 461 | 462 | #### NVIDIA A100 GPU with CUDA 463 | 464 | ```sh 465 | cmake \ 466 | -B build \ 467 | -DCMAKE_BUILD_TYPE=Release \ 468 | -DKokkos_ENABLE_CUDA=ON \ 469 | -DKokkos_ARCH_AMPERE80=ON 470 | ``` 471 | 472 | 473 | Code For more code examples: 474 | 475 | - https://github.com/kokkos/kokkos/tree/master/example/build_cmake_installed 476 | - https://github.com/kokkos/kokkos/tree/master/example/build_cmake_installed_different_compiler 477 | 478 | -------------------------------------------------------------------------------- /patches/print/install.tex.diff: -------------------------------------------------------------------------------- 1 | --- .install.tex 2025-04-29 15:05:59.861833564 +0200 2 | +++ install.tex 2025-04-29 15:09:46.159589195 +0200 3 | @@ -591,7 +591,7 @@ 4 | \KeywordTok{include}\NormalTok{(FetchContent)} 5 | \FunctionTok{FetchContent\_Declare}\NormalTok{(} 6 | \NormalTok{ kokkos} 7 | -\NormalTok{ URL https://github.com/kokkos/kokkos/releases/download/x.y.zz/kokkos{-}x.y.zz.tar.gz} 8 | +\NormalTok{ URL https://github.com/kokkos/kokkos/releases/download/x.y.zz\break{}/kokkos{-}x.y.zz.tar.gz} 9 | \NormalTok{ URL\_HASH SHA256=\textless{}hash for x.y.z archive\textgreater{}} 10 | \NormalTok{)} 11 | \FunctionTok{FetchContent\_MakeAvailable}\NormalTok{(kokkos)} 12 | @@ -709,7 +709,7 @@ 13 | \begin{tabularx}{\linewidth}{llX} 14 | \toprule 15 | 16 | -\tblhead{Option} & \tblhead{Architecture} & \tblhead{Associated 17 | +\tblhead{Option} & \tblhead{Arch.} & \tblhead{Associated 18 | cards} \\ \midrule 19 | 20 | \texttt{-DKokkos\_ARCH\_AMD\_GFX942\_APU=ON} & GFX942 APU & MI300A \\ 21 | @@ -752,7 +752,7 @@ 22 | \begin{tabularx}{\linewidth}{lllX} 23 | \toprule 24 | 25 | -\tblhead{Option} & \tblhead{Architecture} & \tblhead{CC} & \tblhead{Associated 26 | +\tblhead{Option} & \tblhead{Arch.} & \tblhead{CC} & \tblhead{Associated 27 | cards} \\ \midrule 28 | 29 | \texttt{-DKokkos\_ARCH\_HOPPER90=ON} & Hopper & 9.0 & H200, H100 \\ 30 | @@ -766,6 +766,15 @@ 31 | \texttt{-DKokkos\_ARCH\_VOLTA70=ON} & Volta & 7.0 & V100 \\ 32 | \texttt{-DKokkos\_ARCH\_PASCAL61=ON} & Pascal & 6.1 & P6, P40, P4 \\ 33 | \texttt{-DKokkos\_ARCH\_PASCAL60=ON} & Pascal & 6.0 & P100 \\ 34 | +\bottomrule 35 | +\end{tabularx} 36 | + 37 | +\begin{tabularx}{\linewidth}{lllX} 38 | +\toprule 39 | + 40 | +\tblhead{Option} & \tblhead{Arch.} & \tblhead{CC} & \tblhead{Associated 41 | +cards} \\ \midrule 42 | + 43 | \texttt{-DKokkos\_ARCH\_MAXWELL53=ON} & Maxwell & 5.3 & \\ 44 | \texttt{-DKokkos\_ARCH\_MAXWELL52=ON} & Maxwell & 5.2 & M6, M60, M4, 45 | M40 \\ 46 | -------------------------------------------------------------------------------- /patches/print/terminology.tex.diff: -------------------------------------------------------------------------------- 1 | --- .terminology.tex 2025-05-16 15:52:10.918644901 +0200 2 | +++ terminology.tex 2025-05-16 15:52:34.345787628 +0200 3 | @@ -430,8 +430,6 @@ 4 | 5 | \relscale{\textratio} 6 | 7 | -\begin{multicols}{2} 8 | - 9 | \hypertarget{kokkos-mapping}{% 10 | \subsection{Kokkos mapping}\label{kokkos-mapping}} 11 | 12 | @@ -439,7 +437,7 @@ 13 | \subsubsection{Hierarchical 14 | parallelism}\label{hierarchical-parallelism}} 15 | 16 | -\begin{tabularx}{\linewidth}{XlXl} 17 | +\begin{tabularx}{\linewidth}{lXlX} 18 | \toprule 19 | 20 | \tblhead{Kokkos} & \tblhead{Cuda} & \tblhead{HIP} & \tblhead{SYCL} \\ \midrule 21 | @@ -462,7 +460,7 @@ 22 | \hypertarget{memory}{% 23 | \subsubsection{Memory}\label{memory}} 24 | 25 | -\begin{tabularx}{\linewidth}{lXXX} 26 | +\begin{tabularx}{\linewidth}{lllX} 27 | \toprule 28 | 29 | \tblhead{Kokkos} & \tblhead{Cuda} & \tblhead{HIP} & \tblhead{SYCL} \\ \midrule 30 | @@ -484,7 +482,7 @@ 31 | \hypertarget{execution}{% 32 | \subsubsection{Execution}\label{execution}} 33 | 34 | -\begin{tabularx}{\linewidth}{Xlll} 35 | +\begin{tabularx}{\linewidth}{lXXX} 36 | \toprule 37 | 38 | \tblhead{Kokkos} & \tblhead{Cuda} & \tblhead{HIP} & \tblhead{SYCL} \\ \midrule 39 | @@ -514,8 +512,6 @@ 40 | \bottomrule 41 | \end{tabularx} 42 | 43 | -\end{multicols} 44 | - 45 | % add a watermark here 46 | 47 | % \watermark 48 | -------------------------------------------------------------------------------- /patches/print/utilization.tex.diff: -------------------------------------------------------------------------------- 1 | --- .utilization.tex 2025-05-16 14:59:22.520240134 +0200 2 | +++ utilization.tex 2025-05-16 15:45:59.825339129 +0200 3 | @@ -531,13 +531,12 @@ 4 | \hypertarget{generic-memory-spaces}{% 5 | \paragraph{Generic memory spaces}\label{generic-memory-spaces}} 6 | 7 | -\begin{tabularx}{\linewidth}{Xll} 8 | +\begin{tabularx}{\linewidth}{lXX} 9 | \toprule 10 | 11 | -\tblhead{Memory space} & \tblhead{Device backend} & \tblhead{Host 12 | -backend} \\ \midrule 13 | +\tblhead{Memory space} & \tblhead{Device backend} & \tblhead{Host backend} \\ \midrule 14 | 15 | -\texttt{Kokkos::DefaultExecutionSpace::memory\_space} & On device & On 16 | +\texttt{Kokkos::DefaultExecutionSpace::memory\_space} & On dev. & On 17 | host \\ 18 | \texttt{Kokkos::DefaultHostExecutionSpace::memory\_space} & On host & On 19 | host \\ 20 | @@ -581,7 +580,7 @@ 21 | \begin{tabularx}{\linewidth}{lX} 22 | \toprule 23 | 24 | -\tblhead{Template argument} & \tblhead{Description} \\ \midrule 25 | +\tblhead{Template arg.} & \tblhead{Description} \\ \midrule 26 | 27 | \texttt{DataType} & \texttt{ScalarType} for the data type, followed by a 28 | \texttt{*} for each runtime dimension, then by a 29 | @@ -778,7 +777,7 @@ 30 | \begin{tabularx}{\linewidth}{lX} 31 | \toprule 32 | 33 | -\tblhead{Template argument} & \tblhead{Description} \\ \midrule 34 | +\tblhead{Template arg.} & \tblhead{Description} \\ \midrule 35 | 36 | \texttt{Operation} & See \protect\hyperlink{scatter-operation}{scatter 37 | operation}; defaults to \texttt{Kokkos::Experimental::ScatterSum} \\ 38 | @@ -942,7 +941,7 @@ 39 | \begin{tabularx}{\linewidth}{lX} 40 | \toprule 41 | 42 | -\tblhead{Template argument} & \tblhead{Description} \\ \midrule 43 | +\tblhead{Template arg.} & \tblhead{Description} \\ \midrule 44 | 45 | \texttt{ExecutionSpace} & See 46 | \protect\hyperlink{execution-spaces}{execution spaces}; defaults to 47 | @@ -1068,7 +1067,6 @@ 48 | \NormalTok{ Kokkos::TeamPolicy(numberOfElementsI, Kokkos::AUTO),} 49 | \NormalTok{ KOKKOS\_LAMBDA (}\AttributeTok{const}\NormalTok{ Kokkos::TeamPolicy\textless{}\textgreater{}::}\DataTypeTok{member\_type}\NormalTok{\& teamMember) \{} 50 | \AttributeTok{const} \DataTypeTok{int}\NormalTok{ i = teamMember.team\_rank();} 51 | - 52 | \NormalTok{ Kokkos::parallel\_for(} 53 | \NormalTok{ Kokkos::TeamThreadRange(teamMember, firstJ, lastJ),} 54 | \NormalTok{ [=] (}\AttributeTok{const} \DataTypeTok{int}\NormalTok{ j) \{} 55 | @@ -1180,6 +1178,13 @@ 56 | \texttt{Kokkos::atomic\_lshift(\&x,\ y)} & \texttt{x\ =\ x\ \textless{}\textless{}\ y} \\ 57 | \texttt{Kokkos::atomic\_max(\&x,\ y)} & \texttt{x\ =\ std::max(x,\ y)} \\ 58 | \texttt{Kokkos::atomic\_min(\&x,\ y)} & \texttt{x\ =\ std::min(x,\ y)} \\ 59 | +\bottomrule 60 | +\end{tabularx} 61 | + 62 | +\begin{tabularx}{\linewidth}{Xl} 63 | +\toprule 64 | + 65 | +\tblhead{Operation} & \tblhead{Replaces} \\ \midrule 66 | \texttt{Kokkos::atomic\_mod(\&x,\ y)} & \texttt{x\ \%=\ y} \\ 67 | \texttt{Kokkos::atomic\_nand(\&x,\ y)} & \texttt{x\ =\ !(x\ \&\&\ y)} \\ 68 | \texttt{Kokkos::atomic\_or(\&x,\ y)} & \texttt{x\ \textbar{}=\ y} \\ 69 | @@ -1191,10 +1196,13 @@ 70 | \bottomrule 71 | \end{tabularx} 72 | 73 | +% extra space 74 | +\vspace{-0.4em} 75 | + 76 | \hypertarget{atomic-exchanges}{% 77 | \subsubsection{Atomic exchanges}\label{atomic-exchanges}} 78 | 79 | -\begin{tabularx}{\linewidth}{lX} 80 | +\begin{tabularx}{\linewidth}{XX} 81 | \toprule 82 | 83 | \tblhead{Operation} & \tblhead{Description} \\ \midrule 84 | @@ -1208,19 +1216,22 @@ 85 | \bottomrule 86 | \end{tabularx} 87 | 88 | +% extra space 89 | +\vspace{-0.4em} 90 | + 91 | \hypertarget{mathematics}{% 92 | \subsection{Mathematics}\label{mathematics}} 93 | 94 | \hypertarget{math-functions}{% 95 | \subsubsection{Math functions}\label{math-functions}} 96 | 97 | -\begin{tabularx}{\linewidth}{XX} 98 | +\begin{tabularx}{\linewidth}{Xl} 99 | \toprule 100 | 101 | \tblhead{Function type} & \tblhead{List of functions (prefixed by 102 | \texttt{Kokkos::})} \\ \midrule 103 | 104 | -Basic operations & \texttt{abs}, \texttt{fabs}, \texttt{fmod}, 105 | +Basic ops. & \texttt{abs}, \texttt{fabs}, \texttt{fmod}, 106 | \texttt{remainder}, \texttt{fma}, \texttt{fmax}, \texttt{fmin}, 107 | \texttt{fdim}, \texttt{nan} \\ 108 | Exponential & \texttt{exp}, \texttt{exp2}, \texttt{expm1}, \texttt{log}, 109 | @@ -1230,7 +1241,7 @@ 110 | \texttt{acos}, \texttt{atan}, \texttt{atan2} \\ 111 | Hyperbolic & \texttt{sinh}, \texttt{cosh}, \texttt{tanh}, 112 | \texttt{asinh}, \texttt{acosh}, \texttt{atanh} \\ 113 | -Error and gamma & \texttt{erf}, \texttt{erfc}, \texttt{tgamma}, 114 | +Error, gamma & \texttt{erf}, \texttt{erfc}, \texttt{tgamma}, 115 | \texttt{lgamma} \\ 116 | Nearest & \texttt{ceil}, \texttt{floor}, \texttt{trunc}, \texttt{round}, 117 | \texttt{nearbyint} \\ 118 | @@ -1243,6 +1254,9 @@ 119 | 120 | Note that not all C++ standard math functions are available. 121 | 122 | +% extra space 123 | +\vspace{-0.4em} 124 | + 125 | \hypertarget{complex-numbers}{% 126 | \subsubsection{Complex numbers}\label{complex-numbers}} 127 | 128 | @@ -1255,6 +1269,9 @@ 129 | \end{Highlighting} 130 | \end{Shaded} 131 | 132 | +% extra space 133 | +\vspace{-0.4em} 134 | + 135 | \hypertarget{manage-1}{% 136 | \paragraph{Manage}\label{manage-1}} 137 | 138 | @@ -1269,6 +1286,9 @@ 139 | \bottomrule 140 | \end{tabularx} 141 | 142 | +% extra space 143 | +\vspace{-0.4em} 144 | + 145 | \hypertarget{utilities}{% 146 | \subsection{Utilities}\label{utilities}} 147 | 148 | -------------------------------------------------------------------------------- /personna.md: -------------------------------------------------------------------------------- 1 | # Personna 2 | 3 | | Name | Computer science | Plasticity | Experience | Time available | 4 | |------------|------------------|------------|------------|----------------| 5 | | Alphonse | 1 | 1 | 4 | 5 | 6 | | Béatrice | 4 | 4 | 3 | 2 | 7 | | Christophe | 5 | 2 | 5 | 3 | 8 | | Dolorès | 3 | 5 | 1 | 1 | 9 | | Estéban | 4 | 3 | 4 | 3 | 10 | | Félicie | 5 | 3 | 4 | 2 | 11 | 12 | Note: 1 is low, 5 is high. 13 | 14 | ## Alphonse 15 | 16 | Nuclear physicist with limited programming knowledges, he has been using Fortran for thirty years to implement mathematical and physical models. 17 | 18 | He has to use Kokkos for his three-dimensional code to run on GPU, and will likely hire some help for this task. 19 | 20 | ## Béatrice 21 | 22 | CFD programmer in the industry, she has been using Kokkos for one year on two different projects that run on CPU and GPU. 23 | 24 | She wants to code with Kokkos efficiently without getting lost in the documentation. 25 | 26 | ## Christophe 27 | 28 | Former CUDA developer for six years, he is specialized in GPU code tuning. 29 | 30 | He wants to transition to Kokkos to target other GPUs without having to learn their specific language. 31 | 32 | ## Dolorès 33 | 34 | Undergraduate student in computer sciences, she knows GPU programming, machine learning, and various modern stacks. 35 | 36 | She wants to learn Kokkos for her six-month long internship on GPU computing. 37 | 38 | ## Estéban 39 | 40 | Mid-career physicist who has worked on solvers development for fifteen years, he has a couple of years of experience in Kokkos too. 41 | 42 | He wants to rewrite an existing code from scratch using Kokkos and make it as clean and portable as possible. 43 | 44 | ## Félicie 45 | 46 | System engineer who manages several clusters and helps the CI of various projects. 47 | 48 | She wants to deploy Kokkos for cluster users and for CI workflows. 49 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pandocfilters>=1.5.0,<1.6.0 2 | -------------------------------------------------------------------------------- /styles/after_body/print.tex: -------------------------------------------------------------------------------- 1 | \end{multicols} 2 | 3 | % add a watermark here 4 | 5 | % \watermark 6 | 7 | % add a notes section here 8 | 9 | % \notessection 10 | 11 | % add a notes section followed by a watermark here 12 | 13 | % \notessection{\watermark} 14 | -------------------------------------------------------------------------------- /styles/before_body/print.tex: -------------------------------------------------------------------------------- 1 | \relscale{\textratio} 2 | 3 | \begin{multicols}{2} 4 | -------------------------------------------------------------------------------- /styles/header/print.tex: -------------------------------------------------------------------------------- 1 | % custom header 2 | 3 | \usepackage[hmargin=0.5cm, vmargin=0cm, includeheadfoot]{geometry} 4 | \usepackage{multicol} 5 | \usepackage{mdframed} 6 | \usepackage{fancyhdr} 7 | \usepackage{calc} 8 | \usepackage[nobottomtitles*]{titlesec} 9 | \usepackage{tabularx} 10 | \usepackage{url} 11 | \usepackage{graphicx} 12 | \usepackage{fvextra} 13 | \usepackage{ifthen} 14 | \usepackage{relsize} 15 | 16 | \definecolor{kokkos}{RGB}{30, 120, 152} 17 | 18 | \colorlet{main}{kokkos!80} 19 | \colorlet{lightmain}{kokkos!30} 20 | \colorlet{lightgray}{gray!20} 21 | \colorlet{darkgray}{gray!70!black} 22 | 23 | \renewcommand{\familydefault}{\sfdefault} 24 | 25 | % reset maketitle 26 | 27 | \renewcommand\maketitle{} 28 | 29 | % title 30 | 31 | \makeatletter 32 | \def\gettitle{\@title} 33 | \makeatother 34 | 35 | % images 36 | 37 | \graphicspath{{images/}} 38 | 39 | % headers and footers 40 | 41 | \newlength{\headerheight} 42 | \newlength{\footerheight} 43 | \newlength{\headerheightreduction} 44 | \newlength{\footerheightreduction} 45 | \newlength{\horizontalskip} 46 | \newlength{\printmargin} 47 | \newlength{\printmarginwidth} 48 | \newlength{\printmarginheight} 49 | \newlength{\marginwidth} 50 | \newlength{\marginwidthreduction} 51 | \newlength{\footheight} 52 | \newlength{\headerfooterskip} 53 | \newlength{\headerfooterskipreduction} 54 | 55 | % print sizes tuning 56 | 57 | % change here to add print margin 58 | % this adds \printmargin on the outer horizontal margin (think two-sided document) 59 | % and 0.75\printmargin on the top and bottom vertical margins 60 | % in other words, this adds \printmargin horizontally and 1.5\printmargin vertically 61 | % note that using this variable will certainly mess with the page layout! 62 | % to counterbalance this, you can tune the other lengths defined below 63 | % a good first guess is 0.4cm 64 | \setlength{\printmargin}{0cm} 65 | 66 | % change here to reduce the effect of \printmargin on the outer horizontal margin for the text body only 67 | % a good first guess is 0.2cm 68 | \setlength{\marginwidthreduction}{0cm} 69 | 70 | % change here to reduce the space between the text and the header/footer 71 | % a good first guess is 0.2cm 72 | \setlength{\headerfooterskipreduction}{0cm} 73 | 74 | % change here to reduce the height of the header 75 | % a good first guess is 0.2cm 76 | \setlength{\headerheightreduction}{0cm} 77 | 78 | % change here to reduce the height of the footer 79 | % a good first guess is 0.2cm 80 | \setlength{\footerheightreduction}{0cm} 81 | 82 | % change here to put a factor ratio on text size 83 | % change this value if nothing else helps! 84 | % a good first guess is 0.95 85 | \newcommand{\textratio}{1} 86 | 87 | \fancypagestyle{kokkos}{% 88 | \setlength{\headerheight}{2cm - \headerheightreduction} 89 | \setlength{\footerheight}{0.5cm - \footerheightreduction} 90 | \setlength{\horizontalskip}{0.25cm} 91 | \setlength{\marginwidth}{0.5cm} 92 | \setlength{\headerfooterskip}{0.5cm - \headerfooterskipreduction} 93 | \setlength{\printmarginwidth}{\printmargin} 94 | \setlength{\printmarginheight}{0.75\printmargin} 95 | 96 | \renewcommand{\headrulewidth}{0pt} 97 | \renewcommand{\footrulewidth}{0pt} 98 | 99 | \newgeometry{ 100 | inner=\marginwidth, 101 | outer=\marginwidth + \printmarginwidth - \marginwidthreduction, 102 | vmargin=0cm, 103 | includeheadfoot, 104 | headheight=\headerheight + \printmarginheight, 105 | headsep=\headerfooterskip, 106 | footskip=\headerfooterskip + \footerheight + \printmarginheight, 107 | } 108 | 109 | % set missing footheight 110 | \setlength{\footheight}{\footerheight + \printmarginheight} 111 | 112 | % set offsets for the header and footer blocks to have exactly the width of the page 113 | \fancyhfoffset[RE,LO]{\marginwidth} 114 | \fancyhfoffset[LE,RO]{\marginwidth + \printmarginwidth - \marginwidthreduction} 115 | 116 | \fancyhf{} 117 | 118 | \fancyhead{% 119 | % Add the title, the page number, and the version 120 | \setlength{\fboxsep}{0pt}% 121 | \colorbox{main}{% 122 | % logo with continuation 123 | \parbox[b][\headheight][b]{\headheight}{% 124 | % add left continuation of the logo 125 | \ifthenelse{\isodd{\value{page}}}{% 126 | }{% 127 | \colorbox{lightmain}{% 128 | \parbox[b][\headheight]{\printmargin}{~}% 129 | }% 130 | }% 131 | \begin{minipage}[b][\headheight][b]{\headerheight} 132 | % add top continuation of the logo 133 | \colorbox{lightmain}{% 134 | \parbox[b][\printmargin][b]{ 135 | 0.575\headerheight% manual adjust logo height 136 | }{~}% 137 | }% 138 | \vspace{-0.04cm} % manual adjust vertical gap 139 | \\ 140 | \includegraphics[ 141 | height=\headerheight, 142 | ]{images/kokkos_wire.pdf}% 143 | \end{minipage} 144 | }% 145 | % remove logo horizontal space 146 | \hspace{-\headheight} 147 | % title and page number 148 | \parbox[b][\headerheight][c]{\headwidth}{% 149 | \fancycenter{% 150 | % add print margin 151 | \ifthenelse{\isodd{\value{page}}}{% 152 | }{% 153 | \hspace{\printmarginwidth}% 154 | }% 155 | % add custom space for logo 156 | \hspace{0.4\headerheight} 157 | % title 158 | {\Huge\bfseries{}\href{https://kokkos.org/kokkos-core-wiki}{\gettitle}}% 159 | % page number 160 | {\Large\bfseries\color{lightmain}\ :: page \thepage}% 161 | }{}{% 162 | % version 163 | \small\color{lightmain}v\input{VERSION}% 164 | % add small margin 165 | \hspace{\horizontalskip} 166 | % add print margin 167 | \ifthenelse{\isodd{\value{page}}}{% 168 | \hspace{\printmarginwidth}% 169 | }{}% 170 | }% 171 | }% 172 | }% 173 | }% 174 | 175 | \fancyfoot{% 176 | % Add the Kokkos wiki link 177 | \setlength{\fboxsep}{0pt}% 178 | \colorbox{main}{% 179 | \parbox[b][\footheight][t]{\headwidth}{% 180 | \parbox[t][\footerheight][c]{\headwidth}{% 181 | % add print margin 182 | \ifthenelse{\isodd{\value{page}}}{% 183 | }{% 184 | \hspace{\printmarginwidth}% 185 | }% 186 | % add small margin 187 | \hspace{\horizontalskip}% 188 | % link 189 | \footnotesize\bfseries\color{lightmain}\strut\url{https://kokkos.org/kokkos-core-wiki}% 190 | }% 191 | }% 192 | }% 193 | }% 194 | } 195 | 196 | \pagestyle{kokkos} 197 | 198 | % sections 199 | 200 | \titleformat{\subsection}[display] 201 | {\Large\bfseries} 202 | {} 203 | {0pt} 204 | {\subsectionstyle} 205 | 206 | \titleformat{\subsubsection}[display] 207 | {\large\bfseries} 208 | {} 209 | {0pt} 210 | {\subsubsectionstyle} 211 | [\subsubsectionstyleextra] 212 | 213 | \titleformat{\paragraph}[display] 214 | {\normalsize\bfseries} 215 | {} 216 | {0pt} 217 | {\paragraphstyle} 218 | 219 | \titleformat{\subparagraph}[display] 220 | {\small\bfseries} 221 | {} 222 | {0pt} 223 | {\subparagraphstyle} 224 | 225 | \newcommand{\subsectionstyle}[1]{% 226 | \colorbox{main}{\parbox{\dimexpr\linewidth-2\fboxsep}{\strut#1}}% 227 | } 228 | 229 | \newcommand{\subsubsectionstyle}[1]{% 230 | \strut#1% 231 | } 232 | 233 | \newcommand{\subsubsectionstyleextra}{% 234 | \titleline{\color{main}\titlerule}% 235 | } 236 | 237 | \newcommand{\paragraphstyle}[1]{% 238 | #1% 239 | } 240 | 241 | \newcommand{\subparagraphstyle}[1]{% 242 | {\color{main}#1}% 243 | } 244 | 245 | % tables 246 | 247 | \newcommand{\toprule}{% 248 | \rowcolor{lightmain} 249 | } 250 | \newcommand{\midrule}{% 251 | } 252 | \newcommand{\bottomrule}{% 253 | } 254 | 255 | \newcommand{\tblhead}[1]{#1} 256 | 257 | \renewcommand{\arraystretch}{1.25} 258 | 259 | \AtBeginEnvironment{tabularx}{% 260 | \rowcolors*{2}{lightgray}{}% 261 | } 262 | 263 | % code block 264 | 265 | \newcommand\codesize\scriptsize 266 | 267 | \DefineVerbatimEnvironment{Highlighting}{Verbatim}{ 268 | breaklines, 269 | breakanywhere, 270 | commandchars=\\\{\} 271 | } 272 | 273 | \usepackage{tcolorbox} 274 | \tcbuselibrary{ 275 | breakable, 276 | skins, 277 | } 278 | 279 | \ifcsmacro{Shaded}{ 280 | \renewenvironment{Shaded}[1][]{% 281 | \begin{tcolorbox}[ 282 | skin=tilemiddle, 283 | colback=lightgray, 284 | leftrule=0pt, 285 | rightrule=0pt, 286 | toprule=0pt, 287 | bottomrule=0pt, 288 | fontupper=\codesize, 289 | before skip=0.5em, 290 | after skip=0.5em, 291 | left skip=0pt, 292 | right skip=0pt, 293 | top=\fboxsep, 294 | bottom=\fboxsep, 295 | left=\fboxsep, 296 | right=\fboxsep , 297 | pad at break=0.75\fboxsep, 298 | #1, 299 | ]% 300 | }{% 301 | \end{tcolorbox} 302 | } 303 | 304 | \colorlet{codemain}{kokkos} 305 | \colorlet{codelightmain}{kokkos!70} 306 | \colorlet{codegray}{gray!90!black} 307 | 308 | \renewcommand{\AttributeTok}[1]{{\color{codelightmain}#1}} 309 | \renewcommand{\BuiltInTok}[1]{{\color{codemain}#1}} 310 | \renewcommand{\CommentTok}[1]{{\color{codegray}\textit{#1}}} 311 | \renewcommand{\CommentVarTok}[1]{{\color{codegray}\textit{#1}}} 312 | \renewcommand{\ConstantTok}[1]{{\color{codegray}#1}} 313 | \renewcommand{\ControlFlowTok}[1]{{\color{codemain}#1}} 314 | \renewcommand{\DataTypeTok}[1]{{\color{codemain}#1}} 315 | \renewcommand{\DecValTok}[1]{{\color{codelightmain}#1}} 316 | \renewcommand{\FunctionTok}[1]{{\color{codemain}#1}} 317 | \renewcommand{\KeywordTok}[1]{{\color{codemain}\textbf{#1}}} 318 | \renewcommand{\OtherTok}[1]{{\color{codelightmain}#1}} 319 | \renewcommand{\PreprocessorTok}[1]{{\color{codegray}#1}} 320 | \renewcommand{\StringTok}[1]{{\color{codemain}#1}} 321 | }{} 322 | 323 | % code inline 324 | 325 | \let\oldtexttt\texttt 326 | \renewcommand{\texttt}[1]{{\codesize\oldtexttt{#1}}} 327 | 328 | % widows and orphans 329 | 330 | \widowpenalty10000 331 | \clubpenalty10000 332 | 333 | % extra commands 334 | 335 | \makeatletter 336 | 337 | % fill with dotted lines until the end of the page 338 | \newcommand{\dotcolumnfill}[1]{% 339 | % https://tex.stackexchange.com/a/319436 340 | \par 341 | \null 342 | \vskip -\ht\strutbox 343 | \xleaders \hb@xt@ \hsize {% 344 | \strut \cleaders \hb@xt@ .44em{\hss.\hss}\hfill 345 | }\vfill 346 | #1 347 | \vskip \ht\strutbox 348 | \break 349 | } 350 | 351 | \makeatother 352 | 353 | % notes section with dotted lines 354 | \newcommand\notessection[1]{% 355 | \subsection{Notes} 356 | 357 | \vspace{1em} 358 | 359 | \dotcolumnfill{#1} 360 | } 361 | 362 | % watermark with CExA and MDLS logos 363 | \newcommand\watermark{% 364 | \begin{flushright} 365 | Brought to you by% 366 | \raisebox{-0.4\height}{% 367 | \hspace{1em}% 368 | \href{https://cexa-project.org}{\includegraphics[height=1cm]{cexa}}% 369 | \hspace{1em}% 370 | \href{https://mdls.fr}{\includegraphics[height=1cm]{maison_de_la_simulation}}% 371 | }% 372 | \end{flushright} 373 | } 374 | -------------------------------------------------------------------------------- /terminology.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Terminology cheat sheet for Kokkos 3 | --- 4 | 5 | 6 | 7 | # Terminology cheat sheet 8 | 9 | 10 | 11 | ## Kokkos mapping 12 | 13 | ### Hierarchical parallelism 14 | 15 | | Kokkos | Cuda | HIP | SYCL | 16 | |---------------------------------------------------------------------------------|--------|-------------------|------------| 17 | | Instance of `Kokkos::TeamPolicy` | Grid | Grid, index range | ND-Range | League 18 | | Instance of `Kokkos::TeamPolicy::member_type` (one-dimensional only) | Block | Block, work group | Work-group | Team 19 | | Instance of `Kokkos::TeamThread*Range` | Warp | Warp, wavefront | Sub-group | SIMD chunk 20 | | Instance of `Kokkos::TeamVector*Range`, `Kokkos::ThreadVector*Range` | Thread | Thread, work item | Work-item | OpenMP thread, SIMD lane 21 | 22 | 23 | Doc https://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-hierarchy 24 | Doc https://rocm.docs.amd.com/projects/HIP/en/develop/reference/kernel_language.html 25 | Doc https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2025-0/sycl-thread-and-memory-hierarchy.html 26 | 27 | 28 | ### Memory 29 | 30 | | Kokkos | Cuda | HIP | SYCL | 31 | |-----------------------------------------------|------------------------------|--------------------|--------------------------------------------| 32 | | `Kokkos::DefaultExecutionSpace::memory_space` | Global memory | Global memory | Global memory | 33 | | `Kokkos::ScratchMemorySpace` (space level 0) | Shared memory | Shared memory | (Shared) local memory | 34 | | `Kokkos::ScratchMemorySpace` (space level 1) | Global memory | Global memory | Global memory | 35 | | `Kokkos::SharedHostPinnedSpace` | Pinned host memory | Pinned host memory | Unified shared memory (USM) of type host | 36 | | `Kokkos::SharedSpace` | Unified virtual memory (UVM) | Unified memory | Unified shared memory (USM) of type shared | 37 | 38 | 39 | Doc https://docs.nvidia.com/cuda/cuda-c-programming-guide/#memory-hierarchy 40 | Doc https://rocm.docs.amd.com/projects/HIP/en/docs-develop/how-to/hip_runtime_api/memory_management/device_memory.html 41 | Doc https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2025-0/sycl-thread-and-memory-hierarchy.html 42 | 43 | 44 | ### Execution 45 | 46 | | Kokkos | Cuda | HIP | SYCL | 47 | |-----------------------------------------------|------------------------------|--------------------|------------------------------------| 48 | | Instance of `Kokkos::DefaultExecutionSpace` | Stream | Stream | Queue | 49 | 50 | ## GPU terminology equivalences 51 | 52 | | Cuda on NVIDIA | HIP on AMD | SYCL/OpenCL on Intel | Notes | 53 | |-------------------------------|--------------------|----------------------|-------------------------| 54 | | Streaming multiprocessor (SM) | Compute unit (CU) | Compute unit | | 55 | | Streaming processor | Processing element | Processing element | | 56 | | Warp | Wavefront, warp | Sub-group | | 57 | | Thread | Work item | Work-item | | 58 | | NVPTX | AMDIL | SPIR-V | Not strictly equivalent | 59 | 60 | -------------------------------------------------------------------------------- /typos.toml: -------------------------------------------------------------------------------- 1 | [files] 2 | extend-exclude = [ 3 | "*.svg", 4 | ] 5 | 6 | [default.extend-words] 7 | ND = "ND" 8 | HSA = "HSA" 9 | -------------------------------------------------------------------------------- /utilization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Utilisation cheat sheet for Kokkos 3 | --- 4 | 5 | 6 | 7 | # Kokkos utilization cheat sheet 8 | 9 | Warning Only for Kokkos 4.5 and more, for older version look at the doc. 10 | 11 | Doc https://kokkos.org/kokkos-core-wiki/programmingguide.html 12 | 13 | Doc https://kokkos.org/kokkos-core-wiki/API/core-index.html 14 | 15 | Doc https://github.com/kokkos/kokkos-tutorials/blob/main/LectureSeries/KokkosTutorial_01_Introduction.pdf 16 | 17 | 18 | 19 | ## Header 20 | 21 | ```cpp 22 | #include 23 | ``` 24 | 25 | ## Initialization 26 | 27 | ### Initialize and finalize 28 | 29 | ```cpp 30 | int main(int argc, char* argv[]) { 31 | Kokkos::initialize(argc, argv); 32 | { /* ... */ } 33 | Kokkos::finalize(); 34 | } 35 | ``` 36 | 37 | ### Scope guard 38 | 39 | ```cpp 40 | int main(int argc, char* argv[]) { 41 | Kokkos::ScopeGuard kokkos(argc, argv); 42 | /* ... */ 43 | } 44 | ``` 45 | 46 | ## Kokkos concepts 47 | 48 | ### Execution spaces 49 | 50 | | Execution space | Device backend | Host backend | 51 | |-------------------------------------|----------------|--------------| 52 | | `Kokkos::DefaultExecutionSpace` | On device | On host | 53 | | `Kokkos::DefaultHostExecutionSpace` | On host | On host | 54 | 55 | 56 | Doc https://kokkos.org/kokkos-core-wiki/API/core/execution_spaces.html 57 | 58 | 59 | ### Memory spaces 60 | 61 | 62 | Doc https://kokkos.org/kokkos-core-wiki/API/core/memory_spaces.html 63 | 64 | 65 | #### Generic memory spaces 66 | 67 | | Memory space | Device backend | Host backend | 68 | |---------------------------------------------------|----------------|--------------| 69 | | `Kokkos::DefaultExecutionSpace::memory_space` | On device | On host | 70 | | `Kokkos::DefaultHostExecutionSpace::memory_space` | On host | On host | 71 | 72 | #### Specific memory spaces 73 | 74 | | Memory space | Description | 75 | |---------------------------------|---------------------------------------------------------------------------| 76 | | `Kokkos::HostSpace` | Accessible from the host but maybe not from the device | 77 | | `Kokkos::SharedSpace` | Accessible from the host and the device; copy managed by the driver | 78 | | `Kokkos::SharedHostPinnedSpace` | Accessible from the host and the device; zero copy access in small chunks | 79 | 80 | 81 |
82 | Examples 83 | 84 | ```cpp 85 | // Host space 86 | Kokkos::View hostView("hostView", numberOfElements); 87 | 88 | // Shared space 89 | Kokkos::View sharedView("sharedView", numberOfElements); 90 | 91 | // Shared host pinned space 92 | Kokkos::View sharedHostPinnedView("sharedHostPinnedView", numberOfElements); 93 | ``` 94 | 95 |
96 | 97 | 98 | ## Memory management 99 | 100 | ### View 101 | 102 | 103 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/view.html 104 | 105 | 106 | #### Create 107 | 108 | ```cpp 109 | Kokkos::View view("label", numberOfElementsAtRuntimeI, numberOfElementsAtRuntimeJ); 110 | ``` 111 | 112 | | Template argument | Description | 113 | |-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 114 | | `DataType` | `ScalarType` for the data type, followed by a `*` for each runtime dimension, then by a `[numberOfElements]` for each compile time dimension, mandatory | 115 | | `LayoutType` | See [memory layouts](#memory-layouts), optional | 116 | | `MemorySpace` | See [memory spaces](#memory-spaces), optional | 117 | | `MemoryTraits` | See [memory traits](#memory-traits), optional | 118 | 119 | Warning The order of template arguments is important. 120 | 121 | 122 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/view.html#constructors 123 | 124 |
125 | Examples 126 | 127 | ```cpp 128 | // A 1D view of doubles 129 | Kokkos::View view1D("view1D", 100); 130 | 131 | // Const view of doubles 132 | Kokkos::View constView1D("constView1D", 100) = view1D; 133 | 134 | // A 2D view of integers 135 | Kokkos::View view2D("view2D", 50, 50); 136 | 137 | // 3D view with 2 runtime dimensions and 1 compile time dimension 138 | Kokkos::View view3D("view3D", 50, 42, 25); 139 | ``` 140 | 141 |
142 | 143 | 144 | #### Manage 145 | 146 | | Method | Description | 147 | |---------------|-----------------------------------------------------------| 148 | | `(i, j...)` | Returns and sets the value at index `i`, `j`, etc. | 149 | | `size()` | Returns the total number of elements in the view | 150 | | `rank()` | Returns the number of dimensions | 151 | | `layout()` | Returns the layout of the view | 152 | | `extent(dim)` | Returns the number of elements in the requested dimension | 153 | | `data()` | Returns a pointer to the underlying data | 154 | 155 | 156 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/view.html#data-access-functions 157 | 158 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/view.html#data-layout-dimensions-strides 159 | 160 | 161 | ###### Resize and preserve content 162 | 163 | ```cpp 164 | Kokkos::resize(view, newNumberOfElementsI, newNumberOfElementsJ...); 165 | ``` 166 | 167 | 168 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/resize.html 169 | 170 | 171 | ###### Reallocate and do not preserve content 172 | 173 | ```cpp 174 | Kokkos::realloc(view, newNumberOfElementsI, newNumberOfElementsJ...); 175 | ``` 176 | 177 | 178 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/realloc.html 179 | 180 | 181 | ### Memory Layouts 182 | 183 | | Layout | Description | Default | 184 | |------------------------|-------------------------------------------------------------------------------------------------------------|---------| 185 | | `Kokkos::LayoutRight` | Strides increase from the right most to the left most dimension, also known as row-major or C-like | CPU | 186 | | `Kokkos::LayoutLeft` | Strides increase from the left most to the right most dimension, also known as column-major or Fortran-like | GPU | 187 | | `Kokkos::LayoutStride` | Strides can be arbitrary for each dimension | | 188 | 189 | By default, a layout suited for loops on the high frequency index is used. 190 | 191 | 192 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/view.html#data-layout-dimensions-strides 193 | 194 |
195 | Example 196 | 197 | ```cpp 198 | // 2D view with LayoutRight 199 | Kokkos::View view2D("view2D", 50, 50); 200 | ``` 201 | 202 |
203 | 204 | 205 | ### Memory trait 206 | 207 | Memory traits are indicated with `Kokkos::MemoryTraits<>` and are combined with the `|` (pipe) operator. 208 | 209 | | Memory trait | Description | 210 | |------------------------|-------------------------------------------------------------------------------------------------------------------------------------------| 211 | | `Kokkos::Unmanaged` | The allocation has to be managed manually | 212 | | `Kokkos::Atomic` | All accesses to the view are atomic | 213 | | `Kokkos::RandomAccess` | Hint that the view is used in a random access manner; if the view is also `const` this may trigger more efficient load operations on GPUs | 214 | | `Kokkos::Restrict` | There is no aliasing of the view by other data structures in the current scope | 215 | 216 | 217 | Doc https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/View.html#access-traits 218 | 219 |
220 | Examples 221 | 222 | ```cpp 223 | // Unmanaged view on CPU 224 | double* data = new double[numberOfElements]; 225 | Kokkos::View> unmanagedView(data, numberOfElements); 226 | 227 | // Unmanaged view on GPU using CUDA 228 | double* data; 229 | cudaMalloc(&data, numberOfElements * sizeof(double)); 230 | Kokkos::View, Kokkos::CudaSpace> unmanagedView(data, numberOfElements); 231 | 232 | // Atomic view 233 | Kokkos::View> atomicView("atomicView", numberOfElements); 234 | 235 | // Random access with constant data 236 | // first, allocate non constant view 237 | Kokkos::View nonConstView ("data", numberOfElements); 238 | // then, make it constant 239 | Kokkos::View> randomAccessView = nonConstView; 240 | 241 | // Unmanaged, atomic, random access view on GPU using CUDA 242 | double* data; 243 | cudaMalloc(&data, numberOfElements* sizeof(double)); 244 | Kokkos::View> unmanagedView(data, numberOfElements); 245 | ``` 246 | 247 |
248 | 249 | 250 | ### Deep copy 251 | 252 | 253 | Warning Copying or assigning a view does a shallow copy, data are not synchronized in this case. 254 | 255 | 256 | ```cpp 257 | Kokkos::deep_copy(dest, src); 258 | ``` 259 | 260 | The views must have the same dimensions, data type, and reside in the same memory space ([mirror views](#mirror-view) can be deep copied on different memory spaces). 261 | 262 | 263 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/deep_copy.html 264 | 265 | Code 266 | 267 | - [Kokkos example - simple memoryspace](https://github.com/kokkos/kokkos/blob/master/example/tutorial/04_simple_memoryspaces/simple_memoryspaces.cpp) 268 | - [Kokkos example - overlapping deepcopy](https://github.com/kokkos/kokkos/blob/master/example/tutorial/Advanced_Views/07_Overlapping_DeepCopy/overlapping_deepcopy.cpp) 269 | - [Kokkos Tutorials - Exercise 3](https://github.com/kokkos/kokkos-tutorials/blob/main/Exercises/03/Solution/exercise_3_solution.cpp) 270 | 271 |
272 | Example 273 | 274 | ```cpp 275 | Kokkos::View view1("view1", numberOfElements); 276 | Kokkos::View view2("view2", numberOfElements); 277 | 278 | // Deep copy of view1 to view2 279 | Kokkos::deep_copy(view2, view1); 280 | ``` 281 | 282 |
283 | 284 | 285 | ### Mirror view 286 | 287 | 288 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/create_mirror.html 289 | 290 | Code 291 | 292 | - [Kokkos example - simple memoryspace](https://github.com/kokkos/kokkos/blob/master/example/tutorial/04_simple_memoryspaces/simple_memoryspaces.cpp) 293 | - [Kokkos Tutorials - Exercise 3](https://github.com/kokkos/kokkos-tutorials/blob/main/Exercises/03/Solution/exercise_3_solution.cpp) 294 | 295 | 296 | #### Create and always allocate on host 297 | 298 | ```cpp 299 | auto mirrorView = Kokkos::create_mirror(view); 300 | ``` 301 | 302 | #### Create and allocate on host if source view is not in host space 303 | 304 | ```cpp 305 | auto mirrorView = Kokkos::create_mirror_view(view); 306 | ``` 307 | 308 | #### Create, allocate and synchronize if source view is not in same space as destination view 309 | 310 | ```cpp 311 | auto mirrorView = Kokkos::create_mirror_view_and_copy(ExecutionSpace(), view); 312 | ``` 313 | 314 | ### Subview 315 | 316 | A subview has the same reference count as its parent view, so the parent view won't be deallocated before all subviews go away. 317 | 318 | 319 | Doc https://kokkos.org/kokkos-core-wiki/API/core/view/subview.html 320 | 321 | Doc https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/Subviews.html 322 | 323 | 324 | ```cpp 325 | auto subview = Kokkos::subview(view, selector1, selector2, ...); 326 | ``` 327 | 328 | | Subset selector | Description | 329 | |-----------------------------|-------------------------------------| 330 | | `Kokkos::ALL` | All elements in this dimension | 331 | | `Kokkos::pair(first, last)` | Range of elements in this dimension | 332 | | `value` | Specific element in this dimension | 333 | 334 | ### Scatter view (experimental) 335 | 336 | 337 | Doc https://kokkos.org/kokkos-core-wiki/API/containers/ScatterView.html 338 | 339 | Training https://github.com/kokkos/kokkos-tutorials/blob/main/LectureSeries/KokkosTutorial_03_MDRangeMoreViews.pdf 340 | 341 | Code 342 | 343 | - [Kokkos Tutorials - ScatterView](https://github.com/kokkos/kokkos-tutorials/tree/main/Exercises/scatter_view) 344 | 345 | 346 | #### Specific header 347 | 348 | ```cpp 349 | #include 350 | ``` 351 | 352 | #### Create 353 | 354 | ```cpp 355 | auto scatterView = Kokkos::Experimental::create_scatter_view(targetView); 356 | ``` 357 | 358 | | Template argument | Description | 359 | |-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 360 | | `Operation` | See [scatter operation](#scatter-operation); defaults to `Kokkos::Experimental::ScatterSum` | 361 | | `Duplication` | Whether to duplicate the grid or not; choices are `Kokkos::Experimental::ScatterDuplicated`, and `Kokkos::Experimental::ScatterNonDuplicated`; defaults to the option that is the most optimised for `targetView`'s execution space | 362 | | `Contribution` | Whether to contribute using atomics or not; choices are `Kokkos::Experimental::ScatterAtomic`, or `Kokkos::Experimental::ScatterNonAtomic`; defaults to the option that is the most optimised for `targetView`'s execution space | 363 | 364 | #### Scatter operation 365 | 366 | | Operation | Description | 367 | |-------------------------------------|---------------| 368 | | `Kokkos::Experimental::ScatterSum` | Sum | 369 | | `Kokkos::Experimental::ScatterProd` | Product | 370 | | `Kokkos::Experimental::ScatterMin` | Minimum value | 371 | | `Kokkos::Experimental::ScatterMax` | Maximum value | 372 | 373 | #### Scatter, compute, and gather 374 | 375 | ```cpp 376 | Kokkos::parallel_for( 377 | "label", 378 | /* ... */, 379 | KOKKOS_LAMBDA (/* ... */) { 380 | // scatter 381 | auto scatterAccess = scatterView.access(); 382 | 383 | // compute 384 | scatterAccess(/* index */) /* operation */ /* contribution */; 385 | } 386 | ); 387 | 388 | // gather 389 | Kokkos::Experimental::contribute(targetView, scatterView); 390 | ``` 391 | 392 | 393 |
394 | Full example 395 | 396 | ```cpp 397 | #include 398 | 399 | // Compute histogram of values in view1D 400 | KOKKOS_INLINE_FUNCTION int getIndex(double position) { /* ... */ } 401 | KOKKOS_INLINE_FUNCTION double compute(double weight) { /* ... */ } 402 | 403 | // Views of 100 elements to process 404 | Kokkos::View position("position", 100); 405 | Kokkos::View weight("weight", 100); 406 | 407 | // Histogram of 10 slots 408 | Kokkos::View histogram("bar", 10); 409 | auto histogramScatter = Kokkos::Experimental::create_scatter_view(histogram); 410 | 411 | Kokkos::parallel_for( 412 | 100, 413 | KOKKOS_LAMBDA (const int i) { 414 | // scatter 415 | auto access = histogramScatter.access(); 416 | 417 | // compute 418 | const auto index = getIndex(position(i)); 419 | const auto contribution = compute(weight(i)); 420 | access(index) += contribution; 421 | } 422 | ); 423 | 424 | // gather 425 | Kokkos::Experimental::contribute(histogram, histogramScatter); 426 | ``` 427 | 428 |
429 | 430 | 431 | ## Parallel constructs 432 | 433 | ### For loop 434 | 435 | ```cpp 436 | Kokkos::parallel_for( 437 | "label", 438 | ExecutionPolicy(/* ... */), 439 | KOKKOS_LAMBDA (/* ... */) { /* ... */ } 440 | ); 441 | ``` 442 | 443 | ### Reduction 444 | 445 | ```cpp 446 | ScalarType result; 447 | Kokkos::parallel_reduce( 448 | "label", 449 | ExecutionPolicy(/* ... */), 450 | KOKKOS_LAMBDA (/* ... */, ScalarType& resultLocal) { /* ... */ }, 451 | Kokkos::ReducerConcept(result) 452 | ); 453 | ``` 454 | 455 | With `Kokkos::ReducerConcept` being one of the following: 456 | 457 | | Reducer | Operation | Description | 458 | |---------------------|-----------------------|--------------------------------------------| 459 | | `Kokkos::BAnd` | `&` | Binary and | 460 | | `Kokkos::BOr` | `\|` | Binary or | 461 | | `Kokkos::LAnd` | `&&` | Logical and | 462 | | `Kokkos::LOr` | `\|\|` | Logical or | 463 | | `Kokkos::Max` | `std::max` | Maximum | 464 | | `Kokkos::MaxLoc` | `std::max_element` | Maximum and associated index | 465 | | `Kokkos::Min` | `std::min` | Minimum | 466 | | `Kokkos::MinLoc` | `std::min_element` | Minimum and associated index | 467 | | `Kokkos::MinMax` | `std::minmax` | Minimum and maximum | 468 | | `Kokkos::MinMaxLoc` | `std::minmax_element` | Minimum and maximum and associated indices | 469 | | `Kokkos::Prod` | `*` | Product | 470 | | `Kokkos::Sum` | `+` | Sum | 471 | 472 | A scalar value may be passed, for which the reduction is limited to a sum. 473 | When using the `TeamVectorMDRange`, the `TeamThreadMDRange`, or the `ThreadVectorMDRange` execution policy, only a scalar value may be passed, for which the reduction is also limited to a sum. 474 | 475 | 476 | Doc https://kokkos.org/kokkos-core-wiki/API/core/parallel-dispatch/parallel_reduce.html 477 | 478 | Doc https://kokkos.org/kokkos-core-wiki/API/core/builtin_reducers.html 479 | 480 | 481 | ### Fences 482 | 483 | #### Global fence 484 | 485 | ```cpp 486 | Kokkos::fence("label"); 487 | ``` 488 | 489 | 490 | Doc https://kokkos.org/kokkos-core-wiki/API/core/parallel-dispatch/fence.html 491 | 492 | 493 | #### Execution space fence 494 | 495 | ```cpp 496 | ExecutionSpace().fence("label"); 497 | ``` 498 | 499 | #### Team barrier 500 | 501 | ```cpp 502 | Kokkos::TeamPolicy<>::member_type().team_barrier(); 503 | ``` 504 | 505 | 506 | Doc https://kokkos.org/kokkos-core-wiki/API/core/execution_spaces.html#functionality 507 | 508 | 509 | ## Execution policy 510 | 511 | ### Create 512 | 513 | ```cpp 514 | ExecutionPolicy policy(/* ... */); 515 | ``` 516 | 517 | | Template argument | Description | 518 | |-------------------|----------------------------------------------------------------------------------------| 519 | | `ExecutionSpace` | See [execution spaces](#execution-spaces); defaults to `Kokkos::DefaultExecutionSpace` | 520 | | `Schedule` | How to schedule work items; defaults to machine and backend specifics | 521 | | `IndexType` | Integer type to be used for the index; defaults to `int64_t` | 522 | | `LaunchBounds` | Hints for CUDA and HIP launch bounds | 523 | | `WorkTag` | Empty tag class to call the functor | 524 | 525 | 526 | Doc https://kokkos.org/kokkos-core-wiki/API/core/Execution-Policies.html 527 | 528 | 529 | ### Ranges 530 | 531 | #### One-dimensional range 532 | 533 | ```cpp 534 | Kokkos::RangePolicy policy(first, last); 535 | ``` 536 | 537 | If the range starts at 0 and uses default parameters, can be replaced by just the number of elements. 538 | 539 | 540 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/RangePolicy.html 541 | 542 | 543 | #### Multi-dimensional (dimension 2) 544 | 545 | ```cpp 546 | Kokkos::MDRangePolicy> policy({firstI, firstJ}, {lastI, lastJ}); 547 | ``` 548 | 549 | 550 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/MDRangePolicy.html 551 | 552 | 553 | ### Hierarchical parallelism 554 | 555 | 556 | Doc https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/HierarchicalParallelism.html 557 | 558 | 559 | #### Team policy 560 | 561 | ```cpp 562 | Kokkos::TeamPolicy policy(leagueSize, teamSize); 563 | ``` 564 | 565 | Usually, `teamSize` is replaced by `Kokkos::AUTO` to let Kokkos determine it. 566 | A kernel running in a team policy has a `Kokkos::TeamPolicy<>::member_type` argument: 567 | 568 | | Method | Description | 569 | |-----------------|-------------------------------------| 570 | | `league_size()` | Number of teams in the league | 571 | | `league_rank()` | Index of the team within the league | 572 | | `team_size()` | Number of threads in the team | 573 | | `team_rank()` | Index of the thread within the team | 574 | 575 | Note that nested parallel constructs do not use `KOKKOS_LAMBDA` to create lambdas. One must use the C++ syntax, for example `[=]` or `[&]`. 576 | 577 | 578 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamPolicy.html 579 | 580 | 581 | #### Team vector level (2-level hierarchy) 582 | 583 | ```cpp 584 | Kokkos::parallel_for( 585 | "label", 586 | Kokkos::TeamPolicy(numberOfElementsI, Kokkos::AUTO), 587 | KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& teamMember) { 588 | const int i = teamMember.team_rank(); 589 | 590 | Kokkos::parallel_for( 591 | Kokkos::TeamVectorRange(teamMember, firstJ, lastJ), 592 | [=] (const int j) { /* ... */ } 593 | ); 594 | } 595 | ); 596 | ``` 597 | 598 | ##### One-dimensional range 599 | 600 | ```cpp 601 | Kokkos::TeamVectorRange range(teamMember, firstJ, lastJ); 602 | ``` 603 | 604 | 605 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamVectorRange.html 606 | 607 | 608 | ##### Multi-dimensional range (dimension 2) 609 | 610 | ```cpp 611 | Kokkos::TeamVectorMDRange, Kokkos::TeamPolicy<>::member_type> range(teamMember, numberOfElementsJ, numberOfElementsK); 612 | ``` 613 | 614 | 615 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamVectorMDRange.html 616 | 617 | 618 | #### Team thread vector level (3-level hierarchy) 619 | 620 | ```cpp 621 | Kokkos::parallel_for( 622 | "label", 623 | Kokkos::TeamPolicy(numberOfElementsI, Kokkos::AUTO), 624 | KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& teamMember) { 625 | const int i = teamMember.team_rank(); 626 | 627 | Kokkos::parallel_for( 628 | Kokkos::TeamThreadRange(teamMember, firstJ, lastJ), 629 | [=] (const int j) { 630 | Kokkos::parallel_for( 631 | Kokkos::ThreadVectorRange(teamMember, firstK, lastK), 632 | [=] (const int k) { /* ... */ } 633 | ); 634 | } 635 | ); 636 | } 637 | ); 638 | ``` 639 | 640 | ##### One-dimensional range 641 | 642 | ```cpp 643 | Kokkos::TeamThreadRange range(teamMember, firstJ, lastJ); 644 | Kokkos::ThreadVectorRange range(teamMember, firstK, lastK); 645 | ``` 646 | 647 | 648 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamThreadRange.html 649 | 650 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/ThreadVectorRange.html 651 | 652 | 653 | ##### Multi-dimensional range (dimension 2) 654 | 655 | ```cpp 656 | Kokkos::TeamThreadMDRange, Kokkos::TeamPolicy<>::member_type> range(teamMember, numberOfElementsJ, numberOfElementsK); 657 | Kokkos::ThreadVectorMDRange, Kokkos::TeamPolicy<>::member_type> range(teamMember, numberOfElementsL, numberOfElementsM); 658 | ``` 659 | 660 | 661 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/TeamThreadMDRange.html 662 | 663 | Doc https://kokkos.org/kokkos-core-wiki/API/core/policies/ThreadVectorMDRange.html 664 | 665 | 666 | ## Scratch memory 667 | 668 | Each team has access to a scratch memory pad, which has the team's lifetime, and is only accessible by the team's threads. 669 | 670 | 671 | Doc https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/HierarchicalParallelism.html#team-scratch-pad-memory 672 | 673 | 674 | ### Scratch memory space 675 | 676 | | Space level | Memory size | Access speed | 677 | |-------------|-----------------------------|--------------| 678 | | 0 | Limited (tens of kilobytes) | Fast | 679 | | 1 | Larger (few gigabytes) | Medium | 680 | 681 | Used when passing the team policy to the parallel construct and when creating the scratch memory pad. 682 | 683 | ### Create and populate 684 | 685 | ```cpp 686 | // Define a scratch memory pad type 687 | using ScratchPad = Kokkos::View>; 688 | 689 | // Compute how much scratch memory is needed (in bytes) 690 | size_t bytes = ScratchPad::shmem_size(vectorSize); 691 | 692 | // Create the team policy and specify the total scratch memory needed 693 | Kokkos::parallel_for( 694 | "label", 695 | Kokkos::TeamPolicy<>(leagueSize, teamSize).set_scratch_size(spaceLevel, Kokkos::PerTeam(bytes)), 696 | KOKKOS_LAMBDA (const Kokkos::TeamPolicy<>::member_type& teamMember) { 697 | const int i = teamMember.league_rank(); 698 | 699 | // Create the scratch pad 700 | ScratchPad scratch(teamMember.team_scratch(spaceLevel), vectorSize); 701 | 702 | // Initialize it 703 | Kokkos::parallel_for( 704 | Kokkos::TeamVectorRange(teamMember, vectorSize), 705 | [=] (const int j) { scratch(j) = getScratchData(i, j); } 706 | ); 707 | 708 | // Synchronize 709 | teamMember.team_barrier(); 710 | } 711 | ); 712 | ``` 713 | 714 | ## Atomics 715 | 716 | 717 | Doc https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/Atomic-Operations.html 718 | 719 | Doc https://kokkos.org/kokkos-core-wiki/API/core/atomics.html 720 | 721 | 722 | ### Atomic operations 723 | 724 | | Operation | Replaces | 725 | |--------------------------------|----------------------| 726 | | `Kokkos::atomic_add(&x, y)` | `x += y` | 727 | | `Kokkos::atomic_and(&x, y)` | `x &= y` | 728 | | `Kokkos::atomic_dec(&x)` | `x--` | 729 | | `Kokkos::atomic_inc(&x)` | `x++` | 730 | | `Kokkos::atomic_lshift(&x, y)` | `x = x << y` | 731 | | `Kokkos::atomic_max(&x, y)` | `x = std::max(x, y)` | 732 | | `Kokkos::atomic_min(&x, y)` | `x = std::min(x, y)` | 733 | | `Kokkos::atomic_mod(&x, y)` | `x %= y` | 734 | | `Kokkos::atomic_nand(&x, y)` | `x = !(x && y)` | 735 | | `Kokkos::atomic_or(&x, y)` | `x \|= y` | 736 | | `Kokkos::atomic_rshift(&x, y)` | `x = x >> y` | 737 | | `Kokkos::atomic_sub(&x, y)` | `x -= y` | 738 | | `Kokkos::atomic_store(&x, y)` | `x = y` | 739 | | `Kokkos::atomic_xor(&x, y)` | `x ^= y` | 740 | 741 | 742 |
743 | Example 744 | 745 | ```cpp 746 | Kokkos::parallel_for( 747 | numberOfElements, 748 | KOKKOS_LAMBDA (const int i) { 749 | const int value = /* ... */; 750 | const int bucketIndex = computeBucketIndex (value); 751 | Kokkos::atomic_inc(&histogram(bucketIndex)); 752 | } 753 | ); 754 | ``` 755 | 756 |
757 | 758 | 759 | ### Atomic exchanges 760 | 761 | | Operation | Description | 762 | |----------------------------------------------------------|----------------------------------------------------------------------------------------------| 763 | | `Kokkos::atomic_exchange(&x, desired)` | Assign desired value to object and return old value | 764 | | `Kokkos::atomic_compare_exchange(&x, expected, desired)` | Assign desired value to object if the object has the expected value and return the old value | 765 | 766 | 767 |
768 | Example 769 | 770 | ```cpp 771 | // Assign desired value to object and return old value 772 | int desired = 20; 773 | int obj = 10; 774 | int old = Kokkos::atomic_exchange(&obj, desired); 775 | 776 | // Assign desired value to object if the object has the expected value and return the old value 777 | int desired = 20; 778 | int obj = 10; 779 | int expected = 10; 780 | int old = atomic_compare_exchange(&obj, expected, desired); 781 | ``` 782 | 783 |
784 | 785 | 786 | ## Mathematics 787 | 788 | ### Math functions 789 | 790 | | Function type | List of functions (prefixed by `Kokkos::`) | 791 | |------------------|--------------------------------------------------------------------------| 792 | | Basic operations | `abs`, `fabs`, `fmod`, `remainder`, `fma`, `fmax`, `fmin`, `fdim`, `nan` | 793 | | Exponential | `exp`, `exp2`, `expm1`, `log`, `log2`, `log10`, `log1p` | 794 | | Power | `pow`, `sqrt`, `cbrt`, `hypot` | 795 | | Trigonometric | `sin`, `cos`, `tan`, `asin`, `acos`, `atan`, `atan2` | 796 | | Hyperbolic | `sinh`, `cosh`, `tanh`, `asinh`, `acosh`, `atanh` | 797 | | Error and gamma | `erf`, `erfc`, `tgamma`, `lgamma` | 798 | | Nearest | `ceil`, `floor`, `trunc`, `round`, `nearbyint` | 799 | | Floating point | `logb`, `nextafter`, `copysign` | 800 | | Comparisons | `isfinite`, `isinf`, `isnan`, `signbit` | 801 | 802 | Note that not all C++ standard math functions are available. 803 | 804 | 805 | Doc https://kokkos.org/kokkos-core-wiki/API/core/numerics/mathematical-functions.html?highlight=math 806 | 807 | 808 | ### Complex numbers 809 | 810 | #### Create 811 | 812 | ```cpp 813 | Kokkos::complex complex(realPart, imagPart); 814 | ``` 815 | 816 | #### Manage 817 | 818 | | Method | Description | 819 | |----------|------------------------------------| 820 | | `real()` | Returns or sets the real part | 821 | | `imag()` | Returns or sets the imaginary part | 822 | 823 | 824 | Doc https://kokkos.org/kokkos-core-wiki/API/core/utilities/complex.html 825 | 826 | 827 | ## Utilities 828 | 829 | 830 | Doc https://kokkos.org/kokkos-core-wiki/API/core/Utilities.html 831 | 832 | 833 | ### Code interruption 834 | 835 | ```cpp 836 | Kokkos::abort("message"); 837 | ``` 838 | 839 | ### Print inside a kernel 840 | 841 | ```cpp 842 | Kokkos::printf("format string", arg1, arg2); 843 | ``` 844 | 845 | Similar to `std::printf`. 846 | 847 | ### Timer 848 | 849 | #### Create 850 | 851 | ```cpp 852 | Kokkos::Timer timer; 853 | ``` 854 | 855 | #### Manage 856 | 857 | | Method | Description | 858 | |-------------|--------------------------------------------------------------| 859 | | `seconds()` | Returns the time in seconds since construction or last reset | 860 | | `reset()` | Resets the timer to zero | 861 | 862 | ### Manage parallel environment 863 | 864 | | Function | Description | 865 | |-------------------------|------------------------------------------------------------------------| 866 | | `Kokkos::device_id()` | Returns the device ID of the current device | 867 | | `Kokkos::num_devices()` | Returns the number of devices available to the current execution space | 868 | 869 | ## Macros 870 | 871 | 872 | Doc https://kokkos.org/kokkos-core-wiki/API/core/Macros.html 873 | 874 | 875 | ### Essential macros 876 | 877 | | Macro | Description | 878 | |--------------------------|--------------------------------------------------------| 879 | | `KOKKOS_LAMBDA` | Replaces capture argument for lambdas | 880 | | `KOKKOS_CLASS_LAMBDA` | Replaces capture argument for lambdas, captures `this` | 881 | | `KOKKOS_FUNCTION` | Functor attribute | 882 | | `KOKKOS_INLINE_FUNCTION` | Inlined functor attribute | 883 | 884 | ### Extra macros 885 | 886 | | Macro | Description | 887 | |------------------------|---------------------------------------------------------------------------------------| 888 | | `KOKKOS_VERSION` | Kokkos full version | 889 | | `KOKKOS_VERSION_MAJOR` | Kokkos major version | 890 | | `KOKKOS_VERSION_MINOR` | Kokkos minor version | 891 | | `KOKKOS_VERSION_PATCH` | Kokkos patch level | 892 | | `KOKKOS_ENABLE_*` | Any equivalent CMake option passed when building Kokkos, see installation cheat sheet | 893 | | `KOKKOS_ARCH_*` | Any equivalent CMake option passed when building Kokkos, see installation cheat sheet | 894 | --------------------------------------------------------------------------------