├── CONTRIBUTING.md ├── LICENSE.txt ├── README.md └── OpenCLCToOpenCLCppPortingGuidelines.md /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to the Porting Guidelines 2 | 3 | We hope the Porting Guidelines will be a constantly evolving document, and that's why 4 | all comments, suggestions for improvements, and contributions are most welcome. 5 | 6 | All pull requests should be done against the `develop` branch. The editors review all 7 | changes and periodically increments the version date in the introduction. 8 | There are no explicit style guidelines, however, it is recommended to follow current 9 | style of the document. 10 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 StreamComputing 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # [OpenCL C to OpenCL C++ Porting Guidelines](./OpenCLCToOpenCLCppPortingGuidelines.md) 2 | 3 | [This document](./OpenCLCToOpenCLCppPortingGuidelines.md) is a set of guidelines for 4 | developers who know OpenCL™ C and plan to port their kernels to OpenCL C++, and therefore 5 | they need to know the main differences between those two kernel languages. 6 | 7 | The main focus is on exposing the most important differences between OpenCL C++ and 8 | OpenCL C, and also those which may cause hard-to-detect bugs when porting to OpenCL C++. 9 | Developers who are familiar with OpenCL C and C++ should find OpenCL C++ easy to learn. 10 | 11 | ## Background 12 | 13 | On May 16, 2017, OpenCL 2.2 was released by Khronos Group 14 | ([release note](https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2)). 15 | The most important part of new OpenCL version is support for OpenCL C++ kernel language, 16 | which is defined as a static subset of the C++14 standard. OpenCL C++ introduces the 17 | long-awaited features like classes, templates, lambda expressions, function and operator 18 | overloads, and many other constructs which increase parallel programing productivity 19 | through generic programming. 20 | 21 | The aim of [the Porting Guidelines](./OpenCLCToOpenCLCppPortingGuidelines.md) 22 | is to help people who are familiar with OpenCL C and C++ to switch to OpenCL C++. 23 | The focus is not on highlighting all the differences between those two kernel languages, 24 | but rather on exposing and explaining those that are the most important, and those that 25 | may cause hard-to-detect bugs when porting from OpenCL C to OpenCL C++. 26 | In the future the guidelines may also provide chapters or sections about new features 27 | introduced in OpenCL 2.2 and OpenCL C++. 28 | 29 | ## Contributions and LICENSE 30 | 31 | Comments, suggestions for improvements, and contributions are most welcome. 32 | More details are found at [CONTRIBUTING](./CONTRIBUTING.md) and [LICENSE](./LICENSE.txt). 33 | 34 | ## Trademarks 35 | 36 | OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. 37 | Other names are for informational purposes only and may be trademarks of their respective owners. 38 | -------------------------------------------------------------------------------- /OpenCLCToOpenCLCppPortingGuidelines.md: -------------------------------------------------------------------------------- 1 | # OpenCL C to OpenCL C++ Porting Guidelines 2 | 3 | May 16, 2017 4 | 5 | Editors: 6 | 7 | * [Jakub Szuppe, Stream HPC](https://streamhpc.com) 8 | 9 | This document is a set of guidelines for developers who know OpenCL C and plan to 10 | port their kernels to OpenCL C++, and therefore they need to know the main 11 | differences between those two kernel languages. 12 | The focus is not on highlighting all the differences, but rather on exposing 13 | and explaining those that are the most important, and those that may cause 14 | hard-to-detect bugs when porting from OpenCL C to OpenCL C++. 15 | 16 | Comments, suggestions for improvements, and contributions are most welcome. 17 | 18 | **[Differences](#S-Differences)**: 19 | 20 | * [OpenCL C++ Programming Language](#S-OpenCLCXX): 21 | * [OpenCL C Vector Literals](#S-OpenCLCXX-VectorLiterals) 22 | * [boolN Type](#S-OpenCLCXX-BoolNType) 23 | * [End Of Explicit Named Address Spaces](#S-OpenCLCXX-EndOfExplicitNamedAddressSpaces) 24 | * [Kernel Function Restrictions](#S-OpenCLCXX-KernelRestrictions) 25 | * [Kernel Parameter Restrictions](#S-OpenCLCXX-KernelParamsRestrictions) 26 | * [General Restrictions](#S-OpenCLCXX-GeneralRestrictions) 27 | * [OpenCL C++ Standard Library](#S-OpenCLCXXSTL): 28 | * [Namespace cl::](#S-OpenCLCXXSTL-NamespaceCL) 29 | * [Conversions Library (`convert_*()`)](#S-OpenCLCXXSTL-ConversionsLibrary) 30 | * [Reinterpreting Data Library (as_type())](#S-OpenCLCXXSTL-ReinterpretingDataLibrary) 31 | * [Address Spaces Library](#S-OpenCLCXXSTL-AddressSpacesLibrary) 32 | * [Marker Types](#S-OpenCLCXXSTL-MarkerTypes) 33 | * [Images and Samplers Library](#S-OpenCLCXXSTL-ImagesAndSamplersLibrary) 34 | * [Pipes Library](#S-OpenCLCXXSTL-PipesLibrary) 35 | * [Device Enqueue Library](#S-OpenCLCXXSTL-DeviceEnqueueLibrary) 36 | * [Relational Functions](#S-OpenCLCXXSTL-RelationalFunctions) 37 | * [Vector Data Load and Store Functions](#S-OpenCLCXXSTL-VectorDataLoadandStoreFunctions) 38 | * [Atomic Operations Library](#S-OpenCLCXXSTL-AtomicOperationsLibrary) 39 | * [OpenCL C++ Compilation Process](#S-OpenCLCXXCompilation): 40 | * [OpenCL C++ Compilation to SPIR-V](#S-OpenCLCXXCompilationToSPIRV) 41 | * [Building program created from SPIR-V](#S-OpenCLCXXCompilationBuildSPIRV) 42 | 43 | **[Bibliography](#S-Bibliography)** 44 | 45 | # Differences 46 | 47 | ## OpenCL C++ Programming Language 48 | 49 | ### OpenCL C Vector Literals 50 | 51 | Vector literals, expression used for creating vectors from a list of scalars, 52 | vectors or a mixture thereof, known from OpenCL C are not part of the OpenCL C++ 53 | kernel language. 54 | 55 | In OpenCL C++ vector types can be initialized like any other class - using 56 | constructors. For example, the following are available for `float4`: 57 | 58 | ```cpp 59 | float4(float, float, float, float) 60 | float4(float2, float, float) 61 | float4(float, float2, float) 62 | float4(float, float, float2) 63 | float4(float2, float2) 64 | float4(float3, float) 65 | float4(float, float3) 66 | float4(float) 67 | ``` 68 | 69 | ##### Note 70 | > In OpenCL C++ vector literals are NOT evaluated as user might expect, 71 | unfortunately, they never cause compilation errors. 72 | 73 | Vector literals in OpenCL C++ are not evaluated as user might expect. 74 | In OpenCL C++ expression `(int4)(1, 2, 3, 4)` is evaluated to `(int4)4`. 75 | This happens because of how comma operator works: every value enclosed in 76 | parentheses except for the last is discarded, and then scalar-to-vector 77 | conversion is used for `4`. 78 | 79 | In certain situations vector literals in OpenCL C++ code can cause warnings 80 | during compilation, but they do not cause compilation errors. 81 | 82 | #### Solution 83 | 84 | Do not use vector literals. Replace them with vector constructors. 85 | 86 | #### Examples, bad 87 | 88 | ```cpp 89 | int4 i = (int4)(1, 2, 3, 4); 90 | // This expression will be evaluated to (int4)4, 91 | // and i will be (4, 4, 4, 4). 92 | // In OpenCL C++ compiler (clang) provided by Khronos 93 | // it causes 'expression result unused' warnings. 94 | 95 | int4 i = (int4)(cl::max(0, 1), cl::max(0, 2), cl::max(0, 3), cl::max(0, 4)) 96 | // This expression will be evaluated to (int4)4, 97 | // and i will be (4, 4, 4, 4). 98 | // In OpenCL C++ compiler (clang) provided by Khronos 99 | // it DOES NOT cause any warnings. 100 | ``` 101 | 102 | #### Examples, correct 103 | 104 | ```cpp 105 | uint4 u = uint4(1); // u will be (1, 1, 1, 1) 106 | int4 i = int4{-1, -2, 3, 4} // i will be (-1, -2, 3, 4) 107 | 108 | // in each case f will be (1.0f, 2.0f, 3.0f, 4.0f) 109 | float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f); 110 | float4 f = float4(float2(1.0f, 2.0f), float2(3.0f, 4.0f)); 111 | float4 f = float4(1.0f, float2(2.0f, 3.0f), 4.0f); 112 | ``` 113 | 114 | ### boolN Type 115 | 116 | OpenCL C++ introduces new built-in vector type: `boolN` (where `N` is 2, 3, 4, 8, or 16). This addition change 117 | resolves problem with using the relational (`<`, `>`, `<=`, `>=`, `==`, `!=`) and the logical operators 118 | (`!`, `&&`, `||`) with built-in vector types. 119 | 120 | In OpenCL C for built-in vector types the relational and the logical operators return a vector signed 121 | integer type of the same size as the source operands. In OpenCL C++ it was simpliefied and 122 | those operators return `boolN` for vector types and `bool` for scalars. 123 | 124 | [The OpenCL C 2.0 Specification](#https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#page=27) 125 | on the results of the relational operators: 126 | >The result is a scalar signed integer of type `int` if the source operands are scalar and a vector 127 | signed integer type of the same size as the source operands if the source operands are vector 128 | types. Vector source operands of type `charn` and `ucharn` return a `charn` result; vector 129 | source operands of type `shortn` and `ushortn` return a `shortn` result; vector source 130 | operands of type `intn`, `uintn` and `floatn` return an `intn` result; vector source operands 131 | of type `longn`, `ulongn` and `doublen` return a `longn` result. 132 | 133 | >For scalar types, the relational operators shall return `0` if the specified relation is `false` and `1` if 134 | the specified relation is `true`. For vector types, the relational operators shall return `0` if the specified 135 | relation is `false` and `–1` (i.e. all bits set) if the specified relation is `true`. The relational 136 | operators always return `0` if either argument is not a number (`NaN`). 137 | 138 | 139 | Including `boolN` vector types in OpenCL C++ also caused changes in signatures and/or behavior of 140 | built-in relational functions like: `all()`, `any()` and `select()`. 141 | See [Relational Functions](#S-OpenCLCXXSTL-RelationalFunctions) section for more details. 142 | 143 | #### Examples 144 | 145 | ```cpp 146 | bool2 b = bool2(1 == 0); // { false, false } 147 | 148 | // In OpenCL C: int b = 2 > 1, and b is 1 149 | bool b = 2 > 1 // true 150 | 151 | // In OpenCL C: int b = 2 > 1, and b is 0 152 | bool b = 2 == 1 // false 153 | 154 | // OpenCL C-related note: 155 | // -1 for signed integer type means that all bits are set 156 | 157 | // In OpenCL C: int2 b = (uint2)(0, 1) > (uint2)(0, 0), 158 | // and b is { 0, -1 } 159 | bool2 b = uint2(0, 1) > uint2(0, 0); // { false, true } 160 | 161 | // In OpenCL C: long2 b = (ulong2)(0, 0) > (ulong2)(0, 0), 162 | // and b is { 0, 0 } 163 | bool2 b = ulong2(0, 0) > ulong2(0, 0); // { false, false } 164 | 165 | // In OpenCL C: long2 b = (long2)(1, 1) > (long2)(0, 0), 166 | // and b is { -1, -1 } 167 | bool2 b = long2(1, 1) > long2(0, 0); // { true, true } 168 | ``` 169 | 170 | ```cpp 171 | #include 172 | 173 | // In OpenCL C: int2 b = isnan((float2)(0.0f)), 174 | // and b is { 0, 0 } 175 | bool2 b = isnan(float2(0.0f)) // { false, false } 176 | 177 | // In OpenCL C: long2 b = isfinite((double2)(0.0)) 178 | // and b is { -1, -1 } 179 | bool2 b = isfinite(double2(0.0)) // { true, true } 180 | ``` 181 | 182 | #### OpenCL C++ Specification References 183 | 184 | * [OpenCL C++ Programming Language: Expressions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#expressions) 185 | 186 | ### End Of Explicit Named Address Spaces 187 | 188 | [OpenCL C++ 1.0 Specification in Address Spaces section](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces) 189 | says: 190 | >The OpenCL C++ kernel language doesn’t introduce any explicit named address spaces, but they are 191 | implemented as part of the standard library described in Address Spaces Library section. 192 | There are 4 types of memory supported by all OpenCL devices: global, local, private and constant. 193 | The developers should be aware of them and know their limitations. 194 | 195 | That means that instead of using keywords `global`, `constant`, `local`, and `private`, in order 196 | to explicitly specify address space for variable or pointer you have to use address space pointers 197 | and address space storage classes. 198 | 199 | ##### Note 200 | > Go to [Address Spaces Library](#S-OpenCLCXXSTL-AddressSpacesLibrary) section of 201 | The Porting Guidelines to read more about address space pointers and address space storage classes. 202 | 203 | It is still possible for OpenCL C++ compiler to deduce an address space based on the scope where 204 | an object is declared: 205 | 206 | * If a variable is declared in program scope, with `static` or `extern` specifier and the standard 207 | library storage class (see 208 | [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes) 209 | section) is not used, the variable is allocated in the global memory of a device. 210 | * If a variable is declared in function scope, without static specifier and the standard library storage class 211 | (see [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes) 212 | section) is not used, the variable is allocated in the private memory of a device. 213 | 214 | #### OpenCL C++ Specification References 215 | 216 | * [OpenCL C++ Programming Language: Address Spaces](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces) 217 | * [OpenCL C++ Standard Library: Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) 218 | 219 | #### Examples, bad (OpenCL C-style) 220 | 221 | ```cpp 222 | // Compilation error, "global" address space is not defined 223 | // in OpenCL C++ kernel language 224 | kernel void example_kernel(global int * input) 225 | { 226 | // Compilation error, "local" address space is not defined 227 | // in OpenCL C++ kernel language 228 | local int array[256]; 229 | // ... 230 | } 231 | 232 | // Compilation error, "constant" address space is not defined 233 | // in OpenCL C++ kernel language 234 | kernel void example_kernel(constant int * input) 235 | { 236 | // Compilation error, "private" address space is not defined 237 | // in OpenCL C++ kernel language 238 | private int x; 239 | // ... 240 | } 241 | ``` 242 | 243 | #### Examples, correct (OpenCL C++) 244 | 245 | ```cpp 246 | #include 247 | #include 248 | 249 | kernel void example_kernel(cl::global_ptr input) 250 | { 251 | cl::local array; 252 | 253 | uint gid = cl::get_global_id(0); 254 | array[gid] = input[gid]; 255 | // ... 256 | } 257 | 258 | kernel void example_kernel(cl::constant_ptr input) 259 | { 260 | int x = 0; 261 | // ... 262 | } 263 | 264 | int y; // Allocated in global memory 265 | static int z; // Allocated in global memory 266 | 267 | kernel void example_kernel(cl::constant_ptr input) 268 | { 269 | int x = 0; // Allocated in private memory 270 | static cl::global w; // Allocated in global memory 271 | // ... 272 | } 273 | ``` 274 | 275 | ##### Note 276 | > More examples on address spaces can be found in subsections 277 | [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2) and 278 | [3.4.6. Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-3) of section 279 | [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) in 280 | [OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html). 281 | 282 | ### Kernel Function Restrictions 283 | 284 | Since OpenCL C++ kernel language is based on C++14 several restrictions were defined for 285 | kernel function to make it resemble kernel function known from OpenCL C: 286 | 287 | * A kernel functions are by implicitly declared as extern "C". 288 | * A kernel function cannot be overloaded. 289 | * A kernel function cannot be template function. 290 | * A kernel function cannot be called by another kernel function. 291 | * A kernel function cannot have parameters specified with default values. 292 | * A kernel function must have the return type void. 293 | * A kernel function cannot be called main. 294 | 295 | ##### Note 296 | > Compared to OpenCL C in OpenCL C++ you cannot call a kernel function from another kernel function. 297 | 298 | #### OpenCL C++ Specification References 299 | 300 | * [OpenCL C++ Programming Language: Kernel Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#kernel-functions) 301 | 302 | #### Examples, bad 303 | 304 | ```cpp 305 | // A kernel function cannot be template function. 306 | template 307 | kernel void example_kernel(cl::global_ptr input, uint size) 308 | { /* ... */ } 309 | 310 | // A kernel function cannot have parameters specified with default values. 311 | kernel void foo(cl::global_ptr input, uint size = 10) 312 | { /* ... */ } 313 | 314 | kernel void bar(cl::global_ptr input, uint size) 315 | { 316 | // A kernel function cannot be called by another kernel function. 317 | foo(input, size); 318 | } 319 | 320 | // A kernel function cannot be overloaded. 321 | kernel void bar(cl::global_ptr input, uint size) 322 | { /* ... */ } 323 | ``` 324 | 325 | #### Examples, correct 326 | 327 | ```cpp 328 | template 329 | void function_template(cl::global_ptr input, uint size) 330 | { /* ... */ } 331 | 332 | // Specialization for T = float 333 | template<> 334 | void function_template(cl::global_ptr input, uint size) 335 | { /* ... */ } 336 | 337 | kernel void kernel_uint(cl::global_ptr input, uint size) 338 | { 339 | function_template(input, size); 340 | } 341 | 342 | kernel void kernel_float(cl::global_ptr input, uint size) 343 | { 344 | function_template(input, size); 345 | } 346 | ``` 347 | 348 | ### Kernel Parameter Restrictions 349 | 350 | The OpenCL host compiler and the OpenCL C++ kernel language device compiler can have 351 | different requirements for i.e. type sizes, data packing and alignment, etc., therefore 352 | the kernel parameters must meet the following requirements: 353 | 354 | * Types passed by pointer or reference must be standard layout types. 355 | * Types passed by value must be POD types. 356 | * Types cannot be declared with the built-in bool scalar type, vector type or a class that 357 | contain bool scalar or vector type fields. 358 | * Types cannot be structures and classes with bit field members. 359 | * Marker types must be passed by value 360 | ([Marker Types section](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types)). 361 | * `global`, `constant`, `local` storage classes can be passed only by reference or pointer. 362 | More details in 363 | [Explicit address space storage classes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#explicit-address-space-storage-classes) 364 | section. 365 | * Pointers and references must point to one of the following address spaces: global, local 366 | or constant. 367 | 368 | #### OpenCL C++ Specification References 369 | 370 | * [OpenCL C++ Programming Language: Kernel Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#kernel-functions) 371 | 372 | ### General Restrictions 373 | 374 | The following C++14 features are not supported by OpenCL C++: 375 | 376 | * the `dynamic_cast` operator (ISO C++ Section 5.2.7), 377 | * type identification (ISO C++ Section 5.2.8), 378 | * recursive function calls (ISO C++ Section 5.2.2, item 9) unless they are a compile-time constant expression, 379 | * non-placement `new` and `delete` operators (ISO C++ Sections 5.3.4 and 5.3.5), 380 | * `goto` statement (ISO C++ Section 6.6), 381 | * `register` and `thread_local` storage qualifiers (ISO C++ Section 7.1.1), 382 | * `virtual` function qualifier (ISO C++ Section 7.1.2), 383 | * **function pointers** (ISO C++ Sections 8.3.5 and 8.5.3) **unless they are a compile-time constant expression**, 384 | * virtual functions and abstract classes (ISO C++ Sections 10.3 and 10.4), 385 | * exception handling (ISO C++ Section 15), 386 | * the C++ standard library (ISO C++ Sections 17 . . . 30), 387 | * `asm` declaration (ISO C++ Section 7.4), 388 | * no implicit lambda to function pointer conversion (ISO C++ Section 5.1.2, item 6), 389 | * variadic functions (ISO C99 Section 7.15, Variable arguments ), 390 | * and, like C++, OpenCL C++ does not support variable length arrays (ISO C99, Section 6.7.5). 391 | 392 | To avoid potential confusion with the above, please note the following 393 | features are supported in OpenCL C++: 394 | 395 | * **All variadic templates** (ISO C++ Section 14.5.3) **including variadic function templates are supported**. 396 | 397 | #### OpenCL C++ Specification References 398 | 399 | * [OpenCL C++ Programming Language: Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#opencl_cxx_restrictions) 400 | 401 | --- 402 | ## OpenCL C++ Standard Library 403 | 404 | OpenCL C++ does not support the C++14 standard library, but instead implements its 405 | own standard library. It is a replacement for built-in functions provided in 406 | OpenCL C. 407 | 408 | ##### Note 409 | > OpenCL C++ classes and functions are NOT auto-included. 410 | 411 | ### Namespace cl:: 412 | 413 | All class and functions provided in OpenCL C++ Standard Library are located in 414 | namespace `cl::`. 415 | 416 | #### OpenCL C++ Specification References 417 | 418 | * [OpenCL C++ Standard Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#opencl-c-standard-library) 419 | 420 | #### Solution 421 | 422 | Adding a using-directive `using namespace cl;` right after including all required headers 423 | can reduce work needed to port OpenCL C programs to OpenCL C++. 424 | 425 | #### Examples 426 | 427 | ```cpp 428 | #include 429 | #include // cl::abs(gentype x) 430 | 431 | kernel void foo(cl::global_ptr input /* note cl:: prefix */, uint size) 432 | { 433 | uint global_id = cl::get_global_id(0); // note cl:: prefix 434 | if(global_id < size) 435 | { 436 | using namespace cl; // no need for cl:: prefix in this scope 437 | input[global_id] = abs(input[global_id]); 438 | } 439 | } 440 | ``` 441 | 442 | ```cpp 443 | #include 444 | #include // cl::abs(gentype x) 445 | using namespace cl; // No need for cl:: prefix after this using-directive 446 | 447 | kernel void foo(global_ptr input, uint size) 448 | { 449 | uint global_id = get_global_id(0); 450 | if(global_id < size) 451 | { 452 | input[global_id] = abs(input[global_id]); 453 | } 454 | } 455 | ``` 456 | 457 | ### Conversions Library 458 | 459 | OpenCL C convert_type<_sat><_roundingMode>() 460 | and convert_typeN<_sat><_roundingMode>() built-in 461 | functions were replaced in OpenCL C++ with `convert_cast<>` function template. The behavior of the conversion 462 | may be modified by one or two optional modifiers that specify saturation for out-of-range 463 | inputs and rounding behavior. 464 | 465 | **Rounding Modes** 466 | 467 | ```cpp 468 | namespace cl 469 | { 470 | enum class rounding_mode 471 | { 472 | rte, // Round to nearest even 473 | rtz, // Round toward zero 474 | rtp, // Round toward positive infinity 475 | rtn // Round toward negative infinity 476 | }; 477 | } 478 | ``` 479 | 480 | ##### Note 481 | > If a rounding mode is not specified, conversions to integer type use the `rtz` (round toward zero) 482 | rounding mode and conversions to floating-point type uses the `rte` rounding mode. 483 | 484 | #### OpenCL C++ Specification References 485 | 486 | * [OpenCL C++ Standard Library: Conversions Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#conversions-library) 487 | 488 | #### Examples 489 | 490 | ```cpp 491 | #include 492 | using namespace cl; // No need for cl:: prefix after this using-directive 493 | 494 | kernel void covert_foo_bar() 495 | { 496 | int4 i { -1, 0, 1, 2 }; 497 | float4 f { -1.5f, -0.5f, 0.5f, 1.5f}; 498 | 499 | // Convert ints to floats using the default rounding mode (rte). 500 | // In OpenCL C: convert_float4_rtp(i) 501 | float4 f1 = convert_cast(i); 502 | 503 | // In OpenCL C: convert_float4_rtp(i) 504 | float4 f2 = convert_cast(i); 505 | 506 | // In OpenCL C: convert_int4_sat(f) 507 | int4 i1 = convert_cast(f); 508 | 509 | // In OpenCL C: convert_int4_sat_rte(f) 510 | int4 i1 = convert_cast(f); 511 | } 512 | ``` 513 | 514 | ### Reinterpreting Data Library 515 | 516 | OpenCL C as_type() and as_typeN() operators used for 517 | reinterpreting bits in a data type as another data type in OpenCL were replaced in OpenCL C++ 518 | with `TargetType as_type(InputType const&)` function template. 519 | 520 | ##### Note 521 | > All data types described in 522 | [Device built-in scalar data types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device_builtin_scalar_data_types) 523 | and 524 | [Device built-in vector data types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device_builtin_vector_data_types) 525 | tables (except `bool` and `void`) may be also reinterpreted as another data type of the same size 526 | using the `as_type()` function template for scalar and vector data types. 527 | 528 | #### OpenCL C++ Specification References 529 | 530 | * [OpenCL C++ Standard Library: Reinterpreting Data Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#reinterpreting-data-library) 531 | 532 | #### Examples 533 | 534 | ```cpp 535 | #include 536 | using namespace cl; // No need for cl:: prefix after this using-directive 537 | 538 | kernel void reinterpret_bar_foo() 539 | { 540 | float f = 1.0f; 541 | uint u = as_type(f); // Legal. Contains: 0x3f800000 542 | 543 | float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f); 544 | // Legal. Contains: 545 | // int4(0x3f800000, 0x40000000, 0x40400000, 0x40800000) 546 | int4 i = as_type(f); 547 | 548 | int i; 549 | // Legal. Result is implementation-defined. 550 | short2 j = as_type(i); 551 | 552 | float4 f; 553 | // Error: result and operand have different sizes 554 | double4 g = as_type(f); 555 | 556 | float4 f; 557 | // Legal. 558 | // g.xyz will have same values as f.xyz. 559 | // g.w is undefined 560 | float3 g = as_type(f); 561 | } 562 | ``` 563 | 564 | ### Address Spaces Library 565 | 566 | As mentioned in [End of explicit named address spaces](#S-OpenCLCXX-EndOfExplicitNamedAddressSpaces), 567 | in OpenCL C++ explicit named address spaces known from OpenCL C were replaced by explicit address space 568 | storage and pointer classes. 569 | 570 | **Explicit address space storage classes:** 571 | 572 | * `cl::global x` - allocated in global memory. 573 | * The global storage class can only be used to declare variables at program, function and class scope. 574 | * The variables at function and class scope must be declared with `static` specifier. 575 | * `cl::local x` - allocated in local memory. 576 | * The local storage class can only be used to declare variables at program, kernel and class scope. 577 | * The variables at class scope must be declared with `static` specifier. 578 | * `cl::priv x` - allocated in private memory. 579 | * The priv storage class cannot be used to declare variables in the program scope, with static specifier or extern specifier. 580 | * `cl::constant x` - allocated in global memory, read-only. 581 | * The constant storage class can only be used to declare variables at program, kernel and class scope. 582 | * The variables at class scope must be declared with static specifier. 583 | 584 | **Explicit address space storage pointers classes:** 585 | 586 | * `cl::global_ptr` 587 | * `cl::local_ptr` 588 | * `cl::private_ptr` 589 | * `cl::constant_ptr` 590 | 591 | The explicit address space pointer classes are just like pointers: they can be converted to and from pointers 592 | with compatible address spaces, qualifiers and types. Assignment or casting between explicit pointer types of 593 | incompatible address spaces is illegal. 594 | 595 | All named address spaces are incompatible with all other address spaces, but local, global and private pointers 596 | can be converted to standard C++ pointers. 597 | 598 | #### Restrictions 599 | 600 | [The OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html) 601 | specification in subsections [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2) 602 | of section [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) 603 | contains detailed list of restrictions with examples regarding explicit address space storage and pointer classes. 604 | It is very important to read and understand those restrictions. 605 | 606 | #### OpenCL C++ Specification References 607 | 608 | * [OpenCL C++ Programming Language: Address Spaces](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces) 609 | * [OpenCL C++ Standard Library: Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) 610 | 611 | #### Examples 612 | 613 | ```cpp 614 | #include 615 | #include 616 | #include 617 | 618 | int x; // Allocated in global address space 619 | cl::global y; // Allocated in global address space 620 | 621 | cl::constant z {0}; // Allocated in global address space, read-only, 622 | // must be initialized 623 | 624 | // Program scope array of 5 ints allocated in local address space 625 | cl::local> w = { 10 }; 626 | 627 | // Explicit address space class object passed by value 628 | kernel void example_kernel(cl::global_ptr input) 629 | { 630 | cl::local array; 631 | 632 | static cl::global a; 633 | static cl::constant b {0}; 634 | } 635 | 636 | // Explicit address space storage object passed by reference 637 | kernel void example_kernel(cl::global>& input) 638 | { /* ... */ } 639 | 640 | // Explicit address space storage object passed by pointer 641 | kernel void example_kernel(cl::global * input) 642 | { /* ... */ } 643 | ``` 644 | 645 | ##### Note 646 | > More examples on address spaces can be found in subsections 647 | [3.4.5. Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-2) and 648 | [3.4.6. Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-3) of section 649 | [Address Spaces Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#address-spaces-library) in 650 | [OpenCL C++ specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html). 651 | 652 | ### Marker Types 653 | 654 | Like OpenCL C, OpenCL C++ includes special types - images, pipes. 655 | All those types are considered marker types. 656 | Being a marker type comes with the following set of restrictions: 657 | 658 | * Marker types have the default constructor deleted. 659 | * Marker types have all default copy and move assignment operators deleted. 660 | * Marker types have address-of operator deleted. 661 | * Marker types cannot be used in divergent control flow. It can result in undefined behavior. 662 | * Size of marker types is undefined. 663 | 664 | All marker types can be passed to functions only by a reference. 665 | 666 | #### OpenCL C++ Specification References 667 | 668 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types) 669 | 670 | #### Examples 671 | 672 | ```cpp 673 | #include 674 | #include 675 | using namespace cl; 676 | 677 | float4 bar_val(image2d img) { 678 | return img.read({get_global_id(0), get_global_id(1)}); 679 | } 680 | 681 | float4 bar_ref(image2d& img) { 682 | return img.read({get_global_id(0), get_global_id(1)}); 683 | } 684 | 685 | kernel void foo(image2d img) 686 | { 687 | // Error: marker type cannot be passed by value 688 | float4 val = bar_val(img); 689 | 690 | // Correct, passing marker type by reference 691 | float4 val = bar_ref(img); 692 | } 693 | ``` 694 | 695 | ```cpp 696 | #include 697 | #include 698 | using namespace cl; 699 | 700 | float4 bar(image2d img) { 701 | return img.read({get_global_id(0), get_global_id(1)}); 702 | } 703 | 704 | kernel void foo(image2d img1, image2d img2) 705 | { 706 | // Error: marker type cannot be declared in the kernel 707 | image2d img3; 708 | 709 | // Error: marker type cannot be assigned 710 | img1 = img2; 711 | 712 | // Error: taking address of marker type 713 | image2d *imgPtr = &img1; 714 | 715 | // Undefined behavior: size of marker type is not defined 716 | size_t s = sizeof(img1); 717 | 718 | // Undefined behavior: divergent control flow 719 | float4 val = bar(get_global_id(0) ? img1: img2); 720 | } 721 | ``` 722 | 723 | ### Images and Samplers Library 724 | 725 | Images are another part of the OpenCL that changed a lot compared to OpenCL C. 726 | Instead of image types and built-in image read/write functions in OpenCL C++ there are 727 | image class templates with corresponding methods. Image and sampler class templates are [marker types](#S-OpenCLCXXSTL-MarkerTypes). 728 | 729 | #### Image types 730 | 731 | | OpenCL C | OpenCL C++ | 732 | |----------------------- |------------------------- | 733 | | image1d\_t | cl::image1d | 734 | | image1d\_buffer\_t | cl::image1d\_buffer | 735 | | image1d\_array\_t | cl::image1d\_array | 736 | | image2d\_t | cl::image2d | 737 | | image2d\_array\_t | cl::image2d\_array | 738 | | image2d\_depth\_t | cl::image2d\_depth | 739 | | image2d\_array\_depth\_t | cl::image2d\_array\_depth | 740 | | image3d\_t | cl::image3d | 741 | | sampler\_t | cl::sampler | 742 | 743 | To instantiate image template class user has to specify image element type (which is 744 | type returned when reading from an image, and required when writing pixel to an image), 745 | and access mode (`cl::image_access::read` is the default access mode). 746 | 747 | #### Image dimension 748 | 749 | Based on the dimension of an image different methods are available. All image types have 750 | `int width()` method, images of dimension 2 or 3 have `int height()`, 3D images have 751 | `int depth()`, and arrayed images have one additional method - `int array_size()`. 752 | See subsection 753 | [Image dimension](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-dimension) 754 | of OpenCL C++ Specification for more details. 755 | 756 | #### Image element type 757 | 758 | Depending on the type of an image different types are allowed to be specified as 759 | image element type template parameter. Image type with invalid pixel type is ill formed. 760 | See subsection 761 | [Image element types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-element-types) 762 | of OpenCL C++ Specification for more details. 763 | 764 | Image processing kernels written in OpenCL C++ can be made more readable using `.rgba` vector 765 | component access (compared to `.xyzw` in OpenCL C). 766 | Like `xyzw` selector, `rgba` selector works only for vector types with 4 or less elements. 767 | See also Vector Component Access part of subsection 768 | [Built-in Vector Data Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#builtin-vector-data-types) 769 | and section 770 | [Vector Utilities Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-utilities-library) 771 | of OpenCL C++ Specification. 772 | 773 | ```cpp 774 | // OpenCL C++ 775 | kernel void openclcxx(image2d img) 778 | { 779 | uint4 color; 780 | // rgba selector 781 | color.r = 255; 782 | color.gb = uint2(0); 783 | color.a = 255; 784 | //... 785 | } 786 | 787 | // OpenCL C 788 | kernel void openclc(read_only image2d_t img) // read_only keyword sets access mode 789 | // image element type not defined 790 | { 791 | uint4 color; 792 | // xyzw selector 793 | color.x = 255; 794 | color.yz = (uint2)(0); 795 | color.w = 255; 796 | //... 797 | } 798 | ``` 799 | 800 | #### Image access mode 801 | 802 | Based on the image access mode different read and write methods are present in 803 | the instantiated image class. See subsection 804 | [Image access](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#image-access) 805 | of OpenCL C++ Specification for more details. 806 | 807 | ```cpp 808 | namespace cl 809 | { 810 | enum class image_access 811 | { 812 | sample, 813 | read, 814 | write, 815 | read_write 816 | }; 817 | } 818 | ``` 819 | 820 | #### Sampler 821 | 822 | Like in OpenCL C, in OpenCL C++ there only two ways of acquiring a sampler inside of a kernel. 823 | One is to pass it as a kernel parameter from host using `clSetKernelArg` function, 824 | the other is to create `cl::sampler` using `make_sampler` function in the kernel code. 825 | The sampler objects at non-program scope must be declared with static specifier. 826 | 827 | ```cpp 828 | template 829 | constexpr sampler make_sampler(); 830 | ``` 831 | 832 | Sampler parameters and their behavior are described in subsection 833 | [Sampler Modes](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#sampler-modes) 834 | of OpenCL C++ Specification. 835 | 836 | #### OpenCL C++ Specification References 837 | 838 | * [OpenCL C++ Standard Library: Images and Samplers Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#images-and-samplers-library) 839 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types) 840 | 841 | #### Examples 842 | 843 | ```cpp 844 | // OpenCL C++ 845 | #include 846 | #include 847 | using namespace cl; 848 | 849 | using my_image1d_type = image1d; // access mode 851 | 852 | using my_image2d_type = image2d; // access mode is image_access::read 853 | 854 | kernel void openclcxx(my_image1d_type img1d, my_image2d_type img2d) 855 | { 856 | const int coords1d(get_global_id(0)); 857 | const int2 coords2d(get_global_id(0), get_global_id(1)); 858 | 859 | float4 val1d(0.0f); 860 | // 1) write() is enabled because the access mode of my_image1d_type 861 | // is image_access::write 862 | // 2) write() takes int value as pixel coordinates because my_image1d_type 863 | // is a 1d image type 864 | // 3) write() takes float4 value as pixel value because float4 is the image 865 | // element type of my_image1d_type 866 | img1d.write(coords1d, val1d); 867 | 868 | // 1) read() is enabled because the access mode of my_image2d_type 869 | // is image_access::read 870 | // 2) read() takes int2 as an input argument because my_image2d_type 871 | // is a 2d image type 872 | // 3) read() returns float4 because float4 is the image element type 873 | // of my_image2d_type 874 | float4 val2d = img2d.read(coords2d); 875 | } 876 | ``` 877 | 878 | ```cpp 879 | // OpenCL C 880 | kernel void openclc(write_only image1d_t img1d, // write_only keyword sets access mode 881 | read_only image2d_t img2d) // read_only keyword sets access mode 882 | { 883 | const int coords1d = get_global_id(0); 884 | const int2 coords2d = (int2)(get_global_id(0), get_global_id(1)); 885 | 886 | float4 val1d = (float4)(0.0f); 887 | write_imagef(img1d, coords1d, val1d); 888 | 889 | // float4 read_imagef(image2d_t, int2) function is used to 890 | // read from img 2d image. 891 | float4 val2d = read_imagef(img2d, coords2d); 892 | } 893 | ``` 894 | 895 | ### Pipes Library 896 | 897 | In OpenCL C++ `pipe` keyword was replaced with `cl::pipe` class template. 898 | Reserve operations return `cl::pipe::reservation` object, instead of returning 899 | reservation id of type `reserve_id_t`. 900 | 901 | All `pipe`s-related function were moved to `cl::pipe` or `reservation` as 902 | their methods. 903 | 904 | #### Pipe storage 905 | 906 | OpenCL C++ introduces new pipe-related type - `cl::pipe_storage` class template. 907 | It enables programmers to create `cl::pipe` objects in an OpenCL program without 908 | need to create `cl_pipe` on host using API. `cl::pipe_storage` class template has 909 | two template parameters: `T` - element type, and `N` - the maximum number of packets 910 | which can be held by an object. 911 | 912 | ##### Note 913 | One kernel can have only one pipe accessor (`cl::pipe` object) associated with 914 | one `cl::pipe_storage` object. 915 | 916 | #### Requirements and Restictions 917 | 918 | `cl::pipe::reservation`, `cl::pipe_storage` and `cl::pipe` are marker types. 919 | However, they also have additional sets of requirements and restictions beyond 920 | those specified in 921 | [Market Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types) 922 | section. The most important are: 923 | 924 | * The element type `T` of `pipe` and `pipe_storage` class templates 925 | must be a POD type i.e. satisfy `is_pod::value == true`. 926 | * A kernel cannot read from and write to the same pipe object. 927 | * Variables of type `pipe_storage` can only be declared at program scope or 928 | with the `static` specifier. 929 | * Variables of type `pipe` created from `pipe_storage` can only be declared 930 | inside a kernel function at kernel scope. 931 | * The `reservation`, `pipe_storage`, and `pipe` types cannot be used as a class or 932 | union field, a pointer type, an array or the return type of a function. 933 | * The `reservation`, `pipe_storage`, and `pipe` types cannot be used with the 934 | `global`, `local`, `priv` and `constant` address space storage classes. 935 | 936 | The full lists of requirements and restictions can be found in subsections 937 | [Requirements](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#requirements) and 938 | [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-5) 939 | of Pipe Library section in OpenCL C++ Specification. 940 | 941 | #### OpenCL C++ Specification References 942 | 943 | * [OpenCL C++ Standard Library: Pipes Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#pipes-library) 944 | * [Requirements](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#requirements) 945 | * [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-5) 946 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types) 947 | 948 | #### Examples 949 | 950 | Reading from and writing to a pipe: 951 | 952 | ```cpp 953 | // OpenCL C++ 954 | #include 955 | 956 | kernel void foobar(cl::pipe wp, 957 | cl::pipe rp) 958 | { 959 | int val; 960 | // ... 961 | // write() method is enabled only for pipes with 962 | // pipe_access::write access mode 963 | if(wp.write(val)) { // val passed by const reference 964 | // ... 965 | } 966 | 967 | // read() method is enabled only for pipes with 968 | // pipe_access::read access mode 969 | if(rp.read(val)) { // val passed by reference 970 | // ... 971 | } 972 | } 973 | ``` 974 | 975 | ```cpp 976 | // OpenCL C 977 | kernel void foobar(write_only /* access mode */ pipe /* keyword */ int /* type */ wp, 978 | read_only /* access mode */ pipe /* keyword */ int /* type */ rp) 979 | { 980 | int val; 981 | // ... 982 | 983 | // In OpenCL write_pipe(...) and read_pipe(...) operations 984 | // returns 0 when write/read is successful, and a negative 985 | // value otherwise 986 | if(write_pipe(p, &val) == 0) { 987 | // ... 988 | } 989 | 990 | if(read_pipe(p, &val) == 0) { 991 | // ... 992 | } 993 | } 994 | ``` 995 | 996 | ```cpp 997 | // OpenCL C++ 998 | #include 999 | 1000 | kernel void foobar(cl::pipe p) 1001 | { 1002 | int val; 1003 | // cl::pipe::reservation 1004 | auto r = p.reserve(3); 1005 | // ... 1006 | // read() method is available because pipe p is in 1007 | // pipe_access::read access mode 1008 | if(r.read(2, val)) { 1009 | // ... 1010 | } 1011 | r.commit(); 1012 | } 1013 | ``` 1014 | 1015 | Making and using a reservation: 1016 | 1017 | ```cpp 1018 | // OpenCL C 1019 | kernel void foobar(read_only pipe int p) 1020 | { 1021 | int val; 1022 | reserve_id_t rid = reserve_read_pipe(p, 3); 1023 | // ... 1024 | if(read_pipe(p, rid, 2, &val)) { 1025 | // ... 1026 | } 1027 | commit_read_pipe(p, rid); 1028 | } 1029 | ``` 1030 | 1031 | ```cpp 1032 | // OpenCL C++ 1033 | #include 1034 | 1035 | kernel void foobar(cl::pipe p) 1036 | { 1037 | int val; 1038 | // cl::pipe::reservation 1039 | auto r = p.reserve(3); 1040 | // ... 1041 | // read() method is available because pipe p is in 1042 | // pipe_access::read access mode 1043 | if(r.read(2, val)) { 1044 | // ... 1045 | } 1046 | r.commit(); 1047 | } 1048 | ``` 1049 | 1050 | Using `pipe_storage`: 1051 | 1052 | ```cpp 1053 | // OpenCL C++ 1054 | #include 1055 | 1056 | cl::pipe_storage my_pipe; 1057 | 1058 | kernel void reader() 1059 | { 1060 | auto p = my_pipe.get(); 1061 | // ... 1062 | p.read(...); 1063 | // ... 1064 | } 1065 | 1066 | kernel void writer() 1067 | { 1068 | auto p = my_pipe.get(); 1069 | // ... 1070 | p.write(...); 1071 | // ... 1072 | } 1073 | 1074 | kernel void error_kernel() 1075 | { 1076 | auto p1 = my_pipe.get(); 1077 | // Error, one kernel can have only one pipe accessor 1078 | // (cl::pipe object) associated with one cl::pipe_storage object. 1079 | auto p2 = my_pipe.get(); 1080 | // ... 1081 | } 1082 | ``` 1083 | 1084 | ### Device Enqueue Library 1085 | 1086 | When it comes to enqueuing a kernel without host interaction, the biggest difference between 1087 | OpenCL C and OpenCL C++ is that in OpenCL C++ enqueued kernel can be a lambda expression or 1088 | a function, whereas in OpenCL C it is defined using block syntax. 1089 | 1090 | All functions except function which returns default device queue and kernel query functions 1091 | were moved to appropriate classes as their methods. 1092 | See [Header Synopsis](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#header-opencl_device_queue-synopsis) 1093 | subsections of OpenCL C++ specification. 1094 | 1095 | #### Device Queue 1096 | 1097 | In OpenCL C++ `cl::device_queue` class represents device queue (`queue_t` in OpenCL C). 1098 | `cl::device_queue` is a marker type (see [Marker Types](#S-OpenCLCXXSTL-MarkerTypes)). 1099 | 1100 | | OpenCL C | OpenCL C++ | 1101 | |----------------------- |------------------------- | 1102 | | queue\_t | cl::device\_queue | 1103 | 1104 | ```cpp 1105 | namespace cl 1106 | { 1107 | struct device_queue: marker_type 1108 | { 1109 | // ... 1110 | 1111 | template 1112 | enqueue_status enqueue_kernel(enqueue_policy flag, 1113 | const ndrange &ndrange, 1114 | Fun fun, 1115 | Args... args) noexcept; 1116 | 1117 | // In OpenCL C: 1118 | // int enqueue_kernel(queue_t queue, 1119 | // kernel_enqueue_flags_t flags, 1120 | // const ndrange_t ndrange, 1121 | // void (^block)(local void *, ...), 1122 | // uint size0, ...); 1123 | 1124 | // ... 1125 | }; 1126 | } 1127 | ``` 1128 | 1129 | ##### Note 1130 | >`args` are the arguments that will be passed to `fun` when kernel will be enqueued with 1131 | the exception for `local_ptr` parameters. For local pointers user must supply the size of 1132 | local memory that will be allocated using local\_ptr::size\_type{num_elements}. 1133 | In OpenCL C user has to pass `uint` value for a corresponding local pointer, which specifies 1134 | the size of a local memory accessible using that local pointer. 1135 | 1136 | #### Event 1137 | 1138 | In OpenCL C++ `cl::event` class represents device-side event (`clk_event_t` in OpenCL C). 1139 | 1140 | | OpenCL C | OpenCL C++ | 1141 | |----------------------- |------------------------- | 1142 | | clk\_event\_t | cl::event | 1143 | 1144 | `cl::event` has the same possible states as `clk_event_t`, however in OpenCL C++ error is 1145 | not represented by any negative value, but rather by `cl::event_status::error` enum. 1146 | 1147 | | OpenCL C | OpenCL C++ | Description | 1148 | |----------------------- |------------------------- |------------------------- | 1149 | | CL\_SUBMITTED | cl::event\_status::submitted | Initial status of a user event | 1150 | | CL\_COMPLETE | cl::event\_status::complete | | 1151 | | Any negative integer value | cl::event\_status::error | Status indicating an error | 1152 | 1153 | See [Event Class Methods](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#event-class-methods) and 1154 | [Event Status](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#event-status) 1155 | subsections of OpenCL C++ specification. 1156 | 1157 | #### Enqueue Policy 1158 | 1159 | Available enqueue policies did not changed compared to OpenCL C. 1160 | In OpenCL C enqueue policy type was `kernel_enqueue_flags_t` enum, in OpenCL C++ it is 1161 | `cl::enqueue_policy` enum class. 1162 | 1163 | | OpenCL C | OpenCL C++ | 1164 | |----------------------- |------------------------- | 1165 | | CLK_ENQUEUE_FLAGS_NO_WAIT | cl::enqueue\_polic::no\_wait | 1166 | | CLK_ENQUEUE_FLAGS_WAIT_KERNEL | cl::enqueue\_polic::wait\_kernel | 1167 | | CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP | cl::enqueue\_polic::wait\_work\_group | 1168 | 1169 | See [Enqueue Policy](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#enqueue-policy) 1170 | subsection of OpenCL C++ specification. 1171 | 1172 | #### Requirements 1173 | 1174 | Functor and lambda objects passed to `enqueue_kernel()` method of device queue has to follow 1175 | specific restrictions: 1176 | 1177 | * It has to be trivially copyable. 1178 | * It has to be trivially copy constructible. 1179 | * It has to be trivially destructible. 1180 | 1181 | Code enqueuing function objects that do not meet this criteria is ill-formed. 1182 | 1183 | #### OpenCL C++ Specification References 1184 | 1185 | * [OpenCL C++ Standard Library: Device Enqueue Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#device-enqueue-library) 1186 | * [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-6) 1187 | * [Examples](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#examples-7) 1188 | * [OpenCL C++ Standard Library: Marker Types](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#marker-types) 1189 | 1190 | #### Examples 1191 | 1192 | Block syntax vs. lambda expression: 1193 | 1194 | ```cpp 1195 | // OpenCL C++ 1196 | #include 1197 | #include 1198 | 1199 | kernel void my_func(cl::global_ptr a, cl::global_ptr b, cl::global_ptr c) 1200 | { 1201 | // ... 1202 | auto dq = cl::get_default_device_queue(); 1203 | dq.enqueue_kernel( 1204 | cl::enqueue_polic::no_wait, 1205 | cl::ndrange({10, 10}), 1206 | [=](){ // 1207 | *a = *b + *c; // Lambda expression 1208 | } // 1209 | ); 1210 | // ... 1211 | } 1212 | ``` 1213 | 1214 | ```cpp 1215 | // OpenCL C 1216 | kernel void my_func(global int *a, global int *b, global int *c) 1217 | { 1218 | // ... 1219 | enqueue_kernel( 1220 | get_default_queue(), 1221 | CLK_ENQUEUE_FLAGS_NO_WAIT, 1222 | ndrange_2D(1, 1), 1223 | ^{ // 1224 | *a = *b + *c; // Block syntax 1225 | } // 1226 | ); 1227 | // ... 1228 | } 1229 | ``` 1230 | 1231 | Enqueuing a functor: 1232 | 1233 | ```cpp 1234 | // OpenCL C++ 1235 | #include 1236 | #include 1237 | 1238 | struct my_functor { 1239 | void operator ()(cl::local_ptr p, int x) const 1240 | { /* ... */ } 1241 | }; 1242 | 1243 | kernel void my_func(cl::device_queue q) 1244 | { 1245 | // ... 1246 | my_functor f; 1247 | dq.enqueue_kernel( 1248 | cl::enqueue_polic::no_wait, 1249 | cl::ndrange(1), 1250 | f, // functor 1251 | cl::local_ptr::size_type{10}, // define size of p 1252 | 2 // x 1253 | ); 1254 | // ... 1255 | } 1256 | ``` 1257 | 1258 | ### Relational Functions 1259 | 1260 | In OpenCL C++ there were significant changes in signatures and/or behaviour of 1261 | built-in relational functions. This is because OpenCL C++ introduces 1262 | boolN type which can replace intN as a type 1263 | returned by relational functions. 1264 | 1265 | #### `all()` and `any()` 1266 | 1267 | In OpenCL C: 1268 | ```cpp 1269 | // igentype can be char, charN, short, shortN, int, intN, long, and longN 1270 | int any (igentype x); 1271 | int all (igentype x); 1272 | ``` 1273 | >`any()` returns 1 if **the most significant bit** in any component of `x` is set; otherwise returns 0. 1274 | 1275 | >`all()` returns 1 if **the most significant bit** in all components of `x` is set; otherwise returns 0. 1276 | 1277 | In OpenCL C++: 1278 | 1279 | ```cpp 1280 | bool any(booln t); 1281 | bool all(booln t); 1282 | ``` 1283 | >`any()` returns `true` if any component of `t` is `true`; otherwise returns `false`. 1284 | 1285 | >`all()` returns `true` if all components of `t` are `true`; otherwise returns `false`. 1286 | 1287 | #### `select()` 1288 | 1289 | In OpenCL C: 1290 | ```cpp 1291 | // igentype can be char, charN, short, shortN, int, intN, long, and longN 1292 | // ugentype can be uchar, ucharN, ushort, ushortN, uint, uintN, ulong, and ulongN 1293 | gentype select (gentype a, gentype b, igentype c); 1294 | gentype select (gentype a, gentype b, ugentype c); 1295 | ``` 1296 | > For each component of a vector type, `result[i] = if MSB of c[i] is set ? b[i] : a[i]`. 1297 | 1298 | > For scalar type, `result = c ? b : a`. 1299 | 1300 | > `igentype` and `ugentype` must have the same number of elements and bits as `gentype`. 1301 | 1302 | > NOTE: The above definition means that the behavior of select and the ternary operator 1303 | for vector and scalar types is dependent on different interpretations of the bit pattern of `c`. 1304 | 1305 | In OpenCL C++ `select()` is less confusing: 1306 | ```cpp 1307 | gentype select(gentype a, gentype b, booln c); 1308 | ``` 1309 | > For each component of a vector type, `result[i] = c[i] ? b[i] : a[i]`. 1310 | 1311 | > For a scalar type, `result = c ? b : a`. 1312 | 1313 | > boolN must have the same number of elements as gentype. 1314 | 1315 | #### OpenCL C++ Specification References 1316 | 1317 | * [OpenCL C++ Standard Library: Relational Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#relational-functions) 1318 | * [OpenCL C++ Programming Language: Expressions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#expressions) 1319 | 1320 | #### Examples 1321 | 1322 | ```cpp 1323 | // OpenCL C++ 1324 | #include 1325 | kernel void foobar() 1326 | { 1327 | bool b1 = isequal(1.0f, 1.0f); // true 1328 | bool b2 = isequal(1.0, 2.0); // false 1329 | 1330 | bool2 b3 = isequal(float2(1.0f), float2(1.0f)); // { true, true } 1331 | bool2 b4 = isequal(double2(1.0), double2(2.0)); // { false, false } 1332 | 1333 | bool2 b5 = { true, false }; 1334 | auto b6 = all(b3); // false 1335 | auto b7 = any(b3); // true 1336 | 1337 | bool2 c { true, false }; 1338 | float2 a { 1.0f, 1.0f }; 1339 | float2 b { -1.0f, -1.0f }; 1340 | auto r1 = select(a, b, c); // { -1.0f, 1.0f } 1341 | 1342 | auto r2 = select(1.0f, 2.0f, false); // 1.0f 1343 | } 1344 | ``` 1345 | 1346 | ```cpp 1347 | // OpenCL C 1348 | kernel void foobar() 1349 | { 1350 | // Note: in integer value -1 MSB is set to 1 1351 | 1352 | int b1 = isequal(1.0f, 1.0f); // 1 (true) 1353 | long b2 = isequal(1.0, 2.0); // 0 (false) 1354 | 1355 | int2 b3 = isequal((float2)(1.0f), (float2)(1.0f)); // { -1, -1 } ({ true, true }) 1356 | long2 b4 = isequal((double2)(1.0), (double2)(2.0)); // { 0, 0 } ({ false, false }) 1357 | 1358 | int b5 = all( (int2)(-1, 10) ); // 0 1359 | int b6 = all( (int2)(-1, -1) ); // 1 1360 | 1361 | int b7 = any( (int2)(-1, 0) ); // 1 1362 | int b8 = any( (int2)(1, 1) ); // 0 1363 | 1364 | int2 c = (int2)(-1, 1); 1365 | float2 a = (float2)(1.0f, 1.0f); 1366 | float2 b = (float2)(-1.0f, -1.0f); 1367 | float2 r1 = select(a, b, c); // { -1.0f, 1.0f } 1368 | 1369 | float r2 = select(1.0f, 2.0f, -1); // 2.0f 1370 | float r3 = select(1.0f, 2.0f, 1); // 1.0f 1371 | float r4 = select(1.0f, 2.0f, 0); // 1.0f 1372 | } 1373 | ``` 1374 | 1375 | ### Vector Data Load and Store Functions 1376 | 1377 | In OpenCL C++ vector data load and store functions were greatly simplified compared to OpenCL: instead of 1378 | 39 different functions, now there are just 9 function templates. The requirements and the behaviours of 1379 | functions have not be changed. Also arguments and their order was not changed. 1380 | 1381 | | OpenCL C | OpenCL C++ | 1382 | |----------------------- |------------------------- | 1383 | | gentypeN vloadN | `template make_vector_t vload` | 1384 | | void vstoreN(...) | `template void vstore(…, vector_element_t* p)` | 1385 | | floatN vload_half\[N\] | `template make_vector_t vload_half` | 1386 | | void vstore_half[N]\[\_rounding\_mode\] | `template void vstore_half(…, half* p)` | 1387 | | floatN vloada_halfN | `template make_vector_t vloada_half` | 1388 | | void vstore_halfN\[\_rounding\_mode\] | `template void vstorea_half(…, half* p)` | 1389 | 1390 | Read [Header Synopsis](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#header-opencl_vector_load_store) 1391 | subsection of [Vector Data Load and Store Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-data-load-and-store-functions) 1392 | section to see vector data load and store function templates declarations. 1393 | 1394 | #### OpenCL C++ Specification References 1395 | 1396 | * [OpenCL C++ Standard Library: Vector Data Load and Store Functions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#vector-data-load-and-store-functions) 1397 | 1398 | #### Examples 1399 | 1400 | `vload` and `vstore`: 1401 | 1402 | ```cpp 1403 | // OpenCL C++ 1404 | #include 1405 | using namespace cl; 1406 | 1407 | kernel void foobar(float * fptr, const constant_ptr hptr) 1408 | { 1409 | auto f4 = vload<4>(0, fptr); // reads from (fptr + (0 * 4)), float4 returned 1410 | auto f2 = vload<2>(2, fptr); // reads from (fptr + (2 * 2)), float2 returned 1411 | 1412 | #ifdef cl_khr_fp16 // cl_khr_fp16 must be defined and supported 1413 | auto h8 = vload<8>(0, hptr); // reads from (hptr + (0 * 8)), half8 returned 1414 | #endif 1415 | 1416 | vstore(float4{ 1, 2, 3, 4}, 0, fptr); // float4 stored at (fptr + (0 * 4)) 1417 | vstore(f2, 2, fptr); // float2 stored at (fptr + (2 * 2)) 1418 | } 1419 | ``` 1420 | 1421 | ```cpp 1422 | // OpenCL C 1423 | kernel void foobar(float * fptr, const constant half * hptr) 1424 | { 1425 | float4 f4 = vload4(0, fptr); // reads from (fptr + (0 * 4)), float4 returned 1426 | float2 f2 = vload2(2, fptr); // reads from (fptr + (2 * 2)), float2 returned 1427 | 1428 | #ifdef cl_khr_fp16 // cl_khr_fp16 must be defined and supported 1429 | half8 h8 = vload8(0, hptr); // reads from (hptr + (0 * 8)), half8 returned 1430 | #endif 1431 | 1432 | vstore4(f4, 0, fptr); // float4 stored at (fptr + (0 * 4)) 1433 | vstore2(f2, 2, fptr); // float2 stored at (fptr + (2 * 2)) 1434 | } 1435 | ``` 1436 | 1437 | `vload_half`, `vstore_half`, `vloada_half`, and `vstorea_half`: 1438 | 1439 | ```cpp 1440 | // OpenCL C++ 1441 | #include 1442 | using namespace cl; 1443 | 1444 | kernel void foobar_half(half * hptr) 1445 | { 1446 | // half vload 1447 | auto f4 = vload_half<4>(0, hptr); // reads from (hptr + (0 * 4)), float4 returned 1448 | auto f3 = vload_half<3>(0, hptr); // reads from (hptr + (0 * 3)), float3 returned 1449 | 1450 | // half array vload 1451 | auto f4a = vloada_half<4>(0, hptr); // reads from (hptr + (0 * 4)), float4 returned 1452 | auto f3a = vloada_half<3>(0, hptr); // reads from (hptr + (0 * 4)), float3 returned 1453 | 1454 | // half vstore 1455 | vstore_half(f3, 0, hptr); // float3 stored at (hptr + (0 * 3)), 1456 | // rounded to nearest even (rounding_mode::rte) 1457 | vstore_half(f4, 0, hptr); // float4 stored at (hptr + (0 * 4)), 1458 | // rounded toward zero 1459 | // half array vstore 1460 | vstorea_half(f3a, 0, hptr); // float3 stored at (hptr + (0 * 4)) 1461 | // rounded to nearest even (rounding_mode::rte) 1462 | vstorea_half(f4a, 0, hptr); // float4 stored at (hptr + (0 * 4)) 1463 | // rounded toward zero 1464 | } 1465 | ``` 1466 | 1467 | ```cpp 1468 | // OpenCL C 1469 | kernel void foobar_half(half * hptr) 1470 | { 1471 | // half vload 1472 | float4 f4 = vload_half4(0, hptr); // reads from (hptr + (0 * 4)), float4 returned 1473 | float3 f3 = vload_half3(0, hptr); // reads from (hptr + (0 * 3)), float3 returned 1474 | 1475 | // half array vload 1476 | float4 f4a = vloada_half4(0, hptr); // reads from (hptr + (0 * 4)), float4 returned 1477 | float3 f3a = vloada_half3(0, hptr); // reads from (hptr + (0 * 4)), float3 returned 1478 | 1479 | // half vstore 1480 | vstore_half3(f3, 0, hptr); // float3 stored at (hptr + (0 * 3)), 1481 | // rounded to nearest even 1482 | vstore_half4_rtz(f4, 0, hptr); // float4 stored at (hptr + (0 * 4)), 1483 | // rounded toward zero 1484 | 1485 | // half array vstore 1486 | vstorea_half3(f3a, 0, hptr); // float3 stored at (hptr + (0 * 4)) 1487 | // rounded to nearest even 1488 | vstorea_half4_rtz(f4a, 0, hptr); // float4 stored at (hptr + (0 * 4)) 1489 | // rounded toward zero 1490 | } 1491 | 1492 | ``` 1493 | 1494 | ### Atomic Operations Library 1495 | 1496 | OpenCL C atomic operation are based on C11 atomics. In OpenCL C++ atomics are based on 1497 | C++14 atomics and synchronization operations. 1498 | Section [Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-library) 1499 | of OpenCL C++ presents synopsis of the atomics library and differences from C++14 specification. 1500 | 1501 | Because atomic functions in OpenCL C and OpenCL C++ have virtually the same argument lists adding 1502 | `using namespace cl;` can Significantly speed up porting kernels to OpenCL C++. 1503 | 1504 | #### Atomic types 1505 | 1506 | In OpenCL C++ different OpenCL C atomic types like `atomic_int`, `atomic_float` were replaced with one class 1507 | template `atomic`, however, for supported types proper type alias are declared 1508 | (for example: `using atomic_int = atomic;`). 1509 | 1510 | * There are explicit specializations for integral types. Each of these specializations provides set of extra 1511 | operators suitable for integral types. 1512 | * There is an explicit specialization of the atomic template for pointer types. 1513 | * All atomic classes have deleted copy constructor and deleted copy assignment operators. 1514 | * 64-bit atomic types require `cl_khr_int64_base_atomics` and `cl_khr_int64_extended_atomics` extensions 1515 | and `atomic` in addition requires `cl_khr_fp64`. 1516 | 1517 | #### Restrictions 1518 | 1519 | * The generic `atomic` class template is only available if `T` is `int`, `uint`, `long`, 1520 | `ulong`, `float`, `double`, `intptr_t`, `uintptr_t`, `size_t`, `ptrdiff_t`. 1521 | * The atomic data types cannot be declared inside a kernel or non-kernel function unless they are declared as `static` keyword or in `local` and `global` containers. See examples. 1522 | * The atomic operations on the private memory can result in undefined behavior. 1523 | * `memory_order_consume` from C++14 is not supported by OpenCL C++. 1524 | 1525 | Full list of restrictions can be found in subsection 1526 | [Restrictions](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#restrictions-3) of section 1527 | [Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-library) 1528 | in OpenCL C++ specification. 1529 | 1530 | #### OpenCL C++ Specification References 1531 | 1532 | * [OpenCL C++ Standard Library: Atomic Operations Library](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#atomic-operations-librarys) 1533 | 1534 | #### Examples 1535 | 1536 | ```cpp 1537 | // OpenCL C++ 1538 | #include 1539 | #include 1540 | using namespace cl; 1541 | 1542 | atomic_int a; // OK: program scope atomic in the global memory 1543 | // atomic_int is alias for atomic 1544 | local> b(1); // OK: program scope atomic in the local memory 1545 | // Initialized to 1. The initialization is not atomic. 1546 | global> c = ATOMIC_VAR_INIT(2); // OK: program scope atomic in the global memory 1547 | // Initialized to 2. The initialization is not atomic. 1548 | 1549 | kernel void foo() 1550 | { 1551 | static global> d; // OK: atomic in the global memory 1552 | static atomic e; // OK: atomic in the global memory 1553 | local> f; // OK: atomic in the local memory 1554 | 1555 | atomic> g; // Error: class members cannot be 1556 | // in address space 1557 | 1558 | atomic h; // undefined behavior 1559 | 1560 | atomic_init(&a, 123); // Initialize a to 123. The initialization is not atomic. 1561 | } 1562 | ``` 1563 | 1564 | ```cpp 1565 | // OpenCL C+ 1566 | global atomic_int a; // OK: program scope atomic in the global memory 1567 | local atomic_int b; // Error: program scope local variables not suppoerted in OpenCL C 1568 | global atomic_int c = ATOMIC_VAR_INIT(2); // OK: program scope atomic in the global memory 1569 | // Initialized to 2. The initialization is not atomic. 1570 | 1571 | kernel void foo() 1572 | { 1573 | static global atomic_int d; // OK: atomic in the global memory 1574 | static atomic_int e; // OK: atomic in the global memory 1575 | local atomic_int f; // OK: atomic in the local memory 1576 | 1577 | atomic_int h; // undefined behavior 1578 | 1579 | atomic_init(&a, 123); // Initialize a to 123. The initialization is not atomic. 1580 | } 1581 | ``` 1582 | 1583 | --- 1584 | ## OpenCL C++ Compilation Process 1585 | 1586 | OpenCL C++ kernel language can not be consumed by `clCreateProgramWithSource()` API function, which 1587 | is used to create a program from OpenCL C source. OpenCL C++ source first have to be compiled 1588 | to SPIR-V 1.2 binary, which can later be passed to `clCreateProgramWithIL()` to create an OpenCL program. 1589 | After that program can be build with `clBuildProgram()`. 1590 | 1591 | ### OpenCL C++ Compilation to SPIR-V 1592 | 1593 | To compile OpenCL C++ kernel language to SPIR-V user have to use compiler that is not a part 1594 | of OpenCL framework. The Khronos Group provides reference 1595 | [offline compiler based on Clang 3.6](https://github.com/KhronosGroup/SPIR/tree/spirv-1.1) 1596 | and an implementation of OpenCL C++ Standard Library called [libclcxx](https://github.com/KhronosGroup/libclcxx). 1597 | 1598 | #### Preprocessor options 1599 | 1600 | Every preprocessor option that would normally be specified in `clBuildProgram()`, for OpenCL C++ must 1601 | be passed when it is being compiled to SPIR-V. 1602 | 1603 | ``` 1604 | -D name 1605 | ``` 1606 | 1607 | Predefine name as a macro, with definition 1. 1608 | 1609 | ``` 1610 | -D name=definition 1611 | ``` 1612 | 1613 | The contents of definition are tokenized and processed as if they appeared during translation phase 1614 | three in a `#define` directive. 1615 | In particular, the definition will be truncated by embedded newline characters. 1616 | 1617 | #### Other compilation options 1618 | 1619 | Some feature-related options must be specified during compilation to SPIR-V: 1620 | 1621 | * `-cl-fp16-enable` - enables full half data type support and defines `cl_khr_fp16` macro. Disabled by default. 1622 | * `-cl-fp64-enable` - enables full double data type support and defines `cl_khr_fp64` macro. Disabled by default. 1623 | * `-cl-zero-init-local-mem-vars` - enables software zero-initialization of variables allocated in local memory. 1624 | 1625 | ### Building program created from SPIR-V 1626 | When an OpenCL program created using `clCreateProgramWithIL()` is compiled (`clBuildProgram()`) not 1627 | all build options are allowed. They have to be passed when compiling to SPIR-V. Otherwise, there is 1628 | no difference between building program created from SPIR-V and program created from OpenCL C source. 1629 | Which options are ignored and which not is described in 1630 | [OpenCL 2.2 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html#_compiler_options). 1631 | 1632 | #### OpenCL C++ Specification and OpenCL 2.2 API References 1633 | 1634 | * [OpenCL C++ Specification: Compiler options](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html#compiler_options) 1635 | * [OpenCL 2.2 Specification: Compiler Options](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html#_compiler_options) 1636 | 1637 | 1638 | # Bibliography 1639 | 1640 | ### OpenCL Specifications 1641 | 1642 | * [The OpenCL C++ 1.0 Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.pdf) 1643 | ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-cplusplus.html)) 1644 | * [The OpenCL 2.2 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf) 1645 | ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.html)) 1646 | * [The OpenCL 2.2 Extension Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-extension.pdf) 1647 | ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-extension.html)) 1648 | * [OpenCL 2.2 SPIR-V Environment Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-environment.pdf) 1649 | ([HTML](https://www.khronos.org/registry/OpenCL/specs/opencl-2.2-environment.html)) 1650 | * [The OpenCL C 2.0 Language Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf) 1651 | * [The OpenCL 2.1 API Specification](https://www.khronos.org/registry/OpenCL/specs/opencl-2.1.pdf) 1652 | 1653 | ### OpenCL Reference Pages 1654 | 1655 | * The OpenCL 2.2 Reference Page (not published yet) 1656 | * [The OpenCL 2.1 Reference Page](http://www.khronos.org/registry/cl/sdk/2.1/docs/man/xhtml/) 1657 | 1658 | ### OpenCL Headers 1659 | 1660 | * [OpenCL 2.2 Headers](https://github.com/KhronosGroup/OpenCL-Headers/tree/master/opencl22) 1661 | * [OpenCL 2.1 Headers](https://github.com/KhronosGroup/OpenCL-Headers/tree/master/opencl21) 1662 | 1663 | ### Other 1664 | 1665 | * [Khronos OpenCL Registry](https://www.khronos.org/registry/OpenCL/) 1666 | ([GitHub](https://github.com/KhronosGroup/OpenCL-Registry)) 1667 | * [OpenCL 2.2 Release Note](https://www.khronos.org/news/press/khronos-releases-opencl-2.2-with-spir-v-1.2) 1668 | * Michael Wong, Adam Stanski, Maria Rovatsou, Ruyman Reyes, Ben Gaster, and Bartok Sochaski. 2016. 1669 | C++ for OpenCL Workshop, IWOCL 2016. In Proceedings of the 4th International Workshop 1670 | on OpenCL (IWOCL '16). 1671 | * [Dive into OpenCL C++](http://www.iwocl.org/wp-content/uploads/iwocl-2016-dive-into-opencl-c.pdf) 1672 | * [OpenCL C++ kernel language](http://www.iwocl.org/wp-content/uploads/iwcol-2016-opencl-ckernel-language.pdf) 1673 | 1674 | *** 1675 | 1676 | OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. 1677 | Other names are for informational purposes only and may be trademarks of their respective owners. 1678 | --------------------------------------------------------------------------------